0% found this document useful (0 votes)
19 views

Project Report

Uploaded by

r89836718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Project Report

Uploaded by

r89836718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Rakshit Sawarn

Roll Number – 23B0426


PROJECT REPORT
Deep Learning-Based Approach to
detect Drowsiness of Human Beings

Abstract:
This paper provides a deep learning method for the drowsiness detection and classification from eye
images. In this paper, I have introduced CNN for drowsiness detection of drivers. The model scored at
an accuracy of 99.75% overall on the test set, which further proves its effectiveness.

1. Introduction:
The identification and categorization of the images focussed on human eyes are the most
critical factors for diagnosis of drowsiness levels of the driver to prevent serious accidents.
Methods based on deep learning techniques offer a promising solution to this challenge by
providing fast and accurate detection and providing alert to the drivers so that they are
attentive all the time or take a rest whatever they feels is fit for them.

2. Methodology
2.1 Dataset

The dataset I used for the demonstrations is the MRL Eye Dataset. It contains eye images of
men and women with and without eye glasses in both open and closed conditions. First the
dataset was pre-processed so that all images are of the same size and pixel intensity.

The dataset is publicly available and you can access it from websites like Kaggle. You may
find the dataset at the end of this report.

Some sample images in the dataset.


2.2 Preprocessing

The Preprocessing steps which are the following augmentations for the training and test
datasets to achieve better model performance:

Training Augmentations

• Resize: Resizes the image to a uniform size of 224x224 pixels.

Such augmentations aid in diversity of dataset and hence the robustness of the model under
consideration is increased.

Test Augmentations

• Resize: Resizes the image to a uniform size of 224x224 pixels.

These augmentations help the model generalize better maintaining consistency during
testing.

2.3 Model Architecture

Base Model Selection

I have used the Xception model, which is a deep convolutional neural network architecture
that has shown impressive performance on image classification tasks. Proved by many
papers. The model was initialized with pretrained weights to take advantage of transfer
learning, which can improve performance and reduce training time.

I have used the timm library to load the Xception model with pretrained weights. The
(pretrained=True) parameter ensures that the model is initialized with weights pre-trained
on a large dataset (mostly ImageNet).

The Xception model is designed to accept RGB images (3 channel images) and the dataset
also contains RGB images.
Modifying the Output Layer

The original Xception model is designed for classification tasks with a large number of
classes. So, I have changed the final fully connected layer to adapt it for our specific task of
classifying brain tumors into four classes.

• Checked the number of input features to the final fully connected layer (in_features =
model.fc.in_features).

Replaced the final fully connected layer with a new one (nn.Linear(in_features, 2)) to classify
the input images into two classes: 'OPEN', 'CLOSED'.

Xception Model Architecture


Number of
Layer Name Layer Type Output Shape
Parameters

conv1 Conv2d [1, 32, 149, 149] 2880

bn1 BatchNorm2d [1, 32, 149, 149] 64

act1 ReLU [1, 32, 149, 149] 0

conv2 Conv2d [1, 64, 147, 147] 18432

bn2 BatchNorm2d [1, 64, 147, 147] 128

act2 ReLU [1, 64, 147, 147] 0

block1 Block [1, 128, 73, 73] 194816

block2 Block [1, 256, 37, 37] 485376

block3 Block [1, 728, 19, 19] 2350184

block4 Block [1, 728, 19, 19] 1545664

block5 Block [1, 728, 19, 19] 1545664

block6 Block [1, 728, 19, 19] 1545664

block7 Block [1, 728, 19, 19] 1545664

block8 Block [1, 728, 19, 19] 1545664

block9 Block [1, 728, 19, 19] 1545664

block10 Block [1, 728, 19, 19] 1545664


Number of
Layer Name Layer Type Output Shape
Parameters

block11 Block [1, 728, 19, 19] 1545664

block12 Block [1, 1024, 10, 10] 2280096

conv3 SeparableConv2d [1, 1536, 10, 10] 2370048

bn3 BatchNorm2d [1, 1536, 10, 10] 3072

act3 ReLU [1, 1536, 10, 10] 0

conv4 SeparableConv2d [1, 2048, 10, 10] 3161600

bn4 BatchNorm2d [1, 2048, 10, 10] 4096

act4 ReLU [1, 2048, 10, 10] 0

global_pool SelectAdaptivePool2d [1, 2048] 0

fc Linear [1, 2] 8196

Total Parameters 20,857,668

Model Training and Optimization

For training the model, below the training and optimization strategy used for modified
Xception model:

1. Loss Function: Cross-entropy loss, which is appropriate for the many-class


categorization tasks.
2. Optimizer: Adam optimiser, best when training neural networks, it uses the adaptive
learning rate and momentum.
3. nn.CrossEntropyLoss() to compute the error between predicted class probabilities
and true labels.
4. optim.Adam(model.parameters(), lr=0.0001) which can update the model
weights to reduce the loss.

2.4 Training

The model was trained on Google Colab with GPU enabled using Adam optimizer with a
learning rate of 0.0001. Epochs: The model was trained for 5 epochs.

In 10 epochs only the training accuracy increased to freaking 99.677% and the loss dropped
to 0.0088.
The dataset was passed in a batch size of 32 with 2 num_workers. Here is a complete analysis
of the training process:

Epochs Loss Training Accuracy


1 0.03877 98.6384
2 0.02402 99.1359
3 0.01754 99.3611
4 0.01280 99.5451
5 0.00881 99.6776

2.5 Evaluation Metrics

Below are the metrics I have used to evaluate the performance of model:

• Accuracy: Percentage of actual positive occurrences that were predicted as positive.


• Precision: that is, the proportion of true positive queries among those queries
predicted as positives.
• Recall: ratio of number of true positive instances to the number false negative
instances.
• F1-Score: The harmonic means of precision and recall.
• Confusion Matrix: A matrix for presenting the true vs. predicted classifications.
• ROC Curve and AUC: Receiver Operating Characteristic curve and the Area Under
the Curve.

3. Results
3.1 Classification Performance

My proposed model is able to achieve a testing accuracy of 99.75 % across the test dataset. The
model seem fit as it gives precision, recall and F1-score as higher for all of the classes in the
classification report suggesting that the difference between two types of tumor was detected
effectively by model.

Table 1: Classification Report


Class Precision Recall F1-Score Support

Closed 1.00 1.00 1.00 8346

Open 1.00 1.00 1.00 8634

accuracy 1.00 16980

macro avg 1.00 1.00 1.00 16980

weighted avg 1.00 1.00 1.00 16980


3.2 Confusion Matrix

The confusion matrix provides a visual representation of the model's performance. Each cell
in the matrix represents the number of instances of a predicted class compared to the actual
class.

Figure 1: Confusion Matrix

3.3 ROC Curves

An ROC curve can be used to illustrate the trade-off between the true positive rate and false
positive rate for each class. In addition, the Area Under the Curve (AUC) was used to
evaluate the model capability of discriminating classes.
Figure 2: ROC Curves

4. Discussion
The high accuracy and robust performance of the proposed model indicate that it is suitable for
practical application. Automated drowsiness detection is a method that can increase the assistance
to drivers and prevent a lot of accidents.

5. Future Work
Future research could explore the following directions:

• Expanding the dataset to include more diverse images.


• Implementing more advanced data augmentation techniques.
• Integrating the model into a web app kind of thing so that everyone can use it. The
camera will continuously capture images and closed detection after a particular
threshold will beep an alarm.

6. References
• Coursera course: Neural Networks and Deep Learning by Andrew Ng
https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-
learning
• CS231n: Convolutional Neural Networks for Visual Recognition course by Stanford:
http://cs231n.stanford.edu/2017/index.html
• CDEEP Course: https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs337+335
s2019/index.html
• CS229: Course offered by Stanford: https://cs229.stanford.edu/
• Course on Computer Vision by Stanford:
http://vision.stanford.edu/teaching/cs131_fall2021/
• OpenCV Tutorial: https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html
• PyTorch Tutorial:
https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
• Book by Michael Nielson: http://neuralnetworksanddeeplearning.com/
• Book by Aurélien Géron: Hands-on machine learning with scikit-learn, keras and
tensor flow.

7. Dataset and Source Code


Link to the Dataset available in Kaggle:
https://www.kaggle.com/datasets/imadeddinedjerarda/mrl-eye-dataset

Link to the GitHub Repo of the code:

https://github.com/Rakshit-Sawarn-iitb/Drowsiness-Detection/tree/main

Contains two versions of the code the notebook one with some good comments which is
named as drowsiness detection(1) and other with testing metrics added.

You might also like