0% found this document useful (0 votes)
64 views3 pages

Drowsy Driver Detection

This document proposes a method to detect drowsy drivers using convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. It collects video data of subjects imitating alert and drowsy behaviors. It uses a CNN to extract features from each frame, and feeds sequences of these features into an LSTM to classify the full video sequence as alert or drowsy. The CNN features are extracted using a pre-trained Inception model, and the LSTM is used to predict drowsiness based on temporal patterns in the CNN features across video frames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views3 pages

Drowsy Driver Detection

This document proposes a method to detect drowsy drivers using convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. It collects video data of subjects imitating alert and drowsy behaviors. It uses a CNN to extract features from each frame, and feeds sequences of these features into an LSTM to classify the full video sequence as alert or drowsy. The CNN features are extracted using a pre-trained Inception model, and the LSTM is used to predict drowsiness based on temporal patterns in the CNN features across video frames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Drowsy Driver Detection in Video Sequences using LSTM with CNN

Features
Nisha Gandhi1 Tejas Naik2 Aditya Yele3

Abstract— Around 100,000 accidents per year are caused by survey of literature on non intrusive detection using computer
driver drowsiness. To add to the seriousness of the matter, no vision.
test exists to determine sleepiness as there is for intoxication
detection. Detection of driver drowsiness is gaining importance
Alshaqaqi et al.[3] have presented a detection system
in the field of Computer Vision and Machine Learning. based on edge detection and exploiting the symmetry of
Recurrent Neural Networks (RNNs) and Long Short-Term facial features for extracting the eyes. The state of the eyes is
Memory Units(LSTM) have been very successful in processing determined as open or closed by taking the Hough transform
of sequential multimedia data. In this project, we propose a for circles and comparing the intersection of the Hough
novel driver drowsiness detection method using Convolutional
Neural Networks (CNNs) to extract information from images
transform and the edge image with a threshold. The state of
and feed a sequence of such information to the LSTMs for drowsiness is then determined by using Percentage of Eyelid
prediction. Closure(PERCLOS)- a scientifically associated measure of
drowsiness associated with slow eye closure.
Index Terms - Computer Vision, Machine Learning, Deep
Grace et al. [4] have presented two drowsiness detection
Learning, Drowsy Driver Detection, Face Detection, Eye Track-
ing, Convolutional Neural Network, Recurrent Neural Net- methods. In the first method they develop a camera by
works, Long Short Term Memory. exploiting the fact that the retina reflects different amount
of infrared light at different frequencies.Two images of the
I. INTRODUCTION drivers face are taken at fixed wavelengths. The difference of
Accidents caused due to drowsy driving are a major this images is used to measure percentage eye closure. The
problem in the United States. The National Highway Traffic second method although in its infancy uses a neural network
Safety Administration estimates that drowsy driving was to predict PERCLOS by finding the right combinations of
responsible for 72,000 crashes, 44,000 injuries, and 800 driver performance variables.
deaths in 2013[1]. Drowsiness detection technologies have Malla et al. [5] have built a system for detecting mi-
attempted to prevent such incidents by predicting if a driver crosleep. The system uses a remotely placed camera with
is falling asleep based on various inputs. Technologies in near infra-red illumination to acquire the video. Haar object
drowsiness detection can be classified in to three main detection algorithm is used to detect a face. The eyes Region
categories[2]. The first category involves measuring cerebral of interest is detected using anthropomorphic parameters.
and muscular signals and cardiovascular activity. These tech- Eye closure is detected by taking ratio of the closed portion
niques are invasive and not commercially viable. The second of the eye to the average height of the open portion.
category includes techniques of measuring overall driver be- Under the light of what has been mentioned above, meth-
havior from vehicle patterns.Examples of this method include ods for drowsy detection have involved detection of face,
monitoring the vehicles position in a lane, steering pattern eyes and(or) facial features.
monitoring. These measurements need to take in to account
many parameters such as vehicle type, driver experience,
condition of the road[2].Measuring most of these parameters III. P ROPOSED A PPROACH : C ONV-LSTM
requires significant amount of times and user data. These
techniques do not work with microsleeps-when the driver The problem of detecting drowsiness is that it is difficult
falls asleep for a few seconds without causing any significant to tell from a single frame if the person is blinking or falling
changes in the driving patterns.The third category consists of asleep. In order to overcome this problem, we introduce
using Computer Vision techniques as a non invasive way to our method Conv-LSTM, which comprises of two sub-
monitor drivers sleepiness. We present a system in the third models: the CNN model for feature extraction and LSTM
category for drowsiness detection using CNNs and LSTMs. for interpreting the features across consecutive frames. The
After face detection using Viola Jones face detector, we track procedure for drowsiness detection is thus as follows: First,
the eyes. These are fed to a pre-trained CNN. The sequences we extract significant CNN features from the video frames.
of features extracted by the CNN are then given to LSTM Then features representing the sequence of the action (Alert
for detecting drowsiness. or a Drowsy Driver) for a certain time interval (fixed number
of frames) are fed to the LSTM as an input. Finally, a
II. R ELATED W ORKS softmax layer is used to predict drowsiness/alertness of the
Efforts reported in literature have focused on all three cat- entire video sequence.[15]. Figure(1) below explains the flow
egories of drowsiness detection systems. Here we present a of our model.
Neural Networks (CNNs), which are state-of-the-art for
image classification and feature extraction.We adapted a pre-
trained model, Inception-v3[12], which is trained on the
Image-Net Dataset comprising of 1000 classes for Large
Scale Visual Recognition Challenge(2012)[10]. Using trans-
fer learning we retrain the final layer of this model on our
dataset with Tensorflow[11].
Fig. 1. Flow Diagram for Conv-LSTM
At the completion of 4000 training steps, our model
reported an accuracy of 96.5% on the validation set. Then,
A. Dataset Collection we ran each frame(image) of every video through Inception
model and saved the output from the final pooling layer
Videos of eight subjects (6 males and 2 females) im- (pool-3:0). This results in a 2048-Dimensional vector of
itating signs of alertness and drowsiness were recorded features, which we passed to the sequential neural models.
under ambient recording conditions. During the recording Finally, we convert these extracted features into sequences
of the videos, the subjects were asked to perform certain of extracted features.
actions to imitate drowsiness such as slow eyelid closure, and
droopy eyes followed by a quick recovery of head posture to D. Long Short Term Memory Units (LSTM)
imitate micro-sleep. In order to imitate alertness, the subjects Long Short Term Memory Networks are a special kind of
were asked to gaze in different directions with/without head Recurrent Neural Networks, capable of learning long-term
movement. dependencies while avoiding the vanishing and exploding
The dataset consists of 16 Training and 3 testing gradients problems. Each block contains one or more re-
videos, both containing classes: Alert-Eyes and Drowsy- currently connected memory cells and three multiplicative
Eyes. Videos were recorded with a CMOS front web-camera units, the input, output and forget gates, which control the
1280x720p at 30fps with a flicker reduction of 50 Hz. information flow inside the memory block.
B. Face ROI Detection and Eye Detection module The LSTM framework enables the prediction (textual de-
scription) for visual time series problems. In Drowsy Driver
We use Viola-Jones Haar-Feature based Cascade
Detection, the stitched features (16 videos x 26 frames x
Classifiers[6] for face detection. In order to avoid false
1024 feature vectors) are used to train the sequential model.
positives, we first detect the face Region of Interest(fROI)
We used a single, 4096-wide LSTM layer, followed by
and then apply eye detection on this ROI to obtain a
a 1024 Dense layer, with some dropout in between. We
rectangular localized patch containing a pair of eyes. After
trained the model for 10 epochs, with a batch-size of 4,
detecting the face and eyes in the first frame, we track
using Keras and Tensorflow as the back-end.[13] We used
them using CAMShift (Continuously Adaptive Mean-shift).
Adam Optimizer configured with a learning rate of 0.00005
Below figures demonstrate detection of closed as well as
to train and optimize our network weights. Figure 4 below
open eyes.
shows the architecture of our LSTM model.

Fig. 2. Alert-Eye detection

Fig. 4. LSTM Architechture

IV. R ESULTS OBTAINED


Fig. 3. Drowsy-Eye detection
We tried and tested our model with various parameters.
Inception-v3 retrained on our dataset of eye patches obtained
C. Convolutional Neural Network (Inception-v3) module an approximate training accuracy of : 96.5%. Our testing
We manually created an image dataset for feature extrac- accuracy was 87.5% for 10 epochs for the LSTM model.
tion. Two classes were made with approximately 120 images The model was able to correctly classify a sequence of
each for Alert-Eyes and Drowsy-Eyes. To extract significant consecutive frames from unseen videos, it detected a drowsy
visual features from these images, we use Convolutional person with 93.65% confidence and a alert driver with
99.63% confidence in most of our test runs. To visualize ACKNOWLEDGMENT
the loss function we ran over 30 epochs which resulted in We thank Professor Roy Shilkrot, for his constant guidance
the graph shown in Figure 5. We performed hyperparameter and support.
tuning on learning rate with ADAM and SGD optimizers. We would also like to thank our fellow batch-mates:
Results obtained with ADAM optimizer were significantly Noopur Maheshwari, Rahul Rane, Bhushan Sonawane, Nis-
better than SGD. hant Borude, Mihir Chakradeo and Dhanashree Patil all
graduate students at Stony Brook University for helping us
create our video dataset.
R EFERENCES
[1] Center for Disease Control and Prevention
https://www.cdc.gov/features/dsdrowsydriving/index.html.
[2] Optimised Co-modal Passenger Transport for Reducing Carbon Emis-
sions. COMPASS Handbook of ICT Solutions
[3] B. Alshaqaqi, A. S. Baquhaizel, M. E. Amine Ouis, M. Boumehed,
A. Ouamri and M. Keche, ”Driver drowsiness detection system,”
2013 8th International Workshop on Systems, Signal Processing
and their Applications (WoSSPA), Algiers, 2013, pp. 151-155. doi:
10.1109/WoSSPA.2013.6602353
[4] R. Grace et al., ”A drowsy driver detection system for heavy vehicles,”
17th DASC. AIAA/IEEE/SAE. Digital Avionics Systems Conference.
Proceedings (Cat. No.98CH36267), Bellevue, WA, 1998, pp. I36/1-
I36/8 vol.2. doi: 10.1109/DASC.1998.739878
[5] A. M. Malla, P. R. Davidson, P. J. Bones, R. Green and R. D. Jones,
”Automated video-based measurement of eye closure for detecting
behavioral microsleep,” 2010 Annual International Conference of the
IEEE Engineering in Medicine and Biology, Buenos Aires, 2010, pp.
6741-6744. doi: 10.1109/IEMBS.2010.5626013
Fig. 5. Loss, Validation Loss for 30 epochs [6] P. Viola and M. Jones, ”Rapid object detection using a boosted cascade
of simple features,” Proceedings of the 2001 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition. CVPR 2001,
2001, pp. I-511-I-518 vol.1. doi: 10.1109/CVPR.2001.990517
[7] OpenCV.Open Source Computer Vision Library Reference Manual,
V. C HALLENGES FACED 2001
[8] Franois Chollet, Keras, 2015, Github, https://github.com/fchollet/keras
Unavailability of an apt dataset led us to creating our own [9] Martn Abadi et al., TensorFlow: Large-scale machine learning on
video dataset for driver drowsiness detection. This was quite heterogeneous systems,2015. Software available from tensorflow.org.
time consuming and tedious. [10] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla,
and L.Fei-Fei. ILSVRC-2012, 2012.http://www.image-
Figuring out the exact procedure for reshaping the stitched net.org/challenges/LSVRC/2012/
sequence of frames to connect the output layer of the CNN [11] Retraining Inception’s final layer for New Categories https :
Inception-v3 model to the LSTM model was challenging for //www.tensorf low.org/tutorials/imager etraining
[12] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z.
us. (2016). Rethinking the inception architecture for computer vision. In
Taking care of corner cases such as not predicting the Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 2818-2826).
driver as drowsy for normal eye-blinks proved to be demand- [13] Video classification methods https://blog.coast.ai/five-video-
ing. classification-methods-implemented-in-keras-and-tensorflow-
99cad29cc0b5
[14] CNN-LSTMs https://machinelearningmastery.com/cnn-long-short-
VI. C ONCLUSION term-memory-networks/
[15] Donahue, Jeffrey, Lisa Anne Hendricks, Sergio Guadarrama, Marcus
Thus our model warns drowsy drivers with an alarm, after Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell.
successful eye-detection and tracking with computer vision ”Long-term recurrent convolutional networks for visual recognition
and deep learning techniques (CNN and LSTM models) with and description.” In Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2625-2634. 2015.
an accuracy of 87.5%.

VII. F UTURE S COPE


Our model can be improvised by the following methods:
Learning to detect faces and eyes in varied lighting condi-
tions, such as at night with infrared lights. In addition to this,
the model should also be able to recognize drowsy eyes with
sunglasses.
With some modification this system can be used in combi-
nation with real time cameras to provide alert a driver while
he is driving.This will however require exhaustive testing on
a larger dataset.

You might also like