Drowsy Driver Detection
Drowsy Driver Detection
Features
Nisha Gandhi1 Tejas Naik2 Aditya Yele3
Abstract— Around 100,000 accidents per year are caused by survey of literature on non intrusive detection using computer
driver drowsiness. To add to the seriousness of the matter, no vision.
test exists to determine sleepiness as there is for intoxication
detection. Detection of driver drowsiness is gaining importance
Alshaqaqi et al.[3] have presented a detection system
in the field of Computer Vision and Machine Learning. based on edge detection and exploiting the symmetry of
Recurrent Neural Networks (RNNs) and Long Short-Term facial features for extracting the eyes. The state of the eyes is
Memory Units(LSTM) have been very successful in processing determined as open or closed by taking the Hough transform
of sequential multimedia data. In this project, we propose a for circles and comparing the intersection of the Hough
novel driver drowsiness detection method using Convolutional
Neural Networks (CNNs) to extract information from images
transform and the edge image with a threshold. The state of
and feed a sequence of such information to the LSTMs for drowsiness is then determined by using Percentage of Eyelid
prediction. Closure(PERCLOS)- a scientifically associated measure of
drowsiness associated with slow eye closure.
Index Terms - Computer Vision, Machine Learning, Deep
Grace et al. [4] have presented two drowsiness detection
Learning, Drowsy Driver Detection, Face Detection, Eye Track-
ing, Convolutional Neural Network, Recurrent Neural Net- methods. In the first method they develop a camera by
works, Long Short Term Memory. exploiting the fact that the retina reflects different amount
of infrared light at different frequencies.Two images of the
I. INTRODUCTION drivers face are taken at fixed wavelengths. The difference of
Accidents caused due to drowsy driving are a major this images is used to measure percentage eye closure. The
problem in the United States. The National Highway Traffic second method although in its infancy uses a neural network
Safety Administration estimates that drowsy driving was to predict PERCLOS by finding the right combinations of
responsible for 72,000 crashes, 44,000 injuries, and 800 driver performance variables.
deaths in 2013[1]. Drowsiness detection technologies have Malla et al. [5] have built a system for detecting mi-
attempted to prevent such incidents by predicting if a driver crosleep. The system uses a remotely placed camera with
is falling asleep based on various inputs. Technologies in near infra-red illumination to acquire the video. Haar object
drowsiness detection can be classified in to three main detection algorithm is used to detect a face. The eyes Region
categories[2]. The first category involves measuring cerebral of interest is detected using anthropomorphic parameters.
and muscular signals and cardiovascular activity. These tech- Eye closure is detected by taking ratio of the closed portion
niques are invasive and not commercially viable. The second of the eye to the average height of the open portion.
category includes techniques of measuring overall driver be- Under the light of what has been mentioned above, meth-
havior from vehicle patterns.Examples of this method include ods for drowsy detection have involved detection of face,
monitoring the vehicles position in a lane, steering pattern eyes and(or) facial features.
monitoring. These measurements need to take in to account
many parameters such as vehicle type, driver experience,
condition of the road[2].Measuring most of these parameters III. P ROPOSED A PPROACH : C ONV-LSTM
requires significant amount of times and user data. These
techniques do not work with microsleeps-when the driver The problem of detecting drowsiness is that it is difficult
falls asleep for a few seconds without causing any significant to tell from a single frame if the person is blinking or falling
changes in the driving patterns.The third category consists of asleep. In order to overcome this problem, we introduce
using Computer Vision techniques as a non invasive way to our method Conv-LSTM, which comprises of two sub-
monitor drivers sleepiness. We present a system in the third models: the CNN model for feature extraction and LSTM
category for drowsiness detection using CNNs and LSTMs. for interpreting the features across consecutive frames. The
After face detection using Viola Jones face detector, we track procedure for drowsiness detection is thus as follows: First,
the eyes. These are fed to a pre-trained CNN. The sequences we extract significant CNN features from the video frames.
of features extracted by the CNN are then given to LSTM Then features representing the sequence of the action (Alert
for detecting drowsiness. or a Drowsy Driver) for a certain time interval (fixed number
of frames) are fed to the LSTM as an input. Finally, a
II. R ELATED W ORKS softmax layer is used to predict drowsiness/alertness of the
Efforts reported in literature have focused on all three cat- entire video sequence.[15]. Figure(1) below explains the flow
egories of drowsiness detection systems. Here we present a of our model.
Neural Networks (CNNs), which are state-of-the-art for
image classification and feature extraction.We adapted a pre-
trained model, Inception-v3[12], which is trained on the
Image-Net Dataset comprising of 1000 classes for Large
Scale Visual Recognition Challenge(2012)[10]. Using trans-
fer learning we retrain the final layer of this model on our
dataset with Tensorflow[11].
Fig. 1. Flow Diagram for Conv-LSTM
At the completion of 4000 training steps, our model
reported an accuracy of 96.5% on the validation set. Then,
A. Dataset Collection we ran each frame(image) of every video through Inception
model and saved the output from the final pooling layer
Videos of eight subjects (6 males and 2 females) im- (pool-3:0). This results in a 2048-Dimensional vector of
itating signs of alertness and drowsiness were recorded features, which we passed to the sequential neural models.
under ambient recording conditions. During the recording Finally, we convert these extracted features into sequences
of the videos, the subjects were asked to perform certain of extracted features.
actions to imitate drowsiness such as slow eyelid closure, and
droopy eyes followed by a quick recovery of head posture to D. Long Short Term Memory Units (LSTM)
imitate micro-sleep. In order to imitate alertness, the subjects Long Short Term Memory Networks are a special kind of
were asked to gaze in different directions with/without head Recurrent Neural Networks, capable of learning long-term
movement. dependencies while avoiding the vanishing and exploding
The dataset consists of 16 Training and 3 testing gradients problems. Each block contains one or more re-
videos, both containing classes: Alert-Eyes and Drowsy- currently connected memory cells and three multiplicative
Eyes. Videos were recorded with a CMOS front web-camera units, the input, output and forget gates, which control the
1280x720p at 30fps with a flicker reduction of 50 Hz. information flow inside the memory block.
B. Face ROI Detection and Eye Detection module The LSTM framework enables the prediction (textual de-
scription) for visual time series problems. In Drowsy Driver
We use Viola-Jones Haar-Feature based Cascade
Detection, the stitched features (16 videos x 26 frames x
Classifiers[6] for face detection. In order to avoid false
1024 feature vectors) are used to train the sequential model.
positives, we first detect the face Region of Interest(fROI)
We used a single, 4096-wide LSTM layer, followed by
and then apply eye detection on this ROI to obtain a
a 1024 Dense layer, with some dropout in between. We
rectangular localized patch containing a pair of eyes. After
trained the model for 10 epochs, with a batch-size of 4,
detecting the face and eyes in the first frame, we track
using Keras and Tensorflow as the back-end.[13] We used
them using CAMShift (Continuously Adaptive Mean-shift).
Adam Optimizer configured with a learning rate of 0.00005
Below figures demonstrate detection of closed as well as
to train and optimize our network weights. Figure 4 below
open eyes.
shows the architecture of our LSTM model.