0% found this document useful (0 votes)
236 views

Real-Time Driver-Drowsiness Detection System Using Facial Features

This document summarizes a research article that proposes a real-time driver drowsiness detection system called DriCare. DriCare uses video images of the driver's face, without requiring devices to be attached to the driver's body, to detect signs of fatigue such as yawning, blinking rate, and eye closure duration. The system introduces a new face tracking algorithm and facial feature detection method using 68 key points to improve accuracy. Experimental results showed DriCare achieved around 92% accuracy in detecting driver fatigue.

Uploaded by

thefakestudent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views

Real-Time Driver-Drowsiness Detection System Using Facial Features

This document summarizes a research article that proposes a real-time driver drowsiness detection system called DriCare. DriCare uses video images of the driver's face, without requiring devices to be attached to the driver's body, to detect signs of fatigue such as yawning, blinking rate, and eye closure duration. The system introduces a new face tracking algorithm and facial feature detection method using 68 key points to improve accuracy. Experimental results showed DriCare achieved around 92% accuracy in detecting driver fatigue.

Uploaded by

thefakestudent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Real-Time Driver-Drowsiness Detection


System Using Facial Features
WANGHUA DENG1 RUOXUE WU12
1
Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, China
2
School of Software, Yunnan University, China
Corresponding author: Ruoxue Wu (e-mail: [email protected]).

ABSTRACT The face, an important part of the body, conveys a lot of information. When a driver is in a
state of fatigue, the facial expressions, e.g., the frequency of blinking and yawning, are different from those
in the normal state. In this paper, we propose a system called DriCare, which detects the drivers’ fatigue
status, such as yawning, blinking, and duration of eye closure, using video images, without equipping their
bodies with devices. Owing to the shortcomings of previous algorithms, we introduce a new face-tracking
algorithm to improve the tracking accuracy. Further, we designed a new detection method for facial regions
based on 68 key points. Then we use these facial regions to evaluate the drivers’ state. By combining the
features of the eyes and mouth, DriCare can alert the driver using a fatigue warning. The experimental
results showed that DriCare achieved around 92% accuracy.

INDEX TERMS convolutional neural network, fatigue detection, feature location, face tracking

I. INTRODUCTION is not required in the objective detection method as it mon-


In recent years, an increase in the demand for modern trans- itors the driver’s physiological state and driving-behavior
portation necessitates a faster car-parc growth. At present, characteristics in real time [4]. The collected data are used to
the automobile is an essential mode of transportation for evaluate the driver’s level of fatigue. Furthermore, objective
people. In 2017, a total of 97 million vehicles were sold detection is categorized into two: contact and non-contact.
globally, which was 0.3% more than that in 2016 [1]. In 2018, Compared with the contact method, non-contact is cheaper
the global total estimation of the number of vehicles being and more convenient because the system that not require
used was more than 1 billion [2]. Although the automobile Computer Vision technology or sophisticate camera allow the
has changed people’s lifestyle and improved the convenience use of the device in more cars.
of conducting daily activities, it is also associated with nu- Owing to easy installation and low cost, the non-contact
merous negative effects, such as traffic accidents. A report method has been widely used for fatigue-driving detection.
by the National Highway Traffic Safety Administration [3] For instance, Attention Technologies [5] and SmartEye [6]
showed that a total of 7,277,000 traffic accidents occurred employ the movement of the driver’s eyes and position of the
in the United States in 2016, resulting in 37,461 deaths and driver’s head to determine the level of their fatigue.
3,144,000 injuries. In these accidents, fatigue driving caused In this study, we propose a non-contact method called
approximately 20% − 30% traffic accidents. Thus, fatigued DriCare to detect the level of the driver’s fatigue. Our method
driving is a significant and latent danger in traffic accidents. employs the use of only the vehicle-mounted camera, making
In recent years, the fatigue-driving-detection system has be- it unnecessary for the driver to carry any on/in-body devices.
come a hot research topic. The detection methods are catego- Our design uses each frame image to analyze and detect the
rized as subjective and objective detection. In the subjective driver’s state.
detection method, a driver must participate in the evaluation, Technically, DriCare addresses three critical challenges.
which is associated with the driver’s subjective perceptions First, as drivers’ heights are different, the positions of their
through steps such as self-questioning, evaluation and filling faces in the video are different. Then, when the driver is
in questionnaires. Then, these data are used to estimate the driving, his or her head may be moving; hence, tracking the
vehicles being driven by tired drivers, assisting the drivers to trajectory of the head in time is important once the position
plan their schedules accordingly. However, drivers’ feedback of the head changes. To monitor and warn the driver in

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

real-time, the use of the kernelized correlation filters (KCF)


algorithm [7] is preferred based on our system’s evaluation.
However, the KCF algorithm only uses a single Felzen-
szwalb histogram of oriented gradient features [8], which
has poor face-tracking accuracy in a complex environment.
Moreover, the KCF algorithm uses a manual method to mark
the tracked target in the frame. In the case of KCF tracker
cannot immediately retrieve the target, and it cannot track the
human face once the target leaves the detection area and then
returns.
Second, the driver’s eyes and mouth play a vital role in
tracking. Thus, identifying the key facial features of the
driver is important for judging driving fatigue. A previous
study [9] proposes a deep convolutional neural network for
detecting key points. Though some traditional models [9]–
[11] can detect the positions of several facial key points, they FIGURE 1: The architecture of DriCare.
cannot determine the regions of the driver’s eyes and mouth.
Third, defining the driver’s level of drowsiness is crucial
for our system. When people are tired, drowsiness is evident implementation method and present the results of the experi-
on their faces. According to Wierwille [12], the rate of the ment in Section VI. In Section VII, presents the conclusions
driver’s eye closure is associated with the degree of drowsi- of this study.
ness. Based on this principle, Grace et al [13] proposed PER-
CLOS (percentage of eyelid closure over the pupil over time) II. RELATED WORK
and introduced Copilot to measure the level of the driver’s In this section, we categorize the related work into three parts,
fatigue. Constant yawning is a sign of drowsiness, which those related to the visual object tracking algorithm, the facial
may provoke the driver to fall asleep. Li [14], Abtahi [15], landmarks recognition algorithm and those to the methods of
and Fan [16] used this feature to estimate the level of the driver-drowsiness detection.
driver’s fatigue. However, practically, the driver may have Visual object tracking. Visual object tracking is a crucial
different and complex facial expressions, which may distort problem in computer vision. It has a wide range of applica-
the identification of these features. tions in fields such as human-computer interaction, behavior
Our core contributions are as follows: recognition, robotics, and surveillance. Visual object tracking
• First, we propose a new face-tracking algorithm named estimates the target position in each frame of the image
Multiple Convolutional Neural Networks(CNN)-KCF sequence, given the initial state of the target in the previous
(MC-KCF), which optimizes KCF algorithm. We com- frame. Lucas [19] proposed that the tracking of the moving
bine CNN with the KCF algorithm [17] to improve the target can be realized using the pixel relationship between
performance of the latter in a complex environment, adjacent frames of the video sequence and displacement
such as low light. Furthermore, we introduce the mul- changes of the pixels. However, this algorithm can only de-
titask convolutional neural networks (MTCNN) [18] to tect the medium-sized target that shifts between two frames.
compensate for the inability of the KCF algorithm to With the recent advances of the correlation filter in computer
mark the target in the first frame and prevent losing the vision [7], [20]–[22], Bolme [20] proposed the Minimum
target. Output Sum of Squared Error (MOSSE) filter, which can
• Second, we use CNN to assess the state of the eye. To produce stable correlation filters to track the object. Although
improve the accuracy of CNN, DriCare measures the the MOSSE’s computational efficiency is high, its algorithm
angle of an opening eye to determine if the eye is closed. precision is low, and it can only process the gray information
To detect yawning, DriCare assesses the duration of of a single channel.
the mouth opening. Besides, DriCare proposes three Based on the correlation filter, Li [22] utilized HoG, color-
different criteria to evaluate the degree of the driver’s naming features and the scale adaptive scheme to boost
drowsiness: the blinking frequency, duration of the eyes object tracking. Danelljan [23] used HOG and the discrim-
closing, and yawning. If the results surpass the thresh- inative correlation filter to track the object. SAMF and DSST
old, DriCare will alert the driver of drowsiness. solve the problem of deformation or change in scale when
The remainder of this paper is organized as follows. We the tracking target is rotating. Further, they solve the problem
review the related research in Section II and present the of the tracker’s inability to track object adaptively and the
DriCare overview in Section III. Section IV presents the low operation speed. With the development of the deep-
principle of human face tracking based on the MC-KCF learning algorithm, some scholars combine deep learning and
algorithm. In Section V, we present the evaluation method for the correlation filter to track the mobile target [24]–[28].
the driver’s degree of drowsiness. We describe the DriCare Although these algorithms have better precision than the
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

track algorithms based on the correlation filter, their training employ simple formulae and evaluations, which make the
is time-consuming. Hence, these algorithms cannot track the results easier to measure.
object in real-time in a real environment. In this study, we
propose a MC-KCF algorithm based on the correlation filter
and deep learning. This algorithm uses CNN and MTCNN to
offset the KCF’s limitation and uses the KCF to track objects.
Thus, the algorithm can track the driver’s face in real-time
using our system.
Facial landmarks recognition.The purpose of facial key-
points recognition is that getting the crucial information
about locations of eyebrows, eyes, lips and nose in the face.
With the development of deep learning, it is the first time
FIGURE 2: System workflow.
for Sun [9] to introduced DCNN based on CNN to detect
human facial keypoints. This algorithm only recognizes 5
facial keypoints, albeit its speed is very fast. To get a higher
III. DRICARE OVERVIEW
precision for facial key points recognition, Zhou [11] em-
The proposed system, DriCare, is built using a commercial
ployed FACE++ which optimizes DCNN and it can recognize
camera automobile device, a cloud server that processes
68 facial keypoints, but this algorithm includes too much of
video data, and a commercial cellphone that stores the result.
a model and the operation of this algorithm is very com-
Figure 1 shows the structure of the DriCare system. While
plicated. Wu [29] proposed Tweaked Convolutional Neural
driving, the automobile’s camera captures the driver’s portrait
Networks (TCNN) which is based on Gaussian Mixture
and uploads the video stream to the cloud server in real-
Model (GMM) to improve different layers of CNN. How-
time. Then, the cloud server analyzes the video and detects
ever, the robustness of TCNN depends on data excessively.
the driver’s degree of drowsiness. In this stage, three main
Kowalski [30] introduced Deep Alignment Network (DAN)
parts are analyzed: the driver’s face tracking, facial key-
to recognize the facial keypoints, which has better perfor-
region recognition, and driver’s fatigue state. To meet the
mance than other algorithms. Unfortunately, DAN needs vast
real-time performance of the system, we use the MC-KCF
models and calculation based on complicated functions. So in
algorithm to track the driver’s face and recognize the facial
order to meet the requirement about real time performance,
key regions based on key-point detection. Then, the cloud
DriCare uses Dlib [31] to recognize facial keypoints.
server estimates the driver’s state when the states of the
Driver drowsiness detection. Driver drowsiness detection
eyes and mouth change. The workflow is shown in Figure
can be divided into two types: contact approaches [32]–[34]
2. Finally, the cloud server transmits the result to the driver’s
and non-contact approaches [5], [6], [35], [36]. In contact
cellphone and other apps, through which a warning tone is
approaches, drivers wear or touch some devices to get phys-
transmitted if the driver is observed to be drowsy.
iological parameters for detecting the level of their fatigue.
Warwick [32] implemented the BioHarness 3 on the driver’s
IV. DRIVER FACE TRACKING BY MC-KCF
body to collect the data and measure the drowsiness. Li [33] In this section, we illustrate the principle of driver face
used a smartwatch to detect driver drowsiness based on tracking using DriCare. Owing to the complexity of the real
electroencephalographic (EEG) signal. Jung [34] reformed environment, each frame of the video data requires prepro-
the steering wheel and set an embedded sensor to monitor the cessing to meet the tracking requirements.
electrocardiogram (ECG) signal of the driver. However, due
to the price of contact approaches and installation, there are
some limitations which cannot be implemented ubiquitously.
The other method employs a tag-free approaches to detect
the driver drowsiness, where the measured object does not
need to contact the driver. For example, Omidyeganeh et
al. [35] used the driver’s facial appearance captured by the
camera to detect the driver drowsiness, but this method
is not real-time. Zhang [37] used fatigue facial expression (a) Original image. (b) The image after illumi-
nation enhancement.
reorganization based on Local Binary Pattern (LBP) features
and Support Vector Machines (SVM) to estimate the driver FIGURE 3: The result of histogram equalization.
fatigue, but the complexity of this algorithm is bigger than
our algorithm. Moreover, Picot [38] proposed a method that
uses electrooculogram (EOG) signal and blinking feature for A. PRE-PROCESS
drowsiness detection. Akrout [39] and OyiniMbouna [40] During the detection process, the quality of images is affected
used a fusion system for drowsiness detection based on eye and features of the human face become unclear if the illu-
state and head position. Different from these methods, we mination intensity within the cab is changed during driving.
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

This usually occurs in case of overcast skylight, rain, and Then, we calculate the gradient G and gradient orientation
at night. For detection accuracy, we use the illumination en- at each pixel α in the image as shown in Eq. (2):
hancement method to preprocess images before tracking the
Gx (x, y) = H(x + 1, y) − H(x − 1, y)

driver’s face. Furthermore, we use the histogram equalization 
 Gy (x, y) = H(x, y + 1) − H(x, y − 1)

(HE) algorithm [41] to improve the brightness of the image p (2)
G = Gx (x, y)2 + Gy (x, y)2
frame. 
 Gx (x,y)
α = arctan( Gy (x,y) ) α ∈ (0◦ , 360◦ )

To determine whether light enhancement is required for
the image frame, DriCare evaluates the brightness of the where H(x, y), Gx (x, y), and Gy (x, y) represent the pixel,
image. Therefore, we convert the RGB image into a YCbCr horizontal gradient, and vertical gradient values at (x, y),
image because in the YCbCr color space, Y represents the respectively.
luminance value. We use Eq. (1) for the mean value of Y M Then, we segment the image into n × n cells. According
around the driver’s face in the image as follows: to [8], the gradient orientation is categorized into either 9
Pn bins of contrast-sensitive orientations or 18 bins of contrast-
i L insensitive orientations. If any pixel in a cell belongs to the
M= (1)
n−i corresponding orientation, the value of the orientation bin
increases 1. Finally, each cell has 9-dimensional and 18-
where L denotes the luminance value of each pixel in the dimensional histograms.
YCbCr space, and n and i represent the first and last serial The gradient of each cell is related to the internal pixels
numbers of the driver’s facial pixels in the image, respec- and the 4 cells around it. After calculating the gradient
tively. n−i is the total number of driver’s facial pixels. If M is histogram, we use Eq. (3) and (4) for normalization and
lower than the threshold, the image enhances the illumination truncation.
using the HE algorithm. Otherwise, the image is retained.
Na,b (i, j) =(C(i, j)2 + C(i + a, j)2
After counting large samples, we set the threshold to 60. 1 (3)
Figure 3 shows the result of the illumination enhancement. + C(i + a, j + b)2 + C(i, j + b)2 ) 2
 
Tα (C(i, j)/N−1,−1 (i, j))
 Tα (C(i, j)/N+1,−1 (i, j)) 
H(i, j) =  
 Tα (C(i, j)/N+1,+1 (i, j))  (4)
Tα (C(i, j)/N−1,+1 (i, j))
In Eq. (3), C(i, j) denotes the 9- or 18-dimensional eigenvec-
tor of the cell at (i, j), Na,b (i, j) represents the normalization
factor and a, b represent that the number of different normal-
ization factors, a, b ∈ {−1, 1}. In Eq. (4), H(i, j) is a feature
(a) The right result of (b) The tracking win- (c) The tracking
tracking. dow drifting. result of MC-KCF
vector, and Tα (x) denotes the truncated function. If the value
algorithm in x is bigger than α, the value is assigned to α.
After normalization and truncation, the 9-dimensional fea-
FIGURE 4: The performance of original KCF and MC-KCF ture vector becomes a 36-dimensional feature vector. The 18-
algorithm. dimensional eigenvector becomes a 72-dimensional feature
vector; in total, there are 108-dimensional feature vectors.
Then, we arrange this eigenvector with reference to the
B. THE PRINCIPLE OF MC-KCF matrix of 4 × 27. Finally, we obtain 31-dimensional HOG
features, named the FHOG feature, using matrix addition.
As previously discussed, the KCF algorithm is based on the
FHOG feature [7]. Therefore, in a complex environment and 2) CNN feature exacted by SqueezeNet 1.1
during long-term operation, the tracking window will drift,
SqueezeNet is a small CNN architecture [17] with very fast
as shown in Figure 4(b). We propose the MC-KCF algorithm
operation. Figure 5 shows the architecture of SqueezeNet
instead of the original KCF algorithm to track a human
1.1, which includes a standalone convolution layer (conv1),
face. In the MC-KCF algorithm, a new feature comprises the
3 max-pooling layers, 8 fire modules (Fire2 − 9), a final
FHOG feature and is based on the KCF algorithm and CNN
convolution layer (conv10), and one global average pool
feature. We will explain the principle of FHOG and CNN
layer.
feature execution and how these features are integrated.
SqueezeNet uses the fire module instead of the traditional
convolution layer to reduce the network parameters and
1) FHOG feature exaction improve accuracy. The fire module comprises a squeeze
In our algorithm, the FHOG feature [8] is a key factor for convolution layer of 1 × 1 filters, feeding into an expand
human-face tracking. To extract the FHOG feature and for layer with a mix of 1 × 1 and 3 × 3 convolution filters,
easy calculation, the image is grayed before commencing. similar to that shown in Figure 6. Three tunable dimensions
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

where K is the kernel matrix, I denotes an identity matrix, α ~


represents the vector of coefficients αi , λ is a hyperparame-
ter, and ~y is the vector of regression targets. Moreover, a hat
ˆdenotes the DFT of a vector, and k xx is the first row of the
kernel matrix K.
After the training, we use each classifier to evaluate the
FIGURE 5: The architecture of SqueezeNet 1.1. regression function f (z) for every image sample z. The
biggest value of f (z) is the forecasted position of the target
for each feature. The equation is as follows:
are S1 , e1 , and e3 . A feature map sized H ×W ×M becomes
H × M × S1 by squeezing layer processing [17]. Processing f (z) = F −1 (kˆxz α̂) (7)
by expanding layer [17], we can obtain a feature map sized where F −1 denotes the inverse DFT. Thus, we obtain three
H × W × (e1 + e3 ). tracking results. To obtain the final result of the MC-KCF
algorithm, we set different weights for the result of three
features: δ1 , δ2 , and δ3 . We calculate the entire response
value of the MC-KCF algorithm F using the weights and the
prediction positions based on FHOG and CNN features. The
formula is as follows:

F = δ1 × f (zf hog ) + δ2 × f (zf ire5 ) + δ3 × f (zf ire9 ) (8)


In Eq. (8), the codomain of δ is [0, 1]. When one of δ = 0,
the response value of the corresponding feature is not the
final result; otherwise, when δ = 1, the response value of
the corresponding feature is the entire response value. From
the response value, we obtain the position of the driver’s face.
The different weights of the three features can influence the
tracking accuracy . Thus, we calculate 1000 weight ratios of
the three features. Figure (7) shows a representative ratio.
When the ratio of δ1 : δ2 : δ3 is 0.57 : 0.14 : 0.29,
FIGURE 6: The architecture of Fire module. respectively, the performance is optimal. In our system, the
ratio is 0.57 : 0.14 : 0.29.

3) Multi-feature fusion
To avoid large and redundant CNN features, DriCare uses the
CNN feature obtained from the 1 × 1 convolution filter in the
expand layer of Fires 5 and 9 of the SqueezeNet model. After
conducting the feature extraction of a D1 ×G1 original image
using the MC-KCF algorithm, we obtain a FHOG feature
sized D2 × G2 and two CNN features sized D3 × G3 and
D4 × G4 . Obviously, the sizes of the three features differ.
Therefore, we adjust them so that they have the same size.
Thus, the adjustment equation is written as follows:

D = Da × θ
(5)
G = Ga × ϕ
where D and G denote the standard length and width, respec-
FIGURE 7: The face tracking result in different weights of
tively. Da and Ga represent the original length and width
features.
of the three features, respectively. θ and ϕ are the scaling
factors.
Therefore, the total dimension of the CNN features is 384,
Similar to the structure of the KCF algorithm, in the MC-
bigger than the dimension of the original FHOG feature,
KCF algorithm, we use each feature to train their classifiers
which is a 31-dimensional HOG feature. Since the modified
separately using the kernel ridge regression, which is written
object is small in some frames, we update the model of every
as follows [7]:
N frame to increase the computing speed of the model and
~yˆ improve the real-time performance of the system. We set N
α ~ˆ = ~y (K + λI)−1 =
~ =α (6) value as 3. The whole process is shown in Figure (8).
kˆxx + λ
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 8: The process of MC-KCF algorithm.

4) Calibration of MC-KCF Algorithm 1 Calibration of MC-KCF algorithm


As discussed above, the original KCF algorithm is unable Input: frame of the video f ram, the span of the tracking
to automatically obtain the tracking target of the first video time t, the count number of frames cnt, the number of all
frame. Besides the original KCF algorithm, in which the framesf rams
object goes out of the camera’s sight, we align the MC-KCF Output: the result of human face tracking res_img
algorithm in case the algorithm is unable to track the driver’s Load f ram, set t is 0 and cnt is 0
face. In Section III, the MTCNN algorithm uses the bounding while t 6 10s and cnt 6 f rams do
box to precisely determine the human face. Thus, we use if the current frame is the first frame then
MTCNN to periodically calibrate the MC-KCF algorithm. Use MTCNN algorithm to detect a human face
After preprocessing the video frame, the cloud severs will else if MC-KCF algorithm cannot detect a human face
judge whether the current image is the first frame. If it is, then
the cloud server will use the MTCNN algorithm to locate Use MTCNN algorithm to detect a human face
the human face in the image; otherwise, the cloud sever will else
continue to judge whether the span of tracking time surpasses Use MC-KCF algorithm to detect a human face
10s. If the answer is yes, the cloud sever will use the MTCNN t++
algorithm to relocate the human face and reset the tracking cnt + +
time. If the system evaluates that the current image is not the Output the result res_img
first frame and the duration of the tracking time is less than Update the scope of the detection
10s, DriCare will use the MC-KCF algorithm to track the Read the next frame
driver’s face using the result to update the scope of the search end if
for the driver’s face for the next frame. We summarize the end while
MC-KCF calibration process in Algorithm 1. if t == 10s then
Use MTCNN algorithm to align the MC-KCF algorithm
V. EVALUATION OF THE DRIVER’S FATIGUE STATE (use MTCNN algorithm to detect human face) in the
In this section, we discuss the method of analyzing the current frame
driver’s face via the DriCare system in case of drowsiness. Output the result res_img
Further, we discuss methods to locate the regions of the eyes Update the scope of the detection
and mouth on the driver’s face. A change state of the eyes t=0
and mouth is a crucial indicator of drowsiness. Additionally, cnt + +
we discuss a new algorithm to detect the driver’s fatigue. end if
Read the next frame
A. DETERMINATION OF EYES AND MOUTH REGIONS
In Section 4, we recognize and track the driver’s face in each
video frame. Then, we use Dlib [31] to locate 68 facial key the region of an eye, the region will not include the upper
points on the driver’s face. The result is shown in Figure and lower eyelids from the analysis, thereby influencing the
(9). After obtaining the key points, we set the coordinate of result of the subsequent evaluation. Therefore, we use the key
each key point as (xi , yi ) and use the key points to locate the points of the eyebrow and nose to define the scope of the eye
regions of the eyes and mouth on the driver’s face. and eye socket. The equation is as follows:
 x +x
1) The region of the eyes
lex = i 2 j
(9)
ley = ym + yn −y m
First, we offer the solution for locating the eyes’ regions. 4
From Figure 9, one eye has six key points. However, these In Eq. (9), xi and xj represent the X coordinate of the ith
points are near the eyeball. By using these points to detect and jth key points, respectively. yn and ym represent the Y
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

B. EVALUATION OF THE DRIVER’S FATIGUE STATE


In this section, we discuss the principle of evaluating the
driver’s fatigue state. As shown in Figure 2, DriCare uses two
factors to evaluate the state of the driver’s fatigue: the states
of the eyes and mouth. Unlike other methods, we propose a
new assessment for the eyes state to achieve higher accuracy.
With CNN, we use the angle of the eyes to evaluate the eye
state. Moreover, we use the state of the single eye near the
camera to assess the state of the whole eye. Besides, DriCare
also measures the state of the mouth to judge if the driver
is yawning. After these assessments, DriCare merges these
results and evaluates the driver’s degree of drowsiness.

1) Eye status recognition


(a) The location of the fa- (b) The number of the facial key points.
cial key points. a: recognition based on CNN
We build a CNN to recognize the eight layers of the eye state.
FIGURE 9: The numbering of facial key points. Figure 11 shows the CNN architecture.
We use two convolutional layers and maximum pooling
layers to extract the feature based on the eye region. The
coordinate of the nth and mth key points, respectively. lex
features are integrated by two full connection layers. Finally,
and ley denote the vertices’ coordinates of the rectangular
the results of the output are used to judge if the eye is open.
region of the eye. In our system, according to Fig. 9(b), when
The number of neurons in the output layer is 1, and the
i is Point 18, j is Point 37. m is the point number, with the
activation value is obtained using the sigmoid function as the
minimum value of Y coordinate between Points 18 and 22.
values are equal to or greater than 0, as shown in Eq. (10).
n represents the point number with the minimum value of
Y coordinate between Points 38 and 39. As shown in Figure 1
(10), we obtain the A vertex of the left eye. S(x) = (10)
1 + e−x
In Eq. 10, the range of the result is in [0, 1]. During
training, the value of an open eye is 1, representing positive
samples, and the value of a closed eye is 0, representing
negative samples. A predicted value of greater than 0.5 in
the sigmoid activation function output represents the result
of open eyes; otherwise, it represents closed eyes.

FIGURE 10: The ratio of mouth’s width and height in differ-


ent states. FIGURE 11: The architecture of CNN.

After we obtain the coordinate of the upper left A and


lower right vertices of the region D, we determine the b: recognition based on angle
eye socket region on the driver’s face based on rectangular
Owing to the CNN drawbacks (the accuracy of the eye
symmetry. Table 1 shows the calculation parameters of the
closure recognition by CNN is poor), we use the angle of the
position of the two eyes.
eye to compensate for the CNN’s limitations regarding eye
TABLE 1: The calculation parameters of the position of the closure recognition. After the CNN validates that the driver’s
two eyes. eye is open, we use the angle of the eye to validate the result.
A blink is the process of the eye closing and opening. As
Name Position i j m n
Upper left vertex of the region 18 37 min(y18 , y22 ) min(y38 , y39 ) discussed in the previous section, we identify the eye region
Left Eye
Lower right vertex of the region 22 40 30 min(y41 , y42 )
Upper left vertex of the region 23 43 min(y23 , y27 ) min(y44 , y45 )
using the video frame. As revealed in Fig. 12(a), we use the
Right Eye
Lower right vertex of the region 27 46 30 min(y47 , y48 ) key points in the eye region to assess the angle of the eye.
The equation is as follows:
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

follows:
(  p
 H = p(yr − ye )2 + (xr − xe )2
p
dij = (yj − yi )2 + (xj − xi )2
d2ab +d2ac −d2bc (11) W = (yu − yv )+ (xu − xv )2 (12)
A = (arccos 2×d ab ×dac
)/π × 180◦ H
f=W

In Eq. (11), di j is the distance between Points i and j. In Eq. (12), f is the ratio of the mouth’s width and height in
(xi , yi ) and (xj , yj ) represent the coordinates of Points i one image frame. H represents the height of the mouth, and
and j in the frame, respectively. In our system, (xi , yi ) and W denotes the width of the mouth. (xr , yr ) is the coordinate
(xj , yj ) are the two average coordinate values of two points of the vermilion tubercle, and (xe , ye ) is the lowest point in
in the eyelids. A is the angle of the eye. When we obtain the the lower low lip. (xu , yu ) and (xv , yv ) are two coordinates
result, if the result is bigger than the threshold, DriCare will of the angulus oris. If f is larger than the threshold, DriCare
consider if the state of the eye is opening, and vice versa. We will consider that the driver is opening the mouth, and vice
analyze the large number of samples. In the eye-closed state, versa. According to the measurement for a large number of
the angle of the eye is lower than 20◦ . Therefore, we set the samples, in our paper, we set the threshold to 0.6.
threshold at 20◦ . In practice, the opening of the mouth may resemble other
We assess the driver’s degree of fatigue from three per- situations, such as singing, speaking, and laughing. These
spectives based on the angle of the eye: (1) the proportion phenomena present the same results. To reduce errors, we
of the number of closed-eye frames to the total number of draw the diagram of curves using the width-height ratio of
frames in 1 minute, (2) continuous time of eye closure, and the mouth obtained in each frame as shown in Figure (13).
(3) frequency of blinking. According to [42], [43], when From this illustration, when the driver is yawning, the mouth
the driver is awake, the proportion of closed-eye frames is
less than 30%. Moreover, the driver’s closure time for a
single eye is shorter when he or she is awake, so when the
driver’s single eye closure time exceeds 2s, the driver is
considered fatigued. Besides, when people are awake, they
blink an average of 10 to 15 times per minute. However,
when people are mildly tired, the number of blinks increase;
in case of severe fatigue, the number of blinks will be lower
because the eyes are closed most of the time. To detect fatigue
based on the frequency of blinking, it is necessary to count
the blinking frequency of the eyes within 1 minute. If the
blinking frequency is greater than 25 times/min or lower than
5 times/min, fatigue is indicated.

FIGURE 13: The ratio of mouth’s width and height in differ-


ent states.

will open continuously for a longer time, and the wave peaks
are wider. Otherwise, when the driver is speaking, the mouth
opens continuously for a shorter time, and the wave peaks are
narrower. Hence, we use Eq. (13) to calculate the duration
time ratio R of the opening mouth, which can discriminate
(a) The angle of eye opening. (b) The width and height of
the mouth.
actions such as yawning and speaking. The equation is writ-
ten as follows:
n
FIGURE 12: The state recognition for eye and mouth. R= × 100% (13)
m
where m represents the number of frames for some time.
n is the number of the frame, and f exceeds the threshold.
2) Mouth status recognition According to [44], [45], we know that the whole yawning
For the detection of fatigued driving, the features of the process lasts for 7s in general, and the video frames, the
mouth are important because when a driver is drowsy, con- f of which is higher than the threshold, are approximately
tinuous yawns will occur. Therefore, DriCare uses these 3 ∼ 4s. Therefore, we set R to 50%. When judging whether
features to measure the accuracy of evaluation. In Section 4, yawning occurs, we count the number of frames the ratio of
we obtain some key points in the mouth to calculate the ratio mouth-height to -width of which is higher than the threshold
of the mouth’s width and height. The equation is rewritten as of 7s. To determine the proportion of these frames, the total
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

number of detections should be greater than 50%. If this is We used Python 3.6, OpenCV 3.4, Tensorflow 1.8, and
established, we consider the driver to be yawning. We count Tkinter 8.6 to build the software environment required for
the number of yawns in one minute and if the number of our experiments.
yawns is more than two times per minute, the driver is said to
be drowsy.
However, according to [46], we can know that when driver
is a condition of boring or tedious activities that increased
yawning frequency. Hence, in order to eliminate the error, we
set weights for these features separately, and then we account
the total of weight, if the total of weight is higher than the
threshold, DriCare will consider the driver is drowsy. (a) The hardware of the im- (b) The main interface of the sys-
We summarize the entire detection process for DriCare in plemented system. tem.
Algorithm 2.
FIGURE 14: Experimental environment and interface of the
Algorithm 2 Fatigue Detection algorithm for DriCare system.
Input: frames of the video
Output: Evaluation of the degree of driver fatigue
Load the frames of video B. EXPERIMENTAL EVALUATION
Assess the states of the eye and mouth We tested the DriCare performance and compared them with
Calculate r the ratio of the frame of eye closure in 1 minute other methods in the same condition.
and t a duration time of eye closure.
Calculate b the frequency of blinking and y the number of 1) Performance of MC-KCF
yawning in 1 minute. The Euclidean distance between the predicted and real values
if r > 30% then of the target border is used to evaluate the performance of
Wr = 1 the tracking algorithms. We compare the MC-KCF algorithm
end if with the other tracking algorithms using different scenarios.
if t > 2s and is not yawning then The main test scenarios are fast motion, target disappears in
Wt =1 the field of vision, and target rotation. The average test results
end if in each scenario are counted as the final experimental results,
if b > 25 or b < 5 then as shown in Figure15(a).
Wb = 1 From Figure15(a), the MC-KCF algorithm demonstrates
end if’ the best tracking accuracy. In a complex environment, the
if y > 2 then accuracy of MC-KCF is nearly 90%. Face tracking in the
Wy = 1 driving environment is simpler than in other environments
end if because the driver’s face moves less and the speed is average.
Calculate T the total value of these weight. (T = Wr + Moreover, the face will be visible in the field of vision. Figure
Wt + Wb + Wy ) 15(b) shows the results of the MC-KCF test performance
if T > 2 then and other tracking algorithms, revealing that the MC-KCF
The driver is drowsy algorithm produces the best performance, with the accuracy
else reaching approximately 95%., when the Euclidean distance
The driver is awake within 20px.
end if As shown in Table 2, we further compare the different
methods in terms of speed. Although the KCF algorithm
offers the highest speed, its accuracy is the worse than those
VI. EXPERIMENTS of MC-KCF and KCF + CNN. The MC-KCF algorithm has
A. DATASETS AND SETUP the best accuracy for face tracking; its accuracy is nearly
Figure 14(a) shows a prototype of DriCare comprising a 20% more than that of the Struck algorithm; however, its
commercial Qihu 360 camera and an Intel Core i7 CPU speed is slightly lower than that of KCF. The MC-KCF
Macbook Laptop at 2.5 GHZ and 16 GB memory to simulate algorithm can process 25 video frames per second, which
the cloud server. Figure 14(b) shows the system interface. meets the requirement of our system. Thus, we consider
We use 10 volunteers to collect the video data captured by that the MC-KCF algorithm performs better and offers the
the vehicle camera. Each volunteer simulates the drowsy and practical requirements for speed and accuracy.
clear driving states. Each video is 1-h long. For the evalu-
ation of drowsiness, we use the CelebA [47], YawDD [48] 2) Performance of detection methods
dataset and volunteer video data to assess the performance of For testing the performance of our evaluation algorithm, we
DriCare. compare our method for evaluating the state of the eye with
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) In the complex environment. FIGURE 16: The comparison of eye state recognition
method.

the height and width of the mouth. The experimental results


show that when the driver is awake, the blinking frequency
and eye-closing time are low. However, when the driver is
tired, the blinking frequency and eye-closing time are high,
and sometimes, the driver will be yawning.

3) Performance of DriCare
To test the performance of our system, we measure the
system in different experimental environments. The result is
shown in Table 3.
TABLE 3: The performance in different environments
(b) In the dricab environment. Number The driving environment Detection rate Frames per second
1 Bright & Glasses 92% 18f ps
FIGURE 15: The accuracy of face tracking in different 2 Bright & No glasses 92.6% 18f ps
3 Darkness & Glasses 91.1% 16f ps
environment. 4 Darkness & No glasses 91.6% 16f ps

TABLE 2: Comparison of other performances


From Table 3, our system provides the best accuracy when
Method The size of The size Accuracy Frames the cab is bright and the driver wears no glasses. If the
the video of the hu- Per
frame man face Second driver wears glasses and the driving environment is slightly
MTCNN 93.2% 3fps dim, the accuracy of fatigue driving is reduced. Regardless
KCF 91% 188fps
DSST 85% 6fps of the environmental condition, the average accuracy of our
1280 × 720 240 × 300
Struck 76% 8fps method is approximately 92%. However, the average pro-
KCF+CNN 93% 26fps
MC-KCF 95% 25fps cessing speed is 18f ps when the environment is bright. When
the environment is dark, the speed is 16f ps.
For now, there are not an image-based public driver
other methods. Figure 16 shows the result, indicating that drowsiness recognition dataset can be used to estimated the
the angle of eye opening is 95.2%, which is the highest efficiency of our method. Therefore, we cannot compare the
among the evaluated methods. Additionally, the closed-eye effectiveness of DriCare with other methods [37]–[39] due
recognition is the highest, at 93.5%. The success rate of to different datasets. So we compare our method with other
identifying a closed eye is significantly improved by our methods which are obtained from a video-based dataset. The
method ; it is 10% more than HoughCircle. result are shown in Table 4.
Figures 17(a) and (b) show that the recognition result of the Table 4 reveals that compared with existing methods such
states of the eye and mouth during drowsiness and otherwise. as Zhang [37], Picot [38] and Akrout [39], the average
The horizontal axis represents the number of video frames; accuracy of DriCare is better than other methods, especially,
and the left vertical axis represents the opening of the eyes, the accuracy of DriCare is 11% more than Picot [38]. Thus,
wherein 1 represents the eye opening and 0 represents the DriCare can meet our requirements in terms of the estimation
eye closing. The right vertical axis represents the ratio of accuracy.
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

REFERENCES
[1] International Organization of Motor Vehicle Manufacturers, “Provi-
sional registrations or sales of new vehicles,” http://www.oica.net/wp-
content/uploads/, 2018.
[2] Wards Intelligence, “World vehicles in operation by country, 2013-2017,”
http://subscribers.wardsintelligence.com/data-browse-world, 2018.
[3] National Highway Traffic Safety Administration, “Traffic safety facts
2016,” https://crashstats.nhtsa.dot.gov., 2018.
[4] Borghini Gianluca and Astolfi Laura et.al, “Measuring neurophysiological
signals in aircraft pilots and car drivers for the assessment of mental
workload, fatigue and drowsiness,” NEUROSCIENCE AND BIOBEHAV-
IORAL REVIEWS, 2014.
[5] Attention Technologies, “S.a.m.g-3-steering attention monitor,”
www.zzzzalert.com, 1999.
[6] Smart Eye, “Smarteye,” https://smarteye.se/, 2018.
[7] Henriques J F, Caseiro R, and Martins P et. al, “High-speed tracking with
kernelized correlation filters,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2015.
(a) In the awake state. [8] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
“Object detection with discriminatively trained part-based models,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9,
pp. 1627–1645, Sept. 2010.
[9] Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for
facial point detection,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, June 2013, pp. 3476–3483.
[10] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 Fps via
regressing local binary features,” in Proc. IEEE Conf. Computer Vision
and Pattern Recognition, June 2014, pp. 1685–1692.
[11] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin, “Extensive facial landmark
localization with coarse-to-fine convolutional network cascade,” in Proc.
IEEE Int. Conf. Computer Vision Workshops, Dec. 2013, pp. 386–391.
[12] Wierwille Walter, “Overview of research on driver drowsiness definition
and driver drowsiness detection,” Proceedings: International Technical
Conference on the Enhanced Safety of Vehicles, vol. 1995, pp. 462–468,
1995.
[13] R. Grace, V. E. Byrne, D. M. Bierman, J.-. Legrand, D. Gricourt, B. K.
Davis, J. J. Staszewski, and B. Carnahan, “A drowsy driver detection
system for heavy vehicles,” in Proc. (Cat. No.98CH36267) 17th DASC
AIAA/IEEE/SAE. Digital Avionics Systems Conf, Oct. 1998, vol. 2, pp.
(b) In the drowsiness state. I36/1–I36/8 vol.2.
[14] L. Li, Y. Chen, and Z. Li, “Yawning detection for monitoring driver
FIGURE 17: The recognition result of eye and mouth in fatigue based on two cameras,” in Proc. 12th Int. IEEE Conf. Intelligent
Transportation Systems, Oct. 2009, pp. 1–6.
different state.
[15] S. Abtahi, B. Hariri, and S. Shirmohammadi, “Driver drowsiness monitor-
ing based on yawning detection,” in Proc. IEEE Int. Instrumentation and
Measurement Technology Conf, May 2011, pp. 1–4.
TABLE 4: Comparison of results between DriCare and other [16] X. Fan, B. Yin, and Y. Sun, “Yawning detection for monitoring driver
state-of-the-art methods fatigue,” in Proc. Int. Conf. Machine Learning and Cybernetics, Aug.
2007, vol. 2, pp. 664–668.
Research Methodology Accuracy, % [17] Forrest N. Iandola and Song Han et. al, “Squeezenet: Alexnet-level
Zhang [37] boost-LBP + SVM 85.9 accuracy with 50x fewer parameters and <0.5mb model size,” ICLR,
Picot [38] blinking feature + EOG 82.1 2017.
Akrout [39] blinking + pose estimation 90.2 [18] Zhang K, Zhang Z, and Li Z et. al, “Joint face detection and alignment us-
DriCare MC-KCF + blinking + yawning 93.6 ing multitask cascaded convolutional networks,” IEEE Signal Processing
Letters, 2016.
[19] Bruce D. Lucas and Takeo Kanade, “An iterative image registration
technique with an application to stereo vision,” in Proceedings of the 7th
VII. CONCLUSION International Joint Conference on Artificial Intelligence - Volume 2, San
Francisco, CA, USA, 1981, IJCAI’81, pp. 674–679, Morgan Kaufmann
We propose a novel system for evaluating the driver’s level Publishers Inc.
of fatigue based on face tracking and facial key point detec- [20] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual
object tracking using adaptive correlation filters,” in Proc. IEEE Computer
tion. We design a new algorithm and propose the MC-KCF
Society Conf. Computer Vision and Pattern Recognition, June 2010, pp.
algorithm to track the driver’s face using CNN and MTCNN 2544–2550.
to improve the original KCF algorithm. We define the facial [21] João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista, “Ex-
regions of detection based on facial key points. Moreover, we ploiting the circulant structure of tracking-by-detection with kernels,”
in Proceedings of the 12th European Conference on Computer Vision
introduce a new evaluation method for drowsiness based on - Volume Part IV, Berlin, Heidelberg, 2012, ECCV’12, pp. 702–715,
the states of the eyes and mouth. Therefore, DriCare is almost Springer-Verlag.
a real-time system as it has a high operation speed. From [22] Yang Li and Jianke Zhu, “A scale adaptive kernel correlation filter tracker
with feature integration,” in Computer Vision - ECCV 2014 Workshops,
the experimental results, DriCare is applicable to different Lourdes Agapito, Michael M. Bronstein, and Carsten Rother, Eds., Cham,
circumstances and can offer stable performance. 2015, pp. 254–265, Springer International Publishing.

VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[23] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Discriminative yawning subline of sprague-dawley rats,” BMC Neuroscience, vol. 18,
scale space tracking,” IEEE Transactions on Pattern Analysis and Machine no. 1, pp. 3, Jan 2017.
Intelligence, vol. 39, no. 8, pp. 1561–1575, Aug. 2017. [46] Axel Franzen, Sebastian Mader, and Fabian Winter, “Contagious yawning,
[24] Naiyan Wang and Dit-Yan Yeung, “Learning a deep compact image empathy, and their relation to prosocial behavior,” Journal of Experimental
representation for visual tracking,” in Proceedings of the 26th International Psychology: General, 05 2018.
Conference on Neural Information Processing Systems - Volume 1, USA, [47] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in
2013, NIPS’13, pp. 809–817, Curran Associates Inc. the wild,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), Dec. 2015,
[25] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Learning pp. 3730–3738.
spatially regularized correlation filters for visual tracking,” in Proc. IEEE [48] Shabnam Abtahi, Mona Omidyeganeh, Shervin Shirmohammadi, and
Int. Conf. Computer Vision (ICCV), Dec. 2015, pp. 4310–4318. Behnoosh Hariri, “Yawdd: A yawning detection dataset,” in Proceedings
[26] C. Ma, J. Huang, X. Yang, and M. Yang, “Robust visual tracking of the 5th ACM Multimedia Systems Conference, New York, NY, USA,
via hierarchical convolutional features,” IEEE Transactions on Pattern 2014, MMSys ’14, pp. 24–28, ACM.
Analysis and Machine Intelligence, p. 1, 2018.
[27] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Convolutional
features for correlation filter based visual tracking,” in Proc. IEEE Int.
Conf. Computer Vision Workshop (ICCVW), Dec. 2015, pp. 621–629.
[28] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. S. Torr,
“End-to-end representation learning for correlation filter based tracking,”
in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR),
July 2017, pp. 5000–5008.
[29] Y. Wu, T. Hassner, K. Kim, G. Medioni, and P. Natarajan, “Facial
landmark detection with tweaked convolutional neural networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no.
12, pp. 3067–3074, Dec. 2018.
[30] M. Kowalski, J. Naruniec, and T. Trzcinski, “Deep alignment network: A
convolutional neural network for robust face alignment,” in Proc. IEEE
Conf. Computer Vision and Pattern Recognition Workshops (CVPRW),
July 2017, pp. 2034–2043.
[31] Davis E. King, “Dlib-ml: A machine learning toolkit,” J. Mach. Learn.
Res., vol. 10, pp. 1755–1758, Dec. 2009.
[32] B. Warwick, N. Symons, X. Chen, and K. Xiong, “Detecting driver
drowsiness using wireless wearables,” in Proc. IEEE 12th Int. Conf.
Mobile Ad Hoc and Sensor Systems, Oct. 2015, pp. 585–588.
[33] G. Li, B. Lee, and W. Chung, “Smartwatch-based wearable EEG system
for driver drowsiness detection,” IEEE Sensors Journal, vol. 15, no. 12,
pp. 7169–7180, Dec. 2015.
[34] S. Jung, H. Shin, and W. Chung, “Driver fatigue and drowsiness monitor-
ing system with embedded electrocardiogram sensor on steering wheel,”
IET Intelligent Transport Systems, vol. 8, no. 1, pp. 43–50, Feb. 2014.
[35] M. Omidyeganeh, A. Javadtalab, and S. Shirmohammadi, “Intelligent
driver drowsiness detection through fusion of yawning and eye closure,” in
Proc. IEEE Int. Conf. Virtual Environments Human-Computer Interfaces
and Measurement Systems, Sept. 2011, pp. 1–6.
[36] A. Dasgupta, D. Rahman, and A. Routray, “A smartphone-based drowsi-
ness detection and warning system for automotive drivers,” IEEE Trans-
actions on Intelligent Transportation Systems, pp. 1–10, 2018.
[37] Yan Zhang and Caijian Hua, “Driver fatigue recognition based on facial
expression analysis using local binary patterns,” Optik, vol. 126, pp. 4501–
4505, 2015.
[38] Antoine Picot, Sylvie Charbonnier, Alice Caplier, and Ngoc-Son Vu,
“Using retina modelling to characterize blinking: comparison between eog
and video analysis,” Machine Vision and Applications, vol. 23, no. 6, pp.
1195–1208, Nov 2012.
[39] Belhassen Akrout and Walid Mahdi, “Spatio-temporal features for the
automatic control of driver drowsiness state and lack of concentration,”
Machine Vision and Applications, vol. 26, no. 1, pp. 1–13, Jan 2015.
[40] R. Oyini Mbouna, S. G. Kong, and M. Chun, “Visual analysis of eye
state and head pose for driver alertness monitoring,” IEEE Transactions
on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1462–1469, Sept.
2013.
[41] Mohamed S. Kamel and Lian Guan, “Histogram equalization utilizing
spatial correlation for image enhancement,” Proceedings of SPIE - The
International Society for Optical Engineering, 1989.
[42] S. Kaplan, M. A. Guvensan, A. G. Yavuz, and Y. Karalurt, “Driver
behavior analysis for safe driving: A survey,” IEEE Transactions on
Intelligent Transportation Systems, vol. 16, no. 6, pp. 3017–3032, Dec.
2015.
[43] R. K. Satzoda and M. M. Trivedi, “Drive analysis using vehicle dynamics
and vision-based lane semantics,” IEEE Transactions on Intelligent
Transportation Systems, vol. 16, no. 1, pp. 9–18, Feb. 2015.
[44] J. Barbizet, “Yawning,” J Neurol Neurosurg Psychiatry, vol. 21, pp. 203–
209, 1958.
[45] Jose R. Eguibar, Carlos A. Uribe, Carmen Cortes, Amando Bautista, and
Andrew C. Gallup, “Yawning reduces facial temperature in the high-

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like