Real-Time Driver-Drowsiness Detection System Using Facial Features
Real-Time Driver-Drowsiness Detection System Using Facial Features
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT The face, an important part of the body, conveys a lot of information. When a driver is in a
state of fatigue, the facial expressions, e.g., the frequency of blinking and yawning, are different from those
in the normal state. In this paper, we propose a system called DriCare, which detects the drivers’ fatigue
status, such as yawning, blinking, and duration of eye closure, using video images, without equipping their
bodies with devices. Owing to the shortcomings of previous algorithms, we introduce a new face-tracking
algorithm to improve the tracking accuracy. Further, we designed a new detection method for facial regions
based on 68 key points. Then we use these facial regions to evaluate the drivers’ state. By combining the
features of the eyes and mouth, DriCare can alert the driver using a fatigue warning. The experimental
results showed that DriCare achieved around 92% accuracy.
INDEX TERMS convolutional neural network, fatigue detection, feature location, face tracking
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
track algorithms based on the correlation filter, their training employ simple formulae and evaluations, which make the
is time-consuming. Hence, these algorithms cannot track the results easier to measure.
object in real-time in a real environment. In this study, we
propose a MC-KCF algorithm based on the correlation filter
and deep learning. This algorithm uses CNN and MTCNN to
offset the KCF’s limitation and uses the KCF to track objects.
Thus, the algorithm can track the driver’s face in real-time
using our system.
Facial landmarks recognition.The purpose of facial key-
points recognition is that getting the crucial information
about locations of eyebrows, eyes, lips and nose in the face.
With the development of deep learning, it is the first time
FIGURE 2: System workflow.
for Sun [9] to introduced DCNN based on CNN to detect
human facial keypoints. This algorithm only recognizes 5
facial keypoints, albeit its speed is very fast. To get a higher
III. DRICARE OVERVIEW
precision for facial key points recognition, Zhou [11] em-
The proposed system, DriCare, is built using a commercial
ployed FACE++ which optimizes DCNN and it can recognize
camera automobile device, a cloud server that processes
68 facial keypoints, but this algorithm includes too much of
video data, and a commercial cellphone that stores the result.
a model and the operation of this algorithm is very com-
Figure 1 shows the structure of the DriCare system. While
plicated. Wu [29] proposed Tweaked Convolutional Neural
driving, the automobile’s camera captures the driver’s portrait
Networks (TCNN) which is based on Gaussian Mixture
and uploads the video stream to the cloud server in real-
Model (GMM) to improve different layers of CNN. How-
time. Then, the cloud server analyzes the video and detects
ever, the robustness of TCNN depends on data excessively.
the driver’s degree of drowsiness. In this stage, three main
Kowalski [30] introduced Deep Alignment Network (DAN)
parts are analyzed: the driver’s face tracking, facial key-
to recognize the facial keypoints, which has better perfor-
region recognition, and driver’s fatigue state. To meet the
mance than other algorithms. Unfortunately, DAN needs vast
real-time performance of the system, we use the MC-KCF
models and calculation based on complicated functions. So in
algorithm to track the driver’s face and recognize the facial
order to meet the requirement about real time performance,
key regions based on key-point detection. Then, the cloud
DriCare uses Dlib [31] to recognize facial keypoints.
server estimates the driver’s state when the states of the
Driver drowsiness detection. Driver drowsiness detection
eyes and mouth change. The workflow is shown in Figure
can be divided into two types: contact approaches [32]–[34]
2. Finally, the cloud server transmits the result to the driver’s
and non-contact approaches [5], [6], [35], [36]. In contact
cellphone and other apps, through which a warning tone is
approaches, drivers wear or touch some devices to get phys-
transmitted if the driver is observed to be drowsy.
iological parameters for detecting the level of their fatigue.
Warwick [32] implemented the BioHarness 3 on the driver’s
IV. DRIVER FACE TRACKING BY MC-KCF
body to collect the data and measure the drowsiness. Li [33] In this section, we illustrate the principle of driver face
used a smartwatch to detect driver drowsiness based on tracking using DriCare. Owing to the complexity of the real
electroencephalographic (EEG) signal. Jung [34] reformed environment, each frame of the video data requires prepro-
the steering wheel and set an embedded sensor to monitor the cessing to meet the tracking requirements.
electrocardiogram (ECG) signal of the driver. However, due
to the price of contact approaches and installation, there are
some limitations which cannot be implemented ubiquitously.
The other method employs a tag-free approaches to detect
the driver drowsiness, where the measured object does not
need to contact the driver. For example, Omidyeganeh et
al. [35] used the driver’s facial appearance captured by the
camera to detect the driver drowsiness, but this method
is not real-time. Zhang [37] used fatigue facial expression (a) Original image. (b) The image after illumi-
nation enhancement.
reorganization based on Local Binary Pattern (LBP) features
and Support Vector Machines (SVM) to estimate the driver FIGURE 3: The result of histogram equalization.
fatigue, but the complexity of this algorithm is bigger than
our algorithm. Moreover, Picot [38] proposed a method that
uses electrooculogram (EOG) signal and blinking feature for A. PRE-PROCESS
drowsiness detection. Akrout [39] and OyiniMbouna [40] During the detection process, the quality of images is affected
used a fusion system for drowsiness detection based on eye and features of the human face become unclear if the illu-
state and head position. Different from these methods, we mination intensity within the cab is changed during driving.
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
This usually occurs in case of overcast skylight, rain, and Then, we calculate the gradient G and gradient orientation
at night. For detection accuracy, we use the illumination en- at each pixel α in the image as shown in Eq. (2):
hancement method to preprocess images before tracking the
Gx (x, y) = H(x + 1, y) − H(x − 1, y)
driver’s face. Furthermore, we use the histogram equalization
Gy (x, y) = H(x, y + 1) − H(x, y − 1)
(HE) algorithm [41] to improve the brightness of the image p (2)
G = Gx (x, y)2 + Gy (x, y)2
frame.
Gx (x,y)
α = arctan( Gy (x,y) ) α ∈ (0◦ , 360◦ )
To determine whether light enhancement is required for
the image frame, DriCare evaluates the brightness of the where H(x, y), Gx (x, y), and Gy (x, y) represent the pixel,
image. Therefore, we convert the RGB image into a YCbCr horizontal gradient, and vertical gradient values at (x, y),
image because in the YCbCr color space, Y represents the respectively.
luminance value. We use Eq. (1) for the mean value of Y M Then, we segment the image into n × n cells. According
around the driver’s face in the image as follows: to [8], the gradient orientation is categorized into either 9
Pn bins of contrast-sensitive orientations or 18 bins of contrast-
i L insensitive orientations. If any pixel in a cell belongs to the
M= (1)
n−i corresponding orientation, the value of the orientation bin
increases 1. Finally, each cell has 9-dimensional and 18-
where L denotes the luminance value of each pixel in the dimensional histograms.
YCbCr space, and n and i represent the first and last serial The gradient of each cell is related to the internal pixels
numbers of the driver’s facial pixels in the image, respec- and the 4 cells around it. After calculating the gradient
tively. n−i is the total number of driver’s facial pixels. If M is histogram, we use Eq. (3) and (4) for normalization and
lower than the threshold, the image enhances the illumination truncation.
using the HE algorithm. Otherwise, the image is retained.
Na,b (i, j) =(C(i, j)2 + C(i + a, j)2
After counting large samples, we set the threshold to 60. 1 (3)
Figure 3 shows the result of the illumination enhancement. + C(i + a, j + b)2 + C(i, j + b)2 ) 2
Tα (C(i, j)/N−1,−1 (i, j))
Tα (C(i, j)/N+1,−1 (i, j))
H(i, j) =
Tα (C(i, j)/N+1,+1 (i, j)) (4)
Tα (C(i, j)/N−1,+1 (i, j))
In Eq. (3), C(i, j) denotes the 9- or 18-dimensional eigenvec-
tor of the cell at (i, j), Na,b (i, j) represents the normalization
factor and a, b represent that the number of different normal-
ization factors, a, b ∈ {−1, 1}. In Eq. (4), H(i, j) is a feature
(a) The right result of (b) The tracking win- (c) The tracking
tracking. dow drifting. result of MC-KCF
vector, and Tα (x) denotes the truncated function. If the value
algorithm in x is bigger than α, the value is assigned to α.
After normalization and truncation, the 9-dimensional fea-
FIGURE 4: The performance of original KCF and MC-KCF ture vector becomes a 36-dimensional feature vector. The 18-
algorithm. dimensional eigenvector becomes a 72-dimensional feature
vector; in total, there are 108-dimensional feature vectors.
Then, we arrange this eigenvector with reference to the
B. THE PRINCIPLE OF MC-KCF matrix of 4 × 27. Finally, we obtain 31-dimensional HOG
features, named the FHOG feature, using matrix addition.
As previously discussed, the KCF algorithm is based on the
FHOG feature [7]. Therefore, in a complex environment and 2) CNN feature exacted by SqueezeNet 1.1
during long-term operation, the tracking window will drift,
SqueezeNet is a small CNN architecture [17] with very fast
as shown in Figure 4(b). We propose the MC-KCF algorithm
operation. Figure 5 shows the architecture of SqueezeNet
instead of the original KCF algorithm to track a human
1.1, which includes a standalone convolution layer (conv1),
face. In the MC-KCF algorithm, a new feature comprises the
3 max-pooling layers, 8 fire modules (Fire2 − 9), a final
FHOG feature and is based on the KCF algorithm and CNN
convolution layer (conv10), and one global average pool
feature. We will explain the principle of FHOG and CNN
layer.
feature execution and how these features are integrated.
SqueezeNet uses the fire module instead of the traditional
convolution layer to reduce the network parameters and
1) FHOG feature exaction improve accuracy. The fire module comprises a squeeze
In our algorithm, the FHOG feature [8] is a key factor for convolution layer of 1 × 1 filters, feeding into an expand
human-face tracking. To extract the FHOG feature and for layer with a mix of 1 × 1 and 3 × 3 convolution filters,
easy calculation, the image is grayed before commencing. similar to that shown in Figure 6. Three tunable dimensions
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
3) Multi-feature fusion
To avoid large and redundant CNN features, DriCare uses the
CNN feature obtained from the 1 × 1 convolution filter in the
expand layer of Fires 5 and 9 of the SqueezeNet model. After
conducting the feature extraction of a D1 ×G1 original image
using the MC-KCF algorithm, we obtain a FHOG feature
sized D2 × G2 and two CNN features sized D3 × G3 and
D4 × G4 . Obviously, the sizes of the three features differ.
Therefore, we adjust them so that they have the same size.
Thus, the adjustment equation is written as follows:
D = Da × θ
(5)
G = Ga × ϕ
where D and G denote the standard length and width, respec-
FIGURE 7: The face tracking result in different weights of
tively. Da and Ga represent the original length and width
features.
of the three features, respectively. θ and ϕ are the scaling
factors.
Therefore, the total dimension of the CNN features is 384,
Similar to the structure of the KCF algorithm, in the MC-
bigger than the dimension of the original FHOG feature,
KCF algorithm, we use each feature to train their classifiers
which is a 31-dimensional HOG feature. Since the modified
separately using the kernel ridge regression, which is written
object is small in some frames, we update the model of every
as follows [7]:
N frame to increase the computing speed of the model and
~yˆ improve the real-time performance of the system. We set N
α ~ˆ = ~y (K + λI)−1 =
~ =α (6) value as 3. The whole process is shown in Figure (8).
kˆxx + λ
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
follows:
( p
H = p(yr − ye )2 + (xr − xe )2
p
dij = (yj − yi )2 + (xj − xi )2
d2ab +d2ac −d2bc (11) W = (yu − yv )+ (xu − xv )2 (12)
A = (arccos 2×d ab ×dac
)/π × 180◦ H
f=W
In Eq. (11), di j is the distance between Points i and j. In Eq. (12), f is the ratio of the mouth’s width and height in
(xi , yi ) and (xj , yj ) represent the coordinates of Points i one image frame. H represents the height of the mouth, and
and j in the frame, respectively. In our system, (xi , yi ) and W denotes the width of the mouth. (xr , yr ) is the coordinate
(xj , yj ) are the two average coordinate values of two points of the vermilion tubercle, and (xe , ye ) is the lowest point in
in the eyelids. A is the angle of the eye. When we obtain the the lower low lip. (xu , yu ) and (xv , yv ) are two coordinates
result, if the result is bigger than the threshold, DriCare will of the angulus oris. If f is larger than the threshold, DriCare
consider if the state of the eye is opening, and vice versa. We will consider that the driver is opening the mouth, and vice
analyze the large number of samples. In the eye-closed state, versa. According to the measurement for a large number of
the angle of the eye is lower than 20◦ . Therefore, we set the samples, in our paper, we set the threshold to 0.6.
threshold at 20◦ . In practice, the opening of the mouth may resemble other
We assess the driver’s degree of fatigue from three per- situations, such as singing, speaking, and laughing. These
spectives based on the angle of the eye: (1) the proportion phenomena present the same results. To reduce errors, we
of the number of closed-eye frames to the total number of draw the diagram of curves using the width-height ratio of
frames in 1 minute, (2) continuous time of eye closure, and the mouth obtained in each frame as shown in Figure (13).
(3) frequency of blinking. According to [42], [43], when From this illustration, when the driver is yawning, the mouth
the driver is awake, the proportion of closed-eye frames is
less than 30%. Moreover, the driver’s closure time for a
single eye is shorter when he or she is awake, so when the
driver’s single eye closure time exceeds 2s, the driver is
considered fatigued. Besides, when people are awake, they
blink an average of 10 to 15 times per minute. However,
when people are mildly tired, the number of blinks increase;
in case of severe fatigue, the number of blinks will be lower
because the eyes are closed most of the time. To detect fatigue
based on the frequency of blinking, it is necessary to count
the blinking frequency of the eyes within 1 minute. If the
blinking frequency is greater than 25 times/min or lower than
5 times/min, fatigue is indicated.
will open continuously for a longer time, and the wave peaks
are wider. Otherwise, when the driver is speaking, the mouth
opens continuously for a shorter time, and the wave peaks are
narrower. Hence, we use Eq. (13) to calculate the duration
time ratio R of the opening mouth, which can discriminate
(a) The angle of eye opening. (b) The width and height of
the mouth.
actions such as yawning and speaking. The equation is writ-
ten as follows:
n
FIGURE 12: The state recognition for eye and mouth. R= × 100% (13)
m
where m represents the number of frames for some time.
n is the number of the frame, and f exceeds the threshold.
2) Mouth status recognition According to [44], [45], we know that the whole yawning
For the detection of fatigued driving, the features of the process lasts for 7s in general, and the video frames, the
mouth are important because when a driver is drowsy, con- f of which is higher than the threshold, are approximately
tinuous yawns will occur. Therefore, DriCare uses these 3 ∼ 4s. Therefore, we set R to 50%. When judging whether
features to measure the accuracy of evaluation. In Section 4, yawning occurs, we count the number of frames the ratio of
we obtain some key points in the mouth to calculate the ratio mouth-height to -width of which is higher than the threshold
of the mouth’s width and height. The equation is rewritten as of 7s. To determine the proportion of these frames, the total
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
number of detections should be greater than 50%. If this is We used Python 3.6, OpenCV 3.4, Tensorflow 1.8, and
established, we consider the driver to be yawning. We count Tkinter 8.6 to build the software environment required for
the number of yawns in one minute and if the number of our experiments.
yawns is more than two times per minute, the driver is said to
be drowsy.
However, according to [46], we can know that when driver
is a condition of boring or tedious activities that increased
yawning frequency. Hence, in order to eliminate the error, we
set weights for these features separately, and then we account
the total of weight, if the total of weight is higher than the
threshold, DriCare will consider the driver is drowsy. (a) The hardware of the im- (b) The main interface of the sys-
We summarize the entire detection process for DriCare in plemented system. tem.
Algorithm 2.
FIGURE 14: Experimental environment and interface of the
Algorithm 2 Fatigue Detection algorithm for DriCare system.
Input: frames of the video
Output: Evaluation of the degree of driver fatigue
Load the frames of video B. EXPERIMENTAL EVALUATION
Assess the states of the eye and mouth We tested the DriCare performance and compared them with
Calculate r the ratio of the frame of eye closure in 1 minute other methods in the same condition.
and t a duration time of eye closure.
Calculate b the frequency of blinking and y the number of 1) Performance of MC-KCF
yawning in 1 minute. The Euclidean distance between the predicted and real values
if r > 30% then of the target border is used to evaluate the performance of
Wr = 1 the tracking algorithms. We compare the MC-KCF algorithm
end if with the other tracking algorithms using different scenarios.
if t > 2s and is not yawning then The main test scenarios are fast motion, target disappears in
Wt =1 the field of vision, and target rotation. The average test results
end if in each scenario are counted as the final experimental results,
if b > 25 or b < 5 then as shown in Figure15(a).
Wb = 1 From Figure15(a), the MC-KCF algorithm demonstrates
end if’ the best tracking accuracy. In a complex environment, the
if y > 2 then accuracy of MC-KCF is nearly 90%. Face tracking in the
Wy = 1 driving environment is simpler than in other environments
end if because the driver’s face moves less and the speed is average.
Calculate T the total value of these weight. (T = Wr + Moreover, the face will be visible in the field of vision. Figure
Wt + Wb + Wy ) 15(b) shows the results of the MC-KCF test performance
if T > 2 then and other tracking algorithms, revealing that the MC-KCF
The driver is drowsy algorithm produces the best performance, with the accuracy
else reaching approximately 95%., when the Euclidean distance
The driver is awake within 20px.
end if As shown in Table 2, we further compare the different
methods in terms of speed. Although the KCF algorithm
offers the highest speed, its accuracy is the worse than those
VI. EXPERIMENTS of MC-KCF and KCF + CNN. The MC-KCF algorithm has
A. DATASETS AND SETUP the best accuracy for face tracking; its accuracy is nearly
Figure 14(a) shows a prototype of DriCare comprising a 20% more than that of the Struck algorithm; however, its
commercial Qihu 360 camera and an Intel Core i7 CPU speed is slightly lower than that of KCF. The MC-KCF
Macbook Laptop at 2.5 GHZ and 16 GB memory to simulate algorithm can process 25 video frames per second, which
the cloud server. Figure 14(b) shows the system interface. meets the requirement of our system. Thus, we consider
We use 10 volunteers to collect the video data captured by that the MC-KCF algorithm performs better and offers the
the vehicle camera. Each volunteer simulates the drowsy and practical requirements for speed and accuracy.
clear driving states. Each video is 1-h long. For the evalu-
ation of drowsiness, we use the CelebA [47], YawDD [48] 2) Performance of detection methods
dataset and volunteer video data to assess the performance of For testing the performance of our evaluation algorithm, we
DriCare. compare our method for evaluating the state of the eye with
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
(a) In the complex environment. FIGURE 16: The comparison of eye state recognition
method.
3) Performance of DriCare
To test the performance of our system, we measure the
system in different experimental environments. The result is
shown in Table 3.
TABLE 3: The performance in different environments
(b) In the dricab environment. Number The driving environment Detection rate Frames per second
1 Bright & Glasses 92% 18f ps
FIGURE 15: The accuracy of face tracking in different 2 Bright & No glasses 92.6% 18f ps
3 Darkness & Glasses 91.1% 16f ps
environment. 4 Darkness & No glasses 91.6% 16f ps
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
REFERENCES
[1] International Organization of Motor Vehicle Manufacturers, “Provi-
sional registrations or sales of new vehicles,” http://www.oica.net/wp-
content/uploads/, 2018.
[2] Wards Intelligence, “World vehicles in operation by country, 2013-2017,”
http://subscribers.wardsintelligence.com/data-browse-world, 2018.
[3] National Highway Traffic Safety Administration, “Traffic safety facts
2016,” https://crashstats.nhtsa.dot.gov., 2018.
[4] Borghini Gianluca and Astolfi Laura et.al, “Measuring neurophysiological
signals in aircraft pilots and car drivers for the assessment of mental
workload, fatigue and drowsiness,” NEUROSCIENCE AND BIOBEHAV-
IORAL REVIEWS, 2014.
[5] Attention Technologies, “S.a.m.g-3-steering attention monitor,”
www.zzzzalert.com, 1999.
[6] Smart Eye, “Smarteye,” https://smarteye.se/, 2018.
[7] Henriques J F, Caseiro R, and Martins P et. al, “High-speed tracking with
kernelized correlation filters,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2015.
(a) In the awake state. [8] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
“Object detection with discriminatively trained part-based models,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9,
pp. 1627–1645, Sept. 2010.
[9] Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for
facial point detection,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, June 2013, pp. 3476–3483.
[10] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 Fps via
regressing local binary features,” in Proc. IEEE Conf. Computer Vision
and Pattern Recognition, June 2014, pp. 1685–1692.
[11] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin, “Extensive facial landmark
localization with coarse-to-fine convolutional network cascade,” in Proc.
IEEE Int. Conf. Computer Vision Workshops, Dec. 2013, pp. 386–391.
[12] Wierwille Walter, “Overview of research on driver drowsiness definition
and driver drowsiness detection,” Proceedings: International Technical
Conference on the Enhanced Safety of Vehicles, vol. 1995, pp. 462–468,
1995.
[13] R. Grace, V. E. Byrne, D. M. Bierman, J.-. Legrand, D. Gricourt, B. K.
Davis, J. J. Staszewski, and B. Carnahan, “A drowsy driver detection
system for heavy vehicles,” in Proc. (Cat. No.98CH36267) 17th DASC
AIAA/IEEE/SAE. Digital Avionics Systems Conf, Oct. 1998, vol. 2, pp.
(b) In the drowsiness state. I36/1–I36/8 vol.2.
[14] L. Li, Y. Chen, and Z. Li, “Yawning detection for monitoring driver
FIGURE 17: The recognition result of eye and mouth in fatigue based on two cameras,” in Proc. 12th Int. IEEE Conf. Intelligent
Transportation Systems, Oct. 2009, pp. 1–6.
different state.
[15] S. Abtahi, B. Hariri, and S. Shirmohammadi, “Driver drowsiness monitor-
ing based on yawning detection,” in Proc. IEEE Int. Instrumentation and
Measurement Technology Conf, May 2011, pp. 1–4.
TABLE 4: Comparison of results between DriCare and other [16] X. Fan, B. Yin, and Y. Sun, “Yawning detection for monitoring driver
state-of-the-art methods fatigue,” in Proc. Int. Conf. Machine Learning and Cybernetics, Aug.
2007, vol. 2, pp. 664–668.
Research Methodology Accuracy, % [17] Forrest N. Iandola and Song Han et. al, “Squeezenet: Alexnet-level
Zhang [37] boost-LBP + SVM 85.9 accuracy with 50x fewer parameters and <0.5mb model size,” ICLR,
Picot [38] blinking feature + EOG 82.1 2017.
Akrout [39] blinking + pose estimation 90.2 [18] Zhang K, Zhang Z, and Li Z et. al, “Joint face detection and alignment us-
DriCare MC-KCF + blinking + yawning 93.6 ing multitask cascaded convolutional networks,” IEEE Signal Processing
Letters, 2016.
[19] Bruce D. Lucas and Takeo Kanade, “An iterative image registration
technique with an application to stereo vision,” in Proceedings of the 7th
VII. CONCLUSION International Joint Conference on Artificial Intelligence - Volume 2, San
Francisco, CA, USA, 1981, IJCAI’81, pp. 674–679, Morgan Kaufmann
We propose a novel system for evaluating the driver’s level Publishers Inc.
of fatigue based on face tracking and facial key point detec- [20] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual
object tracking using adaptive correlation filters,” in Proc. IEEE Computer
tion. We design a new algorithm and propose the MC-KCF
Society Conf. Computer Vision and Pattern Recognition, June 2010, pp.
algorithm to track the driver’s face using CNN and MTCNN 2544–2550.
to improve the original KCF algorithm. We define the facial [21] João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista, “Ex-
regions of detection based on facial key points. Moreover, we ploiting the circulant structure of tracking-by-detection with kernels,”
in Proceedings of the 12th European Conference on Computer Vision
introduce a new evaluation method for drowsiness based on - Volume Part IV, Berlin, Heidelberg, 2012, ECCV’12, pp. 702–715,
the states of the eyes and mouth. Therefore, DriCare is almost Springer-Verlag.
a real-time system as it has a high operation speed. From [22] Yang Li and Jianke Zhu, “A scale adaptive kernel correlation filter tracker
with feature integration,” in Computer Vision - ECCV 2014 Workshops,
the experimental results, DriCare is applicable to different Lourdes Agapito, Michael M. Bronstein, and Carsten Rother, Eds., Cham,
circumstances and can offer stable performance. 2015, pp. 254–265, Springer International Publishing.
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2936663, IEEE Access
[23] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Discriminative yawning subline of sprague-dawley rats,” BMC Neuroscience, vol. 18,
scale space tracking,” IEEE Transactions on Pattern Analysis and Machine no. 1, pp. 3, Jan 2017.
Intelligence, vol. 39, no. 8, pp. 1561–1575, Aug. 2017. [46] Axel Franzen, Sebastian Mader, and Fabian Winter, “Contagious yawning,
[24] Naiyan Wang and Dit-Yan Yeung, “Learning a deep compact image empathy, and their relation to prosocial behavior,” Journal of Experimental
representation for visual tracking,” in Proceedings of the 26th International Psychology: General, 05 2018.
Conference on Neural Information Processing Systems - Volume 1, USA, [47] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in
2013, NIPS’13, pp. 809–817, Curran Associates Inc. the wild,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), Dec. 2015,
[25] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Learning pp. 3730–3738.
spatially regularized correlation filters for visual tracking,” in Proc. IEEE [48] Shabnam Abtahi, Mona Omidyeganeh, Shervin Shirmohammadi, and
Int. Conf. Computer Vision (ICCV), Dec. 2015, pp. 4310–4318. Behnoosh Hariri, “Yawdd: A yawning detection dataset,” in Proceedings
[26] C. Ma, J. Huang, X. Yang, and M. Yang, “Robust visual tracking of the 5th ACM Multimedia Systems Conference, New York, NY, USA,
via hierarchical convolutional features,” IEEE Transactions on Pattern 2014, MMSys ’14, pp. 24–28, ACM.
Analysis and Machine Intelligence, p. 1, 2018.
[27] M. Danelljan, G. HÃd’ger, F. S. Khan, and M. Felsberg, “Convolutional
features for correlation filter based visual tracking,” in Proc. IEEE Int.
Conf. Computer Vision Workshop (ICCVW), Dec. 2015, pp. 621–629.
[28] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. S. Torr,
“End-to-end representation learning for correlation filter based tracking,”
in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR),
July 2017, pp. 5000–5008.
[29] Y. Wu, T. Hassner, K. Kim, G. Medioni, and P. Natarajan, “Facial
landmark detection with tweaked convolutional neural networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no.
12, pp. 3067–3074, Dec. 2018.
[30] M. Kowalski, J. Naruniec, and T. Trzcinski, “Deep alignment network: A
convolutional neural network for robust face alignment,” in Proc. IEEE
Conf. Computer Vision and Pattern Recognition Workshops (CVPRW),
July 2017, pp. 2034–2043.
[31] Davis E. King, “Dlib-ml: A machine learning toolkit,” J. Mach. Learn.
Res., vol. 10, pp. 1755–1758, Dec. 2009.
[32] B. Warwick, N. Symons, X. Chen, and K. Xiong, “Detecting driver
drowsiness using wireless wearables,” in Proc. IEEE 12th Int. Conf.
Mobile Ad Hoc and Sensor Systems, Oct. 2015, pp. 585–588.
[33] G. Li, B. Lee, and W. Chung, “Smartwatch-based wearable EEG system
for driver drowsiness detection,” IEEE Sensors Journal, vol. 15, no. 12,
pp. 7169–7180, Dec. 2015.
[34] S. Jung, H. Shin, and W. Chung, “Driver fatigue and drowsiness monitor-
ing system with embedded electrocardiogram sensor on steering wheel,”
IET Intelligent Transport Systems, vol. 8, no. 1, pp. 43–50, Feb. 2014.
[35] M. Omidyeganeh, A. Javadtalab, and S. Shirmohammadi, “Intelligent
driver drowsiness detection through fusion of yawning and eye closure,” in
Proc. IEEE Int. Conf. Virtual Environments Human-Computer Interfaces
and Measurement Systems, Sept. 2011, pp. 1–6.
[36] A. Dasgupta, D. Rahman, and A. Routray, “A smartphone-based drowsi-
ness detection and warning system for automotive drivers,” IEEE Trans-
actions on Intelligent Transportation Systems, pp. 1–10, 2018.
[37] Yan Zhang and Caijian Hua, “Driver fatigue recognition based on facial
expression analysis using local binary patterns,” Optik, vol. 126, pp. 4501–
4505, 2015.
[38] Antoine Picot, Sylvie Charbonnier, Alice Caplier, and Ngoc-Son Vu,
“Using retina modelling to characterize blinking: comparison between eog
and video analysis,” Machine Vision and Applications, vol. 23, no. 6, pp.
1195–1208, Nov 2012.
[39] Belhassen Akrout and Walid Mahdi, “Spatio-temporal features for the
automatic control of driver drowsiness state and lack of concentration,”
Machine Vision and Applications, vol. 26, no. 1, pp. 1–13, Jan 2015.
[40] R. Oyini Mbouna, S. G. Kong, and M. Chun, “Visual analysis of eye
state and head pose for driver alertness monitoring,” IEEE Transactions
on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1462–1469, Sept.
2013.
[41] Mohamed S. Kamel and Lian Guan, “Histogram equalization utilizing
spatial correlation for image enhancement,” Proceedings of SPIE - The
International Society for Optical Engineering, 1989.
[42] S. Kaplan, M. A. Guvensan, A. G. Yavuz, and Y. Karalurt, “Driver
behavior analysis for safe driving: A survey,” IEEE Transactions on
Intelligent Transportation Systems, vol. 16, no. 6, pp. 3017–3032, Dec.
2015.
[43] R. K. Satzoda and M. M. Trivedi, “Drive analysis using vehicle dynamics
and vision-based lane semantics,” IEEE Transactions on Intelligent
Transportation Systems, vol. 16, no. 1, pp. 9–18, Feb. 2015.
[44] J. Barbizet, “Yawning,” J Neurol Neurosurg Psychiatry, vol. 21, pp. 203–
209, 1958.
[45] Jose R. Eguibar, Carlos A. Uribe, Carmen Cortes, Amando Bautista, and
Andrew C. Gallup, “Yawning reduces facial temperature in the high-
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.