A Fatigue Driving Detection Algorithm Based On Facial Multi-Feature Fusion
A Fatigue Driving Detection Algorithm Based On Facial Multi-Feature Fusion
Received May 16, 2020, accepted May 26, 2020, date of publication June 1, 2020, date of current version June 10, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2998363
ABSTRACT Researches on machine vision-based driver fatigue detection algorithm have improved traffic
safety significantly. Generally, many algorithms do not analyze driving state from driver characteristics.
It results in some inaccuracy. The paper proposes a fatigue driving detection algorithm based on facial multi-
feature fusion combining driver characteristics. First, we introduce an improved YOLOv3-tiny convolutional
neural network to capture the facial regions under complex driving conditions, eliminating the inaccuracy and
affections caused by artificial feature extraction. Second, on the basis of the Dlib toolkit, we introduce the Eye
Feature Vector(EFV) and Mouth Feature Vector(MFV), which are the evaluation parameters of the driver’s
eye state and mouth state, respectively. Then, the driver identity information library is constructed by offline
training, including driver eye state classifier library, driver mouth state classifier library, and driver biometric
library. Finally, we construct the driver identity verification model and the driver fatigue assessment model
by online assessment. After passing the identity verification, calculate the driver’s closed eyes time, blink
frequency and yawn frequency to evaluate the driver’s fatigue state. In simulated driving applications, our
algorithm detects the fatigue state at a speed of over 20fps with an accuracy of 95.10%.
INDEX TERMS Traffic safety and environment, fatigue driving detection, machine vision, convolutional
neural network.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
101244 VOLUME 8, 2020
K. Li et al.: Fatigue Driving Detection Algorithm Based on Facial Multi-Feature Fusion
The Road Traffic Safety Law of China clearly states that and eye closing time. After a large number of experiments
if drivers drive for more than 4 hours without rest, he will and verifications, the Carnegie Mellon Institute in the United
be considered to be fatigued driving [5], [6]. If the driver States proposed the PERCLOS method for measuring fatigue.
has excessive fatigue driving behavior, the traffic control The method is considered to be the most reliable and effective
department can conduct it penalties and deduction of driv- fatigue determination method at present. The driver ’s eye
ing license. Although the regulation can reduce the driver’s closure can be obtained based on this parameter to judge its
excessive fatigue driving behavior to a certain extent, giv- fatigue state [17].
ing fatigue warnings at critical times can greatly reduce the 3. Fatigue is judged by the degree of opening and closing of
occurrence of traffic accidents caused by fatigue driving. the mouth: the fatigue is judged according to the different per-
Especially for drivers engaged in long-distance passenger formance of the driver’s mouth state when speaking normally
transportation and freight transportation, they need to drive and yawning. First obtain the image of the mouth through the
motor vehicles continuously for a long time due to work video capture device, and then import the opening and closing
requirements. However, it is difficult to maintain a high alert features of the mouth Neural network system to judge fatigue
state all the time while driving a vehicle. Therefore, real-time based on the duration of mouth opening and closing [18].
detection and alarm of fatigue status is even more important. It is a non-contact method to determine fatigue based on
At present, the detection of fatigue driving is mainly the driver’s facial features [19], [20]. It does not cause inter-
divided into subjective method and objective method [7], [8]. ference and impact on the driver while driving the vehicle,
The subjective method is based on a questionnaire survey. and has the advantages of fast speed and strong operability.
Well-known questionnaires include the Stanford Sleep Scale, Therefore, compared with the other two types of fatigue
Pearson Fatigue Scale, Driver Record Form, and Cooper- driving detection methods, this method is currently the most
Harper Evaluation Questionnaire, which include subjective concerned and widely used.
load assessment,sleep habits table and so on. The question- Universities, research institutes and enterprises have con-
naire survey is based on the driver’s subjective thinking to ducted long-term and in-depth research on fatigue driving
answer the questions in the questionnaire. It has a strong testing. Abe et al. [21] obtained the best results of fatigue
subjectivity, so it cannot be used as a standard method for detection by studying the relationship between eye state
detecting fatigued driving. and fatigue. The eye characteristics studied by them mainly
The objective method is to use the auxiliary tools to detect include eye opening and closing, eye movement, pupil, etc.
the driver’s physiological characteristics or monitor the vehi- Devi and Bajaj [22] proposed a fatigue detection model,
cle information, etc., and to judge fatigue driving [9], [10]. in which the system first locates the face of the driver image,
Mainly divided into three categories: and then extracts the mouth and eye features in the localized
Fatigue detection based on physiological characteristics face area. Finally, through the fatigue detection system, the
Studies have shown that as the degree of fatigue increases, features are comprehensively processed to determine whether
the physiological indicators of the human body will gradually the driver is fatigued. By studying the positioning and track-
deviate from the normal value [11].Therefore, fatigue can be ing of the eyes under infrared light at night, Singh et al. [23]
judged according to the change of the driver’s physiological analyzed the relationship between the eye’s state and fatigue.
characteristics. Common features include EEG, ECG, and Haro et al. [24] proposed the use of infrared light for eye
EMG. Among them, EEG is regarded as the ‘‘gold standard’’ positioning and Kalman filtering for eye tracking, which
for detecting fatigue [12]. is highly robust. Coetzer and Hancke [25] found that the
Fatigue detection based on vehicle behavior Adaboost algorithm has certain advantages in some aspects of
characteristics face detection, after comparing the Adaboost algorithm with
This type of detection method mainly collects and analyzes ANN and SVM.
the relevant information of the vehicle itself during the driv- Although the technology of fatigue detection has made
ing process of the vehicle, to determine whether the driver of better progress and results, it still need to be improved.
the vehicle is fatigued [13], [14]. Detection methods based on physiology and behavior
Fatigue detection based on facial features usually require the driver to wear or additionally install
When the human body is in a fatigue state and a non- more physiological information monitoring devices, which
fatigue state, part of its body parts will show very different affects the comfort of the driver’s normal driving. Moreover,
performances [15], [16]. For example, in the fatigue state, the equipment that collects physiological information is often
human body may experience head droop, body tilt, increased expensive and vulnerable, which is not conducive to the
blink frequency, and yawning. popularization of fatigue driving detection systems.
1. Use the position of the head to determine fatigue: when The detection method based on vision usually uses
the driver is tired, his head may appear tilted, swaying, etc., Adaboost Classifier Algorithm for face localization [26],
so the driver’s head movement can be detected to determine [27]. However, when the driver wears glasses or sunglasses,
whether he is fatigued. light changes, and the face is partially occluded, Adaboost
2. Use the state of the eyes to determine fatigue: detect cannot accurately locate the face position and promptly warn
fatigue through eye characteristics such as blinking frequency of fatigue driving.
At present, the common algorithms judge fatigue by state verification model and the fatigue assessment model, and how
of the driver’s eyes and mouth. However, these algorithms to use the model to judge the fatigue state.
do not take driver’s individual characteristics into account. The third chapter is the experimental analysis. Firstly, the
In fact, the algorithm has high misjudgment, using a fixed experimental environment and data set are introduced. Then
threshold to determine the state of the eyes and mouth. we use qualitative description and quantitative evaluation to
As discussed above literatures, results of the driving fatigue measure face detection and feature point location. Finally,
detection have defect of high intrusion, low robustness, and we evaluate our fatigue driving detection algorithm in two
low reliability. Therefore, we propose a new algorithm. The directions: accuracy and real-time.
innovations are as follows: The fourth chapter is the conclusion, which mainly sum-
We design a driver’s face detection architecture based on marizes the main work content of this paper, analyzes the
the improved YOLOv3-tiny convolutional neural network, shortcomings of the system and the aspects that need to
and trains the network with the open-source dataset WIDER be improved. And then, we propose the future optimization
FACE [28]. Compared with other deep learning algorithms, direction and prospect of the algorithm.
such as YOLOv3 [29], MTCNN [30], the algorithm based on
the improved YOLOv3-tiny network improve the face recog- II. METHODOLOGY
nition accuracy, simplify the network structure, and reduce As shown in Figure 1, our algorithm includes the following 3
the amount of calculation. And then, it is more convenient to modules.
transplant to the mobile. Identity Entry: Firstly, we use the camera to collect driver
Most of the existing algorithms are based on the PER- biometric images, eye classification images and mouth clas-
CLOS, which uses the driver’s eyes state as a feature to sification images. Based on deep learning theory, we apply
judge fatigue. In fact, when the driver’s eyes are too small, the improved YOLOv3-tiny network to locate suspected face
the algorithm has high misjudgment. Similarly, the algo- regions from complex backgrounds. Secondly, according to
rithm based on yawn frequency is also related to the size the driver’s face regions coordinates, the Dlib toolkit is
of the driver’s mouth [31], [32]. Therefore, we design the used to extract facial feature points coordinates, by which
eye and mouth SVM classifier, which takes driver charac- we calculate 128-dimensional Feature Vector, Eye Feature
teristics into account driver characteristics. It judges fatigue Vector (EFV), and Mouth Feature Vector (MFV) of the
based on the actual driver’s eyes and mouth size. It has high driver’s face in the image. Then, to get the eye state classifier
accuracy. that takes driver characteristics into account, support vector
The existing machine learning algorithms that consider machines (SVM) are trained with the eye feature vector in the
individual characteristics often train classifiers by initializa- open-eye image and the closed-eye image. The same goes for
tion before the system starts, which requires re-initialization the mouth. Finally, the driver’s biometric, eye state classifier,
every time the driver is changed. Not only is it a waste of time, and mouth state classifier are stored in the driver’s identity
it also does not ensure that every initialization works well. information library.
Therefore, we constructed the driver identity information Identity Verification: Firstly, we use the camera to collect
library. There are three types of driver identity information in images including driver biometric, and based on deep learn-
the system: driver biometrics, driver eye state classifier, and ing theory, we apply the improved YOLOv3-tiny network
driver mouth state classifier. We train classifiers in advance to locate suspected face regions from complex backgrounds.
and store them into the driver identity information library. Secondly, according to the driver’s face regions coordinates,
Then, through identity verification, the driver’s classifiers the Dlib toolkit is used to extract facial feature points coordi-
are called before system startup. It not only simplifies the nates, by which we calculate 128-dimensional Feature Vector
initialization, but also avoids inaccuracies due to entering the of the driver’s face in the image. Then, it is compared with all
identity manually. the stored driver biometric in the driver identity information
This paper is divided into the following 4 parts. library. Finally, according to the comparison result, the eye
The first chapter is the introduction, which mainly intro- classifier and mouth classifier of the corresponding driver are
duces the background and research significance of our fatigue called for online recognition.
driving detection system, and briefly expounds the research Online Recognition: The original data source is the real-
status of domestic and foreign fatigue driving detection. time camera video. Firstly, based on deep learning theory,
Moreover, according to the shortcomings of current research, we apply the improved YOLOv3-tiny network to extract
we propose a new algorithm. Finally, we introduce the inno- suspected face regions from complex backgrounds. Secondly,
vation of our algorithm. according to the driver’s face regions coordinates, we use the
The second chapter is the introduction of the algorithm. Dlib toolkit extract the driver’s eye and mouth coordinates,
Firstly, we use the improved YOLOv3-tiny network for face by which we calculate the driver’s EFV and MFV in real
detection. Secondly, we introduce how to combine the Dlib time during driving. Then, according to the EFV, we use the
toolkit to extract facial feature parameters. Then, it is intro- eye state classifier obtained by identity verification to judge
duced how to establish the driver identity information library. the driver’s eye state. The same goes for the mouth. Finally,
Finally, we introduce how to construct the driver’s identity based on the eye state and mouth state of each picture detected
over a period of time, we calculate the driver’s PERCLOS, features of the input image can be extracted. Based on these
blinking frequency, and yawn frequency to judge the driver’s extracted features, AdaBoost algorithm is used to train mul-
fatigue state. tiple weak classifiers. Finally, multiple weak classifiers are
cascaded to obtain a strong classifier, which is the final face
A. FACE DETECTION BASED ON THE IMPROVED detector. This method effectively improves the performance
YOLOv3-TINY NETWORK of face detection, and it is still used and improved to this day.
The correctness of the face detection directly affects the per- Recently, with the continuous development and application
formance of the driving fatigue detection algorithm. So, accu- of deep learning, it provides a new method for face detection
rate and rapid face detection is the fundamental task of the and segmentation [35]. It can be divided into two categories:
driving fatigue detection algorithm. In the traditional algo- One is a multi-level detection algorithm based on proposal
rithm, Viola and Jones [33] proposed that the Haar-Like [34] region. The second is the target detection algorithm based on
the anchor box. The former‘s representative algorithms are (http://wider-challenge.org/2019.html) [28] data set as the
Faster-Rcnn [36] and MTCNN [30]. The latter’s represen- driving data. The WIDER FACE dataset includes 32,203
tative algorithms are S3FD [37], SSH [38]. Compared with images and 393,703 marked faces, which is one of the
traditional methods [39], face detection based on convolu- most common face databases. The data set includes different
tional neural network(CNN) avoids the artificial extraction scales, poses, occlusions, expressions, makeup, lighting, as
of features. With the support of data sets, face detection shown in Figure 3.
performance has been greatly improved. The WIDER FACE data set has the following features:
YOLO [40] is You Only Look Once, which means you only • The image resolution is generally high, and all image are
need to look at the picture once to get the target information. color images.
YOLO treats target detection as a regression problem to • Each image has a large number of faces, and each image
solve, and uses an end-to-end convolutional neural network contains an average of 12.2 faces, with more dense small
to extract the characteristics of the input image. It can obtain faces.
the position, size and category information of the target in the • The data set is divided into three types: training set, test
image. set, and verification set, which respectively account for 40%,
YOLOv3 is an improved version of the YOLO algorithm. 50%, and 10% of the data set.
And it is one of the best algorithms in the field of tar- Firstly, based on the YOLOv3-tiny network, the picture of
get detection. Based on YOLO, YOLOv3 refers to many the WIDER FACE data set is adjusted to 10 different sizes,
excellent research results. When using YOLOv3 to detect and the grid cells are arranged on the adjusted pictures by
320 × 320 images, the detection accuracy is consistent with 13 × 13 and 26 × 26. Then, we find the location of the
the SSD algorithm, but it is three times faster than the SSD driver’s face on the non-overlapping grid cell and classify it.
algorithm. For each grid cell, the network outputs B bounding boxes,
YOLOv3-tiny is a lightweight target detection model based corresponding confidence, and the conditional probability of
on YOLOv3. When detecting images on Pascal Titan X, the driver’s face. Finally, non-maximal values are used to
the detection speed can reach 220FPS, and is far higher than suppress redundant bounding boxes. The confidence formula
the general network. The YOLOv3-tiny algorithm has the is given as Equation (1).
following advantages: 1) Fast detection speed. The detection
truth
result can be obtained by running a neural network once score = Pr (Object) ∗ IOUpred (1)
for each test image, and it can be used for real-time detec-
tion; 2) Global understanding of the image. The information where Pr (Object) is the probability of the driver’s face. If the
around the target can be learned during training, and the face is included, Pr (Object) = 1; otherwise Pr (Object) = 0.
background error rate is less than half of the Fast R-CNN truth is the intersection over union(IOU) of the bounding
IOUpred
algorithm; YOLOv3-tiny can be used on devices with low box to the real box.
computing power such as embedded devices, but the detec- The YOLOv3-tiny network loss function consists of the
tion accuracy of YOLOv3-tiny is greatly reduced compared central error term of the bounding box, the width and high
to YOLOv3. error term of the bounding box, the error term of the predic-
The network structure of YOLOv3-tiny is obtained by tion confidence, and the error term of the prediction category.
simplifying the network structure of YOLOv3. Based on the YOLOv3-tiny network completed by offline
The YOLO [40] (You Only Look Once) model is a fast training, we realize the location of the driver’s suspected face
target detection model based on deep learning [41], [42]. It is area and provides an accurate driver’s face image for the
a separate end-to-end network that turns target detection into following algorithm.
a regression problem. To be more specific, the method of
regression and the CNN [43], [44] are used to replace the
sliding window of the traditional target detection to realize the B. DRIVER’S FACIAL MOTION FEATURE EXTRACTION
feature extraction of the driver’s face. This method of feature 1) FACE FEATURE LOCATION AND 128-DIMENSIONAL
extraction is less affected by the external environment and has FEATURE VECTOR EXTRACTION BASED ON THE
the advantage of extracting target features quickly. DLIB TOOLKIT
YOLOv3-tiny has a 23-layer network, including 13 Con- On the driver’s face area located by the improved YOLOv3-
volution layers, 6 Max Pooling layers, 1 Up Sampling layer, tiny network, the Face keypoint detection model based on
1 Fully connected dence layer, and 2 Output layers.To sim- the Dlib [45] library(As is shown in Figure 4(a)) is used to
plify the network and reduce the computation, we trans- extract the fine-grained features of the driver’s face. The Dlib
form the regression of multiple targets into a single target library contains 68 face key points, which uses the method of
according to the regression idea of YOLO model. And then, cascading shape regression to query the key points of the face
we improve YOLOv3-tiny network to locate suspected face component.
regions. The improved network structure is shown in Figure 2. Dlib is a modern C ++ toolbox, which contains machine
In the YOLOv3-tiny network training phase, we use the learning algorithms and tools designed in C ++, and used to
WIDER FACE (Face Detection Data Set and Benchmark) solve practical problems.
In the face key point detection, Dlib adopts the method directly from the sparse subset of pixel intensity, with high
in [46]–[48] and provides a model trained based on millions detection accuracy and very little time-consuming. This
of faces. This method uses the integration of regression method will be used in the face feature extraction proposed
trees [49]–[51] to estimate the position of facial key points in this paper.
FIGURE 4. Driver’s face feature point acquisition based on Dlib. (a) Dlib face feature; (b) Face feature point
positioning effect.
When the driver’s face is detected, the feature points of 2) EYE STATE PARAMETERS EXTRACTION BASED ON EFV
the face are obtained in real time by the above algorithm, As discussed above, whether the fatigue detection algorithm
as shown in the Figure 4(b). based on the traditional PERCLOS or the blink frequency is
After extracting 68 feature points with the Dlib toolkit, dependent on the judgment of eye state. The methods mostly
they can be used to form the face information into a use the P80 standard to extract the parameters of the eye state.
128-dimensional Feature Vector [52]–[54]. In this vector Firstly, the image of the eye is pre-processed through image
space, the Euclidean distance of the same face is closer than processing. Secondly, the contour of the driver’s eyes is fitted
that of different faces. Therefore, 128-dimensional Feature by ellipse fitting. Finally, the ratio of the major axis to the
Vectors extracted based on the Dlib toolkit can be used as minor axis of the ellipse is used as a parameter to characterize
driver biometrics for identity verification. the state of the eye. This method relies on the effect of eye
FIGURE 5. Eye state and EFV difference (Orange means closed, and blue
means open). FIGURE 6. Mouth state and MFV difference.
image preprocessing. In a real scenario, this method may have opening. In the normal driving process of the driver, speaking
low accuracy due to constant changes in lighting conditions also appears as a change in the degree of mouth opening,
and the driver’s head posture during driving. which greatly interferes with fatigue judgment based on the
To this end, based on Dlib facial feature point localiza- degree of mouth opening. After analyzing the yawning pro-
tion, the paper proposes a new parameter, Eye Feature Vec- cess, we find that: when speaking, the mouth is opened to a
tor (EFV), which can be used to evaluate the driver’s eye small extent and its opening duration is short; In the yawn
state. According to the Dlib eye feature points, EFV can state, the mouth is opened to a greater extent and its opening
be defined: the output of the driver eyes extraction module, duration is longer.
to gain the parameter which can indicate the fatigue status of In order to distinguish between the yawn and speaking,
driver. In this module, the ellipse fitting method is applied to the paper divides the mouth state into three types: closed
obtain the shape of pupils of driver. Eyes state (opening or mouth, small mouth, and big mouth. Similar to the eye state
closed) can be decided according to the relationship between parameters, based on Dlib facial feature point localization,
the long and short axes of the ellipse. Furthermore, the fatigue the paper proposes a new parameter, Mouth Feature Vector
status of driver is evaluated by PERCLOS. (MFV), which can be used to evaluate the driver’s mouth
||p2 − p6 || ||p3 − p5 || state. According to the Dlib mouth feature points, MFV can
EFV = ( , ) (2) be defined:
||p1 − p4 || ||p1 − p4 ||
||M2 − M8 || ||M3 − M7 || ||M4 − M6 ||
where Pi , i = 1, 2, . . . , 6 is the eye feature point coordinate. MFV = ( , , ) (3)
||M1 − M5 || ||M1 − M5 || ||M1 − M5 ||
As shown in Figure 5, when the driver’s eyes are in differ-
ent states, the eye feature points have significant differences. where Mi , i = 1, 2, . . . , 8 is the mouth feature point
As seen in the plane scatter plot (where blue is the EFV of coordinate.
the open-eye picture and orange is the EFV of the closed-eye As shown in Figure 6, when the driver’s mouth is in
picture), when the driver’s eyes are in different states, there different states, the mouth feature points have significant
are also significant differences in EFV, which are in line with differences. As seen in the plane scatter plot, when the driver’s
eye feature points. Therefore, EFV can be used as a parameter mouth is in different states, there are also significant differ-
to characterize the state of eyes for driver fatigue detection ences in MFV, which are in line with mouth feature points.
algorithms. Therefore, MFV can be used as a parameter to characterize
the state of mouth for driver fatigue detection algorithms.
3) MOUTH STATE PARAMETERS EXTRACTION
BASED ON MFV C. DRIVER IDENTITY INFORMATION LIBRARY
Similar to the PERCLOS and blink frequency, the yawn 1) DRIVER EYE STATE CLASSIFIER LIBRARY
frequency is also an important index for evaluating fatigue. As mentioned above, traditional driver fatigue detection algo-
It is inaccurate to judge fatigue based on the degree of mouth rithms are mostly based on the P80 criterion, which uses a
driver and compare it with the driver biometric informa- In order to obtain the eye classifier that takes the driver
tion library. If successful, the driver’s eye state classifier characteristics into account, the driver’s identity information
and mouth state classifier are called for online recognition; must first be verified to obtain an eye classifier that corre-
if unsuccessful, the driver will be reminded to input his sponds to the driver’s identity. After the identity verification
own identity information into the driver identity information is completed, online recognition of driver fatigue state based
library. on PERCLOS will be performed. First in the driving process,
we use the ordinary car camera to obtain the driver’s face
D. DRIVER’S IDENTITY VERIFICATION MODEL image in real time. Next, the improved YOLOv3-tiny net-
According to the driver’s identity information library, before work is used to detect the driver’s face. If the driver’s face
assessing the fatigue, the driver needs to complete identity is detected, the facial area is used as the input image and
information verification. First we use the camera to col- the facial feature points are located using the Dlib toolkit.
lect biometric information of the current driver, that is, the To reduce the false detection rate, the paper supplements the
128-dimensional feature vector of the driver’s face. Then, driver’s head posture information as an auxiliary discrimina-
calculate the Euclidean distance between it and all 128- tion parameter. When the improved YOLOv3-tiny network
dimensional feature vectors in the driver’s biometric library. fails to detect a face or locate the facial feature points, it is
The calculation formula is: determined that the driver is in an abnormal head posture
v
u 128 during driving, and this frame image is used as a closed-
uX eye frame. After completing the face feature point position-
d =t (x − y )2 i i (6) ing, the EFV is calculated based on the coordinates of the
i=1
eye feature points, and then the driver’s eye state classifier
where xi is the i th dimension of the 128-dimensional feature obtained by identity verification is used to determine the
vector currently collected, and yi is the i th dimension of driver’s eye state in the image. Finally, the number of closed-
the 128-dimensional feature vector in the driver’s biometric eye images of the driver is counted in a specific number
library. of frames (1000 frames are set in the article), and then we
We use 0.6 as the system’s decision threshold. When all calculate the PERCLOS value. If PERCLOS > ThPERCLOS
d values are greater than 0.6, it is determined that the cur- (ThPERCLOS is the driver’s fatigue state determination thresh-
rent driver is not in the driver biometric library, that is, the old (the article takes 0.4)), it is determined that the driver is
verification fails; When there is a d value less than 0.6, it is in fatigue, otherwise, it is in non-fatigue.
determined that the current driver is in the driver’s biometric
library, that is, the verification is passed. Then find the iden- 2) FATIGUE JUDGMENT BASED ON BLINK FREQUENCY
tity information corresponding to the minimum d value as the Under normal circumstances, during the driving process,
result of the identity verification, that is, the driver’s eye and the driver blinks relatively quickly each time, and the dura-
mouth state classifier is called for detection before the system tion is between 100-400ms. However, in the fatigue state,
starts. the duration of blinking is longer and more than 1 second,
and the blinking frequency increases. Therefore, the blink
E. DRIVER FATIGUE ASSESSMENT MODEL frequency can also intuitively reflect the driver’s fatigue level.
1) FATIGUE JUDGMENT BASED ON PERCLOS The article stipulates that the system detects the state change
Driver fatigue is a description of the state, and its correspond- of eye open-closed-open in turn as a blink, so the formula of
ing fatigue level is a dynamically changing process. Carnegie blink frequency is:
Mellon Research Center Wierwille proposed Percentage of NBlink
Eye Closure (PERCLOS). It has been widely accepted and FBlink = × 100% (8)
T
adopted by many researchers as an effective indicator of
fatigue driving. PERCLOS [56] is a physical quantity that where T is time, the unit is minute, and NBlink is the number
measures the state of human fatigue (drowsiness), which of blinks in T minutes.
is defined as the time taken by the eyes to be closed per Normal people blink about 10-20 times per minute when
unit time. The U.S. Federal Highway Administration and the they are awake. When they are in fatigue, the blink frequency
National Highway Traffic Safety Administration simulated will increase by 64%. Based on related research, similar
driving in a laboratory, which has verified the effectiveness to section 2.5.1, after the identity verification is completed,
of PERCLOS in characterizing driver fatigue. PERCLOS is online recognition of driver fatigue state based on blink
defined as: frequency will be performed. Different from section 2.5.1,
the number of blink images of the driver is counted in a
Nclose
PERCLOS = × 100% (7) specific number of frames (1000 frames are set in the article),
Ntotal and then we calculate the blink frequency. If FBlink > ThBlink
where Nclose is the number of closed eyes images in a specific (ThBlink is the driver’s fatigue state determination threshold
time, and Ntotal is the total number of images in a specific (the article takes 20)), it is determined that the driver is in
time. fatigue, otherwise, it is in non-fatigue.
3) FATIGUE JUDGMENT BASED ON YAWN FREQUENCY determination threshold (the article takes 3)), it is determined
Similar to PERCLOS and blink frequency, yawn frequency that the driver is in fatigue, otherwise, it is in non-fatigue.
is also an important indicator of evaluating fatigue. Yawn is In summary, the flowchart of driver fatigue assessment
a deep breathing activity that often occurs during laziness, model based on multi-feature fusion is shown in Figure 9.
tiredness, and lack of rest. And inhale more oxygen through
enlarging the lung. It stimulates the central nervous system to III. EXPERIMENTS
boost the spirit, which is the conditioned reflex under fatigue. To verify the validity of the algorithm, the paper evaluated the
Using this conditioned reflex activity can provide an intuitive performance of the improved YOLOv3-tiny network with the
evaluation index of the fatigue level. In order to distinguish Self-built data set DSD and public data set WIDER FACE.
non-yawning mouth activities such as speaking, this article On this basis, the design comparison experiment is carried
judges whether to yawn from the degree of mouth opening out to verify whether the fatigue driving detection algorithm
and the time of opening. The article stipulates that when the based on facial multi-feature fusion is correct.
system detects the state changes of Close-Small-Big-Small-
Close and the duration of opening up is more than 2 seconds, A. EXPERIMENTAL ENVIRONMENT AND DATA SET
it is a yawn. The formula of yawn frequency is: The experimental platform is the Intel Core i5-8400 with
x86 architecture, and the CPU clock speed is 2.80GHz.
NYawn Graphics card is GTX1060 with Pascal architecture (CUDA:
FYawn = × 100% (9) 9.2; CUDNN: 7.2), The RAM is 8G DDR4, and the
T
opencv3.4.6 image library is used. The deep learning comput-
where T is time, the unit is minute, and NYawn is the number ing framework is PaddlePaddle1.5. The environment of the
of blinks in T minutes. program is in python 3.6. Hardware configuration as shown
According to related research, yawn frequency increases in Table 1.
significantly when the human is in fatigue. Based on this, The data set used in the experiment included Self-built
the paper studies driver fatigue based on yawn frequency. data set DSD and public data set WIDER FACE, where
In order to obtain the mouth classifier that takes the driver the public data set WIDER FACE includes 32203 pictures
characteristics into account, the driver’s identity information and 393703 marked faces, which is used to train Yolov3-
must first be verified to obtain a mouth classifier that corre- tiny’s face network. However, the WIDER FACE dataset
sponds to the driver’s identity. After the identity verification only contains marker face images and does not provide any
is completed, online recognition of driver fatigue state based information about the driver’s fatigue status. Therefore, the
on yawn frequency will be performed. First in the driving WIDER FACE data set cannot be used to analyze driver
process, we use the ordinary car camera to obtain the driver’s fatigue status. To this end, the driving state dataset (DSD)
face image in real time. Next, the improved YOLOv3-tiny is established in this paper, which contains data collected
network is used to detect the driver’s face. After completing by 50 test drivers sitting on a driving simulator (as shown
the face feature point positioning, the MFV is calculated in Figure 10). The data set of each test driver is shown in the
based on the coordinates of the mouth feature points, and table below.
then the driver’s mouth state classifier obtained by identity
verification is used to determine the driver’s mouth state B. FACE DETECTION
in the image. Finally, the number of yawn of the driver is The improved YOLOv3-tiny network provides face land-
counted in a specific number of frames (1000 frames are marks for fatigue driving detection, and its performance is
set in the article), and then we calculate the yawn frequency directly related to the pros and cons of the fatigue driving
value. If FYawn > ThYawn (ThYawn is the driver’s fatigue state detection algorithm. Therefore, we quantitatively evaluate of
the performance of the improved YOLOv3-tiny network on the face detection area and the marked real area. In Figure 11,
the WIDER FACE data sets. face_d is the face area detected by the model, face is the real
In this paper, accuracy are selected as evaluation indicators. area marked, and the calculation formula is the Equation 11:
It is an intuitive evaluation index of model performance.,
S(face_d ∩ face)
as shown in Equation(10). IoU = (11)
S(face_d ∪ face)
Nd
accuracy = (10) where S(face_d ∩ face) is the area of face_d ∩ face, and
Nt
S(face_d ∪ face) is the area of face_d ∪ face.
where Nd is the number of correctly detected images, and Nt The intersection ratio indicates the degree of overlap
is the total number of images. between the model prediction area and the real area. As can
In the process of improving the YOLOv3-tiny network be seen from Figure 11, the higher the value is, the higher the
training and verification, the intersection ratio parameter detection accuracy is. In the case IOU = 1, the prediction box
(IOU) [57] is introduced to measure the similarity between overlaps with the real box. Generally speaking, in the task of
TABLE 5. The time spend of fatigue status judge. based on facial multi-feature fusion. The main contributions
cover as follows.
We designed a driver’s face detection architecture based
on the improved YOLOv3-tiny convolutional neural network,
and trained the network with the open-source dataset WIDER
FACE [28].
We designed the eye and mouth SVM classifier that
Our algorithm shows that the system has good accuracy takes driver characteristics into account driver characteristics,
and high-speed performance under various conditions, and which judges fatigue based on the actual driver’s eyes and
can accurately judge the fatigue state of the driver. Compared mouth size. It has high accuracy.
with Adaboost +CNN and MTCNN+LRCN algorithm [58], We constructed the driver identity information library.
[59], our method improves the accuracy of the fatigue driving There are three types of driver identity information in the
detection algorithm. It also has better real-time performance, system: driver biometrics, driver eye state classifier, and
which meets the requirements of the fatigue driving detection driver mouth state classifier. We train classifiers in advance
system. The comparative result is shown in Table 5. and store them into the driver identity information library.
Then, through identity verification, the driver’s classifiers
are called before system startup. It not only simplifies the
IV. CONCLUSION
initialization, but also avoids inaccuracies due to entering the
Fatigue driving can seriously affect driving skills and
identity manually.
seriously threaten drivers and other traffic participants.
At present, fatigue driving detection have achieved better
research results, but it still needs to be improved, such as ACKNOWLEDGMENT
high intrusiveness, poor detection performance in complex Thanks to the open source dataset, WIDER_FACE for
environments, and simple evaluation indicator. Therefore, providing data. WIDER_FACE: http://shuoyang1213.me/
we propose a new detection algorithm for fatigue driving WIDERFACE/
[42] S. Kido, Y. Hirano, and N. Hashimoto, ‘‘Detection and classification of [58] G. Lei, X. Liang, Z. Xiao, and Y. Li, ‘‘Real-time driver fatigue detection
lung abnormalities by use of convolutional neural network (CNN) and based on morphology infrared features and deep learning,’’ Infr. Laser
regions with CNN features (R-CNN),’’ in Proc. Int. Workshop Adv. Image Eng., vol. 47, no. 2, 2018, Art. no. 203009.
Technol. (IWAIT), Chiang Mai, Thailand, Jan. 2018, pp. 1–4. [59] L. Tychsen-Smith and L. Petersson, ‘‘Improving object localization with
[43] Y. Tian, Y. Du, Q. Zhang, J. Cheng, and Z. Yang, ‘‘Depth estimation for fitness NMS and bounded IoU loss,’’ 2017, arXiv:1711.00164. [Online].
advancing intelligent transport systems based on self-improving pyramid Available: https://arxiv.org/abs/1711.00164
stereo network,’’ IET Intell. Transp. Syst., vol. 14, no. 5, pp. 338–345,
May 2020.
[44] Y. Tian, Q. Zhang, Z. Ren, F. Wu, P. Hao, and J. Hu, ‘‘Multi-scale dilated
convolution network based depth estimation in intelligent transportation
systems,’’ IEEE Access, vol. 7, pp. 185179–185188, 2019. KENING LI received the B.E. degree in commu-
[45] D. E. King, ‘‘Dlib-ml: A machine learning toolkit,’’ J. Mach. Learn. Res., nications and transportation from the University
vol. 10, pp. 1755–1758, Jul. 2009. of Henan Agricultural University, Henan, China,
[46] J. H. Friedman, ‘‘Greedy function approximation: A gradient boosting
in 2004, the M.S. degree in vehicle operation
machine,’’ Ann. Statist., vol. 29, pp. 1189–1232, Oct. 2001.
engineering from Jilin University, Jilin, in 2007,
[47] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6, and the Ph.D. degree in vehicle engineering from
pp. 84–90, May 2017. Shanghai Jiao Tong University, Shanghai, in 2014.
[48] Q. Zhang and S.-I. Kamata, ‘‘Improved color barycenter model and its He is currently a Lecturer with the School of
separation for road sign detection,’’ IEICE Trans. Inf. Syst., vol. E96.D, Traffic and Environment, Shenzhen Institute of
no. 12, pp. 2839–2849, Dec. 2013. Information Technology, Shenzhen, China. His
[49] X. Cao, Y. Wei, F. Wen, and J. Sun, ‘‘Face alignment by explicit shape research interests include vehicle system dynamics and control, intelligent
regression,’’ Int. J. Comput. Vis., vol. 107, no. 2, pp. 177–190, Apr. 2014. vehicle systems, and artificial intelligence.
[50] P. Dollár, P. Welinder, and P. Perona, ‘‘Cascaded pose regression,’’ in Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., San Francisco,
CA, USA, Jun. 2010, pp. 1078–1085.
[51] Q. Zhang and S.-I. Kamata, ‘‘A novel color descriptor for road-sign
detection,’’ IEICE Trans. Fundam. Electron., Commun. Comput. Sci., YUNBO GONG was born in Tieling, Liaon-
vol. E96-A, no. 5, pp. 971–979, May 2013. ing, in 1995. He received the B.E. degree
[52] X. Kong, X. Liu, B. Jedari, M. Li, L. Wan, and F. Xia, ‘‘Mobile crowdsourc- from the South China University of Technology,
ing in smart cities: Technologies, applications, and future challenges,’’ Guangzhou, China, in 2018, where he is currently
IEEE Internet Things J., vol. 6, no. 5, pp. 8095–8113, Oct. 2019. pursuing the master’s degree in traffic informa-
[53] D. Ma, X. Song, and P. Li, ‘‘Daily traffic flow forecasting through tion engineering and control. His research interests
a contextual convolutional recurrent neural network modeling inter-and
include intelligent vehicles, computer vision, and
intra-day traffic patterns,’’ IEEE Trans. Intell. Transp. Syst., early access,
Feb. 24, 2020, doi: 10.1109/TITS.2020.2973279.
3D laser radar.
[54] D. Ma, J. Xiao, X. Song, X. Ma, and S. Jin, ‘‘A back-pressure-based
model with fixed phase sequences for traffic signal optimization under
oversaturated networks,’’ IEEE Trans. Intell. Transp. Syst., early access,
Apr. 29, 2020, doi: 10.1109/TITS.2020.2987917.
[55] Z. You, Y. Gao, J. Zhang, H. Zhang, M. Zhou, and C. Wu, ‘‘A study on ZILIANG REN (Member, IEEE) received the
driver fatigue recognition based on SVM method,’’ in Proc. 4th Int. Conf. Ph.D. degree from the South China University of
Transp. Inf. Saf. (ICTIS). Banff, AB, Canada: Institute of Electrical and
Technology, Guangzhou, China, in 2017. He is
Electronics Engineers, Aug. 2017, pp. 693–697.
currently with the Guangdong Provincial Key Lab-
[56] J.-J. Yan, H.-H. Kuo, Y.-F. Lin, and T.-L. Liao, ‘‘Real-time driver drowsi-
ness detection system based on PERCLOS and grayscale image process-
oratory of Robotics and Intelligent System, Shen-
ing,’’ in Proc. Int. Symp. Comput., Consum. Control (ISC), Xi’an, China, zhen Institutes of Advanced Technology, Chinese
Jul. 2016, pp. 243–246. Academy of Sciences, Shenzhen, China, as a Post-
[57] L. Tychsen-Smith and L. Petersson, ‘‘Improving object localization with doctoral Researcher and an Assistant Researcher.
fitness NMS and bounded IoU loss,’’ in Proc. 31st Meeting IEEE/CVF His current research interests include computer
Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, vision and machine learning.
Jun. 2018, pp. 6877–6885.