0% found this document useful (0 votes)
13 views

Research On Driver Fatigue State Detection Method Based On Deep Learning

This paper proposes a method to detect driver fatigue state based on deep learning. It uses an improved Multi-task Cascaded Convolutional Network (MTCNN) model to accurately locate facial feature points. It then extracts the eye and mouth areas and classifies their states using a Res-SE-net model. The driver fatigue is judged based on combining the PERCLOS rule of eye state and the OMR rule of mouth opening frequency. Experimental results show this method can effectively extract fatigue features, has high detection accuracy, and meets real-time requirements.

Uploaded by

otik brebes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Research On Driver Fatigue State Detection Method Based On Deep Learning

This paper proposes a method to detect driver fatigue state based on deep learning. It uses an improved Multi-task Cascaded Convolutional Network (MTCNN) model to accurately locate facial feature points. It then extracts the eye and mouth areas and classifies their states using a Res-SE-net model. The driver fatigue is judged based on combining the PERCLOS rule of eye state and the OMR rule of mouth opening frequency. Experimental results show this method can effectively extract fatigue features, has high detection accuracy, and meets real-time requirements.

Uploaded by

otik brebes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Physics: Conference Series

PAPER • OPEN ACCESS You may also like


- Multi-input Convolutional Neural Network
Research on Driver Fatigue State Detection Fault Diagnosis Algorithm Based on the
Hydraulic Pump
Method Based on Deep Learning Lishan Zhang, Lei Han, Yuzhen Meng et
al.

- Dosimetric evaluation of the Leksell


To cite this article: Yonghui Wang and Roufei Qu 2021 J. Phys.: Conf. Ser. 1744 042242 GammaPlan™ Convolution dose
calculation algorithm
A Logothetis, E Pantelis, E Zoros et al.

- Rotating machinery fault diagnosis using


dimension expansion and AntisymNet
View the article online for updates and enhancements. lightweight convolutional neural network
Zhiyong Luo, Yueyue Peng, Xin Dong et
al.

This content was downloaded from IP address 103.161.228.85 on 09/11/2023 at 06:34


MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

Research on Driver Fatigue State Detection Method Based on


Deep Learning

Yonghui Wang* and Roufei Qu


School of Information and Control Engineering, Shenyang Jianzhu University,
Liaoning, China

*Corresponding author e-mail: [email protected]

Abstract. Fatigue driving detection is essential to ensure the safety of society and
drivers. At present, most fatigue detection methods are relatively traditional and single,
and have complex algorithms, low accuracy, and low fault tolerance. Based on the
improved Multi-task Cascaded Convolutional Network (MTCNN) to achieve precise
positioning of facial feature points, combined with the Res-SE-net model to achieve
eye, mouth area and state classification. The model is trained, and finally the driver
fatigue is judged based on the PERCLOS rule combined with the OMR rule of mouth
opening and closing frequency. Experimental results show that this method can
effectively extract fatigue features, has high detection accuracy, meets real-time
requirements, and has high robustness to complex environments.

Keywords: Fatigue Detection, Deep Learning, Facial Feature Points, Convolutional


Neural Network

1. Introduction
According to statistics, 48% of traffic accidents in China are caused by driver fatigue, with direct
economic losses of hundreds of thousands of dollars. With the development of expressways and the
increase of vehicle speed, the fatigue detection of automobile drivers has become an important part of
train safety research. The rise of deep neural networks has promoted the leap of machine vision
algorithms. It provides a large number of excellent solutions for target detection and state
classification problems. With the advent of Convolution Neural Networks (CNN) [1] in 2012, deep
learning has become the most common method for studying the field of imaging. Taigman et al. [2]
proposed the DeepFace mode in 2014 to improve the accuracy of facial recognition and cognitive
functions. In 2016, Zhang et al. [3] proposed Multitask Convolutional Neural Networks (MTCNN).
The model can detect multiple faces quickly and accurately. ZhangF et al. [4] proposed an eye state
recognition method based on the calculation of blinking frequency in the study of driver fatigue
detection based on human eye state recognition, but the fatigue judgment parameter is single and the
accuracy rate is low. The method of judging fatigue based on facial features is mainly based on closing
eyes and yawning. Wierwille et al. [5] set the theoretical mode of PERCLOS, that is, the time ratio of
closing eyes per hour. PERCLOS is now a very effective indicator of fatigue. Dai Shiqi et al. [6] are
based on HOG feature extraction and ERT algorithm, after realizing face detection and feature

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

location, they use convolutional neural network to identify the state of eyes and mouth. Gu Wanghuan
et al. [7] proposed a multi-scale pooling neural network model msp-net. Geng Lei et al. [8] first used
AdaBoost and KCF (kernelized correlation filter) for face perception and tracking, and then used a
classic network structure to perceive the state of the eyes and mouth. Zhao Xuepeng et. [9] used a
cascaded network to locate and detect the eye position to determine fatigue.
This paper uses the improved MTCNN detection model to determine the fatigue state of the driver,
extracts the characteristics of the eyes and mouth, puts the eye and mouth data into the Res-SE-net for
detection, judges the state of the eyes and mouth, and finally combines the PERCLOS rule and Open
Mouth Rate (OMR) to judge fatigue. The algorithm flow is shown in Figure 1.

Figure 1. Fatigue testing process framework

2. Face Detection and Special Point Positioning


This article uses the MTCNN model to detect human faces and special point positioning. The
algorithm flow is shown in Figure 2. MTCNN has high accuracy, and the network structure is shown
in Figure 3. First, obtain the image pyramid by adjusting the size of the original image, detect the face
candidate frame through P-Net, return to the bounding box, modify the candidate window, and then
integrate the repeated candidate frame through NMS (NMS-maximum Suppression). Secondly, the
candidate frames are further adjusted through R-Net, and those false-positive regions are also removed
through bounding box regression and NMS. This network structure has one more fully connected layer
than P-Net, so it can achieve better suppression. Finally, since O-Net has one more fully connected
layer than the R-Net layer, the result of O-Net processing is more accurate, which has the same effect
as the R-Net layer. However, this layer carried out more supervision and output 5 signs.
The MTCNN algorithm can still accurately detect human faces under various light source
conditions, face deviations, and partial occlusion of faces. However, the time consumption between
MTCNN layers is relatively large, so the MTCNN algorithm is optimized in the face detection stage.
Refer to the method of reducing the size of the original image and increasing the minimum face size
proposed by Shi Ruipeng [10], which can improve the detection accuracy and greatly increase the
detection speed. Face detection and feature point positioning are shown in Figure 4.

2
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

Figure 2. MTCNN algorithm flow Figure 3. MTCNN network structure

Figure 4. Face detection and feature point positioning

3. Image Extraction of Target Area of Eyes and Mouth


In the fatigue detection, the status of the driver’s eyes and mouth is to judge whether the driver is
fatigued. The eye area and mouth area are determined by geometric relations. Set A(XA,YA), B(XB,YB),
C(XC,YC), D (XD,YD) are the feature points of the driver's eyes and the corners of the mouth. An
example diagram for extracting the eye and mouth regions is shown in Figure 5. Take extracting the
left eye of the driver as an example: d1 is the distance between the two eyes, h1 is the height of the
extracted eye area, w1 is the width of the extracted eye area, O is the vertex of the upper left corner of
the left eye area, d2 is the distance between the corners of the mouth, h2 is the height of the extracted
mouth area, w2 is the width of the extracted mouth area, and M is the vertex of the upper left corner of
the mouth area. The formula for solving each area is shown in formula (1):
d1 = X B − X A d 2 = X D − X C
w = d * 0.65 w = d *1.2
 1  2 2

h1 = w1 * 0.85 h2 = w2


 X = X − w * 0.85  X = X − w * 0 .8
 O A 1
 M C 2

YO = YA − h1 * 0.5 YM = YC − h2 * 0.8


(1)

3
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

(a) Feature point location (b)Extract image


Figure 5. Image extraction of the target area of eyes and mouth

4. Recognition and Classification of Eye and Mouth Status


In recent years, in the field of computer vision, it has shown the world's highest level of ILSVRC
competition leader, such as Alex-Net [11], Res-Net [12], SE-net [13]. The accuracy of the
convolutional neural network model for computer vision work continues to approach the limit, but the
depth and computational complexity are also increasing significantly, and the requirements for the
performance of hardware devices are also increasing. The size of the input picture and the complexity
of the network structure determine the amount of computation and time overhead for image
classification. This paper proposes to realize the state recognition of the eyes and mouth based on the
Res-net converged sub-network SE-net (Squeeze-and-Excitation Networks), and insert the SE-Net
block into the Res-net classification network, which greatly reduces The error rate of a single Res-net
model is reduced, and the complexity is low, and the amount of calculation is small.

Figure 6. The SE module is embedded in the Res-Net structure


The structure of Res-SE-Net is shown in Figure 6. The dimension information next to the box
represents the output of the layer. Use global average pooling as a Squeeze operation. Two Fully
Connected layers form a Bottleneck structure to model the correlation between channels and output
the same number of weights as the input features. First, the feature dimension is reduced to 1/16 of the
input, and then it is activated by ReLu and then upgraded to the original dimension through the Fully
Connected layer. This can better fit the complex correlation between channels and greatly reduce the
amount of parameters and calculations. Then through a Sigmoid gate, the normalized weight between
0 and 1 is obtained, and the normalized weight is weighted to the features of each channel through the
Scale operation.
The images extracted by MTCNN are sent to Res-SE-Net for eye and mouth state recognition,
convolution kernels of different sizes are set, and the convolution layer traverses multiple input
two-dimensional feature maps, and convolves in three-dimensional space to obtain output feature map,
and then continue to propagate backward. Each convolutional layer trains multiple learnable
convolution kernels by setting multiple output feature maps. As the number of iterations increases, the

4
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

convolution kernel continues to update, and the feature extraction function will continue to be
strengthened.

5. Fatigue Judgment
When the driver gets tired, he will have a series of biological reactions such as closing his eyes or
yawning for a long time. Based on the biometric response and the obtained eye and mouth conditions,
the degree of driver fatigue is determined by calculating PERCLOS and mouth opening rate (OMR).

5.1. PERCLOS Judgment Rule


PERCLOS is the percentage of eye closure time per unit time. P80 (the percentage of the total time
that the eyes are closed more than 80% per unit time) has become a general rule for judging fatigue.
The principle is shown in Figure 7, and the calculation formula is shown in formula (2). Among them,
Te = (t4-t2), which means the number of frames in which the eyes are closed during the detection time;
TE = (t5-t1), which means the total number of frames in the eye state detection time.
Perclos = Te / TE * 100 % (2)

Figure 7. PERCLOS principle diagram under P80 standard

5.2. OMR Judgment Rule


The OMR is similar to the PERCLOS criterion of the eye, and its calculation formula is shown in
formula (3). In the formula, Tm represents the number of frames that the mouth is open during the
detection time; TM represents the total number of frames within the mouth state detection time.
OMR = Tm / TM * 100 % (3)

5.3. Fatigue Judgment


When the driver is awake, the Perclos is not greater than 0.15, and when the driver is fatigued, the
Perclos is greater than 0.4. Studies have found that the duration of a person's yawn is 3 to 5 seconds.
Generally, a yawn is 2.5~3 seconds, and a deep yawn is more than 3 seconds. In the case of increased
fatigue, the human body usually yawns again and again. Therefore, counting the number of yawns
within a certain period of time can further reflect the driver's fatigue. This article uses 2 minutes as the
detection duration, and counts the number of yawns ny in this duration. When ny≥3, it is determined
that the driver is in a state of increased fatigue and a higher level of warning must be given. The
greater the value of PERCLOS and OMR, the greater the degree of fatigue. The final fatigue state
detection needs to combine the two at the same time.

6. Experimental Results and Analysis

6.1. Experimental Environment and Data Set


The experimental environment is based on Python 3.7 and TensorFlow 1.15. The processor is a CPU
(3.20 GHz), and the memory is 4 GB. The improved MTCNN face detection model is trained using

5
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

the benchmark face data set WIDER FACE. The data set has 32203 pictures and 159424 faces. When
training the model, the data set is divided into three subsets according to the difficulty of image
recognition, of which 60% is used as the test set, 30% is used for model training, and 10% is the
validation set. The data set of eyes and mouth is collected through the ZJU blinking video data set and
YawDD fatigue driving video data set to collect a large number of closed eyes and open mouth
samples. In further verification experiments, data of 15 volunteers were collected, and a total of 17532
eye samples were collected, including 8562 samples with open eyes and 8970 samples with closed
eyes. There were 19431 mouth samples, including 10356 mouth samples and 9075 mouth samples.

6.2. Experimental Results and Analysis

6.2.1. Eye and mouth status detection. In order to test the detection and recognition of the method
proposed in this article under actual conditions, the continuous video image stream taken by the
camera is used as the test object, and the camera resolution is 640×480 pixels. First, MTCNN is used
for face detection and feature point extraction on the frame images of the video, and then the
Res-SE-net model is used to realize the judgment of the facial eye and mouth feature state, and finally
the Perclos algorithm and the OMR algorithm are combined to detect the fatigue state. This paper
compares the Res-net embedded in SE-net with the Alex-Net proposed in[11]. Res-SE-net and
Alex-Net are used for model training respectively. After training the eye and mouth state classification
models, the classification accuracy and classification time overhead of the two network structure
training models were tested. The results are shown in Tables 1 and 2.
Table 1. Test results of eye and mouth state classification model
Res-SE-net Average Alex-Net Average
Number of accuracy of positive accuracy of positive
Category
samples and negative and negative
samples% samples%
Left eye 1500 95.1 93.8
Right eye 1500 95.7 93.6
Mouth 1500 96.8 94.1

Table 2. Average time overhead of single image classification /ms


Res-SE-net Average Alex-Net Average
Mission
time overhead time overhead
Left eye 3.461 6.957
Right eye 3.516 6.847
Mouth 3.376 6.942
It can be seen from the test results that the Res-SE-net proposed in this paper is slightly better than
Alex-Net in terms of time cost and accuracy. In the next step, compare the method proposed in this
article with the method proposed in the literature [6,7,8]. Select 2000 sample images containing the
entire face, and normalize them to 640×480 size for testing. The results are shown in Table 3. It can be
seen from the experimental results that the calculation speed and accuracy of Res-SE-net in this paper
are better than the methods proposed in the literature [6,7,8].
Table 3. Test results of left eye state classification by four methods
Average accuracy of
Average time
Mission positive and negative
overhead/ms
samples%
Hog+ERT+CNN[6] 92.5 130.642
MTCNN+MSP-Net[7] 94.5 178.965
AdaBoost+Cascade 91.3 27.972

6
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

regression+SR-Net[8]
MTCNN+Res-SE-net 96.5 8.743
6.2.2. Fatigue status detection. In order to verify the feasibility of the algorithm, this paper randomly
selects 5 female data and 5 male data from YawDD video data, and 1 self-made video data of the
research group, counts the actual fatigue times in these 11 videos (total 752s), and recognizes the
algorithm Number of fatigues. Through comparison experiments with traditional fatigue detection
methods, the algorithm recall rate and precision rate are checked. The experimental data is shown in
Table 4. The precision rate is p = (N-Nt) / (N-Nt+Ne), the recall rate is q = (N-Nt) / N, where N
represents the number of real fatigues, Nt represents the number of missed inspections, and Ne
represents the number of false detections. From the experimental results, it can be seen that the
precision and recall of the algorithm in this paper are better than those of the traditional algorithm. The
false detections and missed detections mainly occur in low light. Relatively speaking, it is checked
under normal light. The full rate and precision rate will be higher.
Table 4. Experimental results of each algorithm
Missed
Fatigue False detections Precision Recall
detection
/Times /Times % rate%
times/ Times
Hog+ERT+CNN[6] 48 7 5 86 89.583
MTCNN+MSP-Net[7] 48 4 7 91.111 85.417
AdaBoost+Cascade
48 6 4 88 91.667
regression+SR-Net[8]
MTCNN+Res-SE-net 48 2 1 95.918 97.917

7. Conclusion
This paper first detects the driver’s face through MTCNN, and extracts the key positions of the eyes
and mouth; secondly, the eyes and mouth images are sent to the Res-Net fused with SE-net for state
testing, and the algorithm is accelerated and optimized. At the same time of accuracy, the training
speed is improved. Finally, the fatigue state is jointly judged by PERCLOS and OMR. Experiments
show that the algorithm proposed in this paper has a high detection accuracy rate, and can achieve
real-time detection results with good robustness.

References
[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural
networks. Communications of the ACM, 2017, 60(6): 84-90.
[2] Taigman Y, Yang Ming, Ranzato M, et al. DeepFace: losing the gap to human-level
performance in face verification //Proc of IEEE Conference on Computer Vision and Pattern
Recognition. Washington DC:IEEE Computer Society, 2014: 1701-1708.
[3] Zhang Kaipeng, Zhang Zhanpen, Li Zhifeng, et al. Joint face detection and alignment using
multitask cascaded convolutional networks. IEEE Signal Processing Letters, 2016, 23(10):
1499-1503.
[4] Zhang F, Su JJ, Geng L, et al. Driver fatigue detection based on eye state recognition.
International Conference on Machine Vision and Information Technology. Singapore. 2017.
105-110.
[5] Wierwille W W, Ellsworth L A. Evaluation of driver drowsiness by trained raters. Accident
Analysis & Prevention, 1994, 26(5): 571-581.
[6] Dai Shiqi, Zeng Zhiyong. Fatigue driving monitoring based on BP neural network. Computer
Systems & Applications, 2018, 27 (7): 113-120.
[7] Gu Wanghuan, Zhu Yu, Chen Xudong, et al. Driver’s fatigue detection system based on
multi-scale pooling convolutional neural networks. Application Research of Computers,
2019, 36 (11): 3471-3475

7
MACE 2020 IOP Publishing
Journal of Physics: Conference Series 1744 (2021) 042242 doi:10.1088/1742-6596/1744/4/042242

[8] Geng Lei, Yuan Fei, Xiao Zhitao, et al. Driver fatigue detection method based on facial
behavior analysis. Computer Engineering, 2018, 44(1): 274-279.
[9] Zhao Xuepeng, Meng Chunning, Feng Mingkui, et al. Fatigue detection based on cascade
convolutional neural network. Journal of Optoelectronics Laser, 2017, 28(5): 497-502.
[10] Shi Ruipeng, Qian Yi, Jiang Danni. Fatigue driving detection method based on convolutional
neural network. Application Research of Computers. https://doi.org/10.19734/j.issn.
1001-3695. 2019. 07. 0313
[11] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural
networks // Advances in Neural Information Processing Systems. New York: Curran
Associates, 2012: 1097-1105.
[12] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image
Recognition // Proc of the IEEE conference on Computer Vision and Pattern Recognition.
Piscataway: IEEE Press, 2016: 770-778.
[13] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks // Proc of the IEEE conference on
Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.

You might also like