A Deep Learning Approach For Face Detection and Lo
A Deep Learning Approach For Face Detection and Lo
A Deep Learning Approach For Face Detection and Lo
Abstract. Face detection and location technique is a hot research direction during recent years.
Especially, driver face detection on highway is still a challenging problem in social safty
deserving research. This paper proposes a novel algorithm based on the improved Multi-task
Cascaded Convolutional Networks (MTCNN) and Support Vector Machine (SVM) to realize
accurate face region detection and feature location of driver's face on highway, predicting face
and feature location via a coarse-to-fine pattern. The proposed algorithm is verified under
various complex highway conditions. Experimental results show that the proposed model
shows satisfied performance compared to other state-of-the-art techniques used in driver face
detection and alignment, keeping robust to the occlusions, varying pose and extreme
illumination on highway.
Keywords. Driver face detection; Face alignment; Convolutional Networks; Support Vector
Machine
1. Introduction
Face detection and alignment technology has been widely used in various practical fields, especially in
driver face detection and alignment area, which involves public security and traffic order. With the
fast development of digital image processing technology, various detection and alignment techniques
have been proposed [1]. However, in real applications, there exit varying illumination, occasion as
well pose affect the detection and alignment performance.
Face detection is a hot research direction in these years. In 2004, Viola [2] put forward cascade
detection method based on AdaBoost with Haar-Like features to perform cascaded classifiers.
Unfortuntely, some later researches [3, 4] indicate that it cannot keep continued competitiveness in
real applications which affect the visual consistency of faces. Afterwards, deep CNNs are utilized in
face detection. Yang et al. [5] proposed the deep neural networks for facial feature recognition.
However, this algorithm is time consumely in real condition. Later, Kaipeng Zhang designed the
MTCNN [6] model, which is consisted of three layers of precise deep convolutional networks.
Nevertheless, due to its fixed training samples and limited network structure, when it comes to real
condition, such as driver face detection, it show unsatisfied performance.
At the same time, face alignment also attracts wide study interests. The research fruits in this area
can be approximately divided into two parts[7-12], regression-based techniques and template fitting
methods. Nevertheless, previous methods about face detection and alignment methods overlooked the
inherent relationship among these two issues.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
AIAAT 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 435 (2018) 012004 doi:10.1088/1757-899X/435/1/012004
1234567890‘’“”
In this paper, we propose a driver face detection and alignment model based on improved MTCNN
and SVM[13], called IMT-SVM, forming adaptive detection and alignment model. We train this
model on our own traffic driver face database, which is constructed by Public Security Department of
Jiangsu Province, verifying the performance on the testing dataset. According to the experimental
result, the IMT-SVM model shows high detection rate as well as low error detection rate compared to
other state-of-the-art method.
2. Basic Theory
In this paper, we do research on the structural improvement of MTCNN and realize the fusion of SVM,
improving the performance of the original algorithm. In this section, the basic theory about face
detection and alignment which are utilized in our model are introduced.
2.1. MTCNN
The main purpose of MTCNN is to construct the image pyramid of the corresponding face image. The
overall stages of MTCNN which is constituted of three convolutional networks are illustrated below.
2.1.1. Proposal Network (P-Net). The fully convolutional network is utilized to get the rough facial
windows as well the corresponding vectors contain bounding box regression information. This can be
concluded as a two-class classification issue which can be solved by the cross-entropy loss.
Ldet
i
y det log pi 1 yidet 1 log pi (1)
2.1.2. Refine Network (R-Net). This layer of network play the role of calibration based on bounding
box regression and NMS, aiming at rejects major false rough facial windows. This objective can be
summed as a regression problem, as well overcomed by Euclidean loss.
2
Lbox
i yˆ ibox yibox (3)
2
In Eq(3), ŷibox and yibox represent the regression target calculated by the network and the
corresponding real coordinate, respectively.
2.1.3.Output Network (O-Net). This stage proposes more supervisions to mark face region. Most
important of all, this stage marks out five facial features’ coordinates. Facial features detection belongs
to the regression issue through minimizing the defined Euclidean loss:
2
Llandmark
i yˆ ilandmark yilandmark (4)
2
landmark landmark
In Eq(4), ŷi and yi are the coordinates of facial features correspond to trained network
and real condition for the i-th input image, respectively. The facial features are consisted of five
feature points, including left eye, right eye, nose, left mouth as well as right mouth.
2.2. SVM
SVM is a classical classification method applied in pattern recognition field. SVM maps the pixels'
data into the space consisted of higher dimensional which is contribute to constructing the optimal
separating hyperplane, aiming at solving the quadratic programming and local minima issue. In our
model, we propose SVM for classification issues of two classes, whether it is face region or not,
2
AIAAT 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 435 (2018) 012004 doi:10.1088/1757-899X/435/1/012004
1234567890‘’“”
realizing multiple classification judgment. The combination of SVM and MTCNN can be utilized in
complex conditions.
Proposal Refine
Network Network
(P-Net) (R-Net)
Manual hard
sample
mining
SVM
Output
Network
(O-Net)
3
AIAAT 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 435 (2018) 012004 doi:10.1088/1757-899X/435/1/012004
1234567890‘’“”
mining is conducted in MTCNN. However, in real application, especially in highway condition, as the
result of complex environment, it is necessary to manual generate representative negative sample,
strengthening the detector in training procedure. Experimental results indicate that this strategy
performs better performance with adding manual hard sample, showing in Section IV.
4. Experimental Results
In this section, We use our own traffic driver face database, which is constructed by Public Security
Department of Jiangsu Province, as dataset, containing approximately 1500 driver face images in
different traffic conditions. We randomly choose 1000 images for training IMT-SVM model,
remaining 500 images for verifying the model's performance. Double kernel CPU are used to train the
networks. The performances of our method on face detection and alignment are shown below.
(a) Original image (b) Result picture (c) Magnified labelled face
Figure 3. The detection and alignment result of IMT-SVM model under the dark environment
(a) Original image (b) Result picture (c) Magnified labelled face
Figure 4.The detection and alignment result of IMT-SVM model during the day time
4
AIAAT 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 435 (2018) 012004 doi:10.1088/1757-899X/435/1/012004
1234567890‘’“”
Table 1.Detecting Performances Between our Method and Other comparision Techniques
Figure 5. The alignment results of our model and other comparison techniques
5. Conclusions
This paper proposes a driver face detection and alignment model based on improved MTCNN and
SVM. As face detection through convolutional network in MTCNN show unsatisfied error detection
rate, it is desirable to use SVM for multiple classification judgment. In addition, improved MTCNN is
proposed to realize precise facial point detection under complex highway environment. Experimental
results show that the IMT-SVM model in this paper is effective, keeping robust to the occlusions,
varying pose and extreme illumination on highway.
However, our work has some limitations under complex illumination environment and shadowed
face images. Thus, in future work, denoising method [16][17] is a direction worth studying and
making improvements[18][19].
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No.61871123), Key
Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by
the Priority Academic Program Development of Jiangsu Higher Education Institutions.
5
AIAAT 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 435 (2018) 012004 doi:10.1088/1757-899X/435/1/012004
1234567890‘’“”
References
[1] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel features formulti-view face detection,”
in IEEE International Joint Conference onBiometrics, 2014, pp. 1-8.
[2] P. Viola and M. J. Jones, “Robust real-time face detection.Internationaljournal of
computervision,” vol. 57, no. 2, pp. 137-154, 2004
[3] M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonalintegration and its
application in extending haar-like features to improveobject detection,” in IEEE Conference on
Computer Vision and PatternRecognition, 2010, pp. 942-949.
[4] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detectionusing a cascade of
histograms of oriented gradients,” in IEEE ComputerConference on Computer Vision and
Pattern Recognition, 2006, pp.1491-1498.
[5] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses toface detection: A deep
learning approach,” in IEEE International Conferenceon Computer Vision, 2015, pp. 3676-3684.
[6] Zhang K, Zhang Z, Li Z, et al. “Joint Face Detection and Alignment Using Multitask Cascaded
Convolutional Networks,” IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[7] Y. Sun, X. Wang, and X. Tang. Deep Convolutional Network Cascade for Facial Point
Detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2013
[8] B. Amberg ,T. Vetter. Optimal landmark detection using shape models and branch and bound.
In Proc. ICCV, 2011.
[9] P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. Localizing parts of faces using
a consensus of exemplars. In Proc. CVPR, 2011.
[10] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encodernetworks (CFAN) for
real-time face alignment,” in European Conferenceon Computer Vision, 2014, pp. 1-16.
[11] X. Zhu and D. Ramanan. Face detection, pose estimation,and landmark localization in the wild.
In Proc. CVPR, 2012.
[12] X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression. In Proc.
CVPR,2012.
[13] Suykens J A K, Vandewalle J. “Least Squares Support Vector Machine Classifiers,” Neural
Processing Letters, 1999, 9(3):293-300.
[14] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neuralnetwork cascade for face
detection,” in IEEE Conference on ComputerVision and Pattern Recognition, 2015, pp. 5325-
5334.
[15] K. He, X. Zhang, S. Ren, J. Sun, “Delving deep into rectifiers: Surpassinghuman-level
performance on imagenet classification,” in IEEE InternationalConference on Computer Vision,
2015, pp. 1026-1034.
[16] D.Shi, “Face image information processing and recognition,” Electronic Industry Press, 2010.
[17] Hu Changhui, Lu Xiaobo, Ye Mengjun, Zeng Weili. “Singular value decomposition and local
near neighbors for face recognition under varying illumination,” Pattern Recognition, 2017,64:
60-83.
[18] W. Zeng and X. Lu, “Region-based nonlocal means algorithm for noise removal,” Electronics
Letters, vol. 47, pp. 1125-1127, 2011.
[19] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neuralnetwork cascade for face
detection,” in IEEE Conference on ComputerVision and Pattern Recognition, 2015, pp. 5325-
5334.