Eye and Mouth State Detection Algorithm Based On C
Eye and Mouth State Detection Algorithm Based On C
net/publication/323072494
Eye and mouth state detection algorithm based on contour feature extraction
CITATIONS READS
47 2,058
5 authors, including:
Yingyu Ji
Huawei Technologies
3 PUBLICATIONS 91 CITATIONS
SEE PROFILE
All content following this page was uploaded by Yingyu Ji on 14 August 2019.
Yingyu Ji
Shigang Wang
Yang Lu
Jian Wei
Yan Zhao
Yingyu Ji, Shigang Wang, Yang Lu, Jian Wei, Yan Zhao, “Eye and mouth state detection
algorithm based on contour feature extraction,” J. Electron. Imaging 27(5),
051205 (2018), doi: 10.1117/1.JEI.27.5.051205.
Downloaded From: https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging on 13 Aug 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Journal of Electronic Imaging 27(5), 051205 (Sep∕Oct 2018)
Abstract. Eye and mouth state analysis is an important step in fatigue detection. An algorithm that analyzes the
state of the eye and mouth by extracting contour features is proposed. First, the face area is detected in the
acquired image database. Then, the eyes are located by an EyeMap algorithm through a clustering method to
extract the sclera-fitting eye contour and calculate the contour aspect ratio. In addition, an effective algorithm is
proposed to solve the problem of contour fitting when the human eye is affected by strabismus. Meanwhile, the
value of chromatism s is defined in the RGB space, and the mouth is accurately located through lip segmenta-
tion. Based on the color difference of the lip, skin, and internal mouth, the internal mouth contour can be fitted to
analyze the opening state of mouth; at the same time, another unique and effective yawning judgment mecha-
nism is considered to determine whether the driver is tired. This paper is based on the three different databases
to evaluate the performance of the proposed algorithm, and it does not need training with high calculation effi-
ciency. © The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction
of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JEI.27.5.051205]
Keywords: eye state; mouth state; contour feature extraction; fatigue detection.
Paper 170761SS received Sep. 7, 2017; accepted for publication Jan. 16, 2018; published online Feb. 9, 2018.
Fig. 1 Block diagram of the proposed eye and mouth state analysis L
algorithm.
where gðx; yÞ represents the ball structuring element and
and Θ denote the grayscale dilation and erosion operations.
Then, EyeMapC is multiplied by EyeMapL to obtain
we use the algorithm of Ref. 21 for color compensation. The EyeMap
facial area is extracted from the database images to obtain the
image of the eye and mouth region. Thus, we use the Viola EyeMap ¼ EyeMapC × EyeMapL:
EQ-TARGET;temp:intralink-;e003;326;520 (3)
Jones face detection algorithm.22 The Viola Jones face detec-
tion algorithm is a method based on an integral graph, cas- EyeMap of a typical image (from the California
cade classifier, and Adaboost algorithm, which greatly Polytechnic University color face database) is constructed as
improves the speed and accuracy of face detection. shown in Fig. 2. Among them, the original eyes’ ROI is as
shown in Fig. 2(a) and EyeMapC, EyeMapL, and EyeMap
2.2 Eye Detection are as shown in Figs. 2(b)–2(d), respectively.
There is a fixed connection among facial features. For exam- In order to accurately locate the eye region, the optimal
ple, the eyes are set in the upper part of the face and the threshold T is obtained by leveraging the OTSU algorithm to
mouth is located in the lower part of the face. In order to convert the EyeMap gray image into a binary image, as
improve the accuracy and speed of detection, our algorithm shown in Fig. 3(a). We analyze the aspect ratio, position, and
determines the region of interest (ROI) of the eyes and other characteristics of every connected component (white
mouth, and then detects the target on the ROI region. After part), to exclude the noneye region, and finally consider a
obtaining the facial image, the upper half of the image is pair of connected components as the eye region, as shown
extracted and recorded as image I 1 , the upper one-eighth in Fig. 3(b). If there is no pair of connected domains, then
of image I 1 is removed, and the lower seven-eighths of the threshold is reduced based on the optimal threshold value
image I 1 is reserved and set as the eye ROI, as shown in T and redetected. Experiments demonstrate that eye length is
Fig. 2(a). In this ROI, we use the EyeMap algorithm23 to approximately half of the distance between the center of
locate the eye region. This method builds two EyeMaps eyes, and eye height is approximately half of the eye length.
Fig. 2 Illustration of EyeMap construction: (a) the original eye ROI region. (b) EyeMapC. (c) EyeMapL.
(d) EyeMap.
Fig. 3 Different steps of our eye detection method: (a) EyeMap image after binarization. (b) A pair of
connected components. (c) Calibrate eye region with a rectangular box.
Fig. 4 Different steps of our mouth detection method: (a) Initial mouth ROI. (b) Mouth opens widely.
(c) Final mouth ROI. (d) Selected 20% pixels.
Therefore, we locate the region of the left and right eyes, 3 Contour Feature Extraction
with a rectangular box calibration, as shown in Fig. 3(c). After locating the eye and mouth region, we judge the state
of the eyes as open or closed by extracting the eye contour
and analyzing the open state of the mouth by extracting the
2.3 Mouth Detection mouth internal contour.
To improve the speed and accuracy of mouth detection, we
set the ROI based on the characteristics of the mouth distri- 3.1 Eye Contour Extraction
bution in the face region. Saeed and Dugelay26 proposed that
the mouth ROI was the lowest one-third of the detected face 3.1.1 Sclera extraction
region. The lower one-third of the face image is extracted and Sclera is the white part of the eye. Based on the difference
recorded as the image I 2 , and the middle half of the image I 2 between the sclera and skin saturation, the large difference
is extracted and set as the mouth ROI, as shown in the green between the red and blue components of the skin and the
box in Fig. 4(a). However, when the mouth opens widely small relative difference between the sclera regions, the
(yawning), we cannot obtain a complete mouth region, as sclera region is segmented by a K-means clustering method.
shown in Fig. 4(b). When the height of the facial region is First, we exclude the impact of the iris29,30 and eyelashes;
expanded one-fifth downward, we obtain the complete eyebrows are included in certain instances. Given that the
mouth region, as shown in Fig. 4(c). However, when the gray value of the iris and eyelash region is the smallest
mouth opens narrowly, the ROI is too large, and it will affect and the scleral gray value is larger than the skin region,
the extraction of the mouth internal contour; thus, it is nec- we obtain the best segmentation threshold T via the
essary to accurately locate the mouth. Based on the differ- OTSU algorithm, the threshold segmentation of the image
ence between the colors of the lips and the skin, the mouth on the basis of threshold T and then divide the eye into
region is precisely positioned according to lip segmentation, two parts: the iris and eyelash region as shown in the
and we split the lips according to the value of chromatism s blue region in Fig. 5(a). In the rest of the sclera and the
of the RGB space.27 The value of chromatism s is defined as skin region, we use the difference between the red compo-
follows: nent R and blue component B between the sclera and the skin
(R-B) and cluster them into two parts according to a
−1 R−G K-means clustering characteristic. The final eye region is di-
s ¼ 2 × tan ∕π: (4)
R vided into three parts, as shown in Fig. 5(b). According to the
EQ-TARGET;temp:intralink-;e004;63;333
Fig. 6 Illustration of contour fitting: (a) Upper and lower eyelids after 3.2 Mouth Internal Contour Extraction
contour fitting. (b) Minimum circumscribed rectangle. In the RGB space, compared with the skin, the difference
between the red component R and green component G of
the lips is larger.31 Considering that the mouth is open, par-
minimum circumscribed rectangle of the eye contour is cal-
ticularly when a person yawns, the RGB value of the internal
culated, as shown in Fig. 6(b), and the aspect ratio of the
part of the mouth is in balance, even if the teeth are exposed.
rectangle is used to determine whether the eye is open or
The relationship between the R − G value of the mouth, lips,
closed. The details are described in the next section.
and skin pixels is as follows: lips > skin > mouth internal.
Owing to the fact that the difference of the lips’ R component
3.1.3 Special circumstances–eye strabismus and G component is the largest, we can effectively separate
There is another special case: when the human eye is stra- the internal part of the skin and the mouth. We set the adap-
bismus, as shown in Fig. 7(a), the sclera of the iris side and tive threshold T according to the following formula via
the effect can yield a poor fit to the eye contour with the threshold segmentation to obtain the binary image, as shown
aforementioned method. By extracting the boundary points in Fig. 8(a):
of the sclera and iris to fit the contour of the eye, we first
estimate the center of the iris. When the sclera is in the 255R − G ≤ T
hðx; yÞ ¼ : (5)
0R − G > T
EQ-TARGET;temp:intralink-;e005;326;528
Fig. 7 Different steps of contour fitting when the eye is strabismus: (a) Original image. (b) Iris center and
sclera detection. (c) Image after intercepting part of iris. (d) Binarized image. (e) Contour fitting.
Fig. 8 Illustration of mouth internal contour extraction steps: (a) Binary image after threshold segmen-
tation. (b) Mouth inner area. (c) Mouth contour fitting. (d) Minimum circumscribed rectangle of the mouth
contour.
3. Determine whether the skin value is greater than than 0.1125 when the eyes are closed, i.e., the T value should
1∕20th of the total number of pixels of the image, and be 0.1125. However, the experimental results show that
if so, T is the maximum value of R − G in these when M < 0.15, i.e., T ¼ 0.15, the eye is in a closed state.
selected pixels. Otherwise, proceed to the next step; In this case, the judgment of the eye state is more accurate,
4. Select n pixels with a small R − G value from the with an accuracy rate of 98.67%. In addition, when the eye is
remaining pixels where n is 1∕15th of the total number completely closed, the eye region through the OTSU algo-
of pixels of the image, merge into the previously rithm is divided into two parts, and this time, we cannot
selected pixels, and calculate the skin value according detect the sclera region. Therefore, when the sclera region is
to the method in 2. Then, return to step 3. not detected or the eyelid closure value M < T, the eye is
considered to be in a closed state.
The mouth image we obtain is located in the center of the
region; therefore, if the center of mass of the connected com- 4.1.2 Mouth state analysis
ponent (white part) is near the center of the image, the con-
According to the internal contour of the mouth, the mouth
nected component is regarded as the inner area of the mouth.
opening degree N is defined by the aspect ratio of the exter-
The internal area of the mouth is obtained by position analy-
nal rectangle
sis of each connected component, as shown in Fig. 8(b). The
extracted external contour of the connected component is the H1
internal contour of the mouth, as shown in Fig. 8(c). In addi- N¼ ; (8)
L1
EQ-TARGET;temp:intralink-;e008;326;582
and compare them with the state obtained from the algo-
For the determination of the threshold T, we use the P80 rithm. The experimental results demonstrate that the algo-
standard in the PERCLOS parameter; when the degree of eye rithm can fit the contour of the eyes with different opening
closure is more than 80%, the eye is in a closed state. In order states, similar to the experimental results shown in Table 1,
to evaluate the eyelid aspect ratio when the eyes open nor- and six fitting effect diagrams shown in Fig. 9, including the
mally, we performed related experiments. First, we collected situation of different lighting conditions, different groups,
eye images from different people whose eyes are open nor- different open eye sizes, and wearing glasses.
mally in the images, and then calculated the eyelid aspect In the CIT and FERET databases, there are many pictures
ratio for each image. Experimental results show that the eye- where the teeth are exposed or with a beard. Under this
lid aspect ratio is ∼16∶9 when the eye is open normally. situation, the gray-projection algorithm to detect the mouth
According to the P80 standard, the eyelid aspect ratio is less state may lose its accuracy;14 however, the algorithm discussed
in this paper can obtain a better result of the internal contour before face detection, which reduces the impact of highlights
of the mouth, as the experimental results shown in Table 2. and shadows on the experiment. Thus, the proposed algo-
Then, we provide six fitting effect diagrams of the mouth rithm exhibits robustness against illumination changes. The
contour as shown in Fig. 10, including the situation of differ- performance of the algorithm will be reduced for images
ent lighting conditions, beards, different opening sizes, and obtained under dramatic lighting changes. In addition, there
exposed teeth. are several mouth pictures where the teeth are exposed or
In fact, the pictures in the database were obtained under with a beard. Owing to the large differences between the
different lighting conditions. Reference 14 uses the projec- gray value of teeth and beards and the skin color, the pro-
tion method to determine the eye and mouth state, that is, the jection method does not work well in this case. Our algo-
image is projected horizontally and vertically to calculate the rithm takes into account the color difference between teeth,
sum of the gray values of the pixels in the horizontal and beard, and skin and determines the degree of mouth opening
vertical directions. This method is greatly affected by the by obtaining the internal contour of the mouth. The exper-
light intensity, and the detection effect is decreased signifi- imental results show that the detection effect is significantly
cantly under uneven illumination. The algorithm proposed in improved, and the performance comparison is shown in
this paper performs illumination compensation on the image Tables 3 and 4.
Table 3 Performance comparison. 7. Z. Wang et al., “The effect of a haptic guidance steering system on
fatigue-related driver behavior,” IEEE Trans. Hum. Mach. Syst.
47(5), 741–748 (2017).
8. P. N. Bhujbal and S. P. Narote, “Lane departure warning system based
Eye state recognition accuracy on Hough transform and Euclidean distance,” in Third Int. Conf. on
Image Information Processing (ICIIP), pp. 370–373, IEEE, Waknaghat
Number of Proposed Projection method (2015).
images method in Ref. 14 9. R. H. Zhang et al., “Vehicle detection method for intelligent vehicle at
night time based on video and laser information,” Int. J. Pattern
254 97.64% 92.52% Recognit. Artif. Intell. 32(4), (2017).
10. R. H. Zhang et al., “Study on self-tuning tyre friction control for devel-
oping main-servo loop integrated chassis control system,” IEEE Access
5(99), 6649–6660 (2017).
11. E. J. Delp, “Video-based real-time surveillance of vehicles,” J. Electron.
Table 4 Performance comparison. Imaging 22(4), 041103 (2016).
12. F. You et al., “Monitoring drivers’ sleepy status at night based on
machine vision,” Multimedia Tools Appl. 76(13), 14869–14886 (2017).
Mouth state recognition accuracy 13. Y. F. Lu and C. L. Li, “Recognition of driver eyes’ states based on vari-
ance projections function,” in 3rd Int. Congress on Image and Signal
Processing, pp. 1919–1922, IEEE, China (2010).
Number of Proposed Projection method 14. M. Omidyeganeh, A. Javadtalab, and S. Shirmohammadi, “Intelligent
images method in Ref. 14 driver drowsiness detection through fusion of yawning and eye closure,”
in IEEE Int. Conf. Virtual Environments, Human-Computer Interfaces
378 96.56% 91.27% and Measurement Systems Proc., pp. 1–6, IEEE, Ottawa (2011).
15. L. N. Jia et al., “Smartphone-based fatigue detection system using
progressive locating method,” Inst. Eng. Technol. 10(3), 148–156 (2016).
16. B. Mandal et al., “Towards detection of bus driver fatigue based on
5 Conclusion robust visual analysis of eye state,” IEEE Trans. Intell. Transp. Syst.
In this paper, we proposed a method for detecting the eye and 18(3), 545–557 (2017).
17. A. Punitha, M. K. Geetha, and A. Sivaprakash, “Driver fatigue moni-
mouth state by extracting contour features. In each step, we toring system based on eye state analysis,” in Int. Conf. Circuits, Power
presented new algorithms and modifications to achieve better and Computing Technologies, pp. 1405–1408, IEEE, India (2014).
18. C. Du and S. Gao, “Image segmentation-based multi-focus image fusion
results. The eye contour is fitted by extracting the sclera bor- through multi-scale convolutional neural network,” IEEE Access 5(99),
der points, and the eyelid closure value M is defined accord- 15750–15761 (2017).
ing to the smallest circumscribed rectangle of the eye contour 19. L. Zhao et al., “Human fatigue expression recognition through image-
based dynamic multi-information and bimodal deep learning,”
to determine whether the eye is open or closed. When the J. Electron. Imaging 25(5), 053024 (2016).
mouth is open, the internal contour of the mouth is extracted 20. R. Kavi et al., “Multiview fusion for activity recognition using deep
according to its external rectangle to define the mouth open- neural networks,” J. Electron. Imaging 25(4), 043010 (2016).
21. Y. Y. Liu et al., “Fast robust face detection under a skin color model with
ing degree N for analyzing the mouth open state. The exper- geometry constraints, in Int. Conf. on Computational Intelligence and
imental results demonstrate that the proposed algorithm has a Security, pp. 515–519, IEEE, Beijing (2009).
22. P. Viola and M. J. Jones, “Robust real-time face detection,” in Proc. of
high accuracy rate in different environments, and it does not Eighth IEEE Int. Conf. on Computer Vision, pp. 747–747, IEEE,
require training data to improve the computational efficiency. Canada (2001).
The proposed algorithm can be used in future energy 23. R. L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, “Face detection in color
images,” IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 696–706 (2002).
vehicles,32 intelligent vehicles, and advanced driver assistant 24. V. H. Jimenez-Arredondo, J. Cepeda-Negrete, and R. E. Sanchez-
systems and can be further analyzed to determine whether a Yanez, “Multilevel color transfer on images for providing an artistic
driver is fatigued by detecting the state of the eyes and mouth. sight of the world,” IEEE Access 5, 15390–15399 (2017).
25. H. Kalbkhani, M. G. Shayesteh, and S. M. Mousavi, “Efficient algo-
When driver fatigue is detected, it can issue a warning to rithms for detection of face, eye and eye state,” IET Comput. Vision
remind the driver to pay attention to driving in order to reduce 7(3), 184–200 (2013).
26. U. Saeed and J. L. Dugelay, “Combining edge detection and region seg-
traffic accidents. Our future work is to build a fatigue model, mentation for lip contour extraction,” Lect. Notes Comput. Sci. 6169,
which can provide a warning signal to remind the driver when 11–20 (2010).
he/she appears to be demonstrating fatigue symptoms. 27. J. Pan, Y. Guan, and S. Wang, “A new color transformation based fast
outer lip contour extraction,” J. Inf. Comput. Sci. 9(9), 2505–2514
(2012).
Acknowledgments 28. H. Talea and K. Yaghmaie, “Automatic combined lip segmentation in
color images,” in IEEE 3rd Int. Conf. on Communication Software and
This work was supported by the State Key Program of Networks, pp. 109–112, IEEE, China (2011).
National Natural Science Foundation of China (61631009) 29. T. N. Tan, Z. F. He, and Z. N. Sun, “Efficient and robust segmentation of
noisy iris images for non-cooperative iris recognition,” Image Vision
and the Jilin province science and technology development Comput. 28(2), 223–230 (2010).
project of China (20150204006GX). 30. A. Mollahosseini, A. D. Anjos, and H. R. Shahbazkia, “Accurate extrac-
tion of iris and eye corners for facial biometrics,” in Int. Conf. on
Environment and BioScience, pp. 1–5, IACSIT Press, Singapore (2011).
References 31. V. E. C. Ghaleh and A. Behrad, “Lip contour extraction using RGB
color space and fuzzy c-means clustering,” in IEEE 9th Int. Conf. on
1. P. R. Tabrizi and R. A. Zoroofi, “Open/closed eye analysis for drowsi- Cybernetic Intelligent Systems, pp. 1–4, IEEE, United Kingdom (2010).
ness detection,” in 1st Workshops on Image Processing Theory, Tools
and Applications, pp. 1–7, IEEE, Sousse, Tunisia (2008). 32. R. H. Zhang et al., “Exploring to direct the reaction pathway for hydrog-
2. A. Jain et al., “Fatigue detection and estimation using auto-regression enation of levulinic acid into gamma-valerolactone for future clean-
analysis in EEG,” in Int. Conf. on Advances in Computing, Commu- energy vehicles over a magnetic Cu-Ni catalyst,” Int. J. Hydrogen
Energy 42(40), 25185–25194 (2017).
nications and Informatics, pp. 1092–1095, IEEE, Jaipur (2016).
3. D. G. K. Madusanka et al., “Hybrid vision based reach-to-grasp task
planning method for trans-humeral prostheses,” IEEE Access 5(99), Yingyu Ji received his BS degree from the School of Information
16149–16161 (2017). Technology, Hebei University of Economics and Business, Hebei,
4. X. Q. Huo, W. L. Zheng, and B. L. Lu, “Driving fatigue detection with China, in 2016. He is currently pursuing his MS degree with the
fusion of EEG and forehead EOG,” in Int. Joint Conf. on Neural College of Communication Engineering of Jilin University. His major
Networks (IJCNN), pp. 897–904, IEEE, Canada (2016).
5. X. Song et al., “The anti-fatigue driving system design based on the eye research interests include pattern recognition and image processing.
blink detect,” Proc. SPIE 10322, 103221R (2017).
6. M. Ramasamy and V. K. Varadan, “Real-time monitoring of drowsiness Shigang Wang received his BS degree from Northeastern University
through wireless nanosensor systems,” Proc. SPIE 9802, 98021G (2016). in 1983, his MS degree in communication and electronics from Jilin
University of Technology in 1998, and his PhD in communication and University in 2011, and his PhD in informatics from Tuebingen
information system from Jilin University in 2001. Currently, he is a pro- University in 2016. Currently, he is working in the Department of
fessor of communication engineering. His research interests include Communication Engineering, Jilin University. His research interests
multidimensional signal processing and stereoscopic, multiview video include multiview stereo and 3D display technology.
coding, and so on.
Yan Zhao received her BS degree in communication engineering
Yang Lu received her BS degree from the College of Communication from Changchun Institute of Posts and Telecommunications in
Engineering, Jilin University, Jilin, China, in 2014. Currently, she is 1993, her MS degree in communication and electronic from Jilin
pursuing her PhD in the College of Communication Engineering of University of Technology in 1999, and her PhD in communication
Jilin University. Her major research interests include pattern recogni- and information system from Jilin University in 2003. Currently, she
tion and image processing. is a professor of communication engineering. Her research interests
include image and video processing, image and video coding, and
Jian Wei received his BS degree from Jilin University in 2008, his so on.
MS degree in communication and information systems from Jilin