Biomedical Signal Processing and Control
Biomedical Signal Processing and Control
Biomedical Signal Processing and Control
a r t i c l e i n f o a b s t r a c t
Article history: One of the most common signs of tiredness or fatigue is yawning. Naturally, identification of fatigued
Received 11 August 2014 individuals would be helped if yawning is detected. Existing techniques for yawn detection are centred on
Received in revised form measuring the mouth opening. This approach, however, may fail if the mouth is occluded by the hand, as it
10 December 2014
is frequently the case. The work presented in this paper focuses on a technique to detect yawning whilst
Accepted 9 February 2015
also allowing for cases of occlusion. For measuring the mouth opening, a new technique which applies
Available online 6 March 2015
adaptive colour region is introduced. For detecting yawning whilst the mouth is occluded, local binary
pattern (LBP) features are used to also identify facial distortions during yawning. In this research, the
Strathclyde Facial Fatigue (SFF) database which contains genuine video footage of fatigued individuals is
used for training, testing and evaluation of the system.
© 2015 Elsevier Ltd. All rights reserved.
1. Introduction involuntary action where the mouth opens wide and, for this
reason, yawning detection research focuses on measuring and clas-
Fatigue is usually perceived as the feeling of tiredness or drowsi- sifying this mouth opening. Frequently, however, this approach
ness. In a work related context, fatigue is a state of mental and/or is thwarted by the common human reaction to hand-cover the
physical exhaustion that reduces the ability to perform work safely mouth during yawning. In this paper, we introduce a new approach
and effectively. Fatigue can result due to a number of factors includ- to detect yawning by combining mouth opening measurements
ing inadequate rest, excessive physical or mental activity, sleep with a facial distortion (wrinkles) detection method. For mouth
pattern disturbances, or excessive stress. There are many condi- opening measurements a new adaptive threshold for segmenting
tions which can affect individuals and which are considered directly the mouth region is introduced. For yawning detection with the
related to fatigue such as visual impairment, reduced hand–eye mouth covered, local binary patterns (LBP) features and a learning
coordination, low motivation, poor concentration, slow reflexes, machine classifier are employed. In order to detect the wrinkles,
sluggish response or inability to concentrate. Fatigue is consid- the edge detector Sobel operator [30] is used. Differently from [32],
ered as the largest contributor to road accidents, leading to loss where the mouth covered detection technique is applied to static
of lives. Data for 2010 from the Department of Transport, UK images only, the yawn analysis approach proposed in this paper
[1], points to 1850 people killed, and 22,660 seriously injured, describes a complete system for yawn detection in video sequences.
with fatigue contributing to 20% to the total number of accidents In this research, genuine yawning fatigue video data is used for
[2]. US National Highway Traffic Safety estimates suggest that training, testing and evaluation. The data is from the Strathclyde
approximately 100,000 accidents each year are caused by fatigue Facial Fatigue (SFF) video database, which was explicitly created
[3]. to aid this research, and which contains series of facial videos
Fatigue is identifiable from human physiology such as eye and of sleep deprived fatigued individuals. The, ethically approved,
mouth observations, brain activity, and by using electrocardio- sleep deprivation experiments were conducted at the University
grams (ECG) measuring, for example, heart rate variability. The of Strathclyde, with the aid of twenty volunteers under controlled
physical activities and human behaviour may also be used to conditions. Each of the 20 volunteers was sleep deprived for peri-
identify fatigue [4–6]. In this paper yawning, indicating fatigue, ods of 0, 3, 5 and 8 h on separate occasions. During each session the
is detected and analysed from video sequences. Yawning is an participants’ faces were recorded while they were carrying out a
series of cognitive tasks.
The remainder of this paper is organised as follows. Section 2
∗ Corresponding author. Tel.: +44 (0) 141 548 4458. discusses the related work of this research, while Section 3 explains
E-mail address: [email protected] (G. Di Caterina). the SFF database. Section 4 discusses the overall system, while
http://dx.doi.org/10.1016/j.bspc.2015.02.006
1746-8094/© 2015 Elsevier Ltd. All rights reserved.
M.M. Ibrahim et al. / Biomedical Signal Processing and Control 18 (2015) 360–369 361
analysis and results are reported in Sections 5 and 6, respectively. detect the mouth region. The yawn is classified using colour prop-
Conclusions to the paper are provided in Section 7. erties to measure the breadth of the mouth opening. Elsewhere,
Gabor wavelet features that represent texture have been used by
Xiao et al. [9] for evaluating the degree of mouth opening. The tex-
2. Related work ture features are extracted from the corner regions of the mouth.
The texture of the corner is different for mouth opening and clos-
Yawning is a symptom indicated visually by a mouth open state. ing. The yawn is classified using linear discriminant analysis (LDA).
Yawning detection research focuses mostly on the measurement Yang et al. [17] employed a back propagation neural network for
of the mouth opening [7–9]. The algorithms used must there- yawn classification in three states: mouth closed, normal open, and
fore be able to differentiate between normal mouth opening and wide open. The mouth region features are extracted based on the
yawning. Yawn detection can be generally categorised into three RGB colour properties. For this approach a huge number of datasets
approaches; feature based, appearance based, and model based. In for images is required in order to obtain adequate results.
the feature based method, the mouth opening can be measured The model based approach requires that the mouth or lips are
by applying several features such as colour, edges and texture, modelled first. An active shape model (ASM) for instance, requires
which are able to describe the activities of the mouth. Commonly, a set of training images and, in every image, the important points
there are two approaches to measure the mouth opening, mainly that represent the structure of the mouth need to be marked. Then,
by tracking the lips movement and by quantifying the width of the marked set of images is used for training using a statistical
the mouth. Since the mouth is the wider facial component, colour algorithm to establish the shape model.
is one of the prominent features used to distinguish the mouth García et al. [18] and Anumas et al. [19] train the structure
region. Yao et al. [10] transfer RGB colour space to LaB colour using ASM, and the yawn is measured based on the lips open-
space in order to use the ‘a’ component, which is found as accept- ing. Hachisuka et al. [20] employ active appearance model (AAM),
able for segmenting the lips. The lips represent the boundary of which annotates each essential facial point. For yawning detection
the mouth opening that is to be measured. YCbCr colour space three points are defined to represent the opening of the mouth. In
was chosen by Omidyeganeh [11] for their yawning detection. this approach a lot of images are required since every person has
In that work, the claim is that colour space is able to indicate different facial structure.
the mouth opening area, which is the darkest colour region, by
setting a certain threshold. Yawning is detected by computing
3. Strathclyde Facial Fatigue (SFF) database
the ratio of the horizontal and vertical intensity projection of the
mouth region. The mouth opening is detected as yawn when this
The Strathclyde Facial Fatigue database was developed in order
ratio is above a set threshold. Implementation using the colour
to obtain genuine facial signs of fatigue. Most of the fatigue related
properties is easy but, to make the algorithm robust, several chal-
researchers have used their own datasets for assessing their devel-
lenges must be addressed. The lips colour is different for everyone,
oped system. For example, Vural et al. [21] developed their fatigue
and due to the different lighting conditions it is necessary to ensure
database by using video footage from a simulated driving environ-
that the threshold value is adapted to the changes.
ment. The participants had to drive for over 3 h and their faces were
Edges are another mouth feature able to represent the shape
recorded during that time. Fan et al. [22] recruited forty partici-
of the mouth opening. Alioua et al. [12] extracted the edges from
pants and recorded their faces for several hours in order to obtain
the differences between the lips region and the darkest region of
facial signs of fatigue. Some researchers only test their algorithms
the mouth. The width of the mouth is measured in consecutive
based on video footage of persons who pretend to experience
frames, and yawn is detected when the mouth is continuously
fatigue symptoms. For example, researchers in [8,11,12] detected
opening widely more than a set number of times. When using the
yawn based on the width of the mouth opening and the algorithms
edges to determine yawn, the challenge is that the changes of the
were not tested on subjects experiencing natural yawning. The SFF
edges are proportional to the illumination changes, making it dif-
database provides a genuine medium of facial fatigue video footage
ficult to set the parameter of an edge detector. In order to locate
from sleep deprivation experiments which involved twenty partic-
accurately the mouth boundary, the active contour algorithm can
ipants.
be used as implemented in [8]. However, the computational cost
The sleep deprived volunteers were 10 male and 10 female with
of this approach is very high. The mouth corners have also been
ages ranging between 20 to 40 years old. Each volunteer had to per-
employed by Xiao et al. [9] to detect yawn. The mouth corners can
form a series of cognitive tasks during four separate experimental
be reference points in order to track the mouth region. Yawning
sessions. Prior to each session, the participants were sleep deprived
is classified based on the change of the texture in the corner of
for 0 h (no sleep deprivation), 3, 5 and 8 h under controlled condi-
the mouth while the mouth is opened widely. Moreover, as in the
tions. The cognitive tasks were designed to test (a) simple attention,
case implemented for measuring eye activities, Jiménez-Pinto et al.
(b) sustained attention and (c) the working memory of the partici-
[13] have detected the yawn based upon the salient points in the
pants. These cognitive tasks are associated with sensitivity to sleep
mouth region. These salient points introduced by Shi and Tomasi
loss and are designed to accelerate fatigue signs. Fig. 1 shows some
[14] are able to describe the motion of the lips movement, and yawn
examples of video footage in the SFF database.
is determined by examining the motion in the mouth region. In
[7,11] yawn is detected based on a horizontal profile projection.
The height of the mouth opening in profile projection represents 4. System overview
the mouth yawn. However, this approach is inefficient for beard
and moustache bearing individuals. A block diagram of the new yawn analysis system is shown in
In appearance based method, statistical learning machine algo- Fig. 2. The system begins with an initialisation operation carried out
rithms are applied and distinctive features need to be extracted in in the first frame of the video. This consists of face acquisition and
order to train the algorithms. Lirong et al. [15] combined Haar-like region initialisation algorithms. During face acquisition, the face,
features and variance value in order to train their system to classify eye and mouth detection algorithms are performed sequentially
the lips using support vector machine (SVM). With the additional before a region of interest initialisation algorithm is executed.
integration of a Kalman filter their algorithm is also able to track Two regions of interest are initialised: focused mouth region
the lip region. Lingling et al. [16] also applied Haar-like features to (FMR) and focused distortion region (FDR). These two regions are
362 M.M. Ibrahim et al. / Biomedical Signal Processing and Control 18 (2015) 360–369
Faace acquisition
Face Detection
Eye Detection
Mouth Detection
Regions Interest
Initialisation
Fig. 3. (a) Anthropometric measurement; (b) focused mouth region (FMR); (c)
Yawn analysis operations focused distortion region (FDR).
Fig. 2. Yawn analysis system overview. From this information the coordinates of the FMR (Fig. 3(b)) are
empirically defined as:
x1 xR
to be tracked in all the following frames. The FMR region is used = (2)
in measuring the mouth opening and to detect when the mouth y1 yR + 0.75EMD
is covered, and FDR is used in measuring the distortions during
yawning. x2 xL
= (3)
y2 yL + 0.75EMD + 0.8ED
4.1. Region of interest evaluation where (xR , yR ) and (xL , yL ) are the centre points of the right eye iris
and left eye iris, respectively. The FMR depends on the location and
As is indicated in [32] a facial region of interest needs to be distance of the eyes. When the face moves forward the eye dis-
established first. Fig. 3 depicts a face acquisition operation where tance increases and so does the FMR. Conversely, the FMR reduces
the eyes and mouth are sequentially detected using a Viola Jones following a backward move.
technique [23,24]. Training is performed through a cascade clas-
sifier applied to every sub-window in the input image in order 4.1.2. Focused distortion region (FDR)
to increase detection performance and reduce computation time. The focused distortion region (FDR) is a region in the face where
The SFF video footage provided the large number of face, eyes changes of facial distortion occur. The region, which is most likely
and mouth data needed to train the Haar-like features [25,26]. to undergo distortions during yawning, was identified based on
In the present method, the face region is first detected and then, conducted experiments using the SFF database. The coordinates of
the eyes and mouth areas are identified within it. The acquisition the FDR (Fig. 3(c)), are empirically defined as:
of the distinctive distances between these facial components is
effected through anthropometric measurements, thus accounting x1 xR
for human facial variability. This information is then used in the = (4)
y1 yR − 0.75EMD
formation of the two following regions of interest.
x2 xL
= (5)
4.1.1. Focused mouth region (FMR) y2 yL + 0.75EMD
This is formed based upon the detected locations of the eyes and
mouth. The distance between the centre of the eyes (ED), and the The threshold values 0.75 and 0.8 used in Eqs. (2)–(5) are based
distance between the centre of mouth and the mid-point between on observation tests, undertaken on the SFF database.
the eyes (EMD) are obtained (Fig. 3(a)). The ratio of these distances
REM is then computed as: 4.2. Yawn analysis operations
Fig. 4. A linear transform that remaps intensity level input image in between Gmin
and Gmax .
3000
40
20
1000 total number of index pixels in the histogram bins between 0 and
10
255 as illustrated in Fig. 5(b).
0 0
0 50 100 150 200 250 0 50 100 150 200 250
(a) (b)
i
Hi = hj (7)
Fig. 5. (a) Enhanced image histogram; (b) cumulative histogram. j=1
Fig. 8. The measurement of high of mouth opening. Fig. 11. Example of extended LBP operator. (a) (8,1); (b) (16,2) and (c) (24,3) neigh-
bourhoods.
where ic and iP denote gray level values of the central pixel and
P represent surrounding pixels in the circle neighbourhood with
radius R. Function s(x) is defined as:
1, if x ≥ 0
s(x) = (12)
0, if x < 0
LBPri
P,R = min ROR LBPP,R , i i = 0, 1, ..., P − 1 (13)
Fig. 14. FDR with input image and edges detected image. (a) Normal; (b) yawn.
Fig. 12. LBP histogram for not covered mouth and covered mouth regions.
FDRSAD = I1 (i, j) − I2 (x + i, y + j) (16)
i,j
FDRSAD
Normalised FDRSAD = (17)
255 × W × H
Fig. 13. (a) Focused distortion region (FDR); (b) normal condition in FDR; (c) yawn-
ing in FDR. 5. Yawn analysis
Mouth Opening
(mo)
Covvered No
mo outh mo >
deteection
Yes
YAWN
Mouth
vered
cov No
0.5. Based on the results obtained and shown in Fig. 19, for mouth
covered classification the LBP uniform pattern (u2) with radius 16
and a neural network classifier with 5 neurons in the hidden layer
are chosen to determine the status of FMR.
Genuine yawning images from sequences of 30 video clips
from the SFF database are used for evaluation of the algorithm’s
performance. These videos contain 10 yawning sequences with
non-covered mouth scenes, 14 with covered mouth scenes and
6 scenes with both situations present. Examples of the detection
results in each situation of yawning are shown in Figs. 19–20.
When the mouth is not occluded (Fig. 19), yawning is detected
when the height of the mouth opening is equal to or greater than the
threshold value. The yawning period is measured from the begin-
ning of the intersection point between the threshold value and
the ratio of the mouth opening height to FMR height RY . When
the RY value is below the threshold (Fig. 20), this could be due
to mouth being covered during yawning. In this case, the mouth
covered detection part of the system is triggered. If mouth occlu-
sion is detected, facial distortion during the period of occlusion is
checked. Yawning is assumed to occur when distortion increases
substantially within that period.
When the mouth is opened wide over a short period of time
before it is occluded (Fig. 21), the value of RY exceeds the thresh-
old value for a short period before it drops quickly. In this case,
Fig. 18. ROC curves for LBP operator: (a) rotational invariant (ri); (b) rotational
the mouth occlusion detection will be triggered when the RY value
invariant with uniform pattern (riu2); (c) uniform pattern (u2) with zoom-in insert
of upper left corner. drops below the threshold. If occlusion is detected, then the period
of yawning is measured from the beginning of the first intersection
of RY with the threshold value, until the end of the mouth occlusion
6.3. Yawn analysis detection.
From the 30 video scenes of yawning, 28 of them were detected
Yawn analysis is a combination of mouth opening measurement, successfully with the period measured accurately. However the
mouth covered detection, and distortions detection. algorithm failed to detect two yawning scenes where the mouth
These operations are examined every 30 s. For mouth opening is completely covered. This is because the FDR did not indicate any
measurements, the best threshold value that represents yawning is distortion.
368 M.M. Ibrahim et al. / Biomedical Signal Processing and Control 18 (2015) 360–369
7. Conclusion
Acknowledgements
References
[13] J. Jiménez-Pinto, M. Torres-Torriti, Driver alert state and fatigue detection by [23] P. Viola, M. Jones, Robust real-time face detection, in: Computer Vision, 2001.
salient points analysis, in: Systems, Man and Cybernetics, 2009. SMC 2009. IEEE ICCV 2001. Proceedings. Eighth IEEE International Conference on, 2001, p. 747.
International Conference on, 2009, pp. 455–461. [24] P. Viola, M.J. JONES, Robust real-time face detection, Int. J. Comput. Vision 57
[14] J. Shi, C. Tomasi, Good features to track, in: Computer Vision and Pattern Recog- (2) (2004) 137–154.
nition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on, [25] P.I. Wilson, J. Fernandez, Facial feature detection using Haar classifiers, J. Com-
1994, pp. 593–600. put. Sci. Coll. 21 (2006) 127–133.
[15] W. Lirong, W. Xiaoli, X. Jing, Lip detection and tracking using variance based [26] M. Castrillón, O. Déniz, C. Guerra, M. Hernández, ENCARA2: real-time detection
Haar-like features and Kalman filter, in: Frontier of Computer Science and of multiple faces at different resolutions in video streams, J. Visual Commun.
Technology (FCST), 2010 Fifth International Conference on, 2010, pp. 608–612. Image Represent. 18 (2007) 130–140.
[16] L. Lingling, C. Yangzhou, L. Zhenlong, Yawning detection for monitoring driver [27] E.R. Davies, Machine Vision: Theory, Algorithms, Practicalities, Morgan Kauf-
fatigue based on two cameras, in: Intelligent Transportation Systems, 2009. mann, San Francisco, 2004.
ITSC’09. 12th International IEEE Conference on, 2009, pp. 1–6. [28] G. Zhao, M. Pietikainen, Dynamic texture recognition using local binary pat-
[17] Y. Yang, J. Sheng, W. Zhou, The monitoring method of driver’s fatigue based on terns with an application to facial expressions, in: Pattern Analysis and Machine
neural network, in: Mechatronics and Automation, 2007. ICMA 2007. Interna- Intelligence, IEEE Transactions on, 2007, pp. 915–928, 29.
tional Conference on, 2007, pp. 3555–3559. [29] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rota-
[18] H. García, A. Salazar, D. Alvarez, Á. Orozco, Driving fatigue detection using active tion invariant texture classification with local binary patterns, in: Pattern
shape models, Adv. Visual Comput. 6455 (2010) 171–180. Analysis and Machine Intelligence, IEEE Transactions on, 2002, pp. 971–987,
[19] S. Anumas, Kim Soo-chan, Driver fatigue monitoring system using video face 24.
images & physiological information, in: Biomedical Engineering International [30] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, vol. 3, Wiley,
Conference (BMEiCON), 2011, 2012, pp. 125–130. New York, NY, 1973.
[20] S. Hachisuka, K. Ishida, T. Enya, M. Kamijo, Facial expression measurement for [31] J.-Y. Zhang, Y. Chen, X.-x. Huang, Edge detection of images based on
detecting driver drowsiness, Eng. Psychol. Cogn. Ergon. 6781 (2011) 135–144. improved Sobel operator and genetic algorithms, in: Image Analysis and
[21] E. Vural, M. Cetin, A. Ercil, G. Littlewort, M. Bartlett, J. Movellan, Drowsy driver Signal Processing, 2009. IASP 2009. International Conference on, 2009,
detection through facial movement analysis, in: Human–Computer Interaction, pp. 31–35.
2007, pp. 6–18. [32] M. Mat Ibrahim, J. Soraghan, L. Petropoulakis, Mouth covered detection for
[22] X. Fan, B.-C. Yin, Y.-F. Sun, Dynamic human fatigue detection using feature- yawn, in: IEEE International Conference on Signal and Image Processing Appli-
level fusion, in: Springer (Ed.), Image and Signal Processing, Springer, 2008, pp. cations (ICSIPA), 2013, pp. 89–94.
94–102.