Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine
Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine
Abstract—Falls are one of the major causes leading to injury of a wearable device is insensitive to environment, wearing it for
elderly people. Using wearable devices for fall detection has a high a long time will cause inconvenience to one’s daily activities.
cost and may cause inconvenience to the daily lives of the elderly. In The smart home solution [6], [7] attempted to unobtrusively
this paper, we present an automated fall detection approach that
requires only a low-cost depth camera. Our approach combines detect falls without using wearable sensors. They install am-
two computer vision techniques—shape-based fall characteriza- bient devices, such as vibration, sound sensors, infrared mo-
tion and a learning-based classifier to distinguish falls from other tion detectors, and pressure sensors, at multiple positions of a
daily actions. Given a fall video clip, we extract curvature scale room to record daily activities. By fusing the information of
space (CSS) features of human silhouettes at each frame and repre- these sensors, fall can be detected and alarmed with a high ac-
sent the action by a bag of CSS words (BoCSS). Then, we utilize the
extreme learning machine (ELM) classifier to identify the BoCSS curacy [8]–[11]. However, using multiple sensors significantly
representation of a fall from those of other actions. In order to elim- raises the cost of the solution. Moreover, putting many sensors
inate the sensitivity of ELM to its hyperparameters, we present a in a room will bring side effect on the health of the elderly
variable-length particle swarm optimization algorithm to optimize people.
the number of hidden neurons, corresponding input weights, and Recent advances in computer vision have made it possible to
biases of ELM. Using a low-cost Kinect depth camera, we build an
action dataset that consists of six types of actions (falling, bending, automatically detect human fall with a commodity camera [12].
sitting, squatting, walking, and lying) from ten subjects. Experi- Such a camera has a low price, and when installed remotely,
menting with the dataset shows that our approach can achieve up would not disturb the normal life of the detected person. As
to 91.15% sensitivity, 77.14% specificity, and 86.83% accuracy. On compared to other sensing modalities, camera can provide richer
a public dataset, our approach performs comparably to state-of- semantic information about the person as well as his/her sur-
the-art fall detection methods that need multiple cameras.
rounding environment. Therefore, it is possible to simultane-
Index Terms—Curvature scale space (CSS), extreme learning ously extract multiple visual cues, including human location,
machine (ELM), fall detection, particle swarm optimization, shape gait pattern, walking speed, and posture, from a single camera.
contour.
Vision-based fall detection requires a distinctive feature repre-
I. INTRODUCTION sentation to characterize fall action and a classification model
to distinguish fall from other daily activities. Although there
GING has become a worldwide issue that significantly
A raises healthcare expenditure. As the World Health Orga-
nization has reported [1], about 28–35% of people who are 65
have been many attempts, vision-based fall detection remains
an open problem.
Human shape is the first clue one would explore for fall de-
and older fall each year, and the percentage goes up to 32–42% tection. Human fall can be identified by analyzing human shape
for people 70+. Since more and more elderly people are living changes over a short period. Existing shape-based fall detectors
alone, automated fall detection becomes a useful technique for approximate human silhouettes by a regular bounding box or
prompt fall alarm. an ellipse, and extract geometric attributes such as aspect ratio,
There have existed many device-based solutions to automatic orientation [13], [14], or edge points [15] as representative fall
fall detection [2], [3], such as those using wearable accelerom- attributes. However, the accuracy of these methods is limited
eters or gyroscopes embedded in garments [4], [5]. Although by the proximity of the shape attributes. Fall action lasts rela-
tively short as compared to other actions [16]. Therefore, motion
analysis has also been used for fall detection, such as those en-
Manuscript received July 13, 2013; revised November 22, 2013 and January coding motion contour in the motion energy image [17] or in
25, 2014; accepted January 29, 2014. Date of publication February 3, 2014; date
of current version November 3, 2014. This work was supported in part by Na-
the integrated spatiotemporal energy map [18]. However, fall
tional Natural Science Foundation of China under Grants 61240052, 61203279, of elderly people might last longer than that of young people.
and 61233014, in part by the Natural Science Foundation of Shandong Province, Therefore, different motion models are often required to study
China (No. ZR2012FM036), and in part by the Independent Innovation Foun-
dation of Shandong University, China (No. 2012JC005). X. Ma and H. Wang
the fall behaviors of young and elderly people.
contributed equally to this work. X. Ma and Y. Li are corresponding authors of 3-D shape information is more robust to viewpoint and partial
this paper. occlusion than 2-D silhouette. Using a multicamera system, 3-D
The authors are with the School of Control Science and Engineering,
Shandong University, Jinan, Shandong, China (e-mail: [email protected];
volume distributions are extracted for fall detection in [19]. If
[email protected]; [email protected]; [email protected]; a majority of the distribution goes close to the floor within a
[email protected]; [email protected]). short time, fall alarm will be triggered. Then in [20], centroid
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
and orientation of person voxel are computed for fall detection.
Digital Object Identifier 10.1109/JBHI.2014.2304357 However, reconstructing 3-D human shape with a multicamera
2168-2194 © 2014 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution
requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1916 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 6, NOVEMBER 2014
Fig. 4. (a) SDUFall Dataset. Shown here are six action categories (rows) from eight subjects (columns): falling, lying, walking, sitting, squatting, and bending.
(b) The insensitivity of silhouette extraction to illumination. Whether light is turned off (a) or on (d), human silhouette [(b) and (e)] is clearly visible since Kinect is
insensitive to visible lights. Therefore, as (c) and (f) shows, silhouettes can be successfully extracted in both situations. (c) The choice of K for K-means clustering.
The highest fall detection accuracy is obtained when K equals to 40.
Fig. 6. Confusion matrices of action classification with One-versus-All SVM, ELM, PSO–ELM, and VPSO–ELM, respectively. VPSO–ELM obtains the highest
classification accuracies. While One-versus-All SVM is better than VPSO–ELM in fall detection accuracy, it has significantly higher false negatives. Lying is
most confounding activity to fall, which we can also see in Fig. 5. Please see the corresponding texts for more analysis. (a) One-versus-All SVM. (b) ELM.
(c) PSO–ELM. (d) VPSO–ELM.
[32] F. Mokhtarian, “Silhouette-based isolated object recognition through cur- Mingang Zhou received the B.S. degree in automa-
vature scale space,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 5, tion and the Master’s degree in control science and
pp. 539–544, May 1995. engineering from Shandong University, Shandong,
[33] L. Fei-Fei and P. Perona, “A Bayesian hierarchical model for learning China, in 2010 and 2013, respectively.
natural scene categories,” in Proc. IEEE Comput. Soc. Conf. Comput. His research interests included machine vision and
Vision Pattern Recogn., 2005, vol. 2, pp. 524–531. pattern recognition.
[34] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models
for real-time tracking,” Proc. IEEE Comput. Soc. Conf. Comput. Vis., Fort
Collins, CO, USA, 1999, vol. 2.
[35] L. Ding and A. Goshtasby, “On the canny edge detector,” Pattern Recogn.,
vol. 34, no. 3, pp. 721–725, 2001.
[36] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEE
Int. Conf. Neural Netw., 1995, vol. 4, pp. 1942–1948.
[37] R. Rifkin and A. Klautau, “In defense of one-versus-all classification,” J.
Mach. Learning Res., vol. 5, pp. 101–141, 2004. Bing Ji received the B.S. degree in electronic en-
gineering and the Master’s degree in signal and in-
formation processing from Xidian University, Xian,
Xin Ma (M’10) received the B.S. degree in industrial China, in 2007 and 2009, respectively, and the Ph.D.
automation and the M.S. degree in automation from degree in medical engineering from University of
Shandong Polytech University (now Shandong Uni- Hull, Hull, U.K., in 2012.
versity), Shandong, China, in 1991 and 1994, respec- He is currently a Lecturer at Shandong Univer-
tively. She received the Ph.D. degree in aircraft con- sity, Shandong, China. His research interests include
trol, guidance, and simulation from Nanjing Univer- cloud robotics and computational simulation of bio-
sity of Aeronautics and Astronautics, Nanjing, China, logical system.
in 1998. She is currently a Professor at Shandong
University. Her current research interests include ar-
tificial intelligence, machine vision, human–robot in-
teraction, and mobile robots.
Yibin Li (M’10) received the B.S. degree in au-
tomation from Tianjin University, Tianjin, China, in
1982, the Master’s degree in electrical automation
Haibo Wang received the B.S. degree in logistics
from Shandong University of Science and Technol-
engineering from the School of Control Science and
ogy, Shandong, China, in 1990, and the Ph.D de-
Engineering, Shandong University, Shandong, China.
gree in automation from Tianjin University, China, in
He had been a cotutelle M.S.–Ph.D. student in the
LIAMA and Laboratoire d’Informatique Fondamen- 2008.
From 1982 to 2003, he worked with Shandong
tale de Lille (LIFL) labs from 2005 to 2010. He re-
University of Science and Technology, China. Since
ceived the Ph.D. degree in computer science from
2003, he has been the Director of Center for Robotics,
LIFL/INRIA Lille-Nord-Europe, Université des Sci-
ences et Technologies de Lille, Lille, France, and the Shandong University. His research interests include
robotics, intelligent control theories, and computer control system.
Ph.D. degree in pattern recognition and intelligent
system from LIAMA/National Laboratory of Pattern
Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing,
China, in 2010.
He is currently a Lecturer at Shandong University. His research interests
include computer vision, pattern recognition, and their applications to virtual
reality.