0% found this document useful (0 votes)
6 views

Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor

Uploaded by

nenood1091
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor

Uploaded by

nenood1091
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX 1

Multi-occupancy Fall Detection using


Non-Invasive Thermal Vision Sensor
Cankun Zhong, Wing W. Y. Ng*, Shuai Zhang, Chris Nugent, Colin Shewell, Javier
Medina-Quero
*Corresponding Author

Abstract—Falling is a common issue within the aging


population. The immediate detection of a fall is key to
guarantee early and immediate attention to avoid other
potential immobility risks and reduction in recovery
time. Video-based approaches for monitoring fall
detection, although being highly accurate, are largely
perceived as being intrusive if deployed within living
environments. As an alternative, thermal vision-based
methods can be deployed to offer a more acceptable
level of privacy. To date, thermal vision-based fall
detection methods have largely focused on
single-occupancy scenarios, which are not fully
representative of real living environments with
multi-occupancy. This work proposes a non-invasive thermal vision-based approach of multi-occupancy fall detection
(MoT-LoGNN) which discriminates between a fall or no-fall. The approach consists of four major components: i) a
multi-occupancy decomposer, ii) a sensitivity-based sample selector, iii) the T-LoGNN for single-occupancy fall
detection, and iv) a fine-tuning mechanism. The T-LoGNN consists of a robust neural network minimizing a Localized
Generalization Error (L-GEM) and thermal image features extracted by a Convolutional Neural Network (CNN).
Comparing to other methods, the MoT-LoGNN achieved the highest average accuracy of 98.39% within the context of a
multi-occupancy fall detection experiment.

Index Terms—Multi-occupancy Fall Detection, Thermal Vision Sensor, MoT-LoGNN, smart environments, Neural
Networks

I. INTRODUCTION introduction of a new health-care delivery paradigm [3].


Besides, low-cost sensing solutions, whose wireless services
A S the population continues to grow on a global scale [1],
increasing pressure is being placed on health and care
services to meet the demands of increased numbers of
coupled with rapid advances in data analysis, have provided
the next generation of products to be deployed within living
environments. These have the potential to improve the
persons requiring care provision. In such scenarios, those
manner where remote health-care support can be provided
suffering from long term chronic conditions [2] or the aging
and are slowly gaining increased acceptance by both users
population have the potential to benefit the most from the
and health-care professionals [4].
From the multitude of health scenarios to consider,
This work was supported in part by 2020 R&D Program in Key detecting falls within the living environment is a relevant
Areas of Guangdong Province (No. 2020B010166002), the National challenge with a high impact in terms of both security and
Natural Science Foundation of China under Grant 61876066, and safety. Accidental falls can cause serious injury to at-risk
Guangdong Province Science and Technology Plan Project individuals, especially for the aging [5]. Within this cohort,
(Collaborative Innovation and Platform Environment Construction)
2019A050510006. In addition, this work was partially supported by
falls are the leading cause of hospitalization, injury-related
the REMIND project, which has received funding from the European deaths and loss of independence. However, it has been
Union’s Horizon 2020 research and innovation programme under the demonstrated that detecting and rapidly responding to falls
Marie Skłodowska-Curie grant agreement No 734355. can reduce the long-term risks associated with falls.
Cankun Zhong and Wing W. Y. Ng (corresponding author) are with Although efforts have been directed towards supporting
the Guangdong Provincial Key Laboratory of Computational
the detection and management of falls within living
Intelligence and Cyberspace Information, School of Computer
Science and Engineering, South China University of Technology, environments, a range of issues still exist. From a usability
Guangzhou 510006, China (e-mail: [email protected] and perspective, challenges are faced by the costs of the solution
[email protected]). and the perceived issue of intrusiveness when video based
Shuai Zhang, Chris Nugent and Colin Shewell are with School of cameras are used. From a technical perspective, challenges
Computing, Ulster University, Northern Ireland, UK (e-mail: are faced by levels of accuracy levels and a desire to reduce
[email protected], [email protected] and cp.shewell
@ulster.ac.uk). the numbers of false positives given the implications that
Javier Medina Quero is with Department of Computer Science, these have from a health-care provision perspective.
University of Jaen, Jaen, Spain (e-mail: [email protected]). In addition, the studies of fall detection are mainly
XXXX-XXXX © XXXX IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

focused on the single-occupancy scenario, because they variety of sensors have been proposed to describe the fall of
think that in the multi-occupancy scenario, standing a person, such as vibration detection sensors, video cameras,
people can provide help for the fallen person. However, pressure sensors, and thermal sensors [11]. All of these
they may not be able to provide timely help for the fallen approaches have their advantages and disadvantages.
person. Alternatively, the machine actively detecting the Acoustic sensors can be used to detect a noise which is
occurrence of falls can contact the most professional atypical of a fall event. These are, however, comprised in
medical staff in the vicinity at the first time. Besides, noisy environments where background noise interferes with
when there is an accident, bystanders may hardly be able the underlying sound of the fall [12]. Vibration sensors can
to save themselves, not to mention that providing the help. be tuned to detect the measurement of a sudden impact
In this paper, we introduce a novel end to end solution for which can be representative of a fall. They are, however,
the remote management of falls within a multi-occupancy subject to false positives due to activities such as heavy
living environment. A low-cost sensing solution is presented, walking in the environment. Video cameras provide what
which has been developed based on the use of low-resolution may be the only definitive solution to record what has
thermal sensors. This configuration of sensors enables happened in an environment [13]. Nevertheless, they suffer
capturing activity in an unobtrusive manner and integrating from the significant issue of perceived intrusiveness when
data into a scalable sensor platform where an innovate they are deployed to monitor the daily activities of users in
approach for thermal image processing is deployed. The real homes. A potential alternative to video-based sensing is
classification of fall or non-fall is computed in real-time the use of thermal cameras. Due to the low resolution of
using image decomposition and classification with a neural thermal images the intrusiveness issue is overcome whilst at
network (NN), which trained via minimization the Localized the same time having the ability to collect sufficient
Generalization Error with features extracted by information from the heat of the human body to capture a fall
Convolutional Neural Networks (CNN). The developed event.
approach has been deployed within 2 smart lab environments B. Fall Detection Methods based on Thermal Cameras
in the UK and Spain and has been evaluated by means of Many methods have been applied in an attempt to improve
collection and analysis of labeled data sets. the performance of automated fall detection based on thermal
The remainder of this paper is organized as follows. cameras. W. K. Wong, et al [14] utilized the width height
Section II provides a brief review of sensors which have ratio of the rectangle bounding the human as the feature and
been used for fall detection and approaches which address set up artificial rules to detect the falls. The x-, and y-axis
fall detection problems using thermal cameras. Section III histograms were utilized as input features for the SVM
describes the thermal camera used in this work and the (Support Vector Machine) model to detect falls of patients
proposed fall detection method. Section IV discusses the [15]. The FallSense method was proposed in [17] which
experimental settings in the smart labs and experimental adopts fuzzy inference system based on the accleration,
results. Finally, conclusions are drawn and outlook for future infrared, and ultrasonic snesors to detect falls. Experimental
work is presented in Section IV. results show that the FallSense achieves overall 16%
II. RELATED WORK improvement in comparison with comparative methods on an
average. P.Mazurek, et al [18] applied the traditional
A. Sensors for Fall Detection
machine learning classifiers (i.e. support vector machine,
A number of approaches have been implemented in an artificial neural network, and naïve Bayes classifier) for fall
attempt to improve the process of detecting falls. From a detection using the kinematic features and
sensing perspective, these have either been centered around mel-cepstrum-related features extracted from the thermal
exploiting wearable sensors and environmental sensing images. Experimental results show that the accuracy of the
approaches [6]. proposed method is more than 90% on two data sets.
From a wearable sensing perspective, efforts have been A number of studies placed the thermal sensors on the
directed towards the processing of data gleaned through ceiling as an alternative to a wall mounted solution in an
sensors such as accelerometers, gyroscopes, and barometers effort to provide a broader view and to reduce occlusions. In
[7]. To further improve the detection accuracy, multi-sensor [18], the thermal pixels of occupants were identified through
data can be utilized jointly as well [8]. More recently the a certain temperature range, and then the thermal pixel count
sensing platforms within smart phones [9] or within smart of the occupant was used to detect the fall. Only focusing on
shoes [10] have also been leveraged to detect falls. Although the number of pixels made this method ignore the shape and
accuracy levels of detecting a fall have been reported to be in edge of a detected person, thus affecting the fall recognition
excess of 90% in some studies these solutions have a major accuracy. In [19], the authors separated the foreground from
disadvantage that they must be worn to offer their the background based on temperature values. Manually set
functionality. To a certain extent, this requirement can be features based on temperature difference and temporal
viewed as being both an inconvenience and intrusive for the information were proposed and evaluated by several
user and in some instances can be forgotten to be carried or classifiers to detect falls. Experimental evaluation
in the worst case not used at all. demonstrated that the system achieved real-time operation
Approaches based on environmental sensors rely on the and over 94% fall recognition rate at room temperatures up
technology being deployed at mostly fixed locations. A wide to 24ºC.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

Cankun Zhong et al.: Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor 3

As an alternative to feature extraction developed by III. MOT-LOGNN


statistical and/or knowledge based approaches, more recent In this study, the proposed MoT-LoGNN consists of four
approaches are focused on deep learning-based methods that components: the T-LoGNN, a fine-tuning mechanism, a
automatically extract features from thermal videos for multi-occupancy decomposer (MOD), and a
detecting falls. Studies [20]-[21] utilized the manually sensitivity-based sample selector (SSS). The combination of
designed CNNs and Study [22] chose one of the most used the CNN and the LG-RBFNN is referred to as T-LoGNN,
Inception-v3 model which pre-trained on the ImageNet where the CNN is used as a feature extractor and a robust
database to detect falls from the thermal images, where the Radial Basis Function Neural Network trained via
classification accuracy were all higher than 80% in the minimization of the Localized Generalization Error
single-occupancy scenario. The fall detection problem was (LG-RBFNN) is used as a classifier. T-LoGNN is initially
treated as an anomaly detection problem in [23]-[24] where trained using the labeled single occupancy training set. The
the deep learning framework Autoencoders were proposed to MOD decomposes multi-occupancy thermal images into one
learn spatio-temporal features automatically from thermal or more thermal single-occupancy sub-images, each only
videos. A fall was identified as an anomaly based on the having a single person. Misclassified sub-images by
reconstruction error. T-LoGNN are selected by the SSS based on the sample
The majority of the current studies have, however, focused sensitivity value to update the single-occupancy training set.
on single-occupancy scenarios. The presentation of fall The updated single-occupancy training set is then used for
detection solutions in multi-occupancy scenarios has rarely fine-tuning the T-LoGNN. It should be noted that if at least
been mentioned except for the recently conducted work [20]. one sub-image decomposed from a multi-occupancy image is
Given the prevalence of such an event occurring, this area classified as fallen by the T-LoGNN, then this
requires further attention. Fall detection solutions using multi-occupancy thermal image will be classified as fallen as
thermal sensors considering single-occupancy scenarios may well.
not only generate higher false alarm rates, however, may also The flow of process for the MoT-LoGNN training is
fail to detect falls precisely in the context of a presented in Fig. 1. Further details will be provided in the
multi-occupancy scenario given that the feature distribution ensuing sub-sections.
of the training (single-occupancy) data is quite different from For testing purposes, a multi-occupancy thermal image is
the one of testing (multi-occupancy) data. input to the MoT-LoGNN. Firstly, the thermal image is
Thermal images collected from low-resolution thermal decomposed into one or more (see further details in section
sensors are usually noisy and blurred and multi-occupancy II-C) single-occupancy thermal sub-images by the MOD.
thermal images are more difficult to recognize than Following this, the T-LoGNN classifies all thermal
single-occupancy thermal images. This is largely related to sub-images. A multi-occupancy thermal image is classified
the additional people in the scene developing similar shapes as a fallen event in the instance that at least one of its thermal
as the person who has actually fallen when they are in close sub-images is classified as fallen.
proximity of each other. In this research, we propose a
multi-occupancy fall detection method MoT-LoGNN using
thermal vision sensors. The major contributions of this work
are summarized as follows:
1) To support multi-occupancy scenarios, we propose a
robust fall detection method (MoT-LoGNN).
Experimental results have demonstrated that the
MoT-LoGNN yields the best performance in
comparison to benchmarking techniques.
2) A stochastic sensitivity measure (SSM) is applied to
both sample selection and validation for fine-tuning the
T-LoGNN where the T-LoGNN consists of a robust
neural network minimizing a Localized Generalization
Error (L-GEM) and thermal image features extracted by
a CNN. The SSM is a key component in the
aforementioned L-GEM which measures the classifier
sensitivities with respect to perturbations to input
features. The minimization of the SSM and the L-GEM
enhance the robustness of the MoT-LoGNN.
3) In this study, Radial Basis Function Neural Network
trained via minimization of L-GEM is proposed as the
classifier for fall detection to offer robust classification
within the context of noisy and blurred thermal images.
Fig. 1. Flow of processes for the MoT-LoGNN training

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

4 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

In the following sub-sections, the thermal vision sensor images during the data collection, not a bounding-box label
hardware is firstly introduced. Following this, four major for each individual of each thermal image.
components of the MoT-LoGNN: the MOD, the B. Multi-occupancy Image Decomposer (MOD)
T-LoGNN, the fine-tuning mechanism, and the SSS are
The MOD consists of three steps: 1) Image Binarization, 2)
presented in Sections II-B, II-C-1, II-C-2, II-D,
Contour Detection, and 3) Single-occupancy Thermal
respectively. Finally, the time complexity analysis of the
Sub-images Generation.
MoT-LoGNN in the testing phase is presented in II-E.
1) Image Binarization: The image binarization process
A. Low Cost and Non-invasive Thermal Sensor aims to distinguish the human heat points regarding the floor
Between the range of thermal vision sensors [25], the high heat points using two pixel values: 0 and 255, respectively
resolution [26] and low resolution devices are used in smart for floor and human shape. The determination of the
environments [27]. In the case of fall detection, a binarization threshold influences both the decomposition
comparative of thermal sensor devices [28] has shown that result and the detection accuracy of the T-LoGNN. In this
non-invasive and low resolution thermal sensors have better study, the binarization threshold is set to be 201 which is
performance and reduction of learning time. determined using a validation set. More details are presented
Based on the encouraging results on fall detection from in Section IV.
previous works, in this work we select the thermal sensor [20] 2) Contours Detection: The border-following algorithm in
Heimann HTPA 32x31, a suitable device with an operating [31] is applied to find the contour of each person in the
temperature range of -20 to 85 oC and powered by a 3.3 Volt binarized thermal images. Owing to noise and blur in thermal
supply. The thermal sensor generates a 32*31 matrix, where images, very small contours are created which is likely to be
each value defines a heat point of temperature. The data are mistaken by the high-temperature floor and are subsequently
collected in real-time by means of a Ethernet crossover cable removed. For 28×28 thermal images that are cropped from
which is connected to the local area network. The the center of the original 32×31 images to contain the most
middleware [29] collect and recover the data from the sensor relevant visual information, small contours whose areas are
in real time within a Web Service in JSON format. less than 4 pixels are removed since these areas would be too
As suggested in [20][28], the thermal sensor was affixed to small to represent a human at the intended sensor
the ceiling of the Smart Lab in the Ulster University to deployment height.
provide a zenith view of the space to be monitored. It was 3) Single-occupancy Thermal Sub-images Generation:
deployed at a height of 2.5 meters in a removable plaster For each multi-occupancy thermal image, k single-occupancy
ceiling, where the Ethernet connection and power supply sub-images are generated if k contours are found (k>1). If
keep hidden by the ceiling. It provides a viewable area of there is either zero or one contour found, the entire thermal
approximately 6 meters by 5.6 meters which makes it image is treated as a thermal sub-image. For each contour,
possible to monitor multiple individuals at the same time. pixels located outside it are set to 0 while others are set to
The field of view of the sensor is 86° by 83°. A picture of the 255. Such that, a single-occupancy thermal sub-image is
sensor deployed in the ceiling is provided in Figure 2(a) created and the detected person (i.e. contour) appears at the
together with the operating range in Figure 2 (b) and an original location of the entire multi-occupancy thermal image.
example of an actual recording in Figure 2 (c). Besides, the fallen or not fallen class label of a thermal
sub-image is inherited from this original multi-occupancy
thermal image.
C. T-LoGNN
The T-LoGNN consists of a robust LG-RBFNN and
thermal image features extracted by a CNN. The
LG-RBFNN and the fine-tuning mechanism [32] for
T-LoGNN are introduced in II-B.1 and II-B.2, respectively.
The CNN is adopted in the T-LoGNN for feature
(a) (b) (c)
extraction from both binarized thermal sub-images and single
Fig. 2. (a) The sensor deployed in the ceiling. (b) The operating range occupancy images. One of the major contributions of this
of the sensor. (c) An example of a thermal image presented in a web work is the use of the LG-RBFNN as the classifier with
interface. features extracted from the CNN. In contrast to the Softmax
classifier, the LG-RBFNN is expected to yield higher
Frames are sampled from the sensor through an I2C generalization capability to future unseen samples since it is
interface at a rate of 6 Hz and processed by a listener which trained via minimizing the generalization error estimated by
communicates via wifi directly with endpoints on the the L-GEM. Furthermore, RBFNN is used here because it is
SensorCentral platform [30]. Once captured by a nonlinear classifier [33] with fast convergence.
SensorCentral image processing techniques are invoked. A class balanced weighting trick is utilized in the training
In this study, the aim is to identify whether a fall has of the CNN when the model performance is hindered by the
occurred in the thermal images rather than consider what has class imbalance problem. The class weight of each class is
happened to each individual in the thermal images. Therefore, the reciprocal of its number of samples multiplied by a
the fallen or not labels were applied to the thermal frame constant.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

Cankun Zhong et al.: Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor 5

1) LG-RBFNN: The purpose of RBFNN training is to find 1


N

a network structure and connection weights to minimize the E SQ ((y ) 2 ) 


N  SSM ( x f , g)
generalization error, such that it will be more robust to the f 1 (4)
N H
effects of noise and blurred areas in thermal images. Once 1 1
the number of hidden neurons are determined, the centers

N   (g(x
f 1
(
H h 1
f  xh )  g ( x f )) 2 )
and widths of hidden neurons can be obtained by k-means
clustering. After fixing both the centers and widths, where SSM(xf,g), H, ∆xh denotes the sensitivity measure of
connection weights can be calculated by a pseudo-inverse the classifier g for training sample xf, the number of Halton
technique. Therefore, the objective of RBFNN training can points, a Halton point where each coordinate ranges from
be simplified to the finding of the optimal number of hidden [−Q,Q], respectively.
neurons which minimizes the generalization error. We cannot, By fixing Q, the optimal RBFNN is found by searching for
however, directly estimate the generalization error. In this the optimal number of hidden neurons which yields the
study, the L-GEM model proposed in [34] is used to find the minimum generalization error. Procedures for finding the
upper bound of generalization errors of the RBFNN. optimal RBFNN are presented in Algorithm 1.
The L-GEM defines the generalization error of unseen
samples located near training samples only, i.e. Q-union (SQ). Algorithm 1 Finding optimal RBFNN
The concept is that generalization errors of unseen samples Input: x, Q
with a large difference from the training samples are Output: optimal RBFNN
expected to be large because we have no knowledge of such 1: for M ← number of classes to N − 1 do
unseen samples. Therefore, estimating generalization errors 2: Train an RBFNN with M hidden neurons using x.
for unseen samples far away from the training samples may 3: Compute the R*SM (Q) for the current RBFNN.
be counterproductive and misleading [35]. 4: end for
SQ is the union of Q-neighborhoods of all training samples 5: return RBFNN yielding the minimum R*SM (Q)
and a Q-neighborhood (SQ(xf)) of a training sample xf (feature
vector extracted by CNN) is a local input space which
includes all unseen samples located near xf. The SQ(xf) is
Given the fact that the temperature of the floor in the entire
defined as follows:
monitoring area may not be uniform such that white (high
SQ(xf) = {x|x = xf + ∆x,|∆xi| ≤ Q,i = 1,2,...,d} (1)
temperature) dots may be added to the background as noise
where ∆x = (∆x1,...,∆xd)’, ∆xi, and i denote the stochastic in cases. These perturbations (noises) introduced to the
perturbation, the stochastic perturbation on the ith input region of interest (i.e. fallen person) may increase the
feature, and the number of input features, respectively. difficulty of fall detection since it will cause the activation
According to [28], for a given Q value, the upper bound of values of neurons in CNN to become larger or smaller and
the LGEM (R*SM (Q)) is estimated by using the Hoeffiding’s thus affect the prediction result. In T-LoGNN, The SSM
inequality with a probability of 1−η. The definition of R*SM measures fluctuations of classifier outputs with respect to
(Q) is as follows: input (feature vector) perturbations. Therefore, the RBFNN
2 trained via minimizing both SSM value and training error
*
RSM (Q)   Remp  ESQ ((y) 2 )  A    (2) (T-LoGNN) is more robust to the noise present within the
 
thermal images.
where   B ln /(2 N ) , N, A, B, Remp, and E SQ ((y) 2 ) 2) Fine-tuning Mechanism of T-LoGNN: The T-LoGNN
is pre-trained using a set of binarized single-occupancy
denote the number of training samples, the difference thermal images for classifying whether or not the person in
between the maximum and the minimum value of outputs, the image has fallen. Prior to training the MoT-LoGNN, a set
the minimum value of training mean square error, the of multi-occupancy thermal images for training is divided
training mean square error, and the SSM of output into 80% for constituting a multi-occupancy training set and
differences, respectively. the remaining 20% for forming a multi-occupancy validation
Note that both  and A are constants for a given training set. These thermal images for training are subsequently
dataset. Let g(·) be the classifier and the definition of the decomposed into single-occupancy thermal sub-images by
SSM is the expectation of squares of classifier output the MOD. Training begins with the T-LoGNN classifying
perturbations (∆y = g(xf + ∆x) − g(xf)) between training each thermal sub-image generated from the training set.
samples and unseen samples in SQ: Those misclassified or possibly misclassified thermal
1
N sub-images are selected by the SSS and added to the
E SQ (( y ) 2 ) 
N  E[( g ( x
f 1
f  x )  g ( x f )) 2 ] (3) single-occupancy training set for fine-tuning of the
T-LoGNN. The process iterates until the limit of iterations (n)
In general, we do not have any prior knowledge about the is reached. In our experiments, it is found that when n is less
distribution of unseen samples in SQ, thus we adopt a than or equal to 5, the classification accuracy of the thermal
quasi-Monte Carlo (QMC) based method as in [35] to images on the training set can be guaranteed to be 100%.
estimate the SSM value as follows: Therefore, in this experiment, n is set to 5.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

6 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

To prevent overfitting, the most robust T-LoGNN among n In Case 2, multiple sub-images are classified as non-fallen,
iterations is selected as the final T-LoGNN to be used in the but at least one of the sub-images should be classified as
MoT-LoGNN. A validation robustness measure (VRM) is fallen. In Case 3, multiple sub-images are classified as fallen,
proposed to measure the robustness of the T-LoGNN as but the classification of multiple fallen sub-images from the
follows: same multi-occupancy thermal image has a high chance to be
mistaken because both fallen events existing at the same time
1
V
1
W is quite rare. However, we do not have a bounding-box label
V (y
v 1
v  F ( X v )) 
W  SSM ( X
w 1
w, f) (5) for each participant in the dataset, therefore it is not possible
to know which sub-image is misclassified.
where F(·), f(·), V , W, Xv, yv, Xw denote the MoT-LoGNN,
the T-LoGNN, the number of samples in the multi-occupancy
1) For non-fallen (i.e. all individuals in a multi-occupancy
validation set, the number of single-occupancy sub-images
thermal image are standing or no one in a thermal
generated from the multi-occupancy validation set, a
image) multi-occupancy thermal images misclassified
multi-occupancy image from the validation set, the label of
as fallen. The sub-images which have been classified
Xv, a single-occupancy sub-image generated from the
as fallen are incorrect and are subsequently selected;
multi-occupancy validation set, respectively. The best
T-LoGNN yielding the minimum VRM needs to yield both 2) For fallen thermal images misclassified as non-fallen,
low classification error on the validation set and low sub-images yielding the largest SSM value among
sensitivity to small perturbations to thermal sub-images. This sub-images decomposed from the same
follows the idea of the minimization of the L-GEM in multi-occupancy thermal image are selected;
[33]-[34] and aligns well with the multi-occupancy falling 3) For correctly classified fallen multi-occupancy thermal
classification problem in this work. images with more than one sub-image being classified
as fallen, the thermal sub-image yielding the largest
D. Sensitivity-based Samples Selector (SSS) stochastic sensitivity measure value among the
An SSS approach is proposed to select useful samples for sub-images being classified as fallen and decomposed
fine-tuning the T-LoGNN. The main difficulty is to identify from the same thermal image are selected.
misclassified single-occupancy thermal sub-images since the In the L-GEM framework, a sample yielding a large SSM
class label is assigned to the entire multi-occupancy thermal value is informative to Neural Network training [36] because
image only and we do not have the real label of the it has a higher chance of being misclassified by the Neural
sub-images. Network. Therefore, for Cases 2 and 3, the sub-image
The three following possible cases are considered by SSS yielding the largest SSM (SSM(xf, g) in Equation (4)) is
to select useful sub-images for fine-tuning T-LoGNN, which selected. Selected samples are labeled with the opposite
are presented in Fig. 3. labels with respect to their corresponding classification
Both Cases 1 and 2 identify misclassified results from the T-LoGNN. Then, these samples are added to
single-occupancy scenarios whilst Case 3 provides additional the ensuing round of fine-tuning T-LoGNN.
samples for people who have not fallen.

Fig. 3. Informative sub-images selected by SSS.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

Cankun Zhong et al.: Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor 7

fallen and one standing/walking. In the experiments, thermal


E. Time Complexity Analysis of MoT-LoGNN in testing
images for both datasets in categories (iii) and (v) are labeled
phase
to be fallen while the remainder is labeled to be non-fallen.
In the testing phase, only two components (MOD and In experiments, for the collected single-occupancy dataset,
T-LoGNN) of the MoT-LoGNN are involved. Their time numbers of samples being labeled as fallen and non-fallen
complexity is given as follows. are 186 and 159, respectively. For the collected
For the MOD consisting of three steps, time complexities multi-occupancy dataset, numbers of samples being labeled
of all steps are all O(ij), where i and j denote the width and as fallen and non-fallen are 528 and 431, respectively.
the height of a thermal image, respectively. Therefore, the Collecting data from different inhabitants and cases of fallen
total time complexity of MOD is TMOD = O(ij). take a great deal of efforts, so the data augmentation
For the T-LoGNN, the total time complexity of all techniques (i.e. translation, flipping, and rotation) are utilized
convolutional layers in the CNN is in this study to enlarge the number of samples by 100 times,
TCNN  O( L cl 1clslslml) Here L, l, s , 𝑚 , c and cl which also helps to overcome the overfitting problem to
l 1 l 𝑙 l-1
some extent.
denote the number of convolutional layers, the index of a 2) Implementation Details: In the experiments, a 3-layer
convolutional layer, the size of the convolutional kernel, the carefully designed CNN was employed for feature extraction
size of a feature map of single channel in the lth layer, the from the thermal images to provide 1024-dimensional feature
number of input channels and the number of output channels vectors. This CNN feature extractor is denoted by cnn and its
of the lth layer, respectively. As to the pooling layers and architecture is presented in Fig. 4. It is optimized using the
fully connected layers of CNN, they often take 5% to10% adaptive moment estimation method with learning rate =
computational time [37]. Furthermore, the time complexity 0.001, β1 = 0.9, β2 = 0.999, and mini-batch = 32. Both
of the LG-RBFNN is TLG-RBFNN = O(dMo), where d, M and o Softmax (softmax) and RBFNN trained without L-GEM
denote the dimension of the feature vector extracted by CNN, (rbfnn) are used in experiments for validating the robustness
the number of hidden neurons, and the number of outputs, of the LG-RBFNN. The number of hidden neurons of the
respectively. rbfnn was determined by minimizing the training
Therefore, the time complexity of the MoT-LoGNN for classification error instead of the localized generalization
classifying a thermal image is TMOD+TCNN+TLG-RBFNN. error in LG-RBFNN. The learned features from the CNN
IV. EXPERIMENTS AND RESULTS were in the range [0, 3.85]. Values of Q = 1 and H = 100
The Experimental setup is given in Section III-A. The were selected by a trial-and-error method and were
MoT-LoGNN compared with other thermal fall detection determined by the trade-off between accurate estimation and
studies is conducted in Section III-B. Some discussions of the computational time, respectively.
details of MoT-LoGNN are presented in Section III-C.
A. Experimental Setup
1) Collected Data: The data for this experiment have been
collected from the Smart Lab in the Ulster University. The
thermal vision sensor was placed on the ceiling of the room
to collect a zenithal view of the occupants, which provided a
clear view of falls and also reduced the potential of occlusion
in instances of multi-occupancy, compared with if a camera
was installed in the vertical plane. Three participants aged
between 25 and 35 (including one woman and two men) with
heights 1.68, 1.72, and 1.83 meters, respectively assisted in
collecting the data. When collecting data, each participant
walked around the area within the field of view of the
thermal vision sensor for walking scenarios and laid on the
floor with changing the orientation, rotating, moving,
bending the joints to simulate different scenarios of people
who have fallen. An external observer collected frames from
the thermal sensor in real-time and labeled each frame (fallen
or non-fallen) manually during the development of scenes.
Each collected thermal image is pre-processed as a 28×28
matrix where each pixel defines a heat point with a value
between 0 and 255. There are two sets of experimental data:
single-occupancy and multi-occupancy. The single-
occupancy dataset consists of three categories: (i) empty Fig. 4. The architecture of the CNN. Conv-k@n*n means the number
of n*n convolution is k. All the kernel size of Max Pooling layer is 2*2.
room, (ii) standing/walking alone, and (iii) fallen alone. For
FC-1024 refers to a fully connected layer with 1024 neurons.
the multi-occupancy dataset two further scenarios were
added: (iv) 2-3 people standing/walking, (v) one person

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

8 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

For both multi-occupancy and single-occupancy datasets, resulting in waste of resources. Therefore, the MoT-LoGNN
ten independent runs were performed for all experiments is obviously superior to other comparison methods in the
with 80% of thermal images being randomly selected for single-occupancy scenario.
training and the remaining 20% used for testing.
3) Evaluation metrics: The performance of different
models is validated on the single-occupancy dataset and the
TABLE I
multi-occupancy dataset. The mean and the standard MEAN (± STDEV) OF PERFORMANCE METRICS OF DIFFERENT METHODS
deviation values of different metrics for different models are ON THE SINGLE-OCCUPANCY DATA
calculated and the symbol “*” denotes a statistically Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
(%)
significant difference between the MoT-LoGNN and the
RectFall[14] 86.43 40.62 0.00 74.35 76.97
corresponding method by Student’s t-test with 95% (±3.69) (±5.63) (±0.00) (±6.53) (±3.81)
confidence. We regard fallen as the positive class and HistFall[15] 92.95 6.27 7.29 89.45 93.22
non-fallen as the negative class. Three performance metrics (±0.22) (±1.46) (±0.35) (±1.76) (±0.56)
are used. The Accuracy measures the overall performance of CNN[21] 91.80 13.15 4.37 87.53 90.15
(±2.95) (±4.31) (±1.98) (±3.09) (±3.44)
different models. The False Rejection Rate (FRR) and the Inception-v3 95.63 5.45 2.5 94.26 96.01
False Acceptance Rate (FAR) measure the missing report rate [22] (±1.47) (±2.23) (±1.88) (±2.15) (±2.74)
of fallen and the false alarm rate, respectively. These MoT-LoGNN 97.31 0.91 3.50 95.63 97.93
performance metrics are defined as follows: (±1.33) (±0.48) (±2.00) (±2.75) (±0.79)

𝑇𝑃 + 𝑇𝑁
Accuracy =
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃 (6) 2) Performance on multi-occupancy dataset: Since the
RectFall and the HistFall methods are proposed for
𝐹𝑁 single-occupancy scenarios, they cannot be directly applied
FRR =
𝐹𝑁 + 𝑇𝑃 (7) to the fall detection problems in multi-occupancy scenarios.
In this experiment, the proposed MOD proposed in this paper
𝐹𝑃 is combined with these two methods, so that these two fall
FAR = detection methods can be applied to multi-occupancy
𝐹𝑃 + 𝑇𝑁 (8)
scenarios. As shown in Table II, in multi-occupancy
scenarios, the performance of deep learning based fall
2*  Precision*Recall  detection methods are obviously better than the RectFall and
F1  (9)
Precision  Recall the HistFall. Moreover, the FARs of the RectFall and
HistFall methods are significantly higher than other methods.
This is mainly due to the characteristics of the MOD
Gmean  Recall * Specificit y (10) framework. That is, as long as a thermal sub-image is
classified as fall, it is considered that there is a fall. Therefore,
the RectFall and the HistFall with high FAR in the
where TP, TN, FP, and FN denote the True Positive, the True single-player scenario have their disadvantages further
Negative, the False Positive, and the False Negative, expanded under the multi-occupancy scenario and the MOD.
respectively. Precision=TP/(TP+FP), Recall=TP/(TP+FN), Overall, MoT-LoGNN has the highest average classification
and Specificity=TN/(TN+FP). accuracy in multi-occupancy scenarios, the lowest average
B. Comparison Test with Other Fall Detection Studies FAR, and the second lowest average FRR, but its FAR is
using Thermal Sensor dozens of times lower than the RectFall which has the lowest
In this section, the proposed MoT-LoGNN is compared FRR.
with two artificial feature extraction based methods (the
RectFall [14] using the width height ratio of the rectangle TABLE II
bounding the human as the feature and the HistFall [15] MEAN (± STDEV) OF PERFORMANCE METRICS OF DIFFERENT METHODS
ON THE MULTI-OCCUPANCY DATA
using the histogram of x-axis and y-axis as the feature) and Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
two deep learning based methods (the manually designed (%)
CNN [21] and the popular Inception-v3 [22] model which RectFall[14] 64.46 79.38 0.12 34.13 45.36
(±1.36) (±1.47) (±0.09) (±2.04) (±1.62)
pre-trained on the ImageNet database).
HistFall[15] 75.72 48.60 4.56 65.10 69.84
1) Performance on single-occupancy dataset: It can be (±2.32) (±7.00) (±1.91) (±5.64) (±4.50)
seen from table I that in the single-occupancy scenario, CNN[21] 85.46 13.15 15.42 84.19 85.60
(±0.77) (±4.31) (±4.53) (±0.99) (±4.48)
compared with other thermal sensor based fall detection
Inception-v3 92.12 8.35 7.33 91.65 92.16
methods, the proposed MoT-LoGNN achieves the best [22] (±0.69) (±3.91) (±3.65) (±1.12) (±3.88)
performance in terms of all metrics except the FRR. As tothe MoT-LoGNN 95.89 4.12 3.89 95.42 95.92
FRR, the MoT-LoGNN reaches the second lowest, while the (±0.50) (±1.32) (±1.07) (±0.55) (±0.68)
RectFall with the lowest FRR has a high FAR of 40.62%,
which means that the system is easy to make error alarm,

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

Cankun Zhong et al.: Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor 9

single-occupancy and multi-occupancy scenarios. It is


C. Discussions of MoT-LoGNN
difficult to find a linear decision plane that separate fallen
1) Sensitivity Analysis of MoT-LoGNN with Different and non-fallen completely. The geometry of the optimal
Amount of Training Samples: The MoT-LoGNN extracts decision function in this two-dimensional feature space
features using the CNN model. It is well-known that the should be a curve instead of a straight line.
amount of training data influences the performance of CNN.
In this section, 10% and 50% of training samples are
randomly selected to train the MoT-LoGNN to show the
sensitivity of MoT-LoGNN with respect to the number of
training samples. Tables III and IV show performances of the
MoT-LoGNN with different amount of training data under
the single-occupancy scenario and the multi-occupancy
scenario, respectively. When the number of training samples
increases, the performance of the MoT-LoGNN increases in
both single-occpancy and multi-occupancy scenarios. The
performance of models on the multi-occupancy dataset is of
significant interest because it covers more possible scenarios
than the single-occupancy dataset and is more representative
to real living environments with multi-occupancy. As seen
from Table IV, the performance of the MoT-LoGNN using all
(100%) training data is a little better than the one using 50%
number of training data on the multi-occupancy dataset in
terms of the comprehensive metrics (no more than 0.6% (a) Single-occupancy
improvement in terms of the Accuracy, F1, and Gmean
metrics). It shows that when the number of training samples
is large enough, the performance of the MoT-LoGNN is not
sensitive to the amount of the training samples.

TABLE III
SENSITIVITY ANALYSIS OF MOT-LOGNN TRAINING WITH DIFFERENT
AMOUNT OF TRAINING DATA ON SINGLE-OCCUPANCY SCENARIO
Amount of Accuracy (%) FAR FRR F1(%) Gmean
Training Data (%) (%) (%)
10% 91.72 (±2.44) 11.60 6.00 87.62 91.09
(±6.20) (±2.41) (±2.43) (±3.27)
50% 96.40 (±1.51) 1.06 4.63 94.26 97.14
(±1.10) (±1.92) (±3.21) (±1.02)
100% 97.31 (±1.33) 0.91 3.50 95.63 97.93
(±0.48) (±2.00) (±2.75) (±0.79)

TABLE IV
SENSITIVITY ANALYSIS OF MOT-LOGNN TRAINING WITH DIFFERENT
AMOUNT OF TRAINING DATA ON MULTI-OCCUPANCY SCENARIO (b) Multi-occupancy
Amount of Accuracy FAR FRR F1(%) Gmean
Training Data (%) (%) (%) (%)
10% 83.83 24.04 8.71 79.49 82.16 Fig. 5. The data distribution of thermal images. The green points
(±7.90) (±20.53) (±3.75) (±12.01) (±10.43) denote the non-fallen class and the blue points denote the fallen
50% 95.44 4.26 4.68 94.92 95.52 class.
(±0.29) (±1.19) (±1.51) (±3.21) (±1.96)
100% 95.89 4.12 3.89 95.42 95.92
(±0.50) (±1.32) (±1.07) (±0.55) (±0.68)
Due to this non-linearity characteristic of the thermal
sensor data, a non-linear classifier is preferred. Tables V and
VI summarize the performance of different classifiers under
2) Non-linearity of the Thermal Image Data: In this study,
the single-occupancy and multi-occupancy scenarios,
the CNN is utilized to extract features of thermal images and
respectively. Where the [cnn+softmax], [cnn+dt], and
the LG-RBFNN serves as a non-linear classifier. To show the
[cnn+svm] denote the models using the softmax, decision
non-linearity of thermal image data, the data is transformed
tree, and support vector machine as a classifier with the
into a new feature space by a CNN and then it is reduced to a
features extracted by the CNN, respectively. In both cases,
two-dimensional space by using the t-Distributed Stochastic
the T-LoGNN yields the best performance which shows the
Neighbor Embedding (tSNE), as shown in Fig. 5. Fig. 5
superiority of the LG-RBFNN for dealing with the thermal
shows that the thermal image data in the feature space
sensor fall detection problem using the CNN features.
constructed by the CNN is non-linear separable in both

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

10 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

TABLE V because it yields the highest Gmean. Therefore, the


COMPARISON OF DIFFERENT CLASSIFIERS USING CNN FEATURES ON
binarization threshold for the MOD is set to be 201 in this
SINGLE-OCCUPANCY DATA
Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
study.
(%) 4) Ablation Study: Major contributions (i.e. the robust
[cnn+softmax] 92.80 10.72 4.71 88.91 92.14 T-LoGNN for single-occupancy fall detection, the MOD for
(±0.76) (±5.90) (±2.61) (±0.65) (±1.97)
simplifying the multi-occupancy fallen detection into the
[cnn+dt] 94.02 11.63 2.73 90.88 92.68
(±1.96) (±4.86) (±0.69) (±1.91) (±2.54) simpler single-occupancy scenario, the SSS for enhancing
[cnn+svm] 93.71 12.04 2.82 90.12 92.39 the performance of the proposed T-LoGNN) of this work
(±0.89) (±5.15) (±1.66) (±0.08) (±1.88) have been evaluated by experiments. Their results are shown
T-LoGNN 95.10 9.92 1.91 92.58 93.95 in Tables VII and VIII.
(±1.88) (±5.22) (±0.41) (±1.88) (±2.57)

TABLE VI
COMPARISON OF DIFFERENT CLASSIFIERS USING CNN FEATURES ON TABLE VII
MEAN (± STDEV) OF PERFORMANCE METRICS OF DIFFERENT METHODS
MULTI-OCCUPANCY DATA
Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
ON THE SINGLE-OCCUPANCY DATA
(%) Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
(%)
[cnn+softmax] 88.77 13.31 9.42 87.37 88.60
(±1.49) (±2.40) (±1.65) T-LoGNN 95.10 9.92 1.91 92.58 93.95
(±1.40) (±1.48)
(±1.88) (±5.22) (±0.41) (±1.88) (±2.57)
[cnn+dt] 90.16 11.85 8.09 88.90 90.00
(±0.83) (±2.29) (±0.98) MOD-LoGNN 97.09 1.70 3.66 95.24 97.50
(±0.68) (±0.90)
(±0.28) (±2.00) (±2.08) (±2.81) (±0.63)
[cnn+svm] 89.98 11.59 8.59 88.74 89.88
(±1.42) (±1.39) (±2.82) MoT-LoGNN 97.31 0.91 3.50 95.63 97.93
(±1.65) (±1.33)
(±1.33) (±0.48) (±2.00) (±2.75) (±0.79)
T-LoGNN 90.81 11.55 7.07 89.57 90.63
(±0.43) (±2.21) (±2.83) (±0.62) (±0.26)
TABLE VIII
MEAN (± STDEV) OF PERFORMANCE METRICS OF DIFFERENT METHODS
ON THE MULTI-OCCUPANCY DATA
3) Determination of the Binarization Threshold: As one Model Accuracy FAR (%) FRR (%) F1(%) Gmean(%)
of the components of the MoT-LoGNN, MOD is responsible (%)
T-LoGNN 90.81 11.55 7.07 89.57 90.63
for decomposing the multi-occupancy into single-occupancy (±0.43) (±2.21) (±2.83) (±0.62) (±0.26)
sub-images, where the determination of the binarization MOD-LoGNN 92.57 12.40 0.85 91.08 91.57
threshold will influence the effect of the MOD directly. (±1.10) (±1.43) (±0.62) (±1.06) (±1.06)
Intuitively, the optimal binarization threshold should separate MoT-LoGNN 95.89 4.12 3.89 95.42 95.92
the person and background clearly, which is helpful for the (±0.50) (±1.32) (±1.07) (±0.55) (±0.68)

model to distinguish the fallen and non-fallen samples. In


this study, a validation set (20% data randomly selected from
the single-occupancy training set) is utilized to determine the From Tables VII and VIII, after combining the MOD with
binarization threshold. Furthermore, the T-LoGNN model is the T-LoGNN, the overall performance (i.e. Accuracy, F1,
used as the fall detection model. The optimal threshold is and Gmean) is improved compared with the T-LoGNN only.
determined once this threshold helps the T-LoGNN model The main reason is that the MOD performs an additional
reach the best performance in terms of the Gmean metric. preprocessing of image binarization which can be regarded
as a data denoising process. However, the FRR of
MOD-LoGNN is higher than the one of the T-LoGNN on the
single-occupancy data, which is mainly caused by the static
binarization setting in this study. Therefore, the edge details
of the fallen person with body curling may be filtered after
the image binarization. In contrast, the MOD-LoGNN yields
a higher FAR than the T-LoGNN on the multi-occupancy
dataset. This is one of the drawbacks of the proposed method
where a thermal image is claimed to be fallen as long as
either one of its sub-image being classified as fallen.
The proposed MoT-LoGNN yields the highest Accuracy,
F1, and Gmean on both single-occupancy and
multi-occupancy dataset, which confirms the effectiveness of
using SSS to select informative samples for fine-tuning the
Fig. 6. The performance of the T-LoGNN using different MODs with T-LoGNN. However, after using the fine-tuning mechanism
different binarization thresholds. (from MOD-LoGNN to MoT-LoGNN), the MoT-LoGNN
yields a higher FRR. It may be caused by too many
non-fallen samples are selected by the SSS to fine-tune the
In experiments, candidates of the binarization threshold T-LoGNN and the class balanced weight technique fail to
ranges from 195 to 210 according to the prior experience. Fig. completely solve the data imbalanced problem, which
6 shows that 201 is the optimal binarization threshold remains further research.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

Cankun Zhong et al.: Multi-occupancy Fall Detection using Non-Invasive Thermal Vision Sensor 11

5) Mis-classified Cases By MoT-LoGNN: The examples conducts fall detection only using a single thermal image,
shown in Fig.7. are representative misclassified samples by where the decision making might be wrong owing to noise
the proposed MoT-LoGNN. The subtitle shows the actual and blur in a thermal image. Thus, decision made using a
label of the samples. It can be seen that these samples are sequence of successive thermal images rather than a single
extremely difficult to distinguish. Sample (a) is one should be more preferrable.
mis-classified is mainly due to that the fallen person is not In the future, we will verify the proposed MoT-LoGNN in
completely in the monitoring area. As to sample (b), the more real application scenarios. Research related to the
fallen person with body curling is easily mis-classified as the image decomposition algorithm adapted to different
standing person. Sample (c) shows a case where the environments is a meaningful research direction to optimize
MoT-LoGNN mis-classifies a standing person as a fallen MoT-LoGNN. In addition, due to the uncertainty of thermal
person when the the background temperature is high enough. images acquired by this low-cost device, temporal
When two standing person are closed enough, the information will be added to improve the robustness of the
MoT-LoGNNmay wrongly judge them as a fallen person, as recognition system.
shown in sample (d).
REFERENCES
[1] World Health Organization. "World report on ageing and health," World
Health Organization, 2015.
[2] Z. He, et al, "Prevalence of multiple chronic conditions amongolder
adults in florida and the united states: comparative analysis of
theoneflorida data trust and national inpatient sample," Journal of
medicalInternet research, vol. 20, no. 4, p. e137, 2018.
[3] D. Marikyan, et al, "A systematic reviewof the smart home
(a) fallen (b) fallen literature: A user perspective," Technological Fore-casting and
Social Change, vol. 138, pp. 139–154, 2019
[4] M. L. Shuwandy, et al., "Sensor-based mHealth authentication for
real-time remote healthcare monitoring system: A multilayer systematic
review,'' Journal of medical systems, vol. 43, no.2, pp. 33, 2019.
[5] E. A. Kramarow, et al., "Deaths from Unintentional Injury Among
Adults Aged 65 and Over, United States, 2000-2013,'' no.2015. US
Department of Health and Human Services, Centers for Disease Control
and Prevention, National Center for Health Statistics, 2015.
(c) non-fallen (d) non-fallen [6] P. Vallabh and R. Malekian, "Fall detection monitoring
systems: A comprehensive review," Journal of Ambient Intelligence
Fig. 7. Mis-calssified samples of the MoT-LoGNN. and Humanized Computing, vol. 9, no. 6, pp. 1809–1833, 2018.
[7] S. S. Kendri, et al., "Development and monitoring of a fall
detection system through wearable sensor belt," Development,
vol. 6, no. 12, 2019.
V. CONCLUSIONS AND FUTURE WORKS [8] L. Wang, et al, "Pre-impact fall detection based
onmulti-source cnn ensemble," IEEE Sensors Journal, vol. 20, no. 10,
In this study, we use have proposed a robust approach that pp.5442–5451, 2020.
distinguishes falls and non-fall shapes from low-resolution [9] J.-S. Lee and H.-H. Tseng, “Development of an enhanced
images captured by low-cost and non-invasive thermal vision threshold-based fall detection system using smart phones with built-in
accelerometers,”IEEE Sensors Journal, vol. 19, no. 18, pp. 8293–8302,
sensors. The device provides a zenithal point of view from
2019.
the ceiling where it is located to detect falls in instances of [10] L. Montanini, et al, "Afootwear-based methodology for fall detection,"
both single and multi-occupancy. We propose the use of IEEE Sensors Journal,vol. 18, no. 3, pp. 1233–1242, 2018.
Convolutional Neural Networks to extract features from the [11] T. Xu, et al., "New advances and challenges of falldetection
thermal images and then use a Radial Basis Function Neural systems: A survey," Applied Sciences, vol. 8, no. 3, p. 418,
Network trained via the minimization of the Localized 2018.
[12] S. M. Adnan, et al, “Fall detection through acoustic local
Generalization Error Bound as the classifier (T-LoGNN) to ternary patterns,”Applied Acoustics, vol. 140, pp. 296–300, 2018
reduce the effects of thermal images with strong amounts of [13] E. Cippitelli, et al., "Radar andrgb-depth sensors for fall
noise and blurred areas on the classification results. In detection: A review," IEEE Sensors Journal, vol. 17, no. 12, pp.
addition, we propose a multi-occupancy fall detection 3585–3604, 2017.
method MoT-LoGNN in response to the decrease of [14] W. K. Wong, et al., "Home alone faint detection surveillance system
using thermal camera," in Computer Research and Development, 2010
classification accuracy caused by the increasing complexity
Second International Conference on. IEEE, 2010, pp. 747–751.
of thermal images in the multi-occupancy scenarios. [15] K.-S. Song, et al., "Histogram based fall prediction of patients using a
Experimental results demonstrated that the MoT-LoGNN thermal imagery camera," in 2017 14th International Conference on
achieved the best performance on both single and Ubiquitous Robots and Ambient Intelligence (URAI). IEEE, 2017, pp.
multi-occupancy scenarios. 161-164.
[16] S. Moulik and S. Majumdar, "Fallsense: An automatic fall detectionand
However, the proposed MoT-LoGNN still have some alarm generation system in iot-enabled environment," IEEE Sensors
limitations. The binarization threshold is set for a given data Journal, vol. 19, no. 19, pp. 8452–8459, 2018.
set in this work. Therefore, its fall detection performance [17] P. Mazurek, et al, "Use of kinematic and mel-cepstrum-related
may decline if the MoT-LoGNN is directly applied to other features for fall detection based on data from infrareddepth sensors,"
Biomedical Signal Processing and Control, vol. 40, pp.102–110, 2018.
environments. Meanwhile, the proposed MoT-LoGNN

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3032728, IEEE
Sensors Journal

12 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, MONTH X, XXXX

[18] J. Rafferty, et al., "Fall detection through thermal vision sensing," in


Ubiquitous Computing and Ambient Intelligence. Springer, 2016, pp.
84–90.
[19] A. Hayashida, et al., "The use of thermal ir array sensor for indoor fall
detection," in Systems, Man, and Cybernetics (SMC), 2017 IEEE
International Conference on. IEEE, 2017, pp. 594– 599.
[20] J. Medina-Quero, et al., "Detection of falls from non-invasive
thermal vision sensors using convolutional neural networks," in
Multidisciplinary Digital Publishing Institute Proceedings, vol. 2, no. 19,
2018, p. 1236.
[21] A. Akula, et al., "Deep learning approach for human action recognition
in infrared images," Cognitive Systems Research, vol. 50, pp. 146–154,
2018.
[22] J. Adolf, et. al, "Deep neural network based body posture recognitions
and fall detection from low resolution infrared array sensor," in 2018
IEEE International Conference on Bioinformatics and Biomedicine
(BIBM). IEEE, 2018, pp. 2394–2399.
[23] J. Nogas, et al., "Fall detection from thermal camera using
convolutional lstm autoencoder," in Proceedings of the 2nd workshop
on Aging, Rehabilitation and Independent Assisted Living, IJCAI
Workshop, 2018.
[24] J. Nogas, et al., "Deepfall: Non-invasive fall detection with deep
spatio-temporal convolutional autoencoders," Journal of Healthcare
Informatics Research, vol. 4, no. 1, pp. 50–70, 2020.
[25] R. Gade and T. B. Moeslund, "Thermal cameras and applications:
asurvey,”Machine vision and applications, vol. 25, no. 1, pp. 245–262,
2014.
[26] M. E. Jaspers, et al., "The flir one thermal imager for the assessment of
burn wounds:Reliability and validity study," Burns, vol. 43, no. 7, pp.
1516–1523,2017.
[27] S. Mashiyama, J. Hong, and T. Ohtsuki, "Activity recognition using low
resolution infrared array sensor," in 2015 IEEE International
Conference on Communications (ICC). IEEE, 2015, pp. 495–500.
[28] M. A. L opez-Medina, et al., "Evaluation of convolutional neural
networks for the classification of falls from heterogeneous thermal
vision sensors," International Journal of Distributed Sensor Networks,
vol. 16, no. 5, p. 1550147720920485,2020.
[29] J. Rafferty, et al., "Sensor central: A research oriented, device agnostic,
sensor data platform," in Ubiquitous Computing and Ambient
Intelligence, S. F.Ochoa, P. Singh, and J. Bravo, Eds.Cham:Springer
International Publishing, 2017, pp. 97–108.
[30] J. Rafferty, et al., "A scalable, research oriented, generic, sensor data
platform," IEEE Access, vol. 6, pp. 45 473–45 484, 2018.
[31] S. Suzuki, and Satoshi, "Topological structural analysis of digitized
binary images by border following," Computer vision, graphics, and
image processing, vol. 30, no. 1, pp. 32–46, 1985.
[32] Z. Zhou, et al., "Fine-tuning convolutional neural networks for
biomedical image analysis: Actively and incrementally," in Proceedings
of the IEEE conference on computer vision and pattern recognition,
2017, pp. 4761– 4772.
[33] J. Park, and I. W. Sandberg, "Universal approximation using radial-basis
function networks," Neural Computation, vol. 3, no. 2, pp. 246-257,
2014
[34] D. S. Yeung, et al., "Localized generalization error model and its
application to architecture selection for radial basis function neural
network," IEEE Transactions on Neural Networks, vol. 18, no. 5, pp.
1294–1305, 2007.
[35] D. S. Yeung, et al., "Mlpnn training via a multiobjective optimization of
training error and stochastic sensitivity," IEEE transactions on neural
networks and learning systems, vol. 27, no. 5, pp. 978–992, 2016.
[36] W.W.Y. Ng, et al., "Diversified Sensitivity-Based Undersampling for
Imbalance Classification Problems," IEEE Transactions on Cybernetics,
vol. 45, no. 11, pp. 2402 – 4212, 2015.
[37] K. He and J. Sun, "Convolutional neural networks at
constrained time cost," in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2015, pp. 5353–5360.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 06,2020 at 13:53:21 UTC from IEEE Xplore. Restrictions apply.

You might also like