0% found this document useful (0 votes)
19 views18 pages

Jeong 2022 Fall

Uploaded by

jdibble1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views18 pages

Jeong 2022 Fall

Uploaded by

jdibble1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

applied

sciences
Article
Fall Detection System Based on Simple Threshold Method and
Long Short-Term Memory: Comparison with Hidden Markov
Model and Extraction of Optimal Parameters
Seung Su Jeong 1 , Nam Ho Kim 2 and Yun Seop Yu 1, *

1 ICT & Robotics Engineering, Semiconductor Convergence Engineering, AISPC Laboratory, IITC,
Hankyong National University, 327 Jungang-ro, Anseong-si 17579, Gyenggi-do, Korea
2 Department of Embedded System, Bundang Convergence Technology Campus of Korea Polytechnic, 5,
Hwangsaeul-ro 329 beon-gil, Sengnam 13590, Gyeonggi-do, Korea
* Correspondence: [email protected]

Abstract: In an aging global society, a few complex problems have been occurring due to falls among
the increasing elderly population. Therefore, falls are detected using a pendant-type sensor that can
be worn comfortably for fall detection. The sensed data are processed by the embedded environment
and classified by a long-term memory (LSTM). A fall detection system that combines a simple
threshold method (STM) and LSTM, the STM-LSTM-based fall detection system, is introduced. In
terms of training data accuracy, the proposed STM-LSTM-based fall detection system is compared
with the previously reported STM-hidden Markov model (HMM)-based fall detection system. The
training accuracy of the STM-LSTM fall detection system is 100%, while the highest training accuracy
by the STM-HMM-based one is 99.5%, which is 0.5% less than the best of the STM-LSTM-based
system. In addition, in the optimized LSTM fall detection system, this may be overfitted because all
data are trained without separating any validation data. In order to resolve the possible overfitting
Citation: Jeong, S.S.; Kim, N.H.; Yu,
issue, training and validation data are evaluated separately in 4:1, and then in terms of validation
Y.S. Fall Detection System Based on data accuracy of the STM-LSTM-based fall detection system, optimal values of the parameters in
Simple Threshold Method and Long LSTM and normalization method are found as follows: best accuracy of 98.21% at no-normalization,
Short-Term Memory: Comparison no-sampling, 128hidden layer nodes, and regularization rate of 0.015. It is also observed that as
with Hidden Markov Model and the number of hidden layer nodes or sampling interval increases, the regularization rate at the
Extraction of Optimal Parameters. highest value of accuracy increases. This means that overfitting can be suppressed by increasing the
Appl. Sci. 2022, 12, 11031. https:// regularization, and thus an appropriate number of hidden layer nodes and a regularization rate must
doi.org/10.3390/app122111031
be selected to improve the fall detection efficiency.
Academic Editor: Hanatsu Nagano
Keywords: fall detection; the elderly; long short-term memory (LSTM); overfitting; regularization
Received: 30 August 2022
Accepted: 26 October 2022
Published: 31 October 2022

Publisher’s Note: MDPI stays neutral 1. Introduction


with regard to jurisdictional claims in
It was reported by the World Health Organization (WHO) that 30% of the population
published maps and institutional affil-
aged 65 or older suffer from falls more than once a year and this percentage increases to
iations.
50% in the population aged 80 or older [1,2]. The fall-related consequences are serious
injuries (e.g., femoral neck fracture, brain damage, skin burns), which in most cases lead to
physical and cognitive disability or death in cases of undetected falls [2–4]. To reduce these
Copyright: © 2022 by the authors.
risks, such as death rate and severity of injuries, by detecting falls as quickly as possible,
Licensee MDPI, Basel, Switzerland. researchers are now focusing on developing automatic fall detection systems that generate
This article is an open access article alerts when events occur [5–7].
distributed under the terms and Studies on fall detection of the elderly with wearable devices [8–10] or smartphones [11–15]
conditions of the Creative Commons using 3-axial accelerometer have been reported. The devices with 3-axial accelerometer
Attribution (CC BY) license (https:// are attached to the body to distinguish between falls and activities of daily living (ADL).
creativecommons.org/licenses/by/ Among the existing studies, a simple threshold method (STM) and machine learning
4.0/). methods for fall detection of the elderly were reported [16–23]. The STM is weak against

Appl. Sci. 2022, 12, 11031. https://doi.org/10.3390/app122111031 https://www.mdpi.com/journal/applsci


Appl. Sci. 2022, 12, 11031 2 of 18

noise, and it is difficult to distinguish it from patterns relatively similar to falls such as
lying or sitting. Machine learning methods, such as hidden Markov model (HMM) [17,18],
layered hidden Markov model (LHMM) [19], decision tree (DT) [20], K-nearest neighbor
(KNN) [21,22], and support vector machine (SVM) [22,23], provided a higher accuracy in
fall detection than the STM. In the case of HMM, there are many problems in obtaining a
temporally aligned sequence and ensuring that the data satisfy a fixed distribution [24].
SVM has the disadvantage of increasing resource consumption and slowing down as the
number of classifications increases [25]. To compensate for these shortcomings, research
using deep neural networks was needed. Accordingly, there is an increasing number of
reports on research on fall detection systems that apply deep neural networks to wear-
able devices [26–31]. Gated recurrent unit (GRU) [27], recurrent neural network (RNN),
long short-term memory (LSTM) [26–31], and LSTM combined with convolutional neu-
ral network (LSTM-CNN) [30,31] have been studied using 3-axis accelerometer data for
fall detection in the elderly. In most studies using LSTM, and GRU, attempts have been
made to adjust the LSTM network, combine various networks, and change the type of
sensors [32–34]. In LSTM-CNN, methods, such as 1-dimensional (1D) or 2-dimensional (2D)
convolution depth and layer addition, were used [34–37]. RNNs suffer from the gradient
vanishing problem [38], and LSTM and GRU are variants of traditional RNN, which is
proposed for solving the problem of RNN. As GRU does not contain many trainable param-
eters, accuracy and computation speed of GRU are relatively lower and faster than those
of LSTM, respectively [27]. As the network of LSTM-CNN is relatively complicated, the
computation speed can be slower than that of LSTM and GRU [32–35]. Therefore, because
LSTM is a simple model to predict time-series fall data in terms of accuracy and computa-
tion speed, LSTM-based fall detection systems have been reported [26–31]. Although rapid
detection is required due to the nature of fall, several LSTM fall detection models [26–31]
have relatively slow fall detection. Moreover, several LSTM-based fall detection systems
have not performed any normalization [27–31]. In addition, there are concerns about actual
operation without any validation process [26–29,31], and thus there is a possibility that the
training data will overfit. Most LSTM-based fall detection systems applied raw data to
LSTM, not feature parameters, and further research concerning, e.g., z-score normalization
following a Gaussian distribution, and regularizations to prevent overfitting and sampling
methods of rescaling data for the best fall detection has not been investigated [26–31].
Power efficiency is very important for wearable devices, which are embedded devices
with limited resources. To increase the power efficiency of wearable devices to which a
fall detection system using deep neural networks is applied, a fall detection algorithm
with good power efficiency is required. A study has been published to increase power
efficiency by combining a fall detection method using a simple threshold and HMM, one
of machine learning (STM-HMM) [17]. HMM predicts the next event with respect to the
previous event, therefore it may be unsuitable for multiple continuous data such as fall
data [24]. In this STM-HMM, even changing to deep neural networks instead of HMM may
improve both fall detection accuracy and power efficiency. Accordingly, it is necessary to
study a fall detection system that applies LSTM, deep learning only the data exceeding
the threshold of the 3-axis acceleration sensor data or its pre-processing data. Because the
training data and the verification data were not tested separately in the fall detection using
the existing HMM [17], it was not confirmed whether the training data were overfitted. In a
fall detection system using STM-HMM, it is necessary to divide training data into training
and validation data to determine overfitting and prevent it. When applying LSTM to a fall
system, it is also necessary to investigate the optimal conditions to increase the accuracy.
In this study, a fall detection system combining the STM with LSTM (STM-LSTM)
is proposed. The 3-axis accelerometer data are calculated with several parameters, and
then the STM is processed. For this fall detection system, the LSTM is applied only when
the parameter exceeds the threshold to detect the fall events. Section 2 describes the
materials used in this study. Section 3 explains the proposed STM-LSTM for fall detection.
In Section 4, the performance of fall detection for the SMM-LSTM is compared against that
Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 20

parameter exceeds the threshold to detect the fall events. Section 2 describes the materials
Appl. Sci. 2022, 12, 11031 3 of 18
used in this study. Section 3 explains the proposed STM-LSTM for fall detection. In Section
4, the performance of fall detection for the SMM-LSTM is compared against that for the
STM-HMM, and in Section 5, optimal parameters in the proposed STM-LSTM are inves-
for the STM-HMM,
tigated. and inthe
Finally, we present Section 5, optimal
conclusion of thisparameters
study. in the proposed STM-LSTM are
investigated. Finally, we present the conclusion of this study.
2. Materials
2. Materials
2.1. Edge Device
2.1. Edge Device
A fall-detection system which can classify falls and ADLs of a person by applying a
A fall-detection
parameterized system
dataset to STM-HMMwhichhascanrecently
classifybeen
fallsreported
and ADLs[17]. of
Thea person by applying
dataset was ac-
a parameterized
quired dataset to STM-HMM
from a self-developed embedded edge has recently
device that been
wasreported
attached[17].
as a The
pendantdataset
to was
acquired
the from aneck
participant’s self-developed
and was used embedded
as sensoredge
nodedevice that
[17]. The was attached
device consists ofasa a±8pendant
g 3- to
axial accelerometer
the participant’s (BMA150,
neck and wasBosch)
used asand Zigbee
sensor nodewireless communication
[17]. The device consistsmoduleof a ±8 g 3-
(CC2430, Texas Instrument).
axial accelerometer (BMA150, A Bosch)
gatewayandwas used to
Zigbee collect communication
wireless data from multiple wireless
module (CC2430,
sensor nodes. A server
Texas Instrument). was used
A gateway to classify
was used to falls and
collect ADLs
data frombymultiple
applyingwireless
the parameters
sensor nodes.
calculated fromused
A server was the 3-axial acceleration
to classify data
falls and to the
ADLs byproposed
applyingfall-detection
the parametersalgorithm. Fig- from
calculated
ure 1 shows a photo of participants wearing the real device for implementing
the 3-axial acceleration data to the proposed fall-detection algorithm. Figure 1 shows a the pro-
posed
photofall detection system,
of participants respectively.
wearing A detailed
the real device description of this
for implementing thedevice is described
proposed fall detection
in Ref. [17].
system, respectively. A detailed description of this device is described in Ref. [17].

Photo
Figure1.1.Photo
Figure of aof a person
person wearing
wearing a realdevice
a real edge edgeasdevice as atopendant
a pendant to participant’s
participant’s neck for
neck for imple-
implementing
menting the proposed
the proposed fall detection
fall detection system. system.

2.2.Dataset
2.2. Dataset
Theexperiment
The experiment waswas performed
performed by by
six six healthy
healthy subjects,
subjects, consisting
consisting of men
of four four and
men and
two women aged 20–50, 160–185 cm tall, and 50–85 kg in weight. To
two women aged 20–50, 160–185 cm tall, and 50–85 kg in weight. To distinguish between distinguish between
fallsand
falls andactivities
activitiesofofdaily
daily living
living (ADL),
(ADL), thethe subjects
subjects performed
performed four four
typestypes of ADLs
of ADLs and and
threetypes
three typesofoffalls,
falls,asasshown
shown in in Table
Table 1. 1.

Table 1. Description of four types of ADL and three types of fall.

Activities Description
ADL-a Walking
ADL-b Lying
ADLs
ADL-c Running
ADL-d Jumping
Fall-a Falling forward
Falls Fall-b Falling sideway
Fall-c Falling backward

The three types of falls were performed using a mattress with a thickness of 20 cm for
the safety of the subjects. It was tested using a total of 560 data consisting of 320 ADLs and
ADL-d Jumping
Fall-a Falling forward
Falls Fall-b Falling sideway
Fall-c Falling backward
Appl. Sci. 2022, 12, 11031 4 of 18
The three types of falls were performed using a mattress with a thickness of 20 cm
for the safety of the subjects. It was tested using a total of 560 data consisting of 320 ADLs
and
240 240
falls.falls. The number
The number of activities
of activities of subjects
of subjects A, B,D C,
A, B, C, and and
(age: 20)Dwas
(age:
15,20)
andwas 15,ofand
those E
those
(age: 50) and F (age: 40) were 10. Figure 2 shows the photos performing 3 types of fall3 and
of E (age: 50) and F (age: 40) were 10. Figure 2 shows the photos performing types
oftheir
fall parameters,
and their parameters,
sum vectorsum vector magnitude
magnitude (SVM), and(SVM), and
angle (θ) angle (θ)from
calculated calculated
measured from
measured 3-axial acceleration
3-axial acceleration data. aItsudden
data. It shows shows aincrease
suddeninincrease
SVM and in angle
SVM and angle
at the at the
moment
moment
of falling. of falling.

(a) (b) (c)

Photosperforming
Figure2.2.Photos
Figure performing33types
types of
of fall
fall and
and their
their measured
measured parameters,
parameters,sumsumvector
vectormagnitude
magnitude
(SVM) and angle (Theta; θ). (a) Falling forward, (b) Falling sideway, (c) Falling backward.
(SVM) and angle (Theta; θ). (a) Falling forward, (b) Falling sideway, (c) Falling backward.
3. Method
3.3.1.
Method
Algorithm of Fall Detection System
3.1. Algorithm of Fall
Figure 3a,b showDetection
the flowSystemcharts of training and test modes of the proposed fall
Figuresystem,
detection 3a,b show the flow charts
respectively. In bothoftraining
trainingandandtest
testmodes,
modestheof the
dataproposed
acquired fall
fromde-
tection
the edge system,
devicerespectively.
are converted In into
bothvarious
trainingparameters
and test modes, the data
(see Section acquired
3.2). from the
In the training
mode,
edge the converted
device parameters
are converted are normalized,
into various parameters and(see
thenSection
all parameters are training
3.2). In the learned using
mode,
LSTM.
the Using the
converted results learned
parameters by the LSTM,
are normalized, and3 then
typesallof falls and 4 types
parameters of ADLsusing
are learned are
distinguished.
LSTM. In the
Using the test mode,
results learned thebyfirst
thepossible
LSTM, fall is primarily
3 types of fallsdetermined
and 4 types through
of ADLsSTM.
are
The STM means that it is determined as a fall event when SVM and θ
distinguished. In the test mode, the first possible fall is primarily determined through exceed the thresholds
of SVM
Appl. Sci. 2022, 12, x FOR PEER REVIEW
STM. The(SVM ) of 2.5gthat
STMthmeans anditθ is
(θ th ) of 46.42◦ , respectively
determined as a fall event [17]. Then,
when SVMthe first
and determined
θ exceed 5 of
the20
fall data are just classified according to 3 types of falls and 4
thresholds of SVM (SVMth) of 2.5g and θ (θth) of 46.42°, respectively [17]. Then, types of ADLs using the
LSTMfirst
learned in training mode.
determined fall data are just classified according to 3 types of falls and 4 types of ADLs
using LSTM learned in training mode.

(a) (b)

Flowcharts
Figure3.3.Flow
Figure chartsof
ofthe
the proposed
proposed fall
fall detection
detectionsystem.
system.(a)
(a)Training
Trainingmode
modeand (b)(b)
and testtest
mode.
mode.
Thereason
The reasonfor
forusing
using the
the STM for
for the
thetest
testmode
modeisistotomore
moreefficiently control
efficiently power
control power
problemsin
problems in edge
edge device
deviceenvironments
environmentswith limited
with resources.
limited Because
resources. power
Because consumption
power consump-
tion of the edge device, including the Zigbee wireless communication module, is domi-
nant at the transmission event, the transmission must be reduced. Because the edge de-
vices are not suitable for applying deep neural network such as LSTM due to its limited
computation resource, the converted parameters must be transmitted to the server. If the
Appl. Sci. 2022, 12, 11031 5 of 18

of the edge device, including the Zigbee wireless communication module, is dominant at
the transmission event, the transmission must be reduced. Because the edge devices are not
suitable for applying deep neural network such as LSTM due to its limited computation
resource, the converted parameters must be transmitted to the server. If the STM is used,
the first determined fall data can just be transmitted to the server instead of transmitting all
data. In general, fall-like situations that exceed the thresholds of SVM and θ have a very
low frequency, and thus better computational and power efficiency is expected.

3.2. Parameters
The data measured from the edge device are converted into 5 types of parameters,
which are θ, SVM, differential SVM (DSVM), and gravity-weighted SVM (GDSVM), and
gravity-weighted DSVM (GDSVM). They are calculated as follows [17]:
q 
m2y (i ) + m2z (i )
θ (i ) = tan−1   × 180 , (1)
m x (i ) π

q
SV M(i ) = m2x (i ) + m2y (i ) + m2z (i ), (2)
q
2
DSV M (i ) = (m x (i ) − m x (i − 1))2 + my (i ) − my (i − 1) + (mz (i ) − mz (i − 1))2 , (3)
θ (i )
GSV M (i ) = × SV M(i ), (4)
90
θ (i )
GDSV M(i ) = × DSV M(i ), (5)
90
where i represents the sampling number, and mx (i), my (i), and mz (i) represent the x-axial,
y-axial, and z-axial acceleration of the ith sampling, respectively. For STM, SVM and θ
are used. The parameters are finally used as the average value using the sliding window
method [39] in which the modified nth parameter is calculated as the average from n-69 to
n parameters as follow as one example (θ):

1 n
70 ∑i=n−69
θ (n) = θ ( i ). (6)

For the input data of LSTM in the fall detection system, single and multiple parameters
are used. The single parameters, Pθ , PS , PD , PG , and PGD represent the average values
of 70 θs, 70 SVMs, 70 DSVMs, 70 GSVMs, and 70 GDSVMs, respectively, as expressed
in Equation (6). The multiple parameters, PθS , PθD , PθG , PθGD , and PALL , represent the
combination of Pθ and PS , Pθ and PD , Pθ and PG , Pθ , and PGD , and Pθ , PS , PD , PG , and PGD ,
respectively, as shown in Table 2.

Table 2. Combination of single parameters for double and multiple paramters.

Parameters Combination
PθS Pθ + PS
PθD Pθ + PD
Double parameters
PθG Pθ + PG
PθGD Pθ + PGD
Multiple parameters PALL Pθ + PS + PD + PG + PGD

3.3. Normalizations
Using the data acquired from the 3-axial accelerometer, the parameterized data are
calculated, and normalization process such as the Min-Max and Z-score normalizations [16]
Appl. Sci. 2022, 12, 11031 6 of 18

are performed to minimize the parameterized data. In the min-max normalization, the
range of the parameterized data is rescaled as the range in [0, 1], which is given by

x − xmin
X Min− Max = , (7)
xmax − xmin

where x is the original parameterized data, XMin-Max is the Min-Max normalized data, and
xmin and xmax are the smallest and largest numbers in each data set, respectively. The
Z-score normalization makes the parameterized data have zero-mean and unit-variance,
which is given by
x − xmean
XZ−score = , (8)
xstd
where XZ-score is the Z-score normalized data, and xmean and xstd are the mean and standard
deviation in each data set, respectively. Each normalization is processed before training on
the LSTM.

3.4. Proposed LSTM Network


3.4.1. Overview of LSTM
To overcome the difficulties in training the RNN model due to gradient vanishing [38]
and error blowing up, the LSTM, in which nonlinear units are replaced in conventional
RNNs, were proposed. Figure 4 shows the typical structure of an LSTM cell. An LSTM
cell contains a self-connected memory cell and three gates, namely the forget-gate, the
input-gate, and the output-gate [40]. The three gate units are the essential components to
learn the long-term patterns by preventing memory contents from irrelevant inputs and
outputs. The input-gate controls the flow of input activations into the cell-memory, forget-
gate controls how much information will flow from the cell-memory, and the output gate
controls the output flow of cell activations into the rest of the network. The cell computes
the cell output ht and the updated cell-memory output ct from the previous cell outputs
(ht −1 ), the previous cell-memory outputs (ct −1 ), and the sequential input xt at time step t
as follows:
f t = σg (Wx f xt + Wh f ht−1 + Wc f ct−1 + b f ), (9)
it = σg (Wxi xt + Whi ht−1 + Wci ct−1 + bi ), (10)
ot = σg (Wxo xt + Who ht−1 + Wco ct−1 + bo ), (11)
ct = f t c t −1 + i t σh (Wxc xt + Whc ht−1 + bc ), (12)
ht = ot σh (ct ), (13)
where Wxf , Whf , Wxi , Whi , Wxo , and Wh0 are the weight matrices from the input to the
forget-gate, from the previous output to the forget-gate, from the input to the input-gate,
from the previous output to the input-gate, from the input to the output-gate, and from
the previous output to output-gate, respectively, Wcf , Wci , Wco are the diagonal weight
matrices for peephole connections, and bf , bi and bo are the bias vectors of the forget-gate,
the input-gate, and the output-gate, respectively. denotes the Hadamard product, and σx
and σg denote the hyperbolic tangent and sigmoid functions, respectively. it , ft , and ot are
the input-gate, forget-gate, and output-gate, respectively.
gate, from the previous output to the forget-gate, from the input to the input-gate, from
the previous output to the input-gate, from the input to the output-gate, and from the
previous output to output-gate, respectively, Wcf, Wci, Wco are the diagonal weight matrices

gate, and the output-gate, respectively. ⨀ denotes the Hadamard product, and σx and σg
for peephole connections, and bf, bi and bo are the bias vectors of the forget-gate, the input-

Appl. Sci. 2022, 12, 11031 denote the hyperbolic tangent and sigmoid functions, respectively. it, ft, and ot are the in- 7 of 18
put-gate, forget-gate, and output-gate, respectively.

Thetypical
Figure4.4.The
Figure typical structure
structure of of long-short
long-short term
term memory
memory (LSTM).
(LSTM). σh σand
σh and σg denote
g denote the hyperbolic
the hyperbolic
tangentand
tangent andsigmoid
sigmoid functions,
functions, respectively.
respectively.

3.4.2.LSTM
3.4.2. LSTMNetwork
Network
Figure5 5shows
Figure shows the
the proposed
proposed LSTMLSTM network
network for fall
for the the detection.
fall detection. It consists
It consists of oneof one
input layer, two hidden layers, and dense layer. The input size of the
input layer, two hidden layers, and dense layer. The input size of the input layer repre- input layer represents
× length ×
sents batch × length × dimension of data. In the input layer, batch means the number of of all
batch dimension of data. In the input layer, batch means the number
input
all input data including
data includingfalls
fallsand
andADLs.
ADLs. Because 3-axial accelerometer
Because 3-axial accelerometeracquired
acquiredaccelera-
accelerations
as 100
tions as Hz
100 during
Hz during 5 sec, thethe
5 sec, length
lengthofofdata
dataconsists
consistsofof 500
500 by default,
default,and andthethelength
length can
be shortened
can be shortened toto500/sn
500/sn when samplingwith
when sampling with sampling
sampling interval
interval sn.addition,
sn. In In addition, the data
the data
dimensionisis1,1,2,2,and
dimension and 55forfor single,
single, double,
double, andand multiple
multiple parameters,
parameters, respectively.
respectively. One One
hidden
hiddenlayer layerconsists
consistsofofnnLSTMs,
LSTMs,and andthe
the number
numberof of hidden
hiddennodes
nodes(n)(n) inin each
each hidden
hidden layer
layer
can be canchanged
be changed for for performance
performance optimization.The
optimization. Thedense
dense layer
layer includes
includesthe thesoftmax
softmax and
Appl. Sci. 2022,
and12, xoptimization,
FOR PEER REVIEW 8 of 20
optimization, andandthethe output
output is displayedasas7 7result
is displayed resultvalues
valuesininthe
thedense
denselayer.
layer. The
The result
result value is represented using the one hot
value is represented using the one hot encoding method. encoding method.

Figure 5. Fall detection


Figuremodel architecture
5. Fall detection forarchitecture
model LSTM. for LSTM.

3.4.3. Regularization
3.4.3. in Loss Function
Regularization in Loss Function
There are several types of loss functions:
There are several types of lossmean squared
functions: meanerror
squared(MSE),
error root mean
(MSE), root mean
squared error (RMSE), binary cross entropy (BCE), and categorical cross entropy (CCE).
squared error (RMSE), binary cross entropy (BCE), and categorical cross entropy (CCE).
Among them, the AmongCCEthem,
withthe softmax
CCE withused as aused
softmax common loss function
as a common for multi-class
loss function for multi-class clas-
classification [41] is selected. One requirement for using CCE is that the labelsthe
sification [41] is selected. One requirement for using CCE is that oflabels of the output
the output
should follow the one-hot encoding method. The
should follow the one-hot encoding method. The CCE is expressed as [42] CCE is expressed as [42]

S % ∑ ∑X3 W X log$\ X '


V T
N TTU C V 3
1 (14)
LCCE = − ∑i=1 ∑ j=1 pij log qij (14)
N data set, C is number of classes, pij is the true probability distri-
Where N is the size of the
bution which the ith training pattern belongs to jth category, and qij is the predicted prob-
ability distribution for ith observation belonging to class j. L2 regularization [43] is added
to the CCE loss function to address overfitting issues. The regularized cost function, J(W),
is expressed as [44,45]
Appl. Sci. 2022, 12, 11031 8 of 18

where N is the size of the data set, C is number of classes, pij is the true probability
distribution which the ith training pattern belongs to jth category, and qij is the predicted
probability distribution for ith observation belonging to class j. L2 regularization [43] is
added to the CCE loss function to address overfitting issues. The regularized cost function,
J(W), is expressed as [44,45]
λ
J (W ) = LCCE + kW k2 , (15)
2
where W is the connection weight between all layers, λ is the regularization rate, and kW k2
represents the L2 norm. Parameter λ determines a trade-off between the training error and
the generalization ability [44].

3.5. Classification
Accuracy (ACC), sensitivity (SEN), and specificity (SPE) are used as common evalua-
tion indicators for classification [46] as follows:

TP + TN
ACC = × 100% , (16)
TP + TN + FP + FN
TP
SEN = × 100%, (17)
TP + FN
TN
SPE = × 100%, (18)
TN + FP
where true positive (TP), false negative (FN), true negative (TN), and false positive (FP)
are the numbers of correctly classified falls, ADL classified falls, correctly classified ADLs,
and fall classified ADLs, respectively. Accuracy, sensitivity, and specificity refer to the
probabilities of accurately predicting all ADL and fall, fall of all falls, and ADL of all
ADLs, respectively.

3.6. Experiment Environment


The platform for training LSTM is the Anaconda for Jupyter Notebook, and the
software environment is Keras 2.8.0 and Tensorflow 2.8.0 on Windows 10 and hardware
NVIDIA RTX3080 GPU is used for training. As a second platform for training LSTM, a
Linux environment was also used for learning, a Jupyter Notebook on Centos Linux 7 was
used for software, and NVIDIA Quadro RTX6000 GPU was used for hardware.

4. Comparison of HMM and LSTM


4.1. Setting of LSTM
An STM-HMM-based fall detection system has been reported to perform well at ACC
of 99.5% [17]. The proposed STM-LSTM-based fall detection system shown in Figure 3
are almost same to the STM-HMM-based fall detection system except for using LSTM for
training and testing all data instead of HMM. All 560 data are trained with both HMM and
LSTM, and then all of them are tested without separating any validation data. Therefore,
they may be overfitted. In this section, the validity of the proposed STM-LSTM-based
fall detection system is investigated through a comparison with the STM-HMM-based fall
detection system, although both systems are overfitted. In Section 5, the proposed STM-
LSTM-based fall detection system is evaluated by training data and test data separately
at 4:1, and different normalizations, regularization rates, numbers of hidden node, and
sampling numbers in the LSTM are compared to obtain the optimal conditions for the
excellent fall detection.
In test mode of the proposed STM-LSTM-based fall detection system, it is firstly
determined as a fall event when SVM and θ exceed SVMth of 2.5g and θ th of 46.42◦ ,
respectively, as shown in Figure 3b. Then, the first determined fall data are just classified
with 3-types of falls and 4-types of ADLs using LSTM training model learned in training
mode. Table 3 shows the parameters for the LSTM training model of the proposed STM-
Appl. Sci. 2022, 12, 11031 9 of 18

LSTM-based fall detection system. No normalization is applied, and in the input layer
node, the batch, length, and dimension of data are 560, 500, and 1 (for single parameters),
2 (for double parameters), and 5 (for quintuple parameter), respectively. The number of
hidden layer and output layer nodes is 13 and 7, respectively, and the learning rate and
regularization rate is 0.0025 and 0.00015, respectively.

Table 3. LSTM training model parameters and methods.

Parameters Values
Learning rate 0.0025
Batch 560
Input layer nodes Length 500
Dimension 1, 2, 5
Number of hidden layer nodes 13
Regularization rate 0.00015
Number of output layer nodes 7

4.2. Comparison of STM-HMM and STM-LSTM-Based Fall Detection System


Figure 6 shows the confusion matrix applying five types of single parameter to the
proposed STM-LSTM-based fall detection system. The reason why there is no walking
pattern in the confusion matrix except for the single parameter PS is that the LSTM inference
is not applied if the threshold value is not reached, as shown in the algorithm in Figure 3b.
Applying the single parameter PG achieves the best accuracy, sensitivity, and specificity of
100%. In Table 4 and Figure 7, applying the single parameters, the training data performance
of the proposed STM-LSTM-based fall detection system is compared with that of the
previously reported STM-HMM-based fall detection systems. In Figure 7, the solid and
dashed lines denote training data performances of the STM-LSTM and STM-HMM-based
fall detection systems, respectively, and the squares, circles, and triangles denote accuracy,
sensitivity, and specificity, respectively. This shows the training data performance calculated
using the network parameters obtained from the training data (i.e., all 560 data). The best
performance of the STM-HMM-based fall detection system is when applying the single
parameter Pθ , while that of the STM-LSTM-based fall detection system is when applying the
single parameter PG . For all single parameters except for the case of the single parameter Pθ ,
the performance of the STM-LSTM-based fall detection system is relatively better than that
of the STM-HMM-based fall detection system, and the best performance of fall detection is
achieved in the STM-LSTM-based fall detection system.
Figure 8 shows the training data performance of the proposed STM-LSTM-based fall
detection system, applying the multiple parameters. Applying the multiple parameter PALL
achieves the best accuracy, sensitivity, and specificity of 100%. The squares, circles, and
triangles denote accuracy, sensitivity, and specificity, respectively. The average performance
with the multiple parameters shows relatively higher accuracy than that with the single
parameters. All 560 data are trained with both HMM and LSTM with the algorithm in
Figure 3a, and all of them are tested without separating any validation data with the
algorithm in Figure 3b. Therefore, they may be overfitted to the extent that accuracy of
100% is obtained for the single parameter PG and the multiple parameter PALL . In order
to resolve the possible overfitting issue, training and validation data should be evaluated
separately in 4:1.
denote accuracy, sensitivity, and specificity, respectively. This shows the training data
performance calculated using the network parameters obtained from the training data (i.e.,
all 560 data). The best performance of the STM-HMM-based fall detection system is when
applying the single parameter Pθ, while that of the STM-LSTM-based fall detection system
is when applying the single parameter PG. For all single parameters except for the case of
Appl. Sci. 2022, 12, 11031 the single parameter Pθ, the performance of the STM-LSTM-based fall detection system is 10 of 18
relatively better than that of the STM-HMM-based fall detection system, and the best per-
formance of fall detection is achieved in the STM-LSTM-based fall detection system.

(a) (b) (c)

(d) (e)

6. Confusion
Figure
Figure 6. Confusionmatrix
matrixapplying 5-types
applying 5-types of single
of single parameter
parameter to the proposed
to the proposed LSTM-based
LSTM-based fall fall
detection system. (a) Pθ, (b) PS, (c) PD, (d) PG, (e) PGD.
detection system. (a) Pθ , (b) PS , (c) PD , (d) PG , (e) PGD .

Table 4. Fall detection results of the proposed STM-LSTM-based and the previously reported STM-
HMM-based fall detection systems, applying the single parameters.

Accuracy [%] Sensitivity [%] Specificity [%]


LSTM HMM LSTM HMM LSTM HMM
Pθ 98.57 99.5 99.06 99.17 97.91 99.96
PS 98.21 96.43 99.31 97.5 96.66 95.63
PD 99.46 98.57 99.68 99.6 99.16 97.81
Appl. Sci. 2022, 12, x FOR PEER REVIEW
PG 100 97.86 100 99.17
11 of 20
100 96.88
PGD 98.92 98.21 99.37 99.17 98.33 97.5

100
Performance [%]

99
98
97
96 LSTM HMM
Accuracy
95 Sensitivity
Specificity
94
Pθ PS PD PG PGD
Single parameters
Figure 7. Training data performances of the STM-LSTM and STM-HMM-based fall detection sys-
Figure
tems, 7. Training
applying data performances of the STM-LSTM and STM-HMM-based
the single parameters. fall detection systems,
applying the single parameters.
Table 4. Fall detection results of the proposed STM-LSTM-based and the previously reported STM-
HMM-based fall detection systems, applying the single parameters.

Accuracy [%] Sensitivity [%] Specificity [%]


LSTM HMM LSTM HMM LSTM HMM
Pθ 98.57 99.5 99.06 99.17 97.91 99.96
PS 98.21 96.43 99.31 97.5 96.66 95.63
Appl. Sci. 2022, 12, 11031 11 of 18
Appl. Sci. 2022, 12, x FOR PEER REVIEW 12 of 20

100

Performance[%]
99

98

97
Accuracy
Sensitivity
96 Specificity

Pθ PS PD PG PGD
Multiple parameters
Figure 8. Training data performance of the STM-LSTM-based fall detection system, applying the
Figure 8. Training data performance of the STM-LSTM-based fall detection
multiple parameters.
system, applying the
multiple parameters.
5. Optimization of LSTM for Fall Detection System
5. Optimization of LSTM
The previously published for Fall Detection
STM-HMM-based System
fall detection system [17] was not veri-
fied against untrained data. It should be verified that the data obtained from the experi-
The previously published STM-HMM-based fall detection system [17] was not verified
ment can be used for actual fall detection. Therefore, it is necessary to use a part of the
against untrained data. It should be verified that
training data as data for verification. Training data and validation datathe follow
data aobtained
ratio of 4 from the experiment
can be used for actual fall detection. Therefore, it is necessary to use a part of the training
to 1, and validation data are not included in training. The validation data performance
calculated from the validation data using the network parameters obtained from the train-
data as data for verification. Training data and validation data follow a ratio of 4 to 1, and
ing data is investigated. Parameter values are as shown in Table 4, and the single param-
validation
eter data sampling,
Pθ, no additional are not included in training.
and no normalization are setThe validation
by default. Since Pθdata
showed performance calculated
from
the bestthe validation
performance dataparameter
as a single using the network
in the parameters
STM-HMM-based obtained
fall detection from the training data is
system
[17], the optimal values
investigated. of parameters
Parameter usedare
values in LSTM and normalization
as shown in Table 4,method are in-
and the single parameter Pθ , no
vestigated for Pθ. The investigating methods and parameter ranges are shown in Table 5.
additional sampling, and no normalization are set by default. Since Pθ showed the best
performance
Table as a single
parameter in the STM-HMM-based fall detection system [17], the
5. Model configuration.

optimal values of parametersParameter


Model Configuration used inRanges/Methods
LSTM and normalization method are investigated for
Normalization Pθ . The investigating methods and parameter
No-normalization, Min-Max,ranges
Z-score are shown in Table 5.
Regularization rate, λ 0.00015 to 0.65
Sampling interval,Table
sn 5. Model configuration. 1, 3, 5, 7, 9
Multiple data dimension Single parameter Pθ
Number of hidden layer nodes, n 6, 13, 32, 64, 128, 256
Model Configuration Parameter Ranges/Methods
5.1. Normalization Normalization No-normalization, Min-Max, Z-score
In this section, the impact of different normalization process of the calculated0.00015
Regularization rate, λ param- to 0.65
Sampling interval,
eters, including No-normalization, sn normalization, and Z-score normalization,
Min-Max 1, 3,is5, 7, 9
examined. Figure Multiple
9 showsdata dimension
the validation Single parameter
data performance of the STM-LSTM-based fall Pθ
detection system
Number when ofapplying
hidden No-normalization,
layer nodes, n Min-Max normalization, and Z-score
6, 13, 32, 64, 128, 256
normalization. Accuracies applying No-normalization, Min-Max normalization, and Z-
score normalization are 95.93%, 92.85%, and 86.6%, respectively. The best performance is
5.1. Normalization
In this section, the impact of different normalization process of the calculated parame-
ters, including No-normalization, Min-Max normalization, and Z-score normalization, is
examined. Figure 9 shows the validation data performance of the STM-LSTM-based fall
detection system when applying No-normalization, Min-Max normalization, and Z-score
normalization. Accuracies applying No-normalization, Min-Max normalization, and Z-
score normalization are 95.93%, 92.85%, and 86.6%, respectively. The best performance
is shown when no normalization is applied, and the best accuracy, sensitivity, and speci-
ficity are 95.93%, 93.75%, and 97.91%, respectively. It showed the worst accuracy when
the Min-Max normalization is applied. To investigate the reason why accuracies of the
Min-Max or Z-score normalizations are lower than that of No-normalization, time series
data of a running pattern and a forward fall pattern before and after the Min-Max or
Z-score normalizations are investigated, as shown in Figure 10. In Figure 10, black and red
Appl. Sci. 2022, 12, x FOR PEER REVIEW 13 of 20

shown when no normalization is applied, and the best accuracy, sensitivity, and specific-
ity are 95.93%, 93.75%, and 97.91%, respectively. It showed the worst accuracy when the
shown when no normalization is applied, normalization
Min-Max and the best accuracy, sensitivity,
is applied. and specific-
To investigate the reason why accuracies of the Min-
Appl. Sci. 2022, 12, 11031 ity are 95.93%, 93.75%, and 97.91%,
Max respectively. It showed the worst
or Z-score normalizations accuracy
are lower than when
that ofthe 12 ofseries
No-normalization, time 18 data
Min-Max normalization is applied. of a To investigate
running patterntheand
reason why accuracies
a forward of the
fall pattern Min-and after the Min-Max or Z-score
before
Max or Z-score normalizations normalizations
are lower than that
are of No-normalization,
investigated, as shown timein series
Figuredata
10. In Figure 10, black and red lines
of a running pattern and a forward fall pattern before and after the Min-Max or Z-score
denote running and forward fall patterns, respectively. After the Min-Max or Z-score nor-
normalizations
lines denote are investigated,
running and as forward
shown in Figure
fall 10. In Figure
patterns, 10, black andAfter
red lines
malizations, the running and respectively.
forward fall patterns thesoMin-Max
are similar to or Z-score
each other that it is
denote running and forward fall patterns, respectively. After the Min-Max or Z-score nor-
normalizations, the running
difficult and forward
to easily fall
distinguish patterns
them, andare thussothe
malizations, the running and forward fall patterns are so similar to each other that it is
similar to
accuracies each
by theother that
Min-Max it Z-score
or
is difficult to easily normalizations
distinguish them,can and
be reduced,
thus compared
the accuraciesto thatbyof the
No-normalization.
Min-Max or Z-score
difficult to easily distinguish them, and thus the accuracies by the Min-Max or Z-score
normalizations
normalizations can be compared
can be reduced, reduced,tocompared to that of No-normalization.
that of No-normalization.
100
100 Accuracy
98
Accuracy Sensitivity

Performance [%]
98 Specificity
96 Sensitivity
Performance [%]
96 Specificity
94
94
92
92
90
90
88
88
86
86
84
84 No normalization Z-score Min-Max
No normalization Z-score Min-Max
Normalization Normalization
Figure 9. Validation data performances of the proposed STM-LSTM-based fall detection system,
Figure 9. Validation data performances of the
applying proposed STM-LSTM-based
normalization methods. fall detection system,
Figure 9. Validation data
applying normalization methods.
performances of the proposed STM-LSTM-based fall detection system,
applying normalization methods.
2
2
Fall forward Fall forward
Fall forward Running Fall forward
Running
1.0 Running
1.0 Running
Normalized Pθ

Normalized Pθ

1
Normalized Pθ

Normalized Pθ

0.5 0.5 0 0

-1 -1
0.0 0.0
Appl. Sci. 2022, 12, x FOR PEER REVIEW 14 of 20
-2 -2
0 1 2 3 4 05 1 2 3 40 51 2 3 4 50 1 2 3 4 5
Time[s] Time[s] Time[s] Time[s]
(a) (a) (b) (b)

80 Fall forward
Running
70
60
Pθ [°]

50
40
30
20
10
0 1 2 3 4 5
Time[s]
(c)

Figure 10. Comparison of data10.


Figure ofComparison
running and forward
of data fall before
of running and fall
and forward after normalization.
before (a) min- (a)
and after normalization.
min-max normalization, (b) Z-score normalization,
max normalization, (b) Z-score normalization, (c) No-normalization. (c) No-normalization.

5.2. Sampling and Regularization Rate


In this section, the impact of different samplings of input parameter and different
regularization rates in the STM-LSTM-based fall detection system is examined. Figure 11
shows validation data accuracy vs. regularization rate in the STM-LSTM-based fall detec-
tion system with respect to the different sampling intervals of input parameter. The regu-
larization rates are 0.00015, 0.00065, 0.0015, 0.0065, 0.015, 0.065, and 0.15. The black squares,
red circles, blue up-triangles, green down-triangles, and purple diamonds denote differ-
Appl. Sci. 2022, 12, 11031 13 of 18

5.2. Sampling and Regularization Rate


In this section, the impact of different samplings of input parameter and different
regularization rates in the STM-LSTM-based fall detection system is examined. Figure 11
shows validation data accuracy vs. regularization rate in the STM-LSTM-based fall de-
tection system with respect to the different sampling intervals of input parameter. The
regularization rates are 0.00015, 0.00065, 0.0015, 0.0065, 0.015, 0.065, and 0.15. The black
squares, red circles, blue up-triangles, green down-triangles, and purple diamonds denote
different sampling intervals of 1, 3, 5, 7, and 9, respectively. Peak values of accuracy of
sampling intervals 1, 3, 5, 7, and 9 are 96.42% at λ = 0.00065, 96.42% at λ = 0.0015, 95.53% at
λ = 0.0065, 94.64% at λ = 0.0065, and 94.64% at λ = 0.0065, respectively. As the sampling
interval increases to 5, the regularization rate at the peak of accuracy increases, but when it
is 5 or more, the regularization rate at the peak of accuracy is the same. When the sampling
interval increases, the peak values of accuracy decrease. As the sampling interval increases,
the length of input data decreases, and thus the noise can be slightly reduced, and the
complexity of the input data can be reduced. Therefore, overfitting can occur due to a small
number of samples [47]. An excessive increase of sampling interval is more dependent
Appl. Sci. 2022, 12, x FOR PEER REVIEW 15 of 20 on
the simplification of data than the noise reduction, resulting in a decrease in accuracy than
an increase in overfitting.

100

95
Accuracy [%]

90
# of samplings
1
85 3
5
7
9
80
0.00015ㅤ
0.00065ㅤ0.0015ㅤ0.0065ㅤ 0.015ㅤ 0.065ㅤ 0.15ㅤ 0.65ㅤ

Regularization rate
Figure
Figure11.
11.Validation
Validationdata
dataaccuracy vs.vs.
accuracy regularization raterate
regularization of the
of STM-LSTM-based fall detection
the STM-LSTM-based fall detection
system
systemwith
withrespect
respecttotodifferent sampling
different samplingintervals.
intervals.

5.3.Hidden
5.3. HiddenLayer
LayerNode
Node and
and Regularization
Regularization Rate
Rate
Inthis
In thissection,
section, the
the impact
impact of different
of different hidden
hidden layerlayer
nodesnodes and different
and different regularization
regularization
ratesininthe
rates theSTM-LSTM-based
STM-LSTM-based fall fall detection
detection systemsystem is examined.
is examined. Figure 12 Figure
shows12theshows
val- the
validation
idation data data accuracy
accuracy vs. regularization
vs. regularization rate of therate of the STM-LSTM-based
STM-LSTM-based fall detectionfall detection
system
system
with withtorespect
respect to thenumbers
the different differentofnumbers
hidden layerof hidden
nodes. layer nodes. The regularization
The regularization rates are
0.00065,
rates are0.0015,
0.00065,0.0065, 0.015,
0.0015, 0.065,0.015,
0.0065, 0.15, and 0.65.
0.065, Theand
0.15, black squares,
0.65. red circles,
The black bluered
squares, up-circles,
triangles, green down-triangles, purple diamonds, and gold left-triangles
blue up-triangles, green down-triangles, purple diamonds, and gold left-triangles denote denote different
numbers
different of the hidden
numbers layer
of the nodeslayer
hidden of 6, nodes
13, 32, of
64,6,128,
13, and 256,128,
32, 64, respectively. Peak values Peak
and 256, respectively.
of accuracy of the numbers of the hidden layer nodes, 6, 13,
values of accuracy of the numbers of the hidden layer nodes, 6, 13, 32, 64, 32, 64, 128, and 256 are128,
94.64%
and 256
at
are 94.64% at λ = 0.00015, 96.42% at λ = 0.00065, 97.32% at λ = 0.0015, 97.32% at λ =at0.0065,
λ = 0.00015, 96.42% at λ = 0.00065, 97.32% at λ = 0.0015, 97.32% at λ = 0.0065, 98.21%
λ98.21%
= 0.015,atand
λ =98.21%
0.015, at
andλ =98.21%
0.065, respectively.
at λ = 0.065,As the number of
respectively. Asthethehidden
number layer
of nodes
the hidden
increases, the regularization rate at the peak of accuracy increases and
layer nodes increases, the regularization rate at the peak of accuracy increases and the the peak values of peak
accuracy increase. This means that as the number of hidden layer nodes
values of accuracy increase. This means that as the number of hidden layer nodes increases, increases, both
accuracy and overfitting
both accuracy can increase
and overfitting due todue
can increase theto
increase of network
the increase complexity
of network [26,47],
complexity [26,47],
and thus the overfitting can be suppressed by increasing the regularization rate [48].
and thus the overfitting can be suppressed by increasing the regularization rate [48].
Meanwhile, as the network complexity increases, the training time also increases, and
thus it is necessary to efficiently select the number of hidden layer nodes suitable for both
higher accuracy and shorter training time. Figure 13 shows the highest validation data
accuracy and computation time of the STM-LSTM-based fall detection system with re-
spect to different number of hidden layer nodes. As the number of hidden layer nodes
Appl.
Appl. 2022,12,
Sci.2022,
Sci. x FOR PEER REVIEW
12,11031 16 of 2014 of 18

Appl. Sci. 2022, 12, x FOR PEER REVIEW 16 of 20

Figure 12.
Figure Validationdata
12. Validation dataaccuracy
accuracy
vs.vs. regularization
regularization raterate of STM-LSTM-based
of the the STM-LSTM-based fall detection
fall detection
system
systemwith
withrespect
respecttotodifferent
differentnumber
number of of
hidden layer
hidden nodes.
layer nodes.

Meanwhile, as the network complexity increases, 100 the training time also increases, and
32Time to efficiently select the number of hidden
thus it is necessary 99 layer nodes suitable for both
Accuracy
higher
30 accuracy and shorter training time. Figure 13 shows
98 the highest validation data
#of hidden layer nodes
accuracy and computation time of the STM-LSTM-based fall detection system with respect

Accuracy [%]
Time [min]

28 97
to different number of hidden layer nodes. As the number of hidden layer nodes increases,
the highest validation data accuracy increases, and computation 96 time increases rapidly at
26
256 nodes. It showed the same best accuracy in the number 95 of hidden layer nodes of 128
and 256.
Figure Accordingly,
2412. Validation dataitaccuracy
can be determined thatrate
vs. regularization the of
number of hidden layer
the STM-LSTM-based fallnodes of 128 is
detection
94
system with respect toindifferent
most appropriate terms of number of hidden
accuracy and layer
time.nodes.
22 93
100
20 92
32 Time 99
Accuracy 91
18
30 98
#of hidden layer nodes 90
Accuracy [%]

6ㅤ 13ㅤ 32ㅤ 64ㅤ 128ㅤ 256ㅤ


Time [min]

28 97

26 Hidden layer nodes 96


95
Figure2413. Highest validation data accuracy and computation time of the STM-LSTM-based fall de-
tection system with respect to different number of hidden layer94
nodes.
22 93
5.4. Summary
20 92
In Section 4, the training data accuracy of STM-LSTM-based fall detection system
showed 91
18 100% for the single parameter PG and the multiple parameter PALL, and thus it may
be overfitted in practice. To resolve the overfitting issue90and find optimal values of the
parameters6used ㅤ ㅤ
in13LSTM, 32 ㅤ validation
the 64ㅤ 128 dataㅤperformance
256ㅤ calculated from the validation
data, using the network parameters obtained from the training data, was investigated
Hidden layer nodes
with respect to normalization method, sampling, regularization rate, and hidden layer
node,
Figure as13.follows:
Figure13. Highest
Highestvalidation datadata
validation accuracy and computation
accuracy time of time
and computation the STM-LSTM-based fall de-
of the STM-LSTM-based fall
tection
1. Assystem
detection some with
systemhumanrespect
with to different
activity
respect number
topatterns
different of the
after
numberhidden layer nodes.
Min-Max
of hidden or the
layer Z-score normalizations
nodes.
are so similar to each other and are difficult to distinguish, the accuracy of distin-
5.4.
5.4.Summary
Summary
guishing falls tends to decrease, and this leads to the conclusion that a normalization
In
is Section
Innot
Section 4,4,the
suitable thetraining
for fall data
training
this dataaccuracy
accuracy
detection. of STM-LSTM-based
of STM-LSTM-based fall detection system
fall detection system
showed 100% for the single parameter P and the multiple parameter
showed 100% for the single parameter PG and the multiple parameter PALL , and thus it
G P ALL, and thus it may
be
mayoverfitted in practice.
be overfitted To resolve
in practice. the overfitting
To resolve issue and
the overfitting issuefind
andoptimal valuesvalues
find optimal of the of the
parameters used in LSTM, the validation data performance calculated
parameters used in LSTM, the validation data performance calculated from the validation from the validation
data,
data, using
using thethe network
network parameters
parametersobtained
obtainedfromfromthe the training
training data,
data, was
was investigated
investigated with
with respect to normalization method, sampling, regularization rate, and hidden layer
node, as follows:
1. As some human activity patterns after the Min-Max or the Z-score normalizations
are so similar to each other and are difficult to distinguish, the accuracy of distin-
guishing falls tends to decrease, and this leads to the conclusion that a normalization
Appl. Sci. 2022, 12, 11031 15 of 18

respect to normalization method, sampling, regularization rate, and hidden layer node,
as follows:
1. As some human activity patterns after the Min-Max or the Z-score normalizations are
so similar to each other and are difficult to distinguish, the accuracy of distinguishing
falls tends to decrease, and this leads to the conclusion that a normalization is not
suitable for this fall detection.
2. When the sampling interval increases, the peak values of accuracy decrease due to
overfitting by reduced input data. Therefore, larger sampling intervals require higher
regularization to reduce overfitting.
3. As the number of hidden layer nodes increases, both accuracy and overfitting can
increase due to the increase of network complexity. Therefore, the higher the hidden
layer node and the higher the regularization rate, the higher the accuracy and the
lower the overfitting, respectively.
4. In terms of higher accuracy and shorter computation time, the optimal values of
the parameters of the LSTM and normalization method are found as follows: No-
normalization and no-sampling with 128 hidden layer nodes and regularization rate
of 0.015. This is best accuracy of 98.21% with a relatively short computation time.

6. Conclusions
In this paper, the STM-LSTM-based fall detection system that combines the simple
threshold method and the LSTM was proposed. The proposed system was based on the
single and multiple parameters calculated from the three-axial acceleration data. In training
mode, the parameters were normalized and then the parameters were learned using LSTM.
In test mode, the first possible fall was primarily determined when both PS and Pθ exceeded
the thresholds of 2.5g and 46.42◦ , respectively. The first possible fall data were just classified
using LSTM learned in training mode. To examine validity of the proposed STM-LSTM
fall detection system, it was compared with the previously reported STM-HMM-based fall
detection system. The best training accuracy by the STM-LSTM-based fall detection system
is 100% for PG , 0.5% higher than that of the STM-HMM-based system. However, since
training data accuracy for this comparison is a result of training data only without validity
data, the risk of overfitting may be occurred. To solve the overfitting problem and find
optimal values of the parameters used in LSTM, the validation data performance calculated
from the validation data, using the network parameters obtained from the training data,
was investigated with respect to normalization method, sampling, regularization rate, and
hidden layer node. For normalization and sampling, it showed the best performance in
no normalization and no sampling. The computation time by 128 hidden layer nodes was
significantly shorter than that of 256 nodes although the same best accuracy was 98.21%
for both 128 and 256 nodes. As the number of hidden layer nodes or sampling interval
increase, the regularization rate at the highest value of accuracy increases. This can be
interpreted as suppressing overfitting by increasing the regularization rate. Therefore, the
fall detection efficiency can be improved by selecting an appropriate number of hidden
layer nodes and regularization rate.
However, this paper still has many directions worthy of improvement. The proposed
STM-LSTM fall detection system was compared with the previously reported STM-HMM
fall detection system, and the data set of the previously reported STM-HMM fall detection
system was used and it has the following limitations: First, it is necessary to verify a relia-
bility through comparison with the data set acquired from the self-developed embedded
edge device and that from commercial device. Second, it is very simple because it consists
only of four types of ADL and three types of fall gestures. Third, as the subject population
consists of four men and two women, it has a gender and age imbalance. To solve this
dataset problem, it is necessary to apply this proposed STM-LSTM fall detection system to
public datasets verified for fall detection. When public datasets are applied to the proposed
system in the future, they are pre-processed with single, double, and multiple parameters,
Appl. Sci. 2022, 12, 11031 16 of 18

and optimal network parameters can be newly and easily found using the optimal network
parameters and optimization method introduced in this paper.

Author Contributions: Conceptualization, S.S.J., N.H.K., and Y.S.Y.; methodology, S.S.J., N.H.K., and
Y.S.Y.; investigation, S.S.J. and Y.S.Y.; data curation, S.S.J.; writing—original draft preparation, S.S.J.
and Y.S.Y.; writing—review and editing, S.S.J. and Y.S.Y.; supervision, Y.S.Y.; project administration,
Y.S.Y.; funding acquisition, Y.S.Y. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was supported by the Basic Science Research Program through NRF of Korea
funded by the Ministry of Education (NRF-2019R1F1A1060383).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Alshammari, S.A.; Alhassan, A.M.; Aldawsari, M.A.; Bazuhair, F.O.; Alotaibi, F.K.; Aldakhil, A.A.; Abdulfattah, F.W. Falls among
elderly and its relation with their health problems and surrounding environmental factors in Riyadh. J. Fam. Community Med.
2018, 25, 29–34. [CrossRef]
2. World Health Organization. Available online: https://www.who.int/zh/news-room/fact-sheets/detail/falls (accessed on
9 March 2020).
3. National Health Administration, Ministry of Health and Welfare. Available online: https://www.hpa.gov.tw/Pages/Detail.aspx?
nodeid=807&pid=4326 (accessed on 9 March 2020).
4. Igual, R.; Medrano, C.; Plaza, I. Challenges, issues and trends in fall detection systems. BioMed. Eng. Online 2013, 12, 66.
[CrossRef]
5. Wang, X.; Ellul, J.; Azzopardi, G. Elderly fall detection systems: A literature survey. Frontiers 2020, 7, 71. [CrossRef]
6. Ramachandran, A.; Karuppiah, A. A survey on recent advances in wearable fall detection systems. BioMed Res. Int. 2020,
2020, 2167160. [CrossRef]
7. Ren, L.; Peng, Y. Research of fall detection and fall prevention technologies: A systematic review. IEEE Access 2019, 7, 77702–77722.
[CrossRef]
8. Taylor, R.M.; Marc, E.C.; Vangeli, S.M.; Anne, H.H.N.; Coralys, C.R. SmartFall: A smartwatch-based fall detection system using
deep learning. Sensors 2018, 18, 10. [CrossRef]
9. Vilarinho, T.; Farshchian, B.; Bajer, D.G.; Dahl, O.H.; Egge, I.; Hegdal, S.S.; Lones, A.; Slettevold, J.N. A combined smartphone
and smartwatch fall detection system. In Proceedings of the 2015 IEEE International Conference on Computer and Information
Technology, Ubiquitous Computing and Communications, Dependable, Autonomic and Secure Computing, Pervasive Intelligence
and Computing, Liverpool, UK, 26–28 October 2015; pp. 1443–1448. [CrossRef]
10. Casilari, E.; Oviedo-Jiménez, M.A. Automatic fall detection system based on the combined use of a smartphone and a smartwatch.
PLoS ONE 2015, 10, 11. [CrossRef]
11. Habib, M.A.; Mohktar, M.S.; Kamaruzzaman, S.B.; Lim, K.S.; Pin, T.M.; Ibrahim, F. Smartphone-based solutions for fall detection
and prevention: Challenges and open issues. Sensors 2014, 14, 4. [CrossRef]
12. Rakhman, A.Z.; Nugroho, L.E. Fall detection system using accelerometer and gyroscope based on smartphone. In Proceedings
of the 1st International Conference on Information Technology, Computer, and Electrical Engineering, Semarang, Indonesia,
8 November 2014; pp. 99–104. [CrossRef]
13. Yavu, G.; Kocak, M.; Ergun, G.; Alemdar, H.O.; Yalcin, H.; Incel, O.D.; Ersoy, C. A smartphone based fall detector with online
location support. In Proceedings of the International Workshop on Sensing for App Phones (ACM), Zurich, Switzerland,
2 November 2010; pp. 31–35.
14. He, Y.; Li, Y.; Bao, S.D. Fall detection by built-in tri-accelerometer of smartphone. In Proceedings of the 2012 IEEE-EMBS
International Conference on Biomedical and Health Informatics, Hong Kong, China, 5–7 January 2012; pp. 184–187. [CrossRef]
15. Stefano, A.; Marco, A.; Francesco, B.; Guglielmo, C.; Paolo, C.; Alessio, V. A smartphone-based fall detection system. Pervasive
Mob. Comput. 2012, 8, 6. [CrossRef]
16. Yi, Y.J.; Yu, Y.S. Emergency-monitoring system based on newly-developed fall detection algorithm. J. Inf. Commun. Converg. Eng.
2013, 11, 3. [CrossRef]
17. Lim, D.H.; Park, C.H.; Kim, N.H. Fall-detection algorithm using 3-axis acceleration: Combination with simple threshold and
hidden Markov model. J. Appl. Math. 2014, 2014, 896030. [CrossRef]
18. Jiang, M.; Chen, Y.; Zhao, Y.; Cai, A. A real-time fall detection system based on HMM and RVM. In Proceedings of the 2013 Visual
Communications and Image Processing (VCIP), Kuching, Malaysia, 17–20 November 2013; pp. 1–6. [CrossRef]
Appl. Sci. 2022, 12, 11031 17 of 18

19. Thome, N.; Miguet, S.; Ambellouis, S. A real-time, multiview fall detection system: A LHMM-based approach. IEEE Trans.
Circuits Syst. Video Technol. 2008, 18, 1522–1532. [CrossRef]
20. Mistikoglu, G.; Gerek, I.H.; Erdis, E.; Usmen, P.M.; Cakan, H.; Kazan, E.E. Decision tree analysis of construction fall accidents
involving roofers. Expert Syst. Appl. 2015, 42, 2256–2263. [CrossRef]
21. Liu, C.L.; Lee, C.H.; Lin, P.M. A fall detection system using k-nearest neighbor classifier. Expert Syst. Appl. 2010, 37, 7174–7181.
[CrossRef]
22. Liu, L.; Popescu, M.; Skubic, M.; Rantz, M.; Yardibi, T.; Cuddihy, P. Automatic fall detection based on Doppler radar motion
signature. In Proceedings of the 2011 5th International Conference on Pervasive Computing Technologies for Healthcare
(PervasiveHealth), Dublin, Ireland, 23–26 May 2011; pp. 222–225. [CrossRef]
23. Zhang, T.; Wang, J.; Xu, L.; Liu, P. Fall Detection by Wearable Sensor and One-Class SVM Algorithm. In Intelligent Computing in
Signal Processing and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2006; pp. 858–863. [CrossRef]
24. Ghahramani, Z. An introduction to hidden Markov models and Bayesian networks. Int. J. Patt. Recogn. Artif. Intell. 2001, 15, 9–42.
[CrossRef]
25. Anguita, D.; Ghio, A.; Greco, N.; Oneto, L.; Ridella, S. Model selection for support vector machines: Advantages and disadvantages
of the machine learning theory. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJNN), Barcelona,
Spain, 18–23 July 2010; pp. 1–8. [CrossRef]
26. Lin, C.B.; Dong, Z.; Kuan, W.K.; Huang, Y.F. A framework for fall detection based on OpenPose skeleton and LSTM/GRU models.
Appl. Sci. 2020, 11, 1. [CrossRef]
27. Chen, W.; Jiang, Z.; Guo, H.; Ni, X. Fall detection based on key points of human-skeleton using openpose. Symmetry 2020, 12, 744.
[CrossRef]
28. Ajerla, D.; Mahfuz, S.; Zulkernine, F. A real-time patient monitoring framework for fall detection. Wirel. Commun. Mob. Comput.
2019, 2019, 9507938. [CrossRef]
29. Queralta, J.P.; Gia, T.N.; Tenhunen, H.; Westerlund, T. Edge-AI in LoRa-based health monitoring: Fall detection system with fog
computing and LSTM recurrent neural networks. In Proceedings of the 42nd International Conference on Telecommunications
and Signal Processing (TSP), IEEE, Budapest, Hungary, 1–3 July 2019; pp. 601–604.
30. Lu, N.; Wu, Y.; Feng, L.; Song, J. Deep learning for fall detection: Three-dimensional CNN combined with LSTM on video
kinematic data. IEEE J. Biomed. Health Inform. 2018, 23, 314–323. [CrossRef]
31. Santos, G.L.; Endo, P.T.; Monteiro, K.H.D.C.; Rocha, E.D.S.; Silva, I.; Lynn, T. Accelerometer-based human fall detection using
convolutional neural networks. Sensors 2019, 19, 7. [CrossRef]
32. Kwolek, B.; Kepski, M. Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 2015, 168, 637–645.
[CrossRef]
33. Gasparrini, S.; Cippitelli, E.; Spinsante, S.; Gambi, E. A depth-based fall detection system using a Kinect® sensor. Sensors 2014, 14,
2756–2775. [CrossRef] [PubMed]
34. Maitre, J.; Bouchard, K.; Gaboury, S. Fall detection with UWB radars and CNN-LSTM architecture. IEEE J. Biomed. Health Inform.
2020, 25, 1273–1283. [CrossRef] [PubMed]
35. Galvão, Y.M.; Ferreira, J.; Albuquerque, V.A.; Barros, P.; Fernandes, B.J. A multimodal approach using deep learning for fall
detection. Expert Syst. Appl. 2021, 168, 114226. [CrossRef]
36. Adhikari, K.; Bouchachia, H.; Nait-Charif, H. Activity recognition for indoor fall detection using convolutional neural network.
In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan,
8–12 May 2017; pp. 81–84. [CrossRef]
37. Casilari, E.; Lora-Rivera, R.; García-Lagos, F. A study on the application of convolutional neural networks to fall detection
evaluated with multiple public datasets. Sensors 2020, 20, 1466. [CrossRef] [PubMed]
38. Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain.
Fuzziness Knowl. Based Syst. 1998, 6, 107–116. [CrossRef]
39. Kim, N.H.; Yu, Y.S. Fall recognition algorithm using gravity-weighted 3-axis accelerometer data. J. Inst. Electron. Eng. Korea 2012,
50, 1570–1575. [CrossRef]
40. Sepp, H.; Jürgen, S. Long short-term memory. Neural Comput. 1997, 9, 8. [CrossRef]
41. Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings
of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 3–8 December 2018;
pp. 8792–8802. [CrossRef]
42. Rusiecki, A. Trimmed categorical cross-entropy for deep learning with label noise. Electron. Lett. 2019, 55, 319–320. [CrossRef]
43. Burden, F.; Winkler, D. Bayesian regularization of neural networks. Artif. Neural Netw. 2008, 458, 23–42. [CrossRef]
44. Park, P.; Marco, P.D.; Shin, H.; Bang, J. Fault detection and diagnosis using combined autoencoder and long short-term memory
network. Sensors 2019, 19, 4612. [CrossRef] [PubMed]
45. Li, K.; Zhao, X.; Bian, J.; Tan, M. Sequential learning for multimodal 3D human activity recognition with long-short term
memory. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan,
6–9 August 2017; pp. 1556–1561. [CrossRef]
46. Parikh, R.; Mathai, A.; Parikh, S.; Sekhar, G.C.; Thomas, R. Understanding and using sensitivity, specificity and predictive values.
Indian J. Ophthalmol. 2008, 56, 45. [CrossRef] [PubMed]
Appl. Sci. 2022, 12, 11031 18 of 18

47. Bejani, M.M.; Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artif. Intell. Rev. 2021,
54, 6391–6438. [CrossRef]
48. Wang, G.; Lee, K.-C.; Shin, S.-Y. Novel image classification method based on few-shot learning in monkey species. J. Inf. Commun.
Converg. Eng. 2021, 19, 79–83. [CrossRef]

You might also like