0% found this document useful (0 votes)
11 views18 pages

Human Activity Recognition System For Moderate Per

This research article presents a wearable human activity recognition system utilizing accelerometer data and a random forest algorithm, achieving a high classification accuracy of 99.4% for 13 activities. The system operates on a moderate performance microcontroller, allowing real-time monitoring of daily activities with data transmission capabilities via the internet. The study highlights the potential of such wearable devices in enhancing healthcare support services by providing convenient and efficient activity tracking.

Uploaded by

hung0108a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views18 pages

Human Activity Recognition System For Moderate Per

This research article presents a wearable human activity recognition system utilizing accelerometer data and a random forest algorithm, achieving a high classification accuracy of 99.4% for 13 activities. The system operates on a moderate performance microcontroller, allowing real-time monitoring of daily activities with data transmission capabilities via the internet. The study highlights the potential of such wearable devices in enhancing healthcare support services by providing convenient and efficient activity tracking.

Uploaded by

hung0108a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

EAI Endorsed Transactions

on Industrial Networks and Intelligent Systems Research Article

Human Activity Recognition System For Moderate


Performance Microcontroller Using Accelerometer
Data And Random Forest Algorithm
To-Hieu Dao1,2 , Hoang Thi Hai Yen3 , Van-Nhat Hoang1 ,Duc-Tan Tran1,2 ,Duc-Nghia Tran4,∗
1 Faculty of Electrical and Electronic Engineering, Phenikaa University, Yen Nghia, Ha Noi, 12116, Viet Nam
2 PHENIKAA Research and Technology Institute (PRATI), A&A Green Phoenix Group JSC, No.167 Hoang Ngan,
Trung Hoa, Cau Giay, Ha Noi, 11313, Viet Nam
3 Thai Nguyen University, University of Information and Communication Technology, Thai Nguyen, Viet Nam
4 Institute of Information Technology, Vietnam Academy of Science and Technology, Ha Noi, Viet Nam

Abstract

There has been increasing interest in the application of artificial intelligence technologies to improve the
quality of support services in healthcare. Some constraints, such as space, infrastructure, and environmental
conditions, present challenges with assistive devices for humans. This paper proposed a wearable-based real-
time human activity recognition system to monitor daily activities. The classification was done directly on the
device, and the results could be checked over the internet. The accelerometer data collection application was
developed on the device with a sampling frequency of 20Hz, and the random forest algorithm was embedded
in the hardware. To improve the accuracy of the recognition system, a feature vector of 31 dimensions was
calculated and used as an input per time window. Besides, the dynamic window method applied by the
proposed model allowed us to change the data sampling time (1-3 seconds) and increase the performance
of activity classification. The experiment results showed that the proposed system could classify 13 activities
with a high accuracy of 99.4%. The rate of correctly classified activities was 96.1%. This work is promising for
healthcare because of the convenience and simplicity of wearables.

Received on 19 August 2022; accepted on 14 October2022; published on 09 November 2022


Keywords: Classification, recognition, activity, real-time, wearable, microcontroller, moderate performance
Copyright © 2022 To-Hieu Dao et al., licensed to EAI. This is an open access article distributed under the terms of the CC
BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any
medium so long as the original work is properly cited.
doi:10.4108/eetinis.v9i4.2571

1. Introduction apply with real-time activity data and required the


establishment of spatial and infrastructure conditions
In recent years, the digital transformation in health- in advance. Wearable sensor technologies aimed to
care has taken place strongly, and four prominent develop into a system that allowed doctors and nurses
approaches have emerged: i) Using robots to interact to monitor patients’ health remotely and promptly give
with patients, provide personalized services, and pro- them useful advice. Wearable sensors [6] are sensors
vide advice [1]; ii) Optimizing the use of personal located directly or indirectly on the body to mea-
electronic wearable devices to monitor health status sure raw signals such as accelerometers, gyroscopes,
and prevent diseases [2, 3]; iii) Using computer vision angles, etc. from human activities. For the monitor-
in healthcare [4]; iv) Actively using artificial intelli- ing of activities, inertial sensors such as accelerome-
gence and big data in the treatment of serious diseases ters and gyroscopes have been extensively employed.
[5]. The above approaches require complex data ware- Gyroscopes detect changes in orientation, such as rota-
houses and digital platforms, with high construction tional displacement, velocity, and acceleration, whereas
and implementation costs. Thus, they were difficult to accelerometer sensors measure changes in velocity and
displacement. Consequently, many researchers have
∗ Corresponding author. Email: [email protected]. been interested in building wearable devices based on

EAI Endorsed Transactions on


1
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

Fig 2. Machine learning algorithms were fed a set of relevant


features for analysis, while deep learning algorithms worked with
Fig 1. Wearable sensor locations on the body. raw data and decided on their own the relevant features.

data obtained from inertial sensors to monitor daily to monitor sports cycles, monitor living habits and
activities [7–12]. determine energy consumption for activities.
Internet of things (IoT) applications [13–15] in the The performance of the recognition model improved
field of predicting human activities were growing substantially when machine learning was utilized to
strongly. Many works on human activity recognition identify human activities [3, 6, 7, 10–12, 15, 19, 24,
(HAR) [16, 17] ways of increasing interest due 29]. While the device’s integrated processors were
to its availability in practical applications such as low-power microcontrollers, they had memory and
sports tracking systems [18]; monitoring systems and processing speed limitations. Besides, ML application
prevention of sedentary conditions at work in the recognition models must be optimised and compiled
office [19]. Moreover, sensors integrated with wearable into the C/C++ language before being embedded in
devices have many advantages, such as cost savings, the microcontroller. Thus, choosing an embeddable
fast response time, cost savings, and low energy machine learning algorithm on microcontrollers was a
consumption, making them suitable for wearable big challenge. Embedded devices typically have limited
applications [7, 12, 20–22]. memory, processing power, power capacity, and more.
Sensor placement (Fig 1) greatly affected the Since embedded systems were designed for specific
performance of the recognition model [6, 11, 12, 15, 19, uses, there were limited resources left for machine
20, 23, 24]. Sensors must be attached at an appropriate learning models. To solve this problem, we built a
location on the body. This was determined by the human activity recognition model based on a machine
activities that required monitoring. Smart watches are learning algorithm with low complexity, small size, and
usually worn on the left wrist. It is often less related embedding ability in microcontrollers.
to activities that do not use the hand, such as sitting, Machine learning algorithms (Fig 2) based on
standing, lying down... Besides, the sensors attached to past experience, observations, and data to predict
the waist help with overall body movement monitoring future corresponding work instead of just following
and major body movements. But data collected from established rules pre-programmed. In general, there
waist circumference is often less sensitive to activities were four main approaches in the field of machine
involving the use of hands, such as writing, holding, learning, including: 1) Supervised learning [30, 31]; 2)
grasping, etc. Therefore, recent works [25–28] have Unsupervised learning [32]; 3) Reinforcement learning
combined many sensors in different locations to [33]; 4) Semi-supervised learning [34]. With HAR,
increase the amount of information collected about supervised learning algorithms would predict the label
activities. However, carrying many wearable devices on (activity) of a new descriptive dataset based on the
the body could cause discomfort. In addition, wearable correlation between that label and the previously
devices needed to be compact in size and light in weight known as descriptive data. In mathematics, a set of
to create a comfortable feeling for the users, especially input data had the form χ = {x0 , x1 , x2 , ..., xn } and a
for the elderly or those recovering from an accident. set of labels had the form γ = {y0 , y1 , x2 , ..., yn }. Where,
These devices were worn on the body during treatment xi was a data descriptor vector for the label yi with
to monitor the progress of daily activities and detect i = 1, 2, ..., N . Data pairs of the form (xi , yi ) ∈ χ × γ
abnormal activities. An example: a sudden change were divided into training and testing datasets. On the
in posture, such as lying down while walking. Not training dataset, a correlation function was calculated
only that, healthy people could use wearable devices to map the elements of the set χ to a corresponding

EAI Endorsed Transactions on


2
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

element of the set γ. From there, when new data x footsteps, gyroscope acceleration, and heart rate to
was available, this function f would predict the label recognize human activities. The features were extracted
y corresponding to y = f (x). However, this function did using the principal component analysis (PCA) method.
not exist in practice, so the function f needed to be built Classification algorithms were used in C4.5, random
well enough for the best classification performance forest (RF), K nearest neighbours algorithm (KNN), and
so that y ≈ f (x). Building a good HAR model was SVM. Both the k-nearest neighbors and Nave Bayes
influenced by many factors, including the quality of classifiers were compared to the use of accelerometers
data describing the action, the size of the data window, and gyroscopes separately by Aiguo et al. [36]. We
features used, the machine learning algorithm applied, know from experiments that using both gyroscopes
and the complexity of activities. and accelerometers together improves categorization
In addition, limited memory on wearables made accuracy. Naive Bayes achieved a better overall accuracy
it difficult to process large amounts of data and an (90.1% and 87.8%) than KNN. A large number of
increased number of activities. By using the dynamic sensors could improve accuracy, but it would be
windowing approach, our study has concentrated on impractical to have to wear them all the time.
enhancing data quality to capture the timeframe of state Adding more sensors would also make the system
change operations. In this study, a device was designed more expensive. Another study by Biagetti et al.
like a belt, which can be worn on the waist in order [8] proposed a human activity recognition system
to classify many activities and cause less discomfort consisting of wireless sensor network nodes (biological
for users. First, we collected accelerometer data for and accelerometer) and transmitted to a computer for
each activity and extracted a set of 31 features (time data analysis. Results when applying the KNN classifier
domain) per data window. Next, sets of features were achieved an overall accuracy of 85.7%. In the study
divided into training and testing sets at the rate of [35], Yang and Zhang propose a wearable operationally
75/25. The random forest algorithm was applied to categorized system that resembles a wristwatch and is
classify 13 routine activities. Finally, the appropriate worn on the hand. The time and frequency domain
model would be installed on a microcontroller with characteristics of the accelerometer data are extracted,
moderate performance (ESP32). Volunteers supported followed by the use of the decision tree method. On the
us in completing research to assess the usefulness of STM32L low-power microcontroller, their modelling
the dynamic window approach. Each device was able can be executed in real time. Despite the short number
to transmit activity data to the data server via wifi of activities (walking, sitting, jumping, bicycling, and
or the internet. Through an application called "Human jogging), the accuracy is below 90%. Five biaxial
Activity Classification" that we provide to the Google accelerometers were worn in a variety of positions on
Play store, users can track the activities of volunteers. the study team by Bao et al. [37] including the hip, wrist,
The activity information was logged on the data server arm, ankle, and thigh. 20 activities were categorized
and backed up on the integrated memory card of each with an accuracy of 84% by the decision tree classifier.
device. Each volunteer would wear a belt-mounted The increased number of sensors, moving data, and
device. Since then, an internet-based remote activity accelerometer set to the designated orientation were
monitoring system has been developed. With this the restriction, though. Many scientists in the last few
method, we could evaluate the classification accuracy years have looked into the possibility of employing a
of wearable devices in a real-time environment. As single accelerometer to collect the signal necessary for
a result, the experimental result was evaluated and activity recognition [38]. Piyush Gupta et al. [38] used a
compared with a number of related works on the belt-worn 3-axis accelerometer to construct an activity
classification of real-time HAR in the papers [12, 22, identification and feature selection system. Using Nave
35]. Bayes and KNN, they observed that wrapper-based
feature selection was superior than filter-based. Data
collection was limited to seven volunteers. All were
2. Related works
young (22–28). Thu and co-authors [12] built a real-
Various machine learning algorithms have been time recognition system for six activities on low-
applied to build activity recognition models in performance microcontrollers. Their system used two
many recent studies and achieved impressive results. features (mean and standard deviation) in a combined
Research by Mannini et al. [25] suggested using decision tree algorithm. The result achieved above,
support vector machine (SVM) classifiers and activity 92%, was quite good, but this result was lower than
data collected from sensors located at the ankle the result achieved on their collected data (99%). The
and wrist. The obtained results showed that the difference came from the data itself and the limited
data collected at the ankle was better (10%) than number of activities. For example: when switching from
the data collected at the wrist. Unlike them, Bali a static state (lying) to a dynamic state (sitting up) or
and co-authors [29] combined information including from a dynamic state (sitting down) to a stationary state

EAI Endorsed Transactions on


3
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

(sitting), these activities could be confused with the


activities in a state of walking or jogging.

3. System model
3.1. Wearable device
Frequently, the posture and velocity of human
movements alter. Different activities produced different
3-axis acceleration values according to where the
accelerometer was placed on the body [12, 23, 35]. In
this research, accelerometer signals from the MPU6050
sensor were collected. Fig 3 shows the block diagram
of the proposed system. The inertial sensor used was
the MPU6050, capable of measuring 6 axes, including
Fig 3. Structure of the wirelessly connected wearable device.
3 axes of acceleration and 3 axes of gyroscope. The
ESP32 was a power-saving microcontroller circuit
that integrated Wi-Fi and Bluetooth. Additionally, the
ESP32 was equipped with a 16MB flash memory, which
was widely used in IOT applications. Besides, the
ESP32 also supported the integration of embedding
mini machine learning algorithms through integrated
tools such as tensorflow, micropython. Therefore, this
was the ideal choice when integrating machine learning
algorithms such as random forest, support vector
machine, et cetera. In this work, the central processor
(ESP32) used I2C communication (inter-integrated
circuit) with MPU6050 and DS1307 (time integrated
circuit) to collect acceleration data over time at a
sampling rate of 20 Hz (20 samples per second).
Fig 4. A voltage divider circuit diagram for battery power
A human activity usually lasts from 2 seconds to
calculation.
3 seconds and this time is longer in the elderly or
people who have just recovered from an accident.
Besides, a sampling frequency of 50Hz or more did not In order to limit power depletion during operation,
give a better result [36], and the performance of the the device needed to be charged after a period of
device depended on the accuracy when combining the continuous operation. The alarm information was
classification algorithm with data features. Acceleration sent to users via the mobile application (Fig 5 left),
values were calculated according to the formula (1). and the alarm sounds from the device. Because the
The source of use of the wearable device was a 3.7V- battery voltage (VBAT) was between 3.7V and 4.2V but
2000 mah lithium battery, so the battery power it could the microcontroller’s maximum withstand voltage was
provide was 3.7V × 2000maH = 7400mW h. 3.3V, the battery voltage signal was passed through the
voltage divider circuit (Fig 4). Next, the capacity of the
3.2. Energy consumption battery was calculated by formula (2) as a percentage.
As a result, the battery reached 100% when the battery
The average power consumption of the device in voltage was 4.2V. The device stopped working when the
each working hour was about 60mWh. The device battery voltage dropped to 3.7V, or 0%. On the phone
could work for up to 7400 : 60 ≈ 123.33 hours, which application, the capacity warning icon needed to be
is equivalent to 5.14 days. charged when the battery capacity was below 10%.
Sami adc ∗ Vref R4 + R5
!
100%
× R − Oi P erpin = × − 3.7 × (2)
Ai = 1024 (1) 2re − 1 R5 0.5
Si

In that, Ai was the acceleration value in the direction


i; Sami was the sampled value on the i-axis; R was the
3.3. Mobile monitoring application
reference suspension resistor; Oi was compensation and To collect activity data, the wearable device was
Si was the sensitivity. configured as a mini server, and users could connect

EAI Endorsed Transactions on


4
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Fig 6. Six tests with the activity of rising.

the activities, each volunteer would wear 6 devices at 6


Fig 5. Low battery warning on the app (left) and data logger corresponding positions: head, chest, waist, right wrist,
application interface (right). right, and right ankle. The collection device included
three sensors: an accelerometer, a gyroscope, and a
magnetometer with a sampling frequency of 25Hz.
to the device via wifi thanks to an interactive app Researchers [23, 38] investigated the fall state based
called “Accelerometer Gyrometer Logger”. It is publicly on this dataset and showed that the sensor data
shared on the Google Play store. This data was collected at the waist position gave the best result.
stored on 2GB of expandable local memory, and this However, the data included a number of similar
memory communicated with the ESP32 via a standard activities, so it was not necessary to distinguish the
serial peripheral interface (SPI). Each collecting and details. For example, sit-down on the bed, chair, in
monitoring device was a node with its own address code the air, or on the sofa. Thus, we would combine
and communicated with the data server by using the these activities into a common action. For example:
HTTP/GET protocol. Activity data was saved on the walking forward and backward into walking; sit-down
MySQL database. on different surfaces such as sit-air, sit-bed, sit-sofa,
The application interface (Fig 5 right) provided sit-chair into sit-down; tripping, sliding but controlled
users with functions including: setting the sampling states into tripover; coughing-sneezing.
frequency; labels corresponding to activity to be The data collected in the public dataset needed to
collected; 3-axis accelerometer and 3-axis gyroscope; be processed to improve classification accuracy. Fig 6
function to delete the collected data when users made shows the rising activity after 6 tests over a period of
a mistake in the operation or the data errors; battery more than 250 seconds, with acc_x, acc_y, and acc_z
capacity notifications; and the logging function of corresponding to acceleration on the x, y, and z axes.
the collected data was controlled directly via the With a sampling frequency of 25Hz, the data of the tests
START/STOP function button. Besides interacting with was approximately 430 samples. In the first 150 data
the software, users could enable or stop the data logging samples, the volunteer was horizontal, barely moving
function by using a physical button on the device. In (static state), then he sat up for the next 2-3 seconds,
addition, pressing the button for 5 seconds could delete and finally sat motionless (static state).
the recorded data. After that, the device would delete
Signal statistics in these tests showed the information
the file and reboot.
of activity was concentrated in signal regions on three
x, y or z axes and had a range of greater than 0.5m/s2 .
4. Research methods Therefore, we determined the activity state was static
or dynamic on each signal segment of size 1 second
4.1. Sampling and pre-processing (time windows). After reducing the activity data in the
Public dataset. The activities of daily living (ADLs) static state, the descriptive information of the actions
dataset introduced in work [37] described 36 move- was more concentrated (Fig 7) than before processing.
ments, including 20 falls and 16 daily living activities. Besides, non-normal values would be replaced
Data of activities were collected by 17 volunteers (10 by interpolated values, which were determined by
males and 7 females). Each volunteer performed each averaging the adjacent signal magnitudes before and
activity in turn, and each action consisted of 5 or 6 tests after those values. The above process was applied
between 30 seconds and 60 seconds. When performing similarly to the remaining activities.

EAI Endorsed Transactions on


5
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

Tab 1. Description activities.

Duration
Activity Description
time(s)
Walking normally, moving slowly,
Walking 2
walking forward and backward.
Jogging Jogging and running. 2
Squatting Squatting down then standing up. 5
Bending Bending about 90 degrees. 1
Bending to pick up an object on
Bendp 4
the floor
Limp Walking with a limp. 2
Tripping or stumbling but then
Tripover 2
stand up and walking normally
Fig 7. Rising after pre-processing. Certain acceleration onto a chair, a
Sit-down 3
sofa, air or bed.
Lie-down Sitting to lying on the bed. 3
The public dataset contained the data for activities Rising From lying to sitting 4
that follow the same process: state-dynamic-static but Lying horizontally, can turn
Lying 1
including repetitive and non-repeating activities. For around horizontally.
Stand up and standing straight,
example, lying down (lie-down) was a non-repetitive
Standing coughing or sneezing while stand- 1
activity, volunteers had to rise and then did it again,
ing.
similar to activities like sit-down, rising, tripover, Sitting motionless, back against
and squatting. In contrast, activities such as walking, Sitting the chair and body slightly tilted 1
jogging, and limp could be performed continuously. back.
Therefore, in order to improve the classification
performance, we classified three activities in a static
state, including sitting, standing, and lying. The data
for these activities was the data areas separated from
the data processing of non-repeating activities. For
example, the descriptive data for two activities sitting
and lying were the data areas in the static state
before and after the lie-down activity was performed.
Similarly, the data of standing was the signal area
before the sit-down activity took place. However, the
timing of the activities was not exactly the same. For
example, the squatting activity had a duration of up
to 5 seconds, while the transitions such as lie-down,
rising, and sit-down had a duration of 3 seconds. Unlike
them, activities that take place when a person moves,
such as jogging, walking, tripover, and limp only need
2 seconds to be classified. Therefore, we proposed using
different sized windows for each activity in order to
provide better activity information. 13 activities were
presented in Tab 1.

Private dataset. The private dataset was built based on Fig 8. The wearable device (red) was attached to the waist and
a group of volunteers wearing waist data collection secured by an elasticated strap. The x-axis of the accelerometer
devices (Fig 8) and performing the following activities: was parallel to the earth’s gravity.
walking, jogging, squatting, bending, bend-p, limp,
tripover, sit-down, lie-down, rising, lying, standing, and
sitting. This group consisted of 20 students, including when it started to create the file. When performing data
12 males and 8 females. The recorded data had the collection, activities such as sit-down, lie-down, bendp,
following format: activity, timestamp, sensor, values of squatting, and rising were performed as an unwritten
x, y, and z, frequency. Activity logger files with a type rule: nothing before starting in the short time, doing
of TXT were named with the activity and the time activity, nothing after finishing. For example, rising

EAI Endorsed Transactions on


6
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

included lying-rising-standing; lie-down included from


sitting-lying down to bed-lying.
The process of data collection activities was applied
at a different collection time. We set the survey time to 3
seconds for non-repetitive activities (squatting, bendp,
tripover, lie-down, sit-down, rising). Each activity took
place for a period of 3 seconds and was repeated in
subsequent discrete intervals. This would help our data
be centralised and avoid loss when activity data is
divided into time windows. The remaining activities,
such as bending, limp, sitting, standing, lying, walking,
jogging were surveyed at continuous intervals (30
seconds to 50 seconds). Collected data would be
segmented into 3 seconds fixed size windows because Fig 9. The static window method would divide an activity’s data
this dataset has been sampled for limited periods of into sliding windows of the same size over time.
time.
Tab 2. Statistics of data windows for each activity on the dataset.
4.2. Data transformation The number of windows
Activity
Data segmentation was one of the stages of the Public dataset Private dataset
receiving process. This phase allowed us to understand Walking 1010 132
the impact of signals by dividing activity data after Jogging 321 76
Squatting 82 101
preprocessing. The data window had two approaches,
Bending 538 58
including dynamic and static windows.
Bendp 91 106
Limp 439 121
Static window size. With this windowing approach,
Tripover 193 138
the data of activities were segmented into equally
Sit-down 370 129
sized data windows as shown in Fig 9. The size of Lie-down 125 74
windows was an important parameter for gathering Rising 114 89
a lot of information in this approach. As the size Lying 971 115
of the window changed, the amount of information Standing 882 113
changed accordingly. If the window size was too large, Sitting 1046 118
the processor would take too long to process the
information, and there was a chance that information
from other activities might appear. In addition, Dynamic window size. With this approach, the window
choosing a window with a short size caught information size was not constant with activity. In Fig 10, the data
loss or misinformation with transitions occurring in a of squatting (squat down then stand up) was split into
short time. Therefore, this approach was suitable for the different sized windows to detect when this activity
classification of activities in a stationary state (sitting, took place. In practice, activities took place at different
standing, lying, bending) and repetitive activities for a time intervals, and applying dynamic windowing was
long time (jogging, walking, limp). necessary to improve activity predictability. Therefore,
Datasets used after preprocessing, removing noise, we applied the dynamic window method to the
would be segmented and labelled. The data segmenta- proposed model in conducting experiments.
tion was done on the computer, so the static windowing When conducting experiments, we used windows
method was applied at this stage. However, the size of of size 1 second to determine the static state or
the window would depend on how long each activity the dynamic state. Static and dynamic windows were
takes place. The window size applied to each activity determined by calculating range (H) by formula (9)
in the public dataset was shown in Tab 1. For example, and average resultant acceleration (ARA) by formula
lie-down and sit-down were 3 seconds, squatting was (12). Where ARA was the average of the square root
5 seconds, and rising was 4 seconds. Slightly different of the sum of the squares of the signal strength
from public data, activity data in private data was on three axes. Research in [39] collected breathing
segmented into fixed-sized windows of 3 seconds. The data when a person was in a stationary state. This
number of observations after data segmentation was state was determined by X. Sun by calculating the
presented in Tab 2. average resultant acceleration for the 3-axis acceleration
measured on the smartwatch. Accordingly, when a
person was not moving, the calculation results were

EAI Endorsed Transactions on


7
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

These features reflected the correlation between the


original data and each activity and helped to improve
the accuracy of the classification model. This work
considered 12 statistical features including: mean
(µ), standard deviation (σ ), median (Med), average
absolute difference (AAD), maximum value (Max),
minimum value (Min), range, root mean square
(RMS), correlation coefficient (Corr), average resultant
acceleration (AAD), correlation between the average of
Fig 10. Data correlation window. the original signal series (µstart ) and the average of the
final signal series (µstop ) over a time window (CAOSS).
These features were extracted in three axes (x, y, z)
based on the respective formulas (3)-(13).
NX
W −1
1
µXj = xi (3)
NW − 1
i=0
v
u
NX
W −1
t
1
σXj = (xi − µXj )2 (4)
NW − 1
i=0

MedXj = x NW with Xj was sorted (5)


2
NX
W −1
1
AADXj = |xi − µXj | (6)
NW − 1
i=0

MinXj = Minimum(x[i]) (7)


MaxXj = Maximum(x[i]) (8)
Fig 11. After analyzing every second of data for signal changes, RangeXj = MaxXj − MinXj (9)
the algorithm identified dynamic windows after 1–3 seconds.
NX
W −1
|xi |2
approximately 10m/s2 . Therefore, a time window was i=0
defined as a static state if two conditions were satisfied: RMSXj = (10)
NW − 1
H runs from 0 to 1m/s2 and ARA runs from 9.6 to
NX
W −1
10m/s2 . 1
The window size changed from 1-3 seconds had a (xi − µXj )(yi − µYj )
NW − 1
i=0
great influence on the classification results shown in Corr(Xj , Yj ) = (11)
the study [40]. Therefore, in this study, we limited the σXj σYj
maximum number of seconds of data to 3 seconds to v
u
tN
W
identify an activity. This was understood that the data 1 X
ARAXj = (xi 2 + yi 2 + zi 2 ) (12)
window size would be automatically changed for each NW − 1
i=0
activity until the next activity is in a static state or
the time between two consecutive activities should not CAOSS = µstop − µstart (13)
exceed 3 seconds. This algorithm is depicted in Fig 11. In that, Xj was jth data segment j; xi , yi , zi were
i th acceleration values on the 3 x, y, and z axes on
4.3. Feature engineering the Xj ; Nw was the number of samples on the window
After converting to time windows, the data was Xj . Correlation coefficient (Corr(Aj , Bj )) was a function
still difficult to apply directly to the machine learning that calculated correlation between pairs of axes: (x, y),
algorithm and not the entire sample of activity (y, z) và (x, z). In CAOSS, µstart and µstop were the
data needed for the classification process. Therefore, average values of signals on each x, y, and z axis into
the data of activities was transformed into a new the first and last second, respectively. The T-distributed
dataset consisting of selected features in the time Stochastic Neighbour Embedding (t-SNE) [3, 7, 12] tool
domain compatible with the classification algorithms. made it simple to see how these features affected the

EAI Endorsed Transactions on


8
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Fig 12. Activities might be easily recognized by the use of Fig 13. The decision tree would build relationships between
features. There was some confusion between the various activities features and activities.
in the moving state, though.

need an external network and ensured that sensitive


system (Fig 12). To visualize high-dimensional data,
data was not revealed on the public internet. Cloud-
there was a technique called t-SNE. By converting
based ML requires embedded devices to send data to
similarity between data points into joint probabilities,
the cloud for inference. As a result, the cloud-based
it attempted to reduce the Kullback-Leibler divergence
computational model is virtually unlimited, but causes
between the joint probabilities of the low-dimensional
latency. There is always a delay when transferring data
embedding and the high-dimensional data. Although
to and from embedded devices. Embedded systems
there were up to 13 activities that needed to be
are often deployed in locations where connectivity is
classified, the number of activities that were confused
limited, so it is not practical to operate ML for human
was minimal. The activities in the moving state were
activity recognition systems as well as the reliability of
where the confusion was most prevalent.
data in wireless transmission. Not all machine learning
A feature vector with 31 dimensions was calculated
algorithms could be applied to microcontrollers; they
from 13 measurements on the x, y, and z axes of
needed to be miniaturized and optimized to run on
the MPU6050 sensor (acceleration). These features
low-power devices without too much loss of accuracy.
were extracted from each data segment (time window)
Machine learning algorithms of suitable complexity and
of each activity. Then, the features were merged
size could be embedded on microcontrollers, including
and created into a feature vector representing the
decision tree (DT) and random forest (RF). Decision
corresponding data segment. These feature vectors
tree [41] was an algorithm that could imitate human
would be divided into training and testing sets at the
thinking. This algorithm was based on the correlation
rate of 75/25.
between data to understand the logic between input
and output data (Fig 13). Each tree node (rhombic)
4.4. Recognition model represented a data feature threshold, each branch rep-
Random forest. While most data processing workloads resented a rule, and each leaf (ellipse) represented an
with machine learning algorithms run in the cloud, activity or prediction label.
there is another trend towards on-device machine Decision tree models often suffer from the overfitting
learning algorithms (on-device-ML). On-device-ML problem that leads to false predictions. To fit the
means deploying machine learning models on embed- data, it kept creating new nodes, and eventually the
ded devices for direct inference at the real-time data tree became too complex. Hence, it worked greatly
source. Because the prediction and classification of on the training data but started making a lot of
activities take place directly, these devices did not errors on the testing data. These algorithms are often

EAI Endorsed Transactions on


9
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

Fig 14. Many single decision trees are built that will work with
random data from a dataset. Thus, the results of object prediction
were aggregated based on the majority rule from these trees.

unstable when adding a new layer of data and are


not suitable for large datasets. Thus, it was difficult to
apply to microcontrollers when the number of output
predictions was large. Furthermore, not all of the
chosen features are appropriate for activity. This feature Fig 15. Construction diagram of the recognition system.
might be effective for one activity but not for another.
Multiple decision trees were gathered using the
random forest technique to create a more robust model
(Fig 14). This algorithm uses the bootstrap technique the ESP32 microcontroller. The computer that had been
[42] to randomly select data from the dataset (random used to train the model and operate the algorithm had
sampling with replacement). As a result, each new the following configuration: Dell G515, 16GB ram, Intel
dataset might be duplicated, and a new decision tree (R) Core I7-9750H processor, 256GB SSD, 1TB HDD.
would be built on this dataset. The process of building The random forest model applied to the ESP32 was
for each decision tree had a random element, so the illustrated in Fig 16. Activity 1 or activity 2 represented
decision trees in the RF algorithm might be different. 1 of 13 daily activities. The final prediction result for
The output prediction results that were aggregated an activity was based on the prediction results of a
from many independent decision trees will give optimal majority of individual decision trees. The recognition
results. model that gave the best results on the test dataset
Features are involved in and have an impact on was compiled into a library named “RF_acc.h”. This
the outcome of activity prediction with RF. In this was included in the embedded executable on the ESP32
research, the recognition model using the random forest processor. Accordingly, the ESP32 integrated into the
algorithm was built on the training data. The testing wearable would read the acceleration signal from
data yielded the best recognition model, which would the MPU6050. After collecting enough data samples,
then be converted into C/C++ and integrated into the ESP32 would extract a vector of 31 features from
ESP32. each window and perform classification based on this
feature vector. This process was performed every 1
Human activity recognition using a random forest algorithm
second, corresponding to 20 data samples. If 1 second
for wearable devices. The feature set obtained after
windows were static, the system would classify the
feature extraction would be randomly divided into a
activity immediately and continue to classify every 1
training set and a test set at a ratio of 75/25. The
second, followed by the same rule. Conversely, if the
recognition model was trained on a labeled training
current window was in a dynamic state, the processor
dataset corresponding to each feature vector. Then, the
would save 1 second of previous data and continue to
classification result based on the test data would help
consider data windows in the next 1 data second. The
to evaluate the built model. For the random forest
human activity recognition system would result when a
algorithm, we used the publicly shared sklearn library
data window size reaches a maximum interval of 3s or
to build the model. Besides, we used a parameter
a next 1 second window in a static state. Thanks to the
n_estimators to set the number of single decision trees,
soft change of the data sample size (dynamic window
and each branch of a decision tree would work with 1
size), activities were detected faster with less delay. At
feature. In this work, our parameter set was 50. Fig 15
the same time, our system could understand the start
shows the process of building a classification algorithm
time and end time of a forward activity (lie-down, sit-
on a wearable device.
down, rising, bendp, tripover).
To build the best recognition model and achieve high
accuracy, the model was trained on the computer, and
the best model was transformed to be embedded in

EAI Endorsed Transactions on


10
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Tab 3. Example about evaluation parameters with sitting,


standing, and bending.

Human activity
Sitting Standing Bending
Sitting 100 2 1
Predicted Standing 4 50 2
Bending 3 0 70

Cross validation. The input data must be large to achieve


a good type of recognition model. However, this was
unlikely because it could not determine how much data
was needed. The K-fold cross validation was applied
to provide a lot of data to train the client to validate
the model with multiple tests simultaneously. This
method would apply to proposed models and other
machine learning models such as decision tree (DT),
support vector machine (SVM), K nearest neighbours
(KNN), gradient boosted decision tree (GBDT). With
this method, the data was divided into k equal parts.
Fig 16. Random forest was implemented on the wearable. In which, training data was used (k − 1) parts, the rest
was testing data. The model output was evaluated based
on the results, including the mean (µ) and standard
4.5. Evaluation indicators deviation (σ ) of k times of data divisions. This research
applied cross-validation with k = 5 in the evaluation
Confusion matrix. process.
TP +TN
ACC = (14)
T P + FP + FN + T N
TP 5. Results
SEN = (15)
T P + FN
5.1. Model performance
TP
PPV = (16) Performance evaluation on the public dataset. The classi-
T P + FP
fication result of the proposed model was presented
TN in the form of a confusion matrix (Fig 17). The result
NP V = (17)
T N + FN of applying the best recognition model on the test
The recognition models in this study were evaluated data (public dataset) reached 96.5%. Activities were
based on the following indicators: accuracy (ACC), properly classified with maximum ratios like bending
sensitivity (SEN ), negative predictive value (N P V ), (134/134), bendp (22/22), lie-down (31/31), standing
and positive predictive value (P P V ). These indexes (220/220), sitting (261/261), lying (242/242). Slightly
were calculated based on the parameters including true lower result were the activities of limp (91/109),
positive (T P ), true negative (T N ), false positive (FP ), tripover (38/47), rising (27/28), sit-down (88/91), walk-
and false negative (FN ) according to the corresponding ing (244/252), jogging (77/80), and the lowest was
formula (14)-(17). squatting (13/20).
Considering an activity of sitting, T P was the Activities were misclassified often due to too fast or
number of these activities that were correctly classified too long execution time or similarity to other activities.
as compared to the actual label, FN was the For example, when a person was walking on crutches
number of these activities that were misclassified, with an injured leg, the movement speed was so slow
FP was the number of other activities that were that it could be mistaken for standing. Squatting was
mistakenly classified as sitting, and T N was the number a combination of sitting down, sitting, standing up,
of activities other than sitting that were correctly and standing for a period of time. This usually occurs
classified. A specific example of the definition of when a person exercises (gym). However, if this took
parameters T P , T N , FP , and FN is presented in Tab 3. a long time, squatting could be mistaken for sit-down.
In this example, Sitting has T P = 100, FN = 3, FP = 7, With the public dataset, sit-down activity (6/20) was
and T N = 120; Standing has T P = 50, FN = 6, FP = 2, mistaken for squatting. In addition, the activities of
and T N = 170. limp and walking were confused with each other due

EAI Endorsed Transactions on


11
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

Tab 4. Evaluation of applying the RF algorithm on the public Tab 5. Evaluation of recognition models with 5-fold cross-
dataset. validation.

Activity ACC(%) SPE(%) PPV(%) NPV(%) RF DT GBDT SVM KNN


Walking 98.1 96.8 92.1 99.4 (%) (%) (%) (%) (%)
Jogging 99.4 96.2 92.8 99.8 Public µ 96.0 92.6 95.3 94.2 94.9
Squatting 99.5 65 100 99.5 dataset σ 0.5 0.7 0.4 0.6 0.6
Bending 100 100 100 100 Private µ 99.1 97.0 98.0 98.4 97.4
Bendp 100 100 100 100 dataset σ 0.6 0.8 0.8 0.5 0.8
Limp 98.2 83.5 90.1 98.7
Tripover 99.1 80.9 90.5 99.4
Sit-down 99.4 96.7 93.6 99.8
Lie-down 100 100 100 100
Rising 99.9 96.4 100 99.9
Standing 99.9 100 99.1 100
Sitting 100 100 100 100
Lying 100 100 100 100
All 99.5 93.5 96.8 99.7

to the similarity in posture or in people with minor


injuries in the leg, less hindering movement. Some other
activities were mistakenly classified as walking, but
the number of these activities was insignificant. For
example, 2/80 jogging, 3/47 tripover, 1/91 sit-down,
and 1/28 rising were mistaken for walking.
Details of the performance evaluation of the
proposed recognition model on the public data set
are described in Tab 4. The two indexes, ACC and
N P V , of all activities, were 98%. In particular, activities
such as sitting, bending, bendp, lie-down had the Fig 17. Classification result for RF algorithm on public data.
indexes of ACC, SP E, P P V , and N P V were 100%.
This was followed by the figure for squatting, limp
and tripover activities, with ACC, P P V , and N P V over
90%. However, the difference lies in the SP E index.
The activities of limp and tripover had a SP E index
of over 80%, while squatting was only 65%. This was
consistent with our analysis of how long these activities
take. Overall, the proposed model had the indexes of
ACC, SP E, P P V , and N P V indexes over 93%, which
was good.
The result of the classification of activities when
applying 5-folds cross validation on the public dataset
was good (Tab 5). The proposed model had the best
result, stood at µ = 96%. This was greater than the
results for algorithms of GBDT, KNN and SVM, making
up 95.3%, 94.9%, and 94.2%, respectively. Yet, the
result for DT was the lowest, standing at 92.6%. Besides,
the recognition model using GBDT algorithm gives the
lowest standard deviation σ = 0.4%. The classification
results had a low standard deviation of 0.4%-0.7%. This
showed that the data from activities was less scattered
and had high reliability. Fig 18. Classification result for RF algorithm on the private
dataset.

EAI Endorsed Transactions on


12
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Tab 6. Result of applying the RF on the private dataset. Tab 7. Evaluation of experimental result.

Activity ACC(%) SPE(%) PPV(%) NPV(%) Activity ACC(%) SPE(%) PPV(%) NPV(%)
Walking 100 100 100 100 Walking 98.8 98.4 89.6 99.8
Jogging 100 100 100 100 Jogging 99.3 95.1 96.7 99.5
Squatting 99.7 96.0 100 99.7 Squatting 99.4 93.4 100 99.4
Bending 100 100 100 100 Bending 99.6 100 94.3 100
Bendp 100 100 100 100 Bendp 99.3 93.0 98.1 99.4
Limp 100 100 100 100 Limp 99.4 96.6 96.6 99.7
Tripover 100 100 100 100 Tripover 99.0 90.6 96.0 99.2
Sit-down 100 100 100 100 Sit-down 99.8 100 98.0 100
Lie-down 100 100 100 100 Lie-down 99.7 96.2 100 99.7
Rising 100 100 100 100 Rising 99.9 100 97.9 100
Standing 100 100 100 100 Standing 99.3 94.5 96.3 99.5
Sitting 100 100 100 100 Sitting 99.1 96.5 93.2 99.7
Lying 99.7 100 96.6 100 Lying 99.6 100 94.4 100
All 100 99.7 99.7 100 All 99.4 96.5 96.2 99.7

Performance evaluation on the private dataset. The private deviation increased to 0.6%. Other recognition models
dataset gave a result of 99.7%. True to our previous such as GBDT, SVM, KNN, and DT give results of
analysis, the limited time (3 seconds) method of 98%, 98.4%, 97.4%, and 97%, respectively. Similar to
collecting activity data has helped our data to be highly the proposed model, the recognition models applying
reliable. The negative impacts on classification results algorithms such as DT, GBDT, and KNN had their
were minimised. These negative impacts were caused standard deviation increase by 0.8%. Unlike that, the
for three reasons: 1) One other activity interfered model applying the SVM algorithm had the standard
when an activity took place over a long period of deviation reduced to 0.5%. However, the resulting
time; 2) The time difference of an activity with each difference from 0.5%-0.8% was not significant. Overall,
volunteer; 3) The data sampling process had not been the results of the proposed model evaluation on both
well controlled leading to excess or lack of activity datasets were good.
data. These catches especially affect transitions such as:
rising, lie-down, sit-down, tripover, bending pick up 5.2. Experimental evaluation
(bendp), squatting. If the timing of these activities could
not be unknown, the data describing them threatened to
degrade the performance of recognition models. Fig 18
The experimental process was conducted on volun-
shows most of the actions distinguished with a 100%
teers for a period of 30 seconds to 60 seconds, and
accuracy rate, except for squatting (24/25). However,
the sampling frequency was 20Hz. Volunteers wore
the classification results obtained were impressive. The
wearables suggested and performed all 13 activities
problems encountered when classifying on the public
according to a predefined scenario. Transitions such as
dataset had almost been solved.
sit-down, lie-down, rising, tripover, and bendp were
limited to a maximum execution time of 3 seconds.
The remaining activities took place in 10-15 seconds.
The classification results on the private dataset A portion of the experimental procedure for finding
showed the strong performance of the proposed model. mixed activity sequences was shown in Fig 19. Since
The evaluation indexes of ACC, SP E, P P V , and N P V we concentrated on recognizing when activities were
all reached over 96% (Tab 6). Except for squatting and started in a dynamic state, activities could be discrim-
lying, all activities had indicators reaching 100%. This inated against more quickly. When tested with the
was greater than the result for squatting with ACC = real sequence of activities, the device was still able to
99.7%, SP E = 96%, P P V = 100%, N P V = 99.7%, and detect the activity with high accuracy even though the
lying with ACC = 99.7%, SP E = 100%, P P V = 96.6%, sampling method for the activities was uniform and no
N P V = 100%. However, the results of the recognition additional activities were present.
model evaluation on this dataset was impressive with Overall, the proposed model reached 96.1%. The
ACC = 100% and N P V = 100%. results of classification versus actual observation were
The results when applying 5-folds cross validation presented as a confusion matrix as shown in Fig
on this dataset were also gradually different when 20. Besides, transition activities were mistaken for
they were applied on the public dataset. The proposed each other when the speed of them was so slow. For
model got the best results with 99.1%, but the standard example, 2/52 lie-down was mistaken for lying, and

EAI Endorsed Transactions on


13
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

1/49 rising was mistaken for lying or sitting. This exhaustive set of training data from all types of activity.
was similar to squatting, where 1/61 squatting was As a result, the samples of test data would be different
mistaken for rising or lie-down. However, the rate from those of training data.
of occurrence of these errors was not significant. In The recognition model was unable to perform well
general, the proposed model gave good classification if the training dataset was insufficient. Therefore,
results. The experimental result helped to evaluate the we tried to collect activity data for a limited time
proposed model’s performance when the input was and surveyed many volunteers. In addition, it was
real-time data (Tab 7). The classification accuracy with a significant task to classify 13 activities. Previous
activities (ACC) and the correct prediction rate of non- studies investigated some repetitive activities such
occurrence actions (N P V ) were both above 99%. The as sitting, lying, standing, walking, walking upstairs
lowest correct prediction rate for actions (P P V ) was and walking downstairs as in [43]; walking, jogging,
89.6% for walking activity, while this reached 100% for upstairs, downstairs, sitting, standing in [20, 44]. With
squatting and lie-down activities. A sensitivity (SP E) of these activities, the application of static windows gave
over 90% showed that it was feasible to apply dynamic good results [41, 45, 46]. However, with state transition
windowing methods to detect activities, especially with activities, the static window had a major drawback: it
state transition activities. For example, bendp and lie- was not able to determine the activity time because the
down activities had corresponding sensitivity indexes time of these activities was not the same.
of SP E = 93% and SP E = 96.2%. Mover, rising and sit- Many related works in the field of HAR have been
down activities were both SP E = 100%. In general, the interested in real-time classification capabilities and the
proposed model had good evaluation indicators with application of different classification algorithms (Tab 8).
ACC = 99.4%, SP E = 96.5%, P P V = 96.2%, N P V = Thu et al. [12] applied 2 features (mean and standard
99.7% in reality. These overall evaluation indicators deviation) to a 3-axis accelerometer on each time
had a negligible difference with those calculated when window of size 6 seconds and combined it with
evaluating our model on public and private datasets. a decision tree algorithm. The result when applied
In addition, the proposed model, when conducting experimentally was 92%, and the accuracy was 95.2%.
experiments, has achieved relatively uniform indexes of Their device classified 6 activities, including sitting,
over 90%. standing, lying, walking, and jogging, in real-time.
The classification performance of our model has A three-level decision tree algorithm (DT) was built
significantly improved between the public dataset and as a recognition model suitable for low-performance
real-time data. For example, squatting and tripover microcontrollers, but the classified activities were
increased from 65% and 80.9% to 93.4% and 90.6%, repetitive activities over time and they had low
respectively. This result was possible thanks to the complexity. Besides, the time of 6 seconds for
sampling process having been improved to increase each classifier applied was too long, leading to a
the quality of the activities and applying the dynamic large delay if changing activities. As the number
window method to understand the activity process. of activities increased and more complex ones were
added, it was difficult for their model to achieve
6. Discussion high accuracy because of memory limitations and
usage features. Similar to Yang’s study [35], the stm32
This work attempted to develop a recognition model microcontroller was used in his study to embed a real-
for a real-time application that would recognize human time recognition model using decision tree algorithm
activity. The dynamic window technique was merged (C4.5). In their study, they used up to 16 features
and optimized with the applicable algorithm (random from 6 measures, including mean, magnitude of the
forest). As a result, the human activity recognition acceleration of the three axes, variance, cumulative,
system is much more accurate. The result showed that skewness, and coefficients. The classification accuracy
the ability to classify real-time activities on wearables for five activities, including sitting, walking, jumping,
was good at 96.1%, although it was slightly lower than jogging, and cycling, was 90% on average. This
the evaluation results on public and private data. The established that the DT algorithm could not provide
first reason was the dispersion among feature vectors perfect accuracy, as their idea was that data acquired
in real-time data. Meanwhile, with public and private from several people would be trained and analyzed
datasets, activities were closely monitored and feature jointly. Consequently, it is possible that their algorithm
quality was enhanced. Additionally, because activities is not optimal for the individual.
occur in sequential order in daily life, the acquired Embedding on a recognition model on low-
data may be more homogeneous than real-time data. In performance microcontrollers required optimization of
reality, training and testing datasets derived from real- machine learning algorithms, the number of features
world scenarios may differ significantly. The existence and time window size, so the results were not good
of so many emergent scenarios made it hard to gather an [12, 35]. In particular, the static window method in

EAI Endorsed Transactions on


14
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Fig 19. The actual sequencing of activities in an experiment.

performed by 3 volunteers (2 men and 1 woman).


The study investigated 3 methods, including artificial
neural networks (ANN), convolutional neural networks
(CNN), and 1NN (similar to KNN). Their results showed
that the ANN model was highly accurate. CNN was not
a good choice for the real-time classification problem
because the training time was too large. Applying a time
window of 3.88 seconds with 50% overlap had achieved
88.8% when performing real-time classification with 7
activities, including sitting, standing, walking, jogging,
running, and lying. The result for accuracy was low
for three activities of jogging (8.1%), cycling (13.5%),
and sitting (16.2%) when the first volunteer performed
the test. This showed data collected from the ankle
position was not the best choice. Results improved
in the next 2 volunteers reaching over 80% with
these actions. However, the recognition model using
the ANN algorithm was difficult to apply to the
real-time classifier directly on the microcontroller
platform because it had a large training time and high
Fig 20. The confusion matrix synthesized the experimental complexity. The usability of this model on phones was
results with the proposed recognition model embedded in a high, but the large phone size would cause discomfort
wearable device. when placed at the ankle. Besides, the battery power on
the phone was not suitable for applications that would
operate continuously for a long time.
those works had shown limitations on the quality The application of real-time human activity recog-
of features as well as limitations on the number of nition on wearable devices was a problem with many
activities that could be classified. challenges in terms of microcontroller memory usage,
Another direction of real-time human activity algorithm complexity, sampling rate, and time to sur-
recognition was presented in the paper [22]. Suto et vey an activity. We have used up to 31 features in
al. tested the classification results between offline and the time domain and used a highly complex random
online with 15 features extracted from accelerometer forest algorithm to solve the classification problem of
and gyroscope sensors. A wearable used was a 13 routine actions. The results achieved 96.1% with
phone that ran the Android operating system. The 99.4% accuracy, showing high applicability in practice.
device was attached to the right ankle and was In addition, our model was aimed at complex problems

EAI Endorsed Transactions on


15
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

Tab 8. Compare several related studies. model had high performance. Research results were
shared at “https://github.com/daohieuictu/HAR-realtime-
Comparison Accuracy Result Model) Features random-forest”. To increase the recognition model’s
(%) (%)
accuracy, we will combine different algorithms when
Thu [12] 95.2 92 DT 6
developing it in the future. Besides, we tend to
Yang [35] 90.1 90.0 DT 16
Suto [22] - 88.8 ANN 15 study the support system for firefighters with complex
Ours 99.4 96.1 RF 31 behaviors (rolling, crawling) and survival states (falling,
unconscious).

such as analyzing the movements and activities of many 8. Ethical Approval


different subjects, such as firefighters, searchers and
rescuers, and newcomers recovering from accidents. Research volunteers are students and lecturers
The state of activity was closely related to their health from our university. All agreed to participate in
status. Especially for firefighters, their environment the experiment, and their information was kept
often has influencing factors such as smoke, fire, and confidential.
high temperatures. Activities are often intense and
constantly changing. Monitoring the activities of fire- 9. Conflict of interest
fighters will help the commander to provide flexible
and timely support. In addition, the fire environment On behalf of all authors, the corresponding author
is often unpredictable, so wearable-based monitoring is states that there is no conflict of interest.
a viable option.
Acknowledgement. This research was the product of the
scientific project [T2022-07-13] titled “Research and design a
real-time activity recognition system to support human health
7. Conclusion monitoring”, sponsored by the Thai Nguyen University,
University of Information and Communication Technology
This work presented a real-time activity recognition
(ICTU).
system integrated onto wearable devices. The dynamic
window method and the random forest algorithm
were combined to create the system using a novel References
methodology. We created a private dataset comprising [1] F. Lanza, V. Seidita, and A. Chella, “Agents and
13 activities in addition to the public dataset, including robots for collaborating and supporting physicians in
walking, jogging, squatting, bending, bendp, limp, healthcare scenarios,” Journal of Biomedical Informatics,
tripover, sit-down, lie-down, rising, standing, and vol. 108, no. January, p. 103483, 2020. [Online].
sitting. With the assistance of volunteers, this data Available: https://doi.org/10.1016/j.jbi.2020.103483
was gathered via the ESP32 integrated wearable and [2] M. M. Rodgers, V. M. Pai, and R. S. Conroy, “Recent
the MPU6050 sensor (accelerometer). The suggested advances in wearable sensors for health monitoring,”
recognition model was evaluated on these two datasets IEEE Sensors Journal, vol. 15, no. 6, pp. 3119–3126, 2015.
using both the confusion matrix and 5-fold cross- [3] N. C. Minh, T. H. Dao, N. Q. Huy, D. N. Tran, N. T.
Thu, and D. T. Tran, “Evaluation of Smartphone and
validation.
Smartwatch Accelerometer Data in Activity Classifica-
The recognition model building process consists of
tion,” in 2021 8th NAFOSTED Conference on Information
two steps: First, we extracted feature vectors with and Computer Science. IEEE, 2021, pp. 33–38.
31 dimensions per time window of changed size. [4] L. Mo, F. Li, Y. Zhu, and A. Huang, “Human
These vectors were labeled and combined with the physical activity recognition based on computer vision
random forest algorithm to build a recognition model with deep learning model,” Conference Record - IEEE
for 13 activities. Next, this model was embedded Instrumentation and Measurement Technology Conference,
on the ESP32 (a high-speed microcontroller) and vol. 2016-July, 2016.
was tested with real-time data. Experimental results [5] N. Zhu, J. Cao, K. Shen, X. Chen, and S. Zhu, “A deci-
showed that the features were highly correlated sion support system with intelligent recommendation
with the activities to be classified by applying the for multi-disciplinary medical treatment,” ACM Trans-
actions on Multimedia Computing, Communications and
dynamic window method. Static activities (sitting,
Applications, vol. 16, no. 1s, pp. 1–23, 2020.
standing, bending, lying) were quickly detectable after [6] S. Chandra Mukhopadhyay, “Wearable Sensors for
just 1 second of data collection. Transition activities Human Activity Monitoring: A Review,” IEEE Sensors
(rising, sit-down, lie-down, squatting, tripover) were Journal, vol. 15, no. 3, pp. 1321–1330, 2015.
classified with over 99% accuracy. The classification [7] T. H. Dao, V. C. Ngo, Q. H. Nguyen, D. N. Tran, and D. T.
results with real-time data reached 96.1%, and Tran, “Building Human Activity Recognition System
the accuracy of 99.4% showed that the proposed using Accelerometers and Machine Learning Methods on

EAI Endorsed Transactions on


16
Industrial Networks and Intelligent Systems
Online First
Human Activity Recognition System For Moderate Performance Microcontroller Using Accelerometer Data And Random Forest Algorithm

Low Performance Microcontrollers,” Research and Devel- [19] P. Casale, O. Pujol, and P. Radeva, “Human activity
opment on Information and Communication Technology, recognition from accelerometer data using a wearable
vol. 12/2021, no. 2, pp. 69–76, 2021. device,” in Lecture Notes in Computer Science (including
[8] G. Biagetti, P. Crippa, L. Falaschetti, S. Orcioni, subseries Lecture Notes in Artificial Intelligence and Lecture
and C. Turchetti, “Human activity monitoring Notes in Bioinformatics), vol. 6669 LNCS, 2011, pp. 289–
system based on wearable sEMG and accelerometer 296.
wireless sensor nodes,” BioMedical Engineering Online, [20] M. Milenkoski, K. Trivodaliev, S. Kalajdziski, M. Jovanov,
vol. 17, no. S1, pp. 1–18, 2018. [Online]. Available: and B. R. Stojkoska, “Real time human activity recogni-
https://doi.org/10.1186/s12938-018-0567-4 tion on smartphones using LSTM networks,” 2018 41st
[9] S. Chung, J. Lim, K. J. Noh, G. Kim, and H. Jeong, International Convention on Information and Communica-
“Sensor data acquisition and multimodal sensor fusion tion Technology, Electronics and Microelectronics, MIPRO
for human activity recognition using deep learning,” in 2018 - Proceedings, pp. 1126–1131, 2018.
Sensors (Switzerland), vol. 19, no. 7, 2019. [21] P. Van Thanh, D. T. Tran, D. C. Nguyen,
[10] G. Şengül, M. Karakaya, S. Misra, O. O. Abayomi-Alli, N. Duc Anh, D. Nhu Dinh, S. El-Rabaie, and
and R. Damaševičius, “Deep learning based fall detec- K. Sandrasegaran, “Development of a Real-Time,
tion using smartwatches for healthcare applications,” Simple and High-Accuracy Fall Detection System
Biomedical Signal Processing and Control, vol. 71, no. for Elderly Using 3-DOF Accelerometers,” Arabian
October 2021, p. 103242, 2022. Journal for Science and Engineering, vol. 44,
[11] Y. Zhao, R. Yang, G. Chevalier, X. Xu, and Z. Zhang, no. 4, pp. 3329–3342, 2019. [Online]. Available:
“Deep Residual Bidir-LSTM for Human Activity Recog- https://doi.org/10.1007/s13369-018-3496-4
nition Using Wearable Sensors,” Mathematical Problems [22] J. Suto, S. Oniga, C. Lung, and I. Orha, “Comparison
in Engineering, vol. 2018, 2018. of offline and real-time human activity recognition
[12] N. T. Thu, T.-h. Dao, B. Q. Bao, D.-n. Tran, P. V. results using machine learning techniques,” Neural
Thanh, and D.-T. Tran, “Real-Time Wearable-Device Computing and Applications, vol. 32, no. 20,
Based Activity recognition Using Machine Learning pp. 15 673–15 686, 2020. [Online]. Available: https:
Methods,” International Journal of Computing and Digital //doi.org/10.1007/s00521-018-3437-x
Systems, vol. 12, no. 1, pp. 321–333, 2022. [Online]. [23] A. T. Özdemir and B. Barshan, “Detecting Falls with
Available: https://dx.doi.org/10.12785/ijcds/120126 Wearable SensorsUsing Machine Learning Techniques,”
[13] D. N. Tran, T. N. Nguyen, P. C. P. Khanh, and D. T. Trana, Sensors, vol. 14, pp. 10 691–10 708, 2014.
“An IoT-based Design Using Accelerometers in Animal [24] T. H. Dao, M. H. Le, D. N. Tran, and D. T. Tran, “Xay dung
Behavior Recognition Systems,” IEEE Sensors Journal, mang giam sat hanh vi trong toa nha su dung cong nghe
vol. 12, no. 18, pp. 17 515–17 528, 2021. wifi,” in REV-ECIT2021. 978-604-80-5958-3, 2021, pp.
[14] P. C. P. Khanh, D.-T. Tran, V. T. Duong, N. H. Thinh, and 48–53.
D.-N. Tran, “The new design of cows’ behavior classifier [25] A. Mannini, S. S. Intille, M. Rosenberger, A. M. Sabatini,
based on acceleration data and proposed feature set,” and W. Haskell, “Activity recognition using a single
Mathematical Biosciences and Engineering, vol. 17, no. 4, accelerometer placed at the wrist or ankle,” Medicine and
pp. 2760–2780, 2020. [Online]. Available: https://www. Science in Sports and Exercise, vol. 45, no. 11, pp. 2193–
aimspress.com/article/doi/10.3934/mbe.2020151 2203, 2013.
[15] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, [26] C. Torres-Huitzil and M. Nuno-Maganda, “Robust
M. Mordonini, and I. De Munari, “IoT Wearable smartphone-based human activity recognition using a
Sensor and Deep Learning: An Integrated Approach for tri-axial accelerometer,” in 2015 IEEE 6th Latin American
Personalized Human Activity Recognition in a Smart Symposium on Circuits and Systems, LASCAS 2015 -
Home Environment,” IEEE Internet of Things Journal, Conference Proceedings, 2015, pp. 2–5.
vol. 6, no. 5, pp. 8553–8562, 2019. [27] D. Rodriguez-Martin, A. Samà, C. Perez-Lopez,
[16] N. Damodaran, E. Haruni, M. Kokhkharova, and A. Català, J. Cabestany, and A. Rodriguez-Molinero,
J. Schäfer, “Device free human activity and fall “SVM-based posture identification with a single waist-
recognition using WiFi channel state information (CSI),” located triaxial accelerometer,” Expert Systems with
CCF Transactions on Pervasive Computing and Interaction, Applications, vol. 40, no. 18, pp. 7203–7211, 2013.
vol. 2, no. 1, pp. 1–17, 2020. [Online]. Available: [Online]. Available: http://dx.doi.org/10.1016/j.eswa.
https://doi.org/10.1007/s42486-020-00027-1 2013.07.028
[17] P. Kumar and S. Chauhan, “RETRACTED ARTICLE: [28] D. Naranjo-Hernández, L. M. Roa, J. Reina-Tosina, and
Human activity recognition with deep learning: M. Á. Estudillo-Valderrama, “SoM: A smart sensor for
overview, challenges and possibilities,” CCF Transactions human activity monitoring and assisted healthy ageing,”
on Pervasive Computing and Interaction, vol. 3, IEEE Transactions on Biomedical Engineering, vol. 59, no.
no. 3, p. 339, 2021. [Online]. Available: https: 12 PART2, pp. 3177–3184, 2012.
//doi.org/10.1007/s42486-021-00063-5 [29] S. Balli, E. A. Sağbaş, and M. Peker, “Human activity
[18] J. Qi, P. Yang, M. Hanneghan, S. Tang, and B. Zhou, “A recognition from smart watch sensor data using a hybrid
hybrid hierarchical framework for gym physical activity of principal component analysis and random forest
recognition and measurement using wearable sensors,” algorithm,” Measurement and Control (United Kingdom),
IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1384– vol. 52, no. 1-2, pp. 37–45, 2019.
1393, 2019.

EAI Endorsed Transactions on


17
Industrial Networks and Intelligent Systems
Online First
To-Hieu Dao et al.

[30] R. Caruana and A. Niculescu-Mizil, “An empirical Available: https://doi.org/10.1155/2015/790412


comparison of supervised learning algorithms,” in ACM [39] R. Igual, C. Medrano, and I. Plaza, “A comparison
International Conference Proceeding Series, vol. 148, 2006, of public datasets for acceleration-based fall
pp. 161–168. detection,” Medical Engineering and Physics, vol. 37,
[31] A. Mannini and A. M. Sabatini, “Machine learning no. 9, pp. 870–878, 2015. [Online]. Available:
methods for classifying human physical activity from http://dx.doi.org/10.1016/j.medengphy.2015.06.009
on-body accelerometers,” Sensors, vol. 10, no. 2, pp. [40] S. Abbate, M. Avvenuti, P. Corsini, J. Light, and
1154–1175, 2010. A. Vecchio, “Monitoring of Human Movements for Fall
[32] Q. V. Le, “Building high-level features using large scale Detection and Activities Recognition in Elderly Care
unsupervised learning,” in ICASSP, IEEE International Using Wireless Sensor Network: a Survey,” Wireless
Conference on Acoustics, Speech and Signal Processing - Sensor Networks: Application-Centric Design, pp. 1–22,
Proceedings, 2013, pp. 8595–8598. 2010.
[33] J. Bassen, B. Balaji, M. Schaarschmidt, C. Thille, [41] A. T. Özdemir, “An analysis on sensor locations of
J. Painter, D. Zimmaro, A. Games, E. Fast, and J. C. the human body for wearable fall detection devices:
Mitchell, “Reinforcement Learning for the Adaptive Principles and practice,” Sensors (Switzerland), vol. 16,
Scheduling of Educational Activities,” in Conference on no. 8, 2016.
Human Factors in Computing Systems - Proceedings, 2020, [42] X. Sun, L. Qiu, Y. Wu, Y. Tang, and G. Cao,
pp. 1–12. “Sleepmonitor: monitoring respiratory rate and body
[34] D. Guan, W. Yuan, Y. K. Lee, A. Gavrilov, and position during sleep using smartwatch,” in Proceedings
S. Lee, “Activity recognition based on semi-supervised of the ACM on Interactive, Mobile, Wearable and Ubiquitous
learning,” in Proceedings - 13th IEEE International Technologies, vol. 1, no. 3, 2017, pp. 1–22.
Conference on Embedded and Real-Time Computing [43] B. Fida, I. Bernabucci, D. Bibbo, S. Conforto,
Systems and Applications, RTCSA 2007, no. 1, 2007, pp. and M. Schmid, “Varying behavior of different
469–475. window sizes on the classification of static
[35] F. Yang and L. Zhang, “Real-time human activity and dynamic physical activities from a single
classification by accelerometer embedded wearable accelerometer,” Medical Engineering and Physics, vol. 37,
devices,” in 2017 4th International Conference on Systems no. 7, pp. 705–711, 2015. [Online]. Available:
and Informatics, ICSAI 2017, vol. 2018-Janua, no. Icsai, http://dx.doi.org/10.1016/j.medengphy.2015.04.005
2017, pp. 469–473. [44] K. Maswadi, N. A. Ghani, S. Hamid, and M. B.
[36] A. Wang, G. Chen, J. Yang, S. Zhao, and C.-Y. Chang, Rasheed, “Human activity classification using Decision
“A Comparative Study on Human Activity Recognition Tree and Naïve Bayes classifiers,” Multimedia Tools and
Using Inertial Sensors in a Smartphone,” IEEE Sensors Applications, vol. 80, no. 14, pp. 21 709–21 726, 2021.
Journal, vol. 16, no. 11, pp. 4566–4578, 2016. [45] T. H. Lee, A. Ullah, and R. Wang, “Bootstrap Aggregating
[37] L. Bao and S. S. Intille, “Activity Recognition from User- and Random Forest,” Advanced Studies in Theoretical and
Annotated Acceleration Data,” in Pervasive Computing, Applied Econometrics, vol. 52, pp. 389–429, 2020.
A. Ferscha and F. Mattern, Eds. Berlin, Heidelberg: [46] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-
Springer Berlin Heidelberg, 2004, pp. 1–17. Ortiz, “A Public Domain Dataset for Human Activity
[38] W. Xiao and Y. Lu, “Daily Human Physical Activity Recognition Using Smartphones,” in Proceedings of the
Recognition Based on Kernel Discriminant Analysis 21th international European symposium on artificial neural
and Extreme Learning Machine,” Mathematical Problems networks, computational intelligence and machine learning,
in Engineering, vol. 2015, p. 790412, 2015. [Online]. 2013, pp. 437–442.

EAI Endorsed Transactions on


18
Industrial Networks and Intelligent Systems
Online First

You might also like