Sensors 19 01716
Sensors 19 01716
Article
Sensor Data Acquisition and Multimodal Sensor
Fusion for Human Activity Recognition Using
Deep Learning
Seungeun Chung *, Jiyoun Lim, Kyoung Ju Noh, Gague Kim and Hyuntae Jeong
SW · Contents Basic Technology Research Group, Electronics and Telecommunications Research Institute,
Daejeon 34129, Korea; [email protected] (J.L.); [email protected] (K.J.N.); [email protected] (G.K.);
[email protected] (H.J.)
* Correspondence: [email protected]; Tel.: +82-42-860-1623
Received: 4 March 2019; Accepted: 8 April 2019; Published: 10 April 2019
Abstract: In this paper, we perform a systematic study about the on-body sensor positioning and data
acquisition details for Human Activity Recognition (HAR) systems. We build a testbed that consists of
eight body-worn Inertial Measurement Units (IMU) sensors and an Android mobile device for activity
data collection. We develop a Long Short-Term Memory (LSTM) network framework to support
training of a deep learning model on human activity data, which is acquired in both real-world and
controlled environments. From the experiment results, we identify that activity data with sampling
rate as low as 10 Hz from four sensors at both sides of wrists, right ankle, and waist is sufficient in
recognizing Activities of Daily Living (ADLs) including eating and driving activity. We adopt
a two-level ensemble model to combine class-probabilities of multiple sensor modalities, and
demonstrate that a classifier-level sensor fusion technique can improve the classification performance.
By analyzing the accuracy of each sensor on different types of activity, we elaborate custom weights
for multimodal sensor fusion that reflect the characteristic of individual activities.
Keywords: mobile sensing; sensor position; human activity recognition; multimodal sensor fusion;
classifier-level ensemble; Long Short-Term Memory network; deep learning
1. Introduction
As wearable devices are widely used nowadays, Human Activity Recognition (HAR) has become
an emerging research area in mobile and wearable computing. Recognizing human activity leads
to a deep understanding of individuals’ activity patterns or long-term habits, which contributes
to developing a wide range of user-centric applications such as human-computer interaction [1,2],
surveillance [3,4], video streaming [5,6], AR/VR [7], and healthcare systems [8,9]. Although activity
recognition has been investigated to a great extent in computer vision area [10–12], the application is
limited to a certain scenario that equips pre-installed cameras with sufficient resolution and guaranteed
angle of view. On the other hand, wearable sensor approach allows continuous sensing during daily
activities without spatio-temporal limitation [13,14], since devices are ubiquitous and do not require
infrastructural support.
Although many commercial fitness trackers or smartwatches adopt wrist-mountable form due to
the ease of accessibility, no consensus has been made regarding the positioning of sensors and their
data acquisition details. For instance, there are many public datasets available which collect data from
Inertial Measurement Units (IMU) attached on different parts of the human body such as ear, chest,
arm, wrist, waist, knee, and ankle with various parametric settings [15–18]. Data acquisition details
may vary depending on the type of activities investigated in the research. For example, a sensor on
waist with low sampling rate may be sufficient for recognizing simple activities (i.e., coarse granularity)
such as walking and sitting. To detect combinatorial activities with finer granularity including eating
and driving, on the other hand, a single sensor attached on the waist may not give a satisfactory
performance. In this work, we focus on Activities of Daily Living (ADLs) including eating and driving
activity, and limit the scope of recognition to basic human activities that constitute ADLs. We expect to
obtain a more explainable and structural understanding of daily routines or personal lifestyles through
the analysis on the primary unit of activity.
Previous works in HAR literature rely on predictive modeling approaches used in statistics
and stochastic process [19,20] (e.g., decision tree, kNN, SVM). Therefore, it requires an expert-level
understanding of human activities in terms of medical and social science for data analysis and feature
extraction that reflects the characteristic of activity dataset. In this paper, we adopt a deep neural
network architecture to recognize human activity from raw data input, which has shown promising
results when adopted for interpreting human activities without complex data pre-processing for
feature extraction [21–23].
This paper presents a preliminary study of investigating optimal position of sensors and their
data acquisition details [24] by adopting deep learning algorithm for HAR. Our goal is to find the
optimal position and combinations of on-body sensors with effective data sampling frequency that
cause minimal burden on data collection procedure. The activity data is collected in both real-world
and controlled environments through our testbed system that includes eight IMU sensors attached on
different parts of the human body. Then, it is trained and examined using a Long Short-Term Memory
(LSTM) [25] neural network classifier.
We apply a classifier-level sensor fusion technique on multimodal sensor data by taking the
gyroscope and magnetometer into account in addition to the accelerometer data. After training data
from each sensor type on the LSTM network, prediction results from base-learner are combined
to generate a stacked meta-feature. Leveraging a Random Forest decision tree as an aggregator
model, the meta-learner performs training on the meta-features. Finally, this two-level stacking and
voting ensemble model makes prediction on the new multimodal sensor data. To reflect different
characteristics in detecting ADLs using multiple modalities, we analyze the recognition accuracy of
each sensor modality on different types of activity and compute customized weight that fits best for
the individual activity.
Key contributions of this research are:
• Build a testbed system and collect activity data in both real-world and lab environments that
contain high variations in the signal patterns.
• Analyze activity data to empirically study the data acquisition details regarding the sensor
position and parametric settings, and investigate the characteristic of each sensor modality on
detecting different types of ADLs.
• Perform sensor fusion on multimodal sensor data by implementing a two-level ensemble model
that combines single-modal results from the deep learning activity classifier.
The rest of this paper is organized as follows. Section 2 summarizes previous work closely related
to activity recognition and sensor fusion research. Section 3 describes our testbed and experiment
protocols used in the study. Section 4 discusses details of our deep learning framework and sensor
fusion model developed for data analysis, and Section 5 presents the evaluation results and findings.
Section 6 continues the discussion on current limitations and future work. Finally, Section 7 states the
final remark.
2. Related Work
Human activity recognition has been investigated using various sensor modalities [26].
Besides the body-worn sensors, object sensors such as Radio Frequency Identifier (RFID) tags
are widely used to infer human activities based on the movements detected in smart home
Sensors 2019, 19, 1716 3 of 20
environments [27]. In addition, ambient sensors that capture the change of environments including
light, sound, temperature, pressure, and humidity are combined to detect activity events [28].
This section introduces literature related to on-body sensor-based HAR, especially the one investigating
the placement of sensors, mobile sensing and public datasets, multimodal HAR and their sensor
fusion techniques.
sampling rate may fail to capture intrinsic attributes of each activity, which can differentiate one from
another. To the best of our knowledge, there has been no consensus made regarding the sampling
frequency of motion-reactive sensors, and previous work accepts rates commonly used in literature for
activity recognition from data acquired through smartphones. Therefore, we aim to investigate the
optimal sampling frequency of body-worn sensors attached on different positions of the human body
to collect activity data with low burden.
addition to chest and waist of a test subject. Medical Velcro straps are used to fasten devices on the
body to reduce signal noise.
We implement a relay module that operates as a data collection hub between IMU devices. As a
large amount of activity data is accumulated at each IMU device, we choose Wi-Fi protocol to assure
fast and stable data transmission through high bandwidth network channels. Therefore, the relay
module first transceives operation commands between IMU devices using the RF protocol to initiate
time-synchronized one-to-many communication channels between IMU devices. After establishing
a Wi-Fi ad-hoc network through the soft-AP setting, the relay module then can retrieve activity data
from each IMU device. When data gathering is completed, the relay module transmits the collated data
to an Android mobile device over the Wi-Fi network. Overall communication protocol is illustrated in
Figure 2.
(a) (b)
Figure 1. Testbed configuration. (a) A body-worn Inertial Measurement Unit (IMU) device consists of
(1) a 9-axis IMU sensor (i.e., 3-axis accelerometer, gyroscope, and magnetometer, respectively), (2) a
Micro Controller Unit (MCU), and (3) a micro-USB rechargeable Li-Po battery module; (b) Among
eight body-worn sensors (in circles), only four sensors at left and right wrists, waist, and ankle (in red
circles) are adopted for the data analysis.
Figure 2. Testbed system consists of a mobile device, a relay module, and multiple IMU sensor devices.
(1) Request for soft-AP setting, (2) Radio Frequency (RF) connection request, (3) Response on connection
establishment, (4)–(5) Activity data transmission.
Sensors 2019, 19, 1716 6 of 20
3.2. ADLs
Statistics Canada annually publishes Time-use survey of the General Social Survey (GSS) [37],
which includes respondents’ time spent on a wide variety of day-to-day activities in Canada.
In addition, Statistics Korea issues Koreans’ average time-use [38] categorized by mandatory, personal,
and leisure time activities. Authors of [18] surveyed ADLs and their occurrence in public HAR
datasets, and we find that six primitive activity labels including walking, standing, sitting, lying,
walking upstairs, and walking downstairs can cover daily activities listed in time-use survey results.
In addition to the primitive activities, we take three combinatorial activities into account: eating,
driving a car, and moving in a car as a passenger. For example, eating activity is an integration of sitting
activity (i.e., primitive) and actions using cutlery during a meal (i.e., secondary activity). Similarly,
driving a car can be interpreted as a combinations of sitting activity and actions related to handling
the steering wheel. In addition, moving in a car as a passenger can include additional actions such as
using smartphones while sitting. Table 1 specifies nine activity labels used in our experiments.
4. Data Analysis
We extract activity data and their corresponding labels, and segment them using a fixed-width
sliding window with 50% overlap. Sliding window size is a base unit to be applied to the data analysis,
which can be interpreted as the resolution of activity. By leveraging a long window size, we can handle
a stream of activity data as one phase to observe long-term activity patterns or trends, but may fail to
extract distinguishing characteristics of each activity if discrete activities are contained in one segment
of activity data. On the other hand, a short window size may be suitable when targeting finer primitive
actions that occur in a very short period of time, especially with regard to low-level hand gestures
such as reach, grasp, and release.
We aim to perceive coarser-grained activities related to modes of locomotion, which usually lasts
for several seconds. Previous work in the HAR literature empirically adopted 5.12 s-long window
size [35,39]. Based on the average free-living cadence (i.e., 76 ± 6 steps per minute [40]), we leverage
the duration of one gait cycle that ranges from 1.46 to 1.71 s as a baseline. Therefore, we evaluate
the recognition accuracy by applying a various sliding window that starts from 1.4 to 5 s. The result
in Figure 3 indicates that the 2.5 s-long sliding window outperforms other parameter settings in
our dataset.
Sensors 2019, 19, 1716 8 of 20
Deep learning classification models are capable to give better inference results using raw sensor
data than the feature data, where pre-processing transformations usually decrease the classification
accuracy [21]. Thus, raw sensor data is directly fed into LSTM cells for offline activity classification
with no additional feature engineering [22,23].
Figure 4. Many-to-one Long Short-Term Memory (LSTM) network architecture used for activity
classification with nine classes. n stands for the number of samples included in a 2.5 s window.
from multimodal data as shown in Figure 5. For each sensor modality including accelerometer,
gyroscope, and magnetometer, prediction results from LSTM network indicating the classification
probabilities of each activity class are accepted as meta-features. As a meta-learner, we first adopt
stacking ensemble with 10-fold cross-validation, which trains the aggregator model on the stacked
class-probabilities. We also apply simple soft-voting ensemble based on the weighted average of
class-probabilities, where the weights are determined according to the prediction accuracy of the
base-learner. Finally, the ensemble model makes combined predictions for the new multimodal
input data.
Figure 5. Classifier-level ensemble for multimodal sensor fusion. Prediction results from each sensor
modality are used as meta-features for the meta-learner.
5. Evaluation
In this section, we first evaluate the activity recognition performance with different configuration
parameters applied using the accelerometer data. We empirically investigate the optimal sampling
rate of body-worn IMU sensors and their position on the body. Then, we perform evaluation on the
sensor fusion technique with multimodal sensor data.
In the classification problem, class imbalance is common because most datasets do not contain
exactly equal number of instances in each class, and this phenomenon also applies to human activities
during daily lives. F1-score is a commonly used measure in the class imbalanced settings [41].
In the multi-class classification domain, however, micro-averaging F1-score is equivalent to accuracy
which measures a ratio of correctly predicted observation to the total instances. Micro-F1 reflects the
unbalanced distributions of samples across classes, placing more weights on the dominant classes.
On the other hand, macro-averaging F1-score is a harmonic mean of precision and recall measures,
and takes both false positives and false negatives into account based on a per-class average [42].
We basically apply 10-fold cross-validation on the data to generalize the result to an independent
dataset. By randomly partitioning the dataset into 10 equal-sized subsamples, it leaves a single
subsample (i.e., 10% of total activity data) out as test data, while the remaining nine subsamples
(i.e., 90% of data) are used as training data. In this way, the system is tested with unseen dataset.
We also perform leave-one-subject-out (LOSO) cross-validation for reference by leveraging data
augmentation [43–46], which is widely adopted in the deep learning research that requires large
dataset for model training. We apply the jittering method to simulate additive sensor noise using
random parameters as described in [47] for label-preserving augmentation. In this way, we handle
limited data availability and secure a large amount of training data to increase robustness of our deep
learning model. The mean performance of the cross-validation results is used in the remainder of
this paper.
Sensors 2019, 19, 1716 10 of 20
Figure 6. Recognition accuracy and execution time according to different sampling rate of the sensor.
Our findings are in line with the policy of many health trackers in the market that usually support
low-frequency sampling rate. For example, a motion and physiological signal tracker Empatica E4 [48]
supports the accelerometer data sampling at 32 Hz. In addition, a motion sensor manufacturer [49]
provides a calculation tool that estimates maximum data acquisition time depending on the sampling
rate of a sensor. According to the calculation, a MetaMotionC IMU sensor can operate up to 11.6
h at 12.5 Hz, while the battery lasts only 5.8 h at 25 Hz sampling frequency, respectively. As HAR
applications in-the-wild require continuous and seamless data collection with the day-long battery
life, it is worthwhile to identify that low-frequency activity data is still effective in recognizing daily
activities.
learning model as shown in Figure 7b. With only two sensors, positioning one on the upper half
(e.g., wrists) and the other on the lower half (e.g., ankle or waist) of the body as experiment no.6
performs better than placing both sensors on the upper half (experiment no.11) or the lower half
(experiment no.9) of the body. We can expect performance improvement when appending an additional
sensor on the lower half of the body to the sensor combinations on the upper half of the body by
comparing experiment results of no.2, no.5 to no.11. Similarly, we can observe performance gain by
attaching a sensor to the upper half of the body, especially on the right wrist (experiment no.3) to the
sensor combinations on the lower half of the body (experiment no.9). From the aspect of the wrist
sensor, we found that an additional single sensor on the ankle, waist, and the opposite side of the
wrist in sequence can influence the performance when adopting two sensor locations as shown in
experiment results [no.6, no.8, no.11] and [no.7, no.10, no.11].
(a)
(b)
Figure 7. Recognition performance according to different numbers and combinations of body-worn
sensors. Both accuracy and F1-score are adopted as the performance measure. (a) 10-fold cross-validation
result; (b) Leave-one-subject-out cross-validation result.
no.4], [no.6, no.7], and [no.8, no.10] pairs, we observed that the sensor attached on right wrist tends to
perform better than the left wrist. This is due to the fact that all five test subjects were right-handed
persons, and eating activity is included in the experiment. Next, we explored the impact of sensors
on the lower half of the body by comparing results of experiment [no.2, no.5], [no.6, no.8], and [no.7,
no.10] pairs, and noticed that the sensor on the right ankle plays a more critical role than the sensor on
the waist in the activity recognition.
By taking the trade-off between accuracy and feasibility into account, we concluded that
placing sensors on the right wrist and right ankle can give reasonable performance when adopting
two sampling positions, and an additional sensor may be placed on the waist to achieve better
recognition accuracy.
Figure 8. Performance and execution time of meta-learner models used in stacking ensemble.
First, we evaluate the performance of single sensor modality in activity recognition. Figure 9
illustrates the recognition accuracy of each sensor data applied to the LSTM base-learner model. The
accelerometer shows the best performance (92.18%) among IMU sensors, followed by magnetometer
(85.22%) and gyroscope (78.33%). As each sensor measures different physical aspects, contribution in
recognizing ADLs differs according to the characteristic of activities. For example, an accelerometer is
useful in measuring changes in velocity and position, and in consequence detecting small movements.
By leveraging multiple axes, an accelerometer is also widely used as an absolute orientation sensor in
the up-down plane. A gyroscope measures changes in orientation and rotational velocity, but requires
calibration from a known orientation to achieve accuracy due to high amount of drift. A magnetometer
is useful to determine absolute orientation from magnetic north with low drift over time, but shows
Sensors 2019, 19, 1716 13 of 20
poor accuracy for fast movements. Therefore, these multimodal sensors are usually combined to
compensate the performance of each other.
Figure 9. Recognition accuracy and F1-score measure of each sensor modality using the LSTM network.
To investigate the contribution of each sensor on recognizing different activities, we evaluate the
recognition accuracy of each sensor modality on individual activities as shown in Figure 10. Activities
illustrated in Figure 10a are a group of combinatorial activities of sitting, including driving, moving
in a car as a passenger, and eating. As a vehicle in operation generates consistent vibration and
periodic acceleration, both moving in a car as a passenger and driving activities are well distinguished
from sitting activities in different environment. Additionally, eating activity includes repetitive hand
movements while sitting, which incurs various yaw, pitch, and roll rotations that can be effectively
captured through the gyroscope sensor. Although driving, moving in a car, and eating activities can be
considered as semi-static because activities occur during sitting, individual sensor modality shows
high recognition accuracy independent from each other due to additional gestures. Both combinations
of sensor fusion (i.e., accelerometer + gyroscope and accelerometer + magnetometer) show a similar
contribution on the performance improvement.
Figure 10b presents the recognition accuracy of a group of static activities including sitting,
standing, and lying. In static position, gravity of the earth may be the major reading of the
accelerometer, which accordingly reflects the tilt angle of the body posture in the signal. No remarkable
measurements take place at the gyroscope sensor, while the magnetometer measures heading of the
sensor. Therefore, the accelerometer shows outstanding performance in detecting different static
postures. Furthermore, fusing additional sensors to the accelerometer data even degrades the accuracy
in detecting sitting and standing activities.
Figure 10c summarizes results of a group of activities that involves walking, including walking
upstairs and downstairs. As walking activity involves highly dynamic physical movements, it is
detected with high accuracy using each sensor modality, and sensor fusion also contributes to the
recognition performance. However, experiments in making distinctions between stair walking and
walking on the plain produced equivocal results. As shown in Figure 11, both walking upstairs and
walking downstairs are incorrectly predicted as walking with high probabilities, where the x-axis of
the confusion matrix stands for the ground truth and the y-axis denotes predicted labels. According to
our experiments, fusing the accelerometer and gyroscope sensor data through stacking ensemble led
to substantial improvement in detecting stair walking activities.
Sensors 2019, 19, 1716 14 of 20
(a)
(b)
(c)
Figure 10. Recognition accuracy of a group of activities using multiple sensor modalities. (a) A group
of combinatorial activities of sitting, including driving, moving in a car (as a passenger), and eating;
(b) A group of static activities including sitting, standing, and lying; (c) A group of activities that
involves walking, including walking upstairs and walking downstairs.
In soft-voting ensemble, sensor modalities are weighted by their demonstrated accuracy shown
in Figure 9, which covers all activity types. Weight proportion of the accelerometer, gyroscope,
Sensors 2019, 19, 1716 15 of 20
Figure 11. Confusion matrix for each activity. The x-axis stands for the ground truth and the y-axis
denotes predicted labels through voting ensemble.
Figure 12. Proportion of weights of each sensor is computed based on the recognition accuracy.
Sensors 2019, 19, 1716 16 of 20
Figure 13. Recognition accuracy increases when applying weights that are customized to each activity.
Finally, Figure 14 summarizes the sensor fusion results using both stacking and voting ensemble
techniques. Overall, the contribution of gyroscope to the recognition accuracy is higher than the
magnetometer when combined with the accelerometer data. Although the gyroscope sensor data is
used without calibration procedure, the results indicate that applying sensor data from additional
modalities can contribute to improving the recognition accuracy of nine activities up to 93.07%.
In addition, soft-voting achieves higher overall performance (94.47%) than hard-voting (93.48%),
because it can give more weight to highly confident votes. By applying multimodal sensor fusion,
we achieved 2.48% of performance improvement compared to the single modal-based activity
recognition. Even considering the balance between precision and recall measures, our model performs
well and seems to be sufficiently competitive with uneven class distribution.
Figure 14. Recognition accuracy and F1-score measure of stacking ensemble using various combinations
of sensors, and two voting ensemble techniques adopting all sensor modalities.
7. Conclusions
In this paper, we perform an empirical study about the on-body sensor positioning and data
acquisition details for HAR systems. We train the raw activity data collected in both real-world
and controlled environments using a deep learning framework, which automatically learns features
through neural networks without heuristic domain knowledge applied.
By leveraging the LSTM network that manipulates temporal dependencies on the time-series
activity data, we identify that low sampling rate—as low as 10 Hz—is sufficient for the activity
recognition. As low sampling frequency reduces the system burden by preserving battery and storage
capacity, it consequentially allows prolonged data collection for HAR applications that usually operate
on resource-hungry mobile devices. Our experiment result indicates that only two sensors attached on
the right wrist and right ankle can present reasonable performance in recognizing ADLs including
eating and driving (e.g., either as a driver or a passenger) activity. In other words, placing one sensor on
the upper half and the other on the lower half of the body is recommended as the practical parametric
setting of sensor position for further HAR research.
Additionally, we investigate the impact of sensor fusion by applying a two-level ensemble
technique including stacking and voting on the multimodal sensor data, and demonstrate the
performance improvement. By analyzing the recognition accuracy of each sensor on different types
of activity, we elaborate custom weights for sensor modalities which can reflect the characteristic of
individual activities.
Author Contributions: Conceptualization, S.C., J.L., K.J.N., G.K. and H.J.; methodology, S.C., J.L., K.J.N., G.K.
and H.J.; software, S.C.; validation, S.C.; formal analysis, S.C., J.L. and H.J.; investigation, S.C. and J.L.; resources,
H.J.; data curation, S.C. and J.L.; writing–original draft preparation, S.C.; writing–review and editing, S.C., J.L.,
K.J.N., G.K. and H.J.; visualization, S.C.; supervision, H.J.; project administration, H.J.; funding acquisition, H.J.
Funding: This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant
funded by the Korean government. [19ZS1130, Core Technology Research for Self-Improving Artificial
Intelligence System].
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ilbeygi, M.; Kangavari, M.R. Comprehensive architecture for intelligent adaptive interface in the field of
single-human multiple-robot interaction. ETRI J. 2018, 40, 483–498. [CrossRef]
2. Kim, D.; Rodriguez, S.; Matson, E.T.; Kim, G.J. Special issue on smart interactions in cyber-physical systems:
Humans, agents, robots, machines, and sensors. ETRI J. 2018, 40, 417–420. [CrossRef]
3. Song, Y.; Tang, J.; Liu, F.; Yan, S. Body Surface Context: A New Robust Feature for Action Recognition From
Depth Videos. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 952–964. [CrossRef]
4. Dharmalingam, S.; Palanisamy, A. Vector space based augmented structural kinematic feature descriptor for
human activity recognition in videos. ETRI J. 2018, 40, 499–510. [CrossRef]
Sensors 2019, 19, 1716 18 of 20
5. Moon, J.; Jin, J.; Kwon, Y.; Kang, K.; Park, J.; Park, K. Extensible Hierarchical Method of Detecting Interactive
Actions for Video Understanding. ETRI J. 2017, 39, 502–513. [CrossRef]
6. Ji, Y.; Kim, S.; Kim, Y.J.; Lee, K.B. Human-like sign-language learning method using deep learning. ETRI J.
2018, 40, 435–445. [CrossRef]
7. Wen, R.; Nguyen, B.P.; Chng, C.B.; Chui, C.K. In Situ Spatial AR Surgical Planning Using projector-Kinect
System. In Proceedings of the Fourth Symposium on Information and Communication Technology
(SoICT ’13) , Danang, Vietnam, 5–6 December 2013; ACM: New York, NY, USA, 2013; pp. 164–171. [CrossRef]
8. Jalal, A.; Kamal, S.; Kim, D. A Depth Video Sensor-Based Life-Logging Human Activity Recognition System
for Elderly Care in Smart Indoor Environments. Sensors 2014, 14, 11735–11759. [CrossRef] [PubMed]
9. Zheng, Y.; Ding, X.; Poon, C.C.Y.; Lo, B.P.L.; Zhang, H.; Zhou, X.; Yang, G.; Zhao, N.; Zhang, Y. Unobtrusive
Sensing and Wearable Devices for Health Informatics. IEEE Trans. Biomed. Eng. 2014, 61, 1538–1554.
[CrossRef] [PubMed]
10. Puwein, J.; Ballan, L.; Ziegler, R.; Pollefeys, M. Joint Camera Pose Estimation and 3D Human Pose Estimation
in a Multi-camera Setup. In Proceedings of the ACCV 2014, Singapore, 1–5 November 2014.
11. Kim, Y.; Baek, S.; Bae, B.C. Motion Capture of the Human Body Using Multiple Depth Sensors. ETRI J. 2017,
39, 181–190. [CrossRef]
12. Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using
spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [CrossRef]
13. Zhu, C.; Sheng, W.; Liu, M. Wearable Sensor-Based Behavioral Anomaly Detection in Smart Assisted Living
Systems. IEEE Trans. Autom. Sci. Eng. 2015, 12, 1225–1234. [CrossRef]
14. Bruno, B.; Mastrogiovanni, F.; Sgorbissa, A.; Vernazza, T.; Zaccaria, R. Analysis of human behavior
recognition algorithms based on acceleration data. In Proceedings of the 2013 IEEE International Conference
on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1602–1607. [CrossRef]
15. Vaizman, Y.; Ellis, K.; Lanckriet, G. Recognizing Detailed Human Context in the Wild from Smartphones
and Smartwatches. IEEE Pervasive Comput. 2017, 16, 62–74. [CrossRef]
16. Vaizman, Y.; Ellis, K.; Lanckriet, G.; Weibel, N. ExtraSensory App: Data Collection In-the-Wild with Rich
User Interface to Self-Report Behavior. In Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems (CHI ’18), Montreal QC, Canada, 21–26 April 2018; ACM: New York, NY, USA, 2018;
p. 554. [CrossRef]
17. Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human
Activity Recognition Using Smartphones. In Proceedings of the 21th European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine Learning (ESANN 2013), Bruges, Belgium,
24–26 April 2013.
18. Micucci, D.; Mobilio, M.; Napoletano, P. UniMiB SHAR: A Dataset for Human Activity Recognition Using
Acceleration Data from Smartphones. Appl. Sci. 2017, 7, 1101. [CrossRef]
19. Atallah, L.; Lo, B.; King, R.; Yang, G.Z. Sensor Positioning for Activity Recognition Using Wearable
Accelerometers. IEEE Trans. Biomed. Circuits Syst. 2011, 5, 320–329. [CrossRef]
20. Cleland, I.; Kikhia, B.; Nugent, C.; Boytsov, A.; Hallberg, J.; Synnes, K.; McClean, S.; Finlay, D. Optimal
Placement of Accelerometers for the Detection of Everyday Activities. Sensors 2013, 13, 9183–9200. [CrossRef]
21. Radu, V.; Lane, N.D.; Bhattacharya, S.; Mascolo, C.; Marina, M.K.; Kawsar, F. Towards Multimodal Deep
Learning for Activity Recognition on Mobile Devices. In Proceedings of the 2016 ACM International Joint
Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp ’16,), Heidelberg, Germany, 12–16
September 2016; ACM: New York, NY, USA, 2016; pp. 185–188. [CrossRef]
22. Chen, Y.; Xue, Y. A Deep Learning Approach to Human Activity Recognition Based on Single Accelerometer.
In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon,
China, 9–12 October 2015; pp. 1488–1492. [CrossRef]
23. Jiang, W.; Yin, Z. Human Activity Recognition Using Wearable Sensors by Deep Convolutional Neural
Networks. In Proceedings of the 23rd ACM International Conference on Multimedia (MM ’15), Brisbane,
Australia, 26–30 October 2015; ACM: New York, NY, USA, 2015; pp. 1307–1310. [CrossRef]
24. Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor Positioning and Data Acquisition for Activity
Recognition using Deep Learning. In Proceedings of the International Conference on Information and
Communication Technology Convergence (ICTC), Jeju Island, Korea, 17–19 October 2018; pp. 154–159.
[CrossRef]
Sensors 2019, 19, 1716 19 of 20
25. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
26. Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Trster, G.; del R. Milln, J.; Roggen, D.
The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition.
Pattern Recognit. Lett. 2013, 34, 2033–2042. [CrossRef]
27. Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey.
Pattern Recognit. Lett. 2019, 119, 3–11. [CrossRef]
28. Laput, G.; Zhang, Y.; Harrison, C. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of
the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17), Denver, CO, USA, 6–11 May
2017; ACM: New York, NY, USA, 2017; pp. 3986–3999. [CrossRef]
29. Gupta, P.; Dallas, T. Feature Selection and Activity Recognition System Using a Single Triaxial Accelerometer.
IEEE Trans. Biomed. Eng. 2014, 61, 1780–1786. [CrossRef]
30. Wang, Z.; Wu, D.; Gravina, R.; Fortino, G.; Jiang, Y.; Tang, K. Kernel fusion based extreme learning machine
for cross-location activity recognition. Inf. Fusion 2017, 37, 1–9. [CrossRef]
31. Ding, R.; Li, X.; Nie, L.; Li, J.; Si, X.; Chu, D.; Liu, G.; Zhan, D. Empirical Study and Improvement on Deep
Transfer Learning for Human Activity Recognition. Sensors 2019, 19, 57. [CrossRef]
32. Ordóñez, F.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal
Wearable Activity Recognition. Sensors 2016, 16, 115. [CrossRef]
33. Murad, A.; Pyun, J.Y. Deep Recurrent Neural Networks for Human Activity Recognition. Sensors 2017,
17, 2556. [CrossRef]
34. Li, F.; Shirahama, K.; Nisar, M.A.; Köping, L.; Grzegorzek, M. Comparison of Feature Learning Methods for
Human Activity Recognition Using Wearable Sensors. Sensors 2018, 18, 679. [CrossRef]
35. Guo, H.; Chen, L.; Peng, L.; Chen, G. Wearable Sensor Based Multimodal Human Activity Recognition
Exploiting the Diversity of Classifier Ensemble. In Proceedings of the 2016 ACM International Joint
Conference on Pervasive and Ubiquitous Computing (UbiComp ’16), Heidelberg, Germany, 12–16 September
2016; ACM: New York, NY, USA, 2016; pp. 1112–1123. [CrossRef]
36. Peng, L.; Chen, L.; Wu, M.; Chen, G. Complex Activity Recognition using Acceleration, Vital Sign, and
Location Data. IEEE Trans. Mob. Comput. 2018. [CrossRef]
37. The General Social Survey—Statistics Canada. Available online: https://www150.statcan.gc.ca/n1/pub/
89f0115x/89f0115x2013001-eng.htm (accessed on 9 April 2019).
38. The time use Survey—Statistics Korea. Available online: http://kostat.go.kr/portal/eng/pressReleases/11/
6/index.board (accessed on 9 April 2019).
39. Reiss, A.; Stricker, D. Introducing a New Benchmarked Dataset for Activity Monitoring. In Proceedings of the
2012 16th International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109.
[CrossRef]
40. Dall, P.; McCrorie, P.; Granat, M.; Stansfield, B. Step accumulation per minute epoch is not the same as cadence for
free-living adults. Med. Sci. Sports Exerc. 2013, 45, 1995–2001. [CrossRef]
41. Narasimhan, H.; Pan, W.; Kar, P.; Protopapas, P.; Ramaswamy, H.G. Optimizing the Multiclass F-Measure
via Biconcave Programming. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining
(ICDM), Barcelona, Spain, 12–15 December 2016; pp. 1101–1106. [CrossRef]
42. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process.
Manag. 2009, 45, 427–437. [CrossRef]
43. Tran, T.; Pham, T.; Carneiro, G.; Palmer, L.; Reid, I. A Bayesian Data Augmentation Approach for Learning
Deep Models. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY,
USA, 2017; Volume 30, pp. 2797–2806.
44. Rogez, G.; Schmid, C. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild. In Advances
in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2016; Volume 29,
pp. 3108–3116.
45. Cui, X.; Goel, V.; Kingsbury, B. Data Augmentation for Deep Neural Network Acoustic Modeling. IEEE/ACM
Trans. Audio Speech Lang. Proc. 2015, 23, 1469–1477. [CrossRef]
Sensors 2019, 19, 1716 20 of 20
46. Mathur, A.; Zhang, T.; Bhattacharya, S.; Veličković, P.; Joffe, L.; Lane, N.D.; Kawsar, F.; Lió, P. Using Deep Data
Augmentation Training to Address Software and Hardware Heterogeneities in Wearable and Smartphone
Sensing Devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing
in Sensor Networks (IPSN ’18), Porto, Portugal, 11–13 April 2018; IEEE Press: Piscataway, NJ, USA, 2018;
pp. 200–211. [CrossRef]
47. Um, T.T.; Pfister, F.M.J.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data Augmentation
of Wearable Sensor Data for Parkinson’s Disease Monitoring Using Convolutional Neural Networks.
In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI ’17), Glasgow,
UK, 13–17 November 2017; ACM: New York, NY, USA, 2017; pp. 216–220. [CrossRef]
48. Empatica. Real-Time Physiological Signals E4 EDA/GSR Sensor. Available online: https://www.empatica.
com/get-started-e4 (accessed on 9 April 2019).
49. MbientLab. Smart Wireless Sensors and a Machine Learning Cloud for Motion Recognition. Available
online: https://mbientlab.com/ (accessed on 9 April 2019).
50. Nguyen, B.P.; Tay, W.; Chui, C. Robust Biometric Recognition From Palm Depth Images for Gloved Hands.
IEEE Trans. Hum.-Mach. Syst. 2015, 45, 799–804. [CrossRef]
c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).