LSTM Networks For Mobile Human Activity Recognition: Yuwen Chen, Kunhua Zhong, Ju Zhang, Qilong Sun and Xueliang Zhao
LSTM Networks For Mobile Human Activity Recognition: Yuwen Chen, Kunhua Zhong, Ju Zhang, Qilong Sun and Xueliang Zhao
LSTM Networks For Mobile Human Activity Recognition: Yuwen Chen, Kunhua Zhong, Ju Zhang, Qilong Sun and Xueliang Zhao
Abstract—A lot of real-life mobile sensing applications are features, while discrete cosine transform (DCT) have also been
becoming available. These applications use mobile sensors applied with promising results [12], as well as auto-regressive
embedded in smart phones to recognize human activities in order model coefficients [13]. Recently, time-delay embedding [14]
to get a better understanding of human behavior. In this paper, have been applied for activity recognition. It adopts nonlinear
we propose a LSTM-based feature extraction approach to time series analysis to extract features from time series and
recognize human activities using tri-axial accelerometers data. shows a significant improvement on periodic activities
The experimental results on the (WISDM) Lab public datasets recognition .However, the features from time-delay
indicate that our LSTM-based approach is practical and achieves embedding are less suitable for non-periodic activities. The
92.1% accuracy.
authors in [15] firstly introduce feature learning methods to the
Keywords-Activity recognition, Deep learning, Long short area of activity recognition, they used Deep Belief Networks
memory network (DBN) and PCA to learn features for activity recognition in
ubiquitous computing. The authors in [16] following the work
of [15] applied shift-invariant sparse coding technique. The
I. INTRODUCTION authors in [17] also used sparse coding to learn features.
Although human activity recognition (HAR) has been
The features used in most of researches on HAR are
studied extensively in the past decade, HAR on smartphones is
selected by hand. Designing hand-crafted features in a specific
a relatively new area. HAR is a classical multi-variate time
application requires domain knowledge [18], and maybe result
series or sequence analysis problem, for which the task is to
in loss of information after extracting features. This problem is
detect and classify those contiguous portions of sensor data
not unique to activity recognition. It has been well-studied in
streams that cover activities of interest for the target
other research areas such as image recognition[19],where
application. The predominant approach to HAR is based on a
different types of features need to be extracted when trying to
sliding window procedure, where a fixed length analysis
recognize a handwriting as opposed to recognizing faces. In
window is shifted along the signal sequence for frame
recent years, due to advances of the processing capabilities, a
extraction. Preprocessing then transforms raw signal data into
large amount of Deep Learning (DL) techniques have been
feature vectors, which are subjected to statistical classifiers
developed and successful applied in recognition tasks [20, 21].
that eventually provide activity hypotheses.
These techniques allow an automatic extraction of features
Activity recognition has a wide range of applications in without any domain knowledge
mobile applications — from fitness and health tracking to
In this work, we propose an approach based on Long -
context-based advertising and employee monitoring. Context-
Short Term Memory(LSTM) [22]to recognize activities in
aware applications can customize their behaviour based on the
various application domains.
current activity. For example [1, 2, 3, 4] have used the
smartphone accelerometer to recognize movements, such as
walking and running. Advances in the area of mobile sensing III. LSTM-BASED ACTIVITY RECOGNITION
enable users to: quantify their sleep and exercise patterns [5], Long short-term memory (LSTM) is a recurrent neural
monitor personal commute behaviors [6], track their emotional network (RNN) architecture published[22] in 1997 by Sepp
state [7], or even measure how long they spend queuing in Hochreiter and Jürgen Schmidhuber. Unlike traditional RNNs,
retail stores [8]. an LSTM network is well-suited to learn from experience to
classify, process and predict time series when there are very
II. RELATED WORK long time lags of unknown size between important events.
LSTM model which introduces a new structure called a
Extensive work has been done in the area of HAR using
memory cell (see Figure 1 below). A memory cell is composed
smartphone sensors, which has been summarized in [9].
of four main elements: an input gate, a neuron with a self-
Feature extraction for AR is an important task. Statistical
recurrent connection, a forget gate and an output gate. The
features such as mean, standard deviation, entropy and
self-recurrent connection has a weight of 1.0 and ensures that,
correlation coefficients, etc. are the most widely used hand-
barring any outside interference, the state of a memory cell can
crafted features in AR [10]. Fourier transform and wavelet
remain constant from one time step to another. The gates serve
transform [11] are another two commonly used hand-crafted
to modulate the interactions between the memory cell itself
(1)
(2)
51
Train the training data through the LSTM-model mentioned V. CONCLUSION
above. accuracy show Figure 4. the LSTM-based model In this paper, we have proposed a LSTM-based feature
achieves classification accuracy of 95.1%. Our training metrics extraction approach. The experimental results have shown that
are not smooth and fluctuations because we use small data sizes the LSTM-based approach is practical and achieves best
and the distribution of our training data is not uniform. If we 92.1% accuracy.
used larger data we would get a smoother blue line. The results
suggest that we need more data, stronger regularization, or Experiments with larger datasets are needed to further
fewer model parameters. study the robustness of the proposed technique. Further
improvements maybe achieved by using more data and
To analyze the results in more detail, we show the regularization.
confusion matrix for the validation datasets, The confusion
matrices indicate that many of the prediction error are due to
confusion between these two activities: ”Jogging”, ”Upstairs”. REFERENCES
This is because these two activities are relatively similar. [1] ]M. Fahim, I. Fatima, S. Lee, and Y. T. Park, “Efm: evolutionary fuzzy
model for dynamic activities recognition using a smartphone
accelerometer,” Applied Intelligence, pp. 1–14, 2013
[2] O. D. Lara and M. A. Labrador, “A mobile platform for real-time human
activity recognition,” inProceedings of the IEEE Consumer
Communications and Networking Conference (CCNC '12), pp. 667–671,
IEEE, 2012.
[3] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition
using cell phone accelerometers,”ACM SIGKDD Explorations
Newsletter, vol. 12, no. 2, pp. 74–82, 2011.
[4] L. Sun, D. Zhang, B. Li, B. Guo, and S. Li, “Activity recognition on an
accelerometer embedded mobile phone with varying positions and
orientations,” in Proceedings of the 7th international conference on
Ubiquitous intelligence and computing (UIC '10), pp. 548–562, Springer,
Berlin, Germany
[5] S. Consolvo, et al. Activity Sensing in the Wild: A Field Trial of UbiFit
Garden. In CHI ’08.
[6] S. Reddy, et al. Using mobile phones to determine transportation modes.
ACM Trans. Sen. Netw., 6(2):13:1–13:27, Mar. 2010.
FIGURE III. CLASS DISTRIBUTION FOR DATASETS [7] K. K. Rachuri, et al. Emotionsense: A mobile phones based adaptive
platform for experimental social psychology research. In UbiComp ’10.
[8] Y. Wang, et al. Tracking human queues using single-point signal
monitoring. In MobiSys ’14.
[9] Incel, O.D., Kose, M., Ersoy, C.: A review and taxonomy of activity
recognition on mobile phones. BioNanoScience 3(2), 145–171 (2013)
[10] D.Figo,P.C.Diniz,D.R.Ferreira,andJ.M.Cardoso.Preprocessingtechniques
forcontext recognition from accelerometer data. Personal and Ubiquitous
Computing, 14(7):645–662, 2010.
[11] Z.HeandL.Jin. Activity recognition from cceleration data based on
discrete consine transform and svm.
InSystems,ManandCybernetics.SMC2009.IEEEInternationalConference
on, pages 5041–5044. IEEE, 2009.
FIGURE IV. ACCURACY
[12] T. Tamura, M. Sekine, M. Ogawa, T. Togawa, and Y. Fukui.
Classification of acceleration
waveformsduringwalkingbywavelettransform.
Methodsofinformationinmedicine,36(45):356–359, 1997
[13] Z.-Y.HeandL.-W.Jin. Activity recognition from acceleration data using
a rmodel representation and svm. In Machine Learning and Cybernetics,
2008 International Conference on, volume 4, pages 2245–2250. IEEE,
2008.
[14] J. Frank, S. Mannor, and D. Precup. Activity and gait recognition with
time-delay embeddings. In AAAI, 2010.
[15] Pl¨otz, T., Hammerla, N.Y., Olivier, P.: Feature learning for activity
recognition in ubiquitous computing. In: Proceedings of the Twenty-
second International Joint Conference on Artificial Intelligence, vol. 2,
pp. 1729–1734. AAAI Press (2011)
[16] Vollmer, C., Gross, H.-M., Eggert, J.P.: Learning features for activity
recognition with shift-invariant sparse coding. In: Mladenov, V.,
Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov,
N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 367–374. Springer,
Heidelberg (2013) 5. [17]Bhattacharya, S., Nurmi, P., Hammerla, N.,
Pl¨otz, T.: Using unlabeled data in a sparse-coding framework for human
activity recognition. arXiv preprint arXiv:1312.6995 (2013)
FIGURE V. CONFUSION MATRIX
52
[17] T. Pl¨otz, N. Y. Hammerla, and P. Olivier. Feature learning for activity
recognition in ubiquitous computing. InProceedingsoftheTwenty-
SecondIJCAIVolumeTwo,pages1729–1734. AAAI Press, 2011.
[18] D. G. Lowe. Object recognition from local scale-invariant features. In
Computer vision, 1999. The proceedings of the seventh IEEE
international conference on, volume 2, pages 1150–1157. Ieee, 1999.
[19] U.BagciandL.Bai. A comparison of daubechies and gabor wavelets for
classification of mr images. In Signal Processing and Communications,
2007. ICSPC 2007. IEEE International Conference on, pages 676–679.
IEEE, 2007.
[20] Y. Tang, R. Salakhutdinov, and G. Hinton. Robust boltzmann machines
for recognition and denoising.
InComputerVisionandPatternRecognition(CVPR),2012IEEEConference
on, pages 2264–2271. IEEE, 2012.
[21] Sepp Hochreiter and Jürgen Schmidhuber (1997). "Long short-term
memory" . Neural Computation 9 (8): 1735–1780.
[23] GM Weiss, JR Kwapisz,S Moore Activity recognition using cell phone
accelerometers Proceedings of the Fourth International Works on
konwledge Discovery from sensor Data., 2010, 12(12):74-82
53