Human Activity Recognition by Machine Learning Methods
Human Activity Recognition by Machine Learning Methods
Human Activity Recognition by Machine Learning Methods
Ole M. Brastein 1), Roland Olsson 2), Nils-Olav Skeie 1) and Thomas
Lindblad 3)
1)
Høgskolen I Sørøst Norge, Porsgrunn
2)
Høgskolen I Østfold
3)
Kungliga Tekniska Högskolan, Stockholm
Abstract
Recognition of human activity from sensor data is a research field of great potential.
Giving autonomous systems the ability to identify what a human subject is doing at
a given time is highly useful in many industries, particularly in health care and
security monitoring. Our results, using a public domain dataset, show that the state-
of-the-art decision tree ensemble algorithm XGBoost gives an accuracy of 94.6%
validated on an independent test set. Previously published results using support
vector machines (SVM) gave an accuracy of 90.2%. As far as we know, our result is
the new state of the art for this data set. Recognition of human activity carries
potential privacy concerns, which to some degree constrain the choice of sensor
technology. Therefore, systems such as ours which can identify activities from
simple inertial sensors, e.g. accelerators and gyroscopes are of particular interest.
Data from such inertial sensors are difficult to interpret using mechanistic models;
hence the field of Machine Learning is particularly interesting for this application.
1 Introduction
Human Activity Recognition (HAR) is the process of identifying what a person is doing
based on sensor readings. Activities are generally divided into classes, and the goal of
HAR is to identify which class of activity is performed. There are many applications of
HAR. For example, a HAR system could detect abnormal activity in a crowd of people
and allow identification of possible threatening situations, or detect that a person is in
need of assistance. Another situation where identifying human activity can be very
useful is the monitoring of elderly people in need of care [1].
Several types of sensor systems can be used to classify human activities. One
popular approach is computer vision systems. Another applicable sensor system is the
use of inertial sensors, e.g. accelerometers and gyroscopes. Most modern smartphones
have such sensors built into them. Modern silicon fabrication technology has allowed
micro-scale sensors, often referred to as Micro-Electro-Mechanical Systems (MEMS)
such as accelerometers and gyroscopes to be produced cheaply and in small packages.
This makes the identification of human activity from inertial data especially interesting,
as such sensor systems could be applied without any practical inconvenience to the
wearer of the sensors.
The data set used in this project was taken from J. L. Reyes-Ortiz, et. al. [2], and
has been used in several projects [3-5]. The authors use machine learning methods to
analyze the data. Support vector machines (SVM) are used in [3] to classify six different
activities, walking, walking upstairs, walking downstairs, standing, sitting and laying.
Classification results are on the order of 70-90% with the laying activity having a
particularly high success rate of 100% correct classifications.
Based on the promising results in [3] it is of interest to evaluate other
classification algorithms and compare their performance with the SVM results. In the
present work, the same data set as used in [3] is analyzed with three decision tree
classification methods, namely C5.0, Random Forest and XGBoost [6].
2 Methods and algorithms
XGBoost is generally the most accurate machine learning (ML) method in use today,
and it is used to win around 50% of Kaggle competitions. XGBoost is a state of the art
tree ensemble method, which contains an extensive set of regularization mechanisms to
prevent overfitting. Random Forest (RF) is by far the most used tree ensemble method;
hence it is of interest to compare results from Random Forest and XGBoost. The C5.0
algorithm with boosting is included here only for comparison with the more modern
methods; XGBoost and RF.
2.3 XGBoost
XGBoost is an acronym for eXtreme Gradient Boosting [6]. The algorithm is an
implementation of a gradient boosting machine [8]. Gradient boosting machines are
based on an ensemble of decision trees, as discussed in section 2.1, where multiple
weak learner trees are used in combination as a collective to give better predictions than
individual trees can do [7]. In comparison with older gradient boosting algorithms,
XGBoost has superior regularization and better handling of missing values, as well as
much improved efficiency [6].
2.4 Boosting
The concept of boosting is to use an ensemble of weak learners to iteratively improve
on the results of the previous model. The AdaBoost algorithm, as discussed in [7],
consists of a sequence of weak classifiers. In each iteration, the best classifier is
identified based on the current sample weights. Any classification errors in the current
iteration receive more weight in the next iteration, while the inverse is true for correctly
classified samples. Each iteration focuses on a different aspect of the data, such that
regions of data that are difficult to classify are treated separately. In the final stage, an
ensemble is created from the combined overall sequence of classifiers. This ensemble is
likely to have a better prediction performance than any individual classifier [7].
Figure 1 - This plot shows the correlation matrix for all 561 predictors. Such a large matrix is difficult to
analyze manually. The R package corrplot is used to organize the predictors such that predictors with
high correlation are plotted together in groups. Dark color signifies high correlation while low correlation
is plotted in white. While detailed analysis of such a plot is difficult when the number of predictors is
large, it is useful to investigate the structure of the correlation matrix.
If multiple predictors contain essentially the same information, i.e. the same relevance
to the classification process, the tree algorithms will have degraded prediction
performance due to this irrelevancy in the choice of which predictor gives the optimum
split at each tree node [7]. To highlight the problem, a correlation analysis of the data is
performed using the R function cor. This calculates inter-predictor correlations in a data
matrix, and the corrplot method from the library corrplot can be used to plot the results.
The correlation matrix plot in Figure 1 shows that the correlation between
variables, or predictors, is extensive. The plot shows essentially three groups of
correlated predictors and it may be concluded that there are a large number of predictors
giving essentially redundant information.
Figure 2 - Scoreplot from PCA, including color legend for each of the six activities. The plot shows that
by using PC-1 and PC-2 it is possible to separate static and dynamic activities.
In Figure 2, as shown by the legend, each of the six response classes is highlighted by
gray-tone and marker shape. Dynamic behaviors is marked with dots while stationary
behaviors are marked with cross. The horizontal axis shows PC-1, the most dominant
PC, which accounts for 51% of the total variation in the data set. By inspection of this
plot, it is clear that there are two groups of samples. The three classes that contain
samples from subjects that are walking are clearly separated from the other three classes
where the subject is stationary. Furthermore, in the PC-2 direction, containing 7% of the
total variation, the three walking activities are further separated but with significant
overlap. This analysis shows that it is reasonable to continue with classification, since
there is clearly information in the predictors that correlate to the activity class, i.e. there
is some structure in the data that can be used by algorithms to build classification
models.
Figure 4 -Tuning results from the Random Forest method. The plot shows prediction accuracy using 5-
fold CV with 25 repeats against the tuning parameter mtry.
As shown in the initial test of RF in Figure 3, the performance is not drastically affected
by the number of randomly selected variables mtry, however the accuracy deteriorates if
mtry is above 15 predictors. This value corresponds approximately to Breimans
suggestion [10] of setting mtry to the square root of P (~23). The lack of improvement
with higher mtry, i.e. including more data, could be related to the high degree of
correlation between the predictors in this data set. However, according to [7] , this lack
of sensitivity to mtry is a commonly seen result.
A more extensive test of RF, using a higher number of CV folds and repeats, is
shown in Figure 4. The optimal number of predictors in each iteration appears to be 15.
The difference in accuracy between the tested choices of tuning parameter is not large.
It is important to note again, as discussed in section 3.1, that the CV based accuracy
estimates are only intended for parameter tuning, and must not be taken as estimates of
method accuracy.
Kuhn et.al. recommends at least 1000 trees to be used in RF, but that would take
some ~10 days to compute with a reasonable amount of cross validation (CV), say 25
repeats of 5-fold CV, for this data se, hence only 100 trees are used here.
4.4 XGBoost
The next method is XGBoost. The R implementation of this method is taken from [13].
This implementation is efficiently parallelized which provides fast computation. The
gradual stepping up in complexity is therefore not necessary. Instead, a full run with a
default tuning grid is executed immediately.
Figure 5 - Shows tuning results for XGBoost. For this method there are 4 tuning parameters. The figure
consist of four sub-plots, each representing particular settings for eta and colsample parameters. The sub-
plots shows the maximum allowed tree-depth on the absica and the prediction accuracy calculated from 5-
fold CV repeated 25 times on the ordinate axis. Each sub-plot have three graphs, one for each of the three
tested settings of boosting iterations (50, 100 and 150).
Despite using a higher number of tuning parameters, resulting in several times more
runs then previous algorithms, XGBoost finishes in ~4 hours. A higher number of boost
iterations is also used here, from 50 to 150. The efficiency of this implementation is
further evidenced by the CPU load holding steady at 90% throughout the computation.
As shown by Figure 5, the tuning grid includes variations over the typical
training parameters for XGboost. The eta and colsample parameters have little effect on
the model accuracy. The eta setting controls the learning rate of the algorithm by
scaling the contribution of each new tree. The colsample_bytree setting determines the
ratio of predictors that are used for each new generated tree. For a tree-depth of 1 there
is some observable improvement by increasing eta from 0.3 to 0,4 but for higher tree-
depths there is no difference for either of these two parameters.
Max Tree Depth and boost iterations are more important. In all cases, the
accuracy is in the range 95-99%. These results are found using 5-fold cross validation
repeated 25 times. As discussed in section 3.1, CV does not give independent sub-sets
of the data used here. Hence the accuracy estimates can only be used for parameter
tuning and does not represent accurate predictions of the method accuracy.
4.5 C5.0
C5.0 results are only used as a reference method for Random Forest and XGBoost. As
such, detailed discussion of the tuning for C5.0 is not included here. However, the
method was tuned using a similar approach to that presented for Random Forest and
XGBoost. The results of tuning hyper parameters show that 30 boosting iterations is a
good choice. Further, there is no significant improvement by application of winnowing.
Table 2 - Sensitivity and specificity for Random Forest on the test set
The classification accuracy on the test set is found to be 93.7% for the Random Forest
model. The confusion matrix is shown in Table 1 and the sensitivity and specificity is
shown in Table 2.
The classification accuracy on test set is found to be 94.6% for the XGBoost model,
which is a good result. The confusion matrix is shown in Table 3 and the sensitivity and
specificity is shown in Table 4.
From Table 5 we may conclude that the results are similar to those of [3]. The overall
accuracy is computed as a weighted sum over all test examples. As mentioned
previously, the XGBoost method performs slightly better then Random Forest and rule-
based C5.0.
An interesting observation from e.g. Table 3 is that there are three disjunct
groups of activities where there are no miss-classifications between these groups. The
first group consists of the three walking activities, which are separate from the other
three stationary activities. This is expected based on the results from PCA, where the
first PC clearly shows that these two types of activities, i.e. walking/dynamic vs
stationary, can be separated by only one PC. Further, the activity laying also has no
sample misclassifications. This can likely be traced back to the angle of gravitation
predictor, since the laying activity is distinctly different in this respect compared to the
other five classes of activities
8 Bibliography