Deep2019 2
Deep2019 2
Abstract—With the advent of the Internet of Things (IoT), [4]. Marker-based method make use of optic wearable marker-
there have been significant advancements in the area of human based motion capture (MoCap) framework. It can accurately
activity recognition (HAR) in recent years. HAR is applicable capture complex human motions but this approach has some
to wider application such as elderly care, anomalous behaviour
detection and surveillance system. Several machine learning disadvantages. It require the optical sensors to be attached
algorithms have been employed to predict the activities per- on the human and also demand the need of multiple camera
formed by the human in an environment. However, traditional settings. Whereas, the vision based method make use of RGB
machine learning approaches have been outperformed by feature or depth image. It does not require the user to carry any
engineering methods which can select an optimal set of features. devices or to attach any sensors on the human. Therefore,
On the contrary, it is known that deep learning models such
as Convolutional Neural Networks (CNN) can extract features this methodology is getting more consideration nowadays,
and reduce the computational cost automatically. In this paper, consequently making the HAR framework simple and easy
we use CNN model to predict human activities from Wiezmann to be deployed in many applications.
Dataset. Specifically, we employ transfer learning to get deep
image features and trained machine learning classifiers. Our Most of the vision-based HAR systems proposed in the
experimental results showed the accuracy of 96.95% using VGG- literature used traditional machine learning algorithms for
16. Our experimental results also confirmed the high performance activity recognition. However, traditional machine learning
of VGG-16 as compared to rest of the applied CNN models. methods have been outperformed by deep learning methods
Index Terms—Activity recognition, deep learning, convolu- in recent time [5]. The most common type of deep learning
tional neural network.
method is Convolutional Neural Network (CNN). CNN are
largely applied in areas related to computer vision. It consists
I. I NTRODUCTION series of convolution layers through which images are passed
for processing. In this paper, we use CNN to recognise
Human activity recognition (HAR) is an active research area human activities from Wiezmann Dataset. We first extracted
because of its applications in elderly care, automated homes the frames for each activities from the videos. Specifically,
and surveillance system. Several studies has been done on we use transfer learning to get deep image features and
human activity recognition in the past. Some of the existing trained machine learning classifiers. We applied 3 different
work are either wearable based [1] or non-wearable based CNN models to classify activities and compared our results
[2] [3]. Wearable based HAR system make use of wearable with the existing works on the same dataset. In summary, the
sensors that are attached on the human body. Wearable based main contributions of our work are as follows:
HAR system are intrusive in nature. Non-wearable based HAR
system do not require any sensors to attach on the human or to 1) We applied three different CNN models to classify
carry any device for activity recognition. Non-wearable based human recognition activities and we showed the
approach can be further categorised into sensor based [2] and accuracy of 96.95% using VGG-16.
vision-based HAR systems [3]. Sensor based technology use
RF signals from sensors, such as RFID, PIR sensors and Wi- 2) We used transfer learning to leverage the knowledge
Fi signals to detect human activities. Vision based technology gained from large-scale dataset such as ImageNet [6] to
use videos, image frames from depth cameras or IR cameras the human activity recognition dataset.
to classify human activities. Sensor based HAR system are
non-intrusive in nature but may not provide high accuracy. The rest of the paper is as follows: Section II provide an
Therefore, vision-based human activity recognition system has overview of the related work in video-based HAR systems. We
gained significant interest in the present time. Recognising provide an overview of transfer learning in section III. Section
human activities from the streaming video is challenging. IV outline the research methodology, sources of data, research
Video-based human activity recognition can be categorised approach and discuss the experimental results. Conclusion and
as marker-based and vision-based according to motion features future work are drawn in Section V.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 11,2020 at 14:35:37 UTC from IEEE Xplore. Restrictions apply.
2019 29th Int ern at ional Telecommunication Networks and Applications Conference (ITNAC)
video-based hum an activity recog nition using deep learning Convo lutio nal Neural Network (CNN)
Dog X
models have gained a lot of interest in recent years [5].
Learned Cat X
Zhu et al. [4] proposed an action cla ssification method by
Features
addin g a mixed-n orm regularization function to a deep LSTM Hand"
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 11,2020 at 14:35:37 UTC from IEEE Xplore. Restrictions apply.
2019 29th International Telecommunication Networks and Applications Conference (ITNAC)
TABLE II
activity. Table I shows the total number of frames per activity R ESULTS ON ACTIVITY R ECOGNITION BASED ON DIFFERENT CNN
based on the extracting frames for all 9 people. The entire MODELS IN TERMS OF ACCURACY SCORE , P RECISION , R ECALL , AND
dataset is divided into Training (70%), Validation(10%), and F1- SCORE
Testing (20%).
Model Accuracy Precision Recall F1-
TABLE I (in %) (in %) (in %) score
DATASET STATISTICS IN TERMS OF NUMBER OF FRAMES PER ACTIVITY
(in %)
VGG-16 96.95 97.00 97.00 97.00
Activity Number of Frames VGG-19 96.54 97.00 97.00 96.00
Inception-v3 95.63 96.00 96.00 96.00
Bend 639
Jack 729 TABLE III
Jump 538 P ERFORMANCE COMPARISON USING W EIZMANN DATASET
Run 346
Side 444 Model Accuracy (in %)
Skip 378
Walk 566 VGG-16 96.95
Wave1 653 Cai et al. [19] 95.70
Wave2 624 Kumar et al. [20] 95.69
Feng et al. [21] 94.10
Total 4917 Han et al. [22] 90.00
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 11,2020 at 14:35:37 UTC from IEEE Xplore. Restrictions apply.
2019 29th International Telecommunication Networks and Applications Conference (ITNAC)
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 11,2020 at 14:35:37 UTC from IEEE Xplore. Restrictions apply.