0% found this document useful (0 votes)

19 views10 pages

Buffelli 2021

This document presents a novel attention-based deep learning framework, TrASenD, for human activity recognition (HAR) that significantly outperforms existing models by over 7% on the F1 score across multiple datasets. The framework addresses the limitations of recurrent neural networks (RNNs) in capturing long-term dependencies and introduces a transfer learning technique for user adaptation, improving prediction accuracy by an average of 6% for specific users. The extensive experimental evaluation demonstrates the superior capabilities of TrASenD compared to current state-of-the-art methods in HAR.

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views10 pages

Buffelli 2021

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

13474 IEEE SENSORS JOURNAL, VOL. 21, NO.

12, JUNE 15, 2021

Attention-Based Deep Learning Framework

for Human Activity Recognition With
User Adaptation
Davide Buffelli and Fabio Vandin

Abstract —Sensor-based human activity recognition (HAR)

requires to predict the action of a person based on sensor-
generated time series data. HAR has attracted major interest
in the past few years, thanks to the large number of appli-
cations enabled by modern ubiquitous computing devices.
While several techniques based on hand-crafted feature engi-
neering have been proposed, the current state-of-the-art is
represented by deep learning architectures that automati-
cally obtain high level representations and that use recurrent
neural networks (RNNs) to extract temporal dependencies in
the input. RNNs have several limitations, in particular in deal-
ing with long-term dependencies. We propose a novel deep
learning framework, TrASenD, based on a purely attention-
based mechanism, that overcomes the limitations of the state-of-the-art. We show that our proposed attention-based
architecture is considerably more powerful than previous approaches, with an average increment, of more than 7% on
the F1 score over the previous best performing model. Furthermore, we consider the problem of personalizing HAR
deep learning models, which is of great importance in several applications. We propose a simple and effective transfer-
learning based strategy to adapt a model to a specific user, providing an average increment of 6% on the F1 score on
the predictions for that user. Our extensive experimental evaluation proves the significantly superior capabilities of our
proposed framework over the current state-of-the-art and the effectiveness of our user adaptation technique.
Index Terms — Activity recognition, deep learning, multimodal sensors, pattern recognition.

I. I NTRODUCTION based on trial-and-error, require a lot of human effort, and

therefore time, and are not guaranteed to generalize well
S ENSOR-BASED human activity recognition (HAR) is a
time series classification task that involves predicting the
movement or action of a person (e.g. walking, running, etc.)
to unseen subjects. Deep learning enables automatic feature
extraction and can hierarchically compose features to obtain
based on sensor data. HAR has many practical applications, high level representations, which have more discriminative
such as fitness tracking, video surveillance, and gesture recog- power than handcrafted features based on human expertise.
nition. Despite being a well-studied and mature problem, HAR These properties allow deep learning models to be more robust
has been a very active research area in recent years, due and with higher generalization properties, and have shown
to the rise of ubiquitous computing enabled by smartphones, great results in HAR [32], [53]. In particular, the state-of-
wearables, and Internet-of-Things devices [5], [9], [32], [57]. the-art is given by the DeepSense framework [55], with an
Several previously proposed approaches tackled the problem architecture based on convolutional neural networks (CNNs)
by hand-crafting features [15], [47]. These kind of approaches, and recurrent neural networks (RNNs).
RNNs have been used in several domains to capture
Manuscript received December 4, 2020; accepted March 6, 2021. Date
of publication March 22, 2021; date of current version June 14, 2021. sequential relationships, but present some shortcomings when
This work was supported in part by the Italian Ministry of Education, learning from long input sequences [11], [19]. An attractive
University and Research (MIUR), through the Research Projects of strategy to enhance, or replace, RNNs is provided by attention
National Interest (PRIN) initiative under Grant 20174LF3T8 “Efficient
Algorithms for HArnessing Networked Data (AHeAD)”, and the initiative models. The main idea behind attention mechanisms is to
Departments of Excellence under Law 232/2016, and in part by the act as a memory-access mechanism that allows the model
University of Padova under Grant "STARS 2017: Algorithms for Inferential to selectively access the most important parts of the input
Data Mining” and Grant “SID 2020: RATED-X”. The associate editor
coordinating the review of this article and approving it for publication was sequence based on the current context. Attention models alle-
Prof. Aime Lay-Ekuakille. (Corresponding author: Fabio Vandin.) viate RNNs difficulties in learning from long input sequences,
The authors are with the Department of Information Engineering, Uni- and successive developments have led to NLP models based
versity of Padova, 35131 Padova, Italy (e-mail: [email protected];
[email protected]). solely on attention mechanisms [51]. To the best of our
Digital Object Identifier 10.1109/JSEN.2021.3067690 knowledge, the use of pure attention models in deep learning

1558-1748 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.
BUFFELLI AND VANDIN: ATTENTION-BASED DEEP LEARNING FRAMEWORK FOR HAR WITH USER ADAPTATION 13475

architectures to extract temporal dependencies in multimodal example of HAR is given by the fall detection functionality:
data, such as multi-sensor HAR data, has not been explored. given the 3D time series data extracted by an accelerometer,
The human activity recognition task is highly “personal”, detect if the person has fallen and needs assistance.
in the sense that a single smartphone or smartwatch is usually In HAR, sensors usually collect multi-dimensional time
used by just one person, and the style of walking, running series data, which presents important challenges:
or climbing stairs is peculiar to each individual. It is then • Noise: data coming from sensors is usually noisy.
desirable to have deep learning techniques that can be adapted • Heterogenous sensing rates: different sensors may have
to a specific user. However, the exploration of personalized different sensing rates.
deep learning models for HAR has been hitherto ignored. • User generalization and adaptation: every person has
a specific style of walking, running, jumping, etc. It
A. Our Contribution is then important to create systems that are capable of
We expand the deep learning approaches for HAR with a generalizing to new users, but at the same time with the
new purely attention-based framework, T R AS EN D, that builds possibility of adapting to the specific style of a given
upon the state-of-the-art while significantly outperforming it person.
on three different HAR datasets. T R AS EN D builds on the The approach proposed in this paper addresses these chal-
observation that RNNs do not provide the best way to capture lenges by: (1) using data augmentation to train models that
the temporal relationships in the data, and uses a purely are robust to noise, (2) preprocessing data to eliminate depen-
attention-based strategy. We also consider other variants of dencies on sensing rates, and (3) taking advantage of the
DeepSense, designed by replacing RNNs with more powerful generalization capabilities of deep learning models, and further
attention enhanced RNNs mechanisms to capture temporal proposing an effective user adaptation procedure.
dependencies, and we show that while they do perform bet-
ter than DeepSense, they are still less performing than our
III. R ELATED W ORK
purely attention based T R AS EN D. In addition, we propose a
personalization framework to adapt the model to a specific We divide the previous work related to our contribu-
user over time, increasing the accuracy of the predictions for tions in three sections: deep learning approaches for HAR
the user. To achieve this result we use a lightweight transfer (Section III-A), attention mechanisms (Section III-B), and
learning approach that continues the training of only a small transfer learning and personalization for HAR (Section III-C).
portion of the model with data acquired from the user. We
empirically show that this approach significantly improves the A. Deep Learning for HAR
performance of the model on a specific user.
Following the taxonomy defined in recent surveys
Our contributions can be summarized as follows:
[32], [53], deep learning techniques for sensor-based HAR
• We make use of a purely attention-based mechanism to fall into three main categories. The first category includes
develop a novel deep learning framework, T R AS EN D, for architectures composed of RNNs only (e.g., [4], [16], [20],
multimodal temporal data. [23]). The second category includes architectures based on
• We extensively evaluate T R AS EN D against the current
CNNs only, and can be further divided in two subcategories
state-of-the-art and some of its variants that we design. of models: Data Driven and Model Driven [53]. Data Driven
We show that T R AS EN D significantly outperforms other models (e.g., [18], [34], [42]) use CNNs directly on the raw
methods on 3 different HAR datasets, with an average data coming from the sensors (each dimension of the data
increment of more than 7% on the F1 score over the is seen as a channel). Model Driven approaches (e.g., [25],
previous best performing model. We also test the impact [36], [45], [48], [58]) first preprocess the data to get a grid-
of data augmentation, showing that it plays an important like structure, and then use CNNs. Recent work in the latter
role on the generalization capabilities of the models. category focuses on hybrid models: [39] combines multiple
• We propose a new transfer learning technique to adapt a
CNN models with a fusion layer, that merges the features
model to specific user, in order to exploit the “personal” extracted by the different models, while [2] uses a CNN to
nature of the HAR task. extract information from sensors, which is then combined with
• We empirically prove the effectiveness of our person-
an image segmentation model to produce spinal cord injury
alization technique, showing that it leads to an average predictions. The third category is represented by those models
increment of 6% on the F1 score on the predictions for a that use both CNNs and RNNs [28], [33], [45], [50], [55],
specific user. We further show that it is effective on every [56]. Finally, other deep learning techniques used for HAR are
model we analyze, and on each dataset. autoencoders [3], [52], and Restricted Boltzmann Machines
[17], [24], [35].
II. S ENSING FOR HAR DeepSense [55] is a deep learning framework for HAR that
Wearable sensors have now become a common tool for both belongs to the third category, and constitutes the state-of-the-
professional and commercial applications [30]. In fact, modern art for HAR. DeepSense is composed of CNNs to extract
smartphones and smartwatches are equipped with sensors features from intervals of data obtained from different sensors,
that allow the monitoring of physiological parameters, and and RNNs (Gated Recurrent Unit (GRU) in particular) to
the prediction and tracking of physical activities. A practical learn temporal dependencies between different time intervals.

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.
13476 IEEE SENSORS JOURNAL, VOL. 21, NO. 12, JUNE 15, 2021

A final layer is then easily customizable to adapt the frame- on some classifiers that however were not based on deep
work for classification, regression or segmentation tasks. learning, and with Hidden Unit Contributions [29], a small
The authors of DeepSense recently proposed a new version layer inserted in between CNNs and learned from user data.
of the framework, SADeepSense [56], where they introduce a In our approach we use transfer learning to train a small
self-attention mechanism that automatically balances the con- portion of the neural network architecture on data provided
tributions of multiple sensor inputs. SADeepSense maintains by a specific user. We show empirically that this simple and
the same architecture of the original DeepSense framework, easy to implement technique is in fact capable of adapting
and adds an attention module to balance the contribution of the framework to the user. Some preliminary work in this
different sensors based on their sensing quality. Addition- direction can be found in Rokni et al. [40]. We greatly expand
ally, in the RNN layer, another attention module is used on it by: providing quantitative results on the improvements
to selectively attend to the most meaningful timesteps. This given by this personalization process; comparing with state-of-
approach differs significantly from ours as the self-attention the-art techniques; and applying the personalization procedure
module of SADeepSense is used to address the issue of to multiple, different, deep learning architectures. We also
heterogeneity in the sensing quality from multiple sensors, and present an empirical evaluation of the learning capabilities of
to select the most relevant timesteps for the final prediction, the proposed transfer learning technique.
while T R AS EN D employs a purely attention-based mecha-
nism directly as a mean to extract temporal dependencies IV. DATA P REPROCESSING
in the data. Furthermore, SADeepSense retains the stacked In this section we present the preprocessing of the sensor
GRU layer of the original DeepSense framework, while our measurements that is performed for T R AS EN D.1 For each
approach replaces the GRU layer entirely. Another recently sensor S (i) , i ∈ {1, . . . , k}, let matrix V (i) describe its
proposed architecture based on the DeepSense framework, measurements, and vector u(i) define the timestamp of each
which adopts a similar attention strategy to SADeepSense is measurement. V (i) has size d (i) × n (i) , where d (i) is the
AttnSense [28]. number of dimensions for each measurement from sensor
S (i) (e.g., 3 for both accelerometer and gyroscope as they
B. Attention Models measure data along the x, y, and z axes) and n (i) is the
Attention models were first introduced in encoder-decoder number of measurements. u(i) has size n (i) . For each sensor
neural networks in the context of NLP [7]. The main idea S (i) , i ∈ {1, . . . , k}, the preprocessing procedure is defined as
behind attention mechanisms is to allow the decoder to selec- follows:
tively access the most important parts of the input sequence • Split the input measurements V (i) and u(i) along time to
based on the current context. This technique serves as a generate a series of non-overlapping intervals with width
memory-access mechanism, and overcomes RNNs difficulties τ . These intervals define the set W(i) = {(V (i) (i)
t , ut )},
in learning from long input sequences. Attention has then been (i)
where |W | = T and t ∈ 1, . . . , T .
used for image captioning in an architecture that made use • For each pair belonging to W(i) apply the Fourier trans-
of both CNNs and RNNs [54]. Since then, attention models form and stack the inputs into a d (i) × 2 f × T tensor
have become very popular in the deep learning community as X(i) , where f is the dimension of the frequency domain
an effective and powerful tool to enhance the capabilities of containing f magnitude and phase pairs.
RNNs (e.g., [10], [27], [49]). Furthermore, Vaswani et al. [51] Finally, we group all the tensors in the set X = {X(i) },
introduced the Transformer architecture, which is the current i ∈ 1, . . . , k, which is then the input to our T R AS EN D
state-of-the-art for NLP, and completely removes RNNs with framework.
an attention-only mechanism to model temporal relationships. In practice, we first divide the measurements into samples
In HAR, attention models have only been used in addition with a length of 5 seconds (with no overlap), and then apply
to a RNN (as described in Section III-A), and not as a mean to the procedure with τ = 0.25 seconds and f = 10. From
directly capture temporal dependencies, which is the approach now on, with the term timestep we refer to a given τ -length
we propose in T R AS EN D. interval. In order to deal with uneven sampling intervals that
might appear in the data we first interpolate the measurements
C. Transfer Learning and Personalization in HAR in each τ -length interval, sample f evenly separated points,
Transfer learning is not new to HAR. In particular transfer and then apply the Fourier transform to those points. The
learning has been leveraged to compensate for the amount of interpolation is done with a linear interpolation along each
labeled data when training a model for activity recognition in measurement axis. The measurements in a 5 seconds sample
different environments/circumstances [14], [26]. of each sensor are passed to the architecture as a matrix of size
A previous (non-deep learning) transfer learning approach T × features dimension, where T = 20 and features dimension
for personalized HAR, was proposed by Saeedi et al. [41], = d (i) × 2 f (each training and evaluation example is fed to
and used the Locally Linear Embedding (LLE) algorithm to the network with one matrix per sensor). Notice that applying
construct activity manifolds, which are used to assign labels a convolution operation with filters having a receptive field
to unlabeled data that can be used to develop a personalized 1 DeepSense [55] applies a similar procedure, however, we report some
model for the target user. Other different approaches to person- additional details, like the interpolation of the measurements, and the exact
alized HAR have been made with incremental learning [44] values of the parameters, that were not specified in [55].

1×6d (i) with a stride of (1, d (i) ×2) .3 The second and the third
individual convolutional layers have filters with dimension
1 × 3. The convolutions in all three layers are applied without
padding and are followed by batch normalization [21], and
a ReLu activation. Furthermore dropout [46] is applied in
between the layers, with probability 0.2. The output of the
individual layers are then concatenated, obtaining a tensor with
dimension T × number of sensors × f eatur es × channels
(where features depends of the dimension of filters at the
previous layers and channels is equal to the number of
filters of the last individual convolutional layers), and passed
to the merge convolutional subnetwork. This subnetwork is
composed of three convolutional layers with 64 filters each.
For each layer the dimensions of the filters are respec-
tively 1 × number of sensors × 8, 1 × number of sensors × 6,
Fig. 1. Scheme of the DeepSense framework [55]. Individual con- 1×number of sensors×4, this time with padding. Again, after
volutional subnetworks and the merge convolutional subnetwork share
weights across timesteps. each layer, batch normalization and a ReLu activation are per-
formed, with dropout in between layers (with probability 0.2).
The recurrent layers are composed of two stacked GRU [12]
that spans a single row is like extracting features from each layers with 120 cells each. Dropout (with probability 0.5) and
τ -length interval separately. recurrent batch normalization [13] are performed between the
Data Augmentation: Similarly to Yao et al. [55], for each two layers. Then the mean of the outputs at each time step is
training example we added other 9 artificial examples obtained taken, and passed to the output layer.
by adding noise (with a normal distribution with zero mean Finally, the output layer is a simple dense layer with a
and variance of 0.5 for the accelerometer and of 0.2 for the number of units equal to the number of activities to predict.
gyroscope). The idea behind this procedure is that the data The softmax activation is used to get a probability distribution
generated by the sensors are already noisy, so having more between the activities, and cross-entropy is used as loss
samples with slightly different noise should make the network function:
more robust to it. We analyze the impact of data augmentation
N
C
(true) ( pred)
in our experimental section. L= − yi,c log( yi,c )
i c

where N is the number of training examples, C is the number

V. A RCHITECTURE (true)
of different classes, yi,c is the c-th element of the one-
In this section we present our framework T R AS EN D. We
hot encoded ground truth for the i -th training example, and
start with a description of the architectural template defined by ( pred)
yi,c is the c-th element of the output of the architecture
the DeepSense framework [55].2 We then present the unique
(after softmax) for the i -th training example.
characteristics of T R AS EN D and its redesigned temporal
extraction strategy that is based purely on attention. Finally,
we present two additional variants of T R AS EN D with the goal B. TrASenD
of studying different temporal extraction strategies not based Recurrent Neural Networks (RNNs) present several prob-
purely on attention, but still more advanced than the stacked lems, from the difficulty to learn long-term dependencies [11],
GRU layer of DeepSense. [19], to their low computational efficiency. We propose a new
framework, building on the architectural template defined in
Section V-A, that replaces the stacked-GRU recurrent layer
A. DeepSense with an attention-based technique that better exploits temporal
DeepSense’s architecture (Fig. 1) can be divided in three dependencies in the data.
parts: convolutional layers, recurrent layers, and output layer. We first introduce the attention operator, which is at the core
The convolutional layers can be further divided into two of our attention-based technique for the extraction of temporal
subnetworks: an individual convolutional subnetwork for each dependencies, and then present in more detail the architecture
sensor and a unique merge convolutional subnetwork. Each of our proposed framework. Fig. 2 (b) shows a scheme of the
individual convolutional subnetwork (one per sensor) takes architecture of our temporal dependencies extractor.
as input a matrix with dimension T × features dimension 1) Attention Operator: An attention operator takes as input
(see Section IV) and is composed of three convolutional layers three matrices: a Query matrix Q, a Key matrix K , and a Value
with 64 filters each. The first layer has filters with dimension matrix V , where each row of the matrices indicates the query,
key, or value vector of a specific item (where item usually
2 In [55] the authors do not specify several architectural parameters (filter
dimensions, strides, presence of padding, dropout probability, training opti- 3 Intuitively, the filters have a receptive field that covers three measurement
mizer, learning rate, etc.). We refer to the parameters that can be found on the points, and have a stride of one measurement point (after the Fourier transform
author’s implementation available at https://github.com/yscacaca/DeepSense. each point is represented by two numbers: magnitude and phase).

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.
13478 IEEE SENSORS JOURNAL, VOL. 21, NO. 12, JUNE 15, 2021

Fig. 2. (a) Flowchart of our method. (b) Scheme of TrASenD’s temporal information extraction block. Notice how temporal information (coming from
the Merge Convolutional Subnetwork) is analyzed in a feed-forward manner, without the use of any RNN. (c) Scheme of the attention mechanism
for TrASenD-CA. At a given timestep the high level features extracted from the merge convolutional subnetwork are first flattened and concatenated.
The attention mechanism, considering the current state of the GRU layer, generates an attention weight for each feature which is then used to scale
them. The sum of the scaled features represents the context vector which is concatenated to the original features and passed as input to the GRU.

refers to a feature vector). The attention operator attends every We start by applying the positional embedding described
query to every key and obtains a similarity score (also called by Vaswani et al. [51] to introduce a notion of relative order
attention score) which is used to obtain weights for all the between the features extracted at different timesteps. Then,
value vectors (rows of the Value matrix). Following [51], we for each head, we first multiply the input with 3 different
obtain the similarity score using the scaled dot-product, and learnable matrices to obtain the query, key, value matrices
then the attention weights by applying softmax. Finally, the Q, K , V (each row of these matrices represents query, key,
values are scaled with their respective attention weight. The and value vectors for each timestep). We then obtain the
whole process can be written as: attention score using the scaled dot-product, where we used
d k = 64 and set the dimension of the values to be the
QKT same. The attention weights obtained from each head are
attention( Q, K , V ) = softmax √ V then concatenated and multiplied by a learnable matrix to
dk
return to a matrix with dimension T × features. This matrix
where d k is the dimension of query and key vectors. The is then summed with the original inputs (creating a residual
weights are such that, for every query, the values related to the connection), and Layer Normalization [6] is applied. The data
keys with the highest similarity score are given a higher weight in each timestep is passed through a position-wise dense layer4
(i.e., more importance). In other words, the weights are used with ReLu activation. Finally another residual connection with
to give more attention to the values that are more pertinent Layer Normalization is applied to obtain the output of the
to the given query. We talk about self-attention when Query, temporal information extraction block which is then passed
Key, and Value matrices are all referring to items of the same to the feedforward output layer. A scheme of the temporal
sequence. A multi-headed mechanism is such that, for each information extraction block can be found in Fig. 2 (b).
item, different multiple Query, Key, and Value matrices are
created and the attention operator is applied to all of them.
C. Other Architectural Variants
The outputs of all the heads are then combined together. We now present two variants of T R AS EN D where we
2) Architecture: T R AS EN D follows the feature extraction replace the purely attention based temporal information extrac-
procedure and the feed-forward output layer of DeepSense, tion block, with other (simpler, but more advanced than regular
but completely replaces the recurrent layers. In fact, we only RNNs) techniques to capture temporal dependencies in the
use attention to extract temporal dependencies in the data, input.
with a temporal information extractor layer inspired by the a) T R AS EN D-BD: The first variant substitutes the pure
Transformer [51]. In more detail, we create a temporal infor- attention temporal block with a bidirectional-RNN (BRNN)
mation extractor using a 8-headed self-attention mechanism. [43]. A BRNN generalizes the concept of RNNs by connecting
To pass the data to the temporal layer, we reshape the output two hidden layers of opposite directions to the same output (we
of the merge convolutional subnetwork to have dimension continue using GRUs as forward and backward hidden layers).
T × features (where features depends from the size and the This allows the network to get information from past and
number of filters in the merge convolutional subnetwork). The future inputs simultaneously. At each timestep we now get the
features at different timesteps will be the input of the self- state of both forward and backward cells, so we concatenate
attention mechanism. Every sublayer of the temporal block 4 The same feedforward network is used for each timestep. It is equivalent
has output with size T ×features to allow residual connections. to a one-dimensional convolutional layer over timesteps with kernel size 1.

them, and finally take the average of the concatenated outputs TABLE I
at each timestep and pass them to the output layer. S UMMARY OF THE M ULTI -M ODAL HAR D ATASETS
U SED FOR O UR T ESTS
b) T R AS EN D-CA: Inspired by the work by Xu et al. [54],
we use a GRU layer (we keep it with 120 cells) with an
attention mechanism over the output features of the merge con-
volutional subnetwork. We first average the features extracted
from the first τ -length interval (first timestep) and pass it
through a dense layer to obtain the initial state for the GRU
layer. We then use the following attention mechanism: at each the survey by Wang et al. [53]: we consider the datasets that
timestep, we pass the features extracted by the CNN layers have data from at least 9 subjects (to better test generalization
and the current state of the GRU through two different dense properties), with at least 2 different sensing modalities (to test
layers without applying any activation function. We then sum the various methods on multimodal data), and then take the
the two outputs and apply tanh before passing it to softmax datasets with the largest number of samples. A summary of
to obtain the attention weights. Finally, the features are scaled the chosen datasets can be found in Table I.
c) HHAR [47]: The Heterogeneity Activity Recognition
with their attention weights. The sum of the scaled feature
vectors forms the context vector which is then concatenated Data Set contains data from accelerometer and gyroscope of
to the original features for the current timestep and passed as 12 different devices (8 smartphones and 4 smartwatches) used
input to the GRU. A scheme of this attention mechanism can by 9 different subjects while performing 6 activities. We only
be found in Fig. 2 (c). The rest of the architecture remains considered data coming from smartphones.
d) PAMAP2 [37], [38]: The Physical Activity Monitoring
unchanged.
dataset contains data of 12 different physical activities, per-
D. Transfer Learning Personalization formed by 9 subjects wearing 3 inertial measurement units
and a heart rate monitor. We only considered data coming from
To make the system capable of adapting to a specific user
the inertial measurement units (IMU), which were positioned
over time, we propose a simple transfer learning strategy
in three different body areas (hand, chest, ankle) during the
(Figure 2 (a)). Transfer learning is a method where a model
measurements. From each IMU we considered data measured
developed for a task is reused as the starting point to learn a
by the first accelerometer, the gyroscope and the magnetome-
model on a second task. The typical scenario in a transfer
ter. This provides a scenario with data coming from 9 input
learning setting is to have a trained base network, which
sensors.
is repurposed by training on a target dataset. The idea is
e) USC-HAD [59]: The University of Southern California
that the pre-trained weights in the base network can ease
Human Activity Dataset uses high precision specialised hard-
the training on the target dataset. We slightly depart from
ware, and has a focus on the diversity of subjects, balancing
this scenario by extracting the output layer from a trained
the participants based on gender, age, height and weight. The
T R AS EN D model (and other proposed variants); that is, we
dataset contains measurements from accelerometer and gyro-
are using transfer learning only on the output layer. More in
scope obtained from 14 different subjects while performing 12
detail, the data coming from the sensors will be passed to the
activities.
T R AS EN D architecture, up to the end of the temporal layer.
The output layer becomes a separate network that receives
the output of the temporal layer as input, and will be trained B. Baselines
with the data generated by the user. This can be implemented We choose an extensive collection of deep learning, and
in a practical scenario by first using a model trained on non-deep learning methods to compare to T R AS EN D and its
one of the datasets, and after each prediction, asking the variants. For all considered models, we use the implementation
user to manually insert the activity he was performing. We provided by the authors when available. Unless otherwise
then use these new data samples to retrain only the output specified we use the model hyperparameters defined by the
layer, which is a single layer dense network that can easily authors.
be trained on-device. This procedure allows the architecture f) Deep Learning Baselines: We test our algorithm against
to take advantage of the complex general feature extracting all the DeepSense-based architectures, and additional deep
mechanism that reduces multimodal time series to a fixed learning techniques. In particular for the DeepSense-based
size vector, and to successively learn user-specific feature architectures we test against the original DeepSense [55],
characteristics. and the two latest attention enhanced versions: SADeepSense
[56], and AttnSense [28]. We then consider DeepConvLSTM
VI. E XPERIMENTAL E VALUATION [33] which is a CNN+LSTM approach, and its new attentive
We present here the datasets and the procedure used to version proposed in [31] that we call DeepConvLSTM-Att.
evaluate the performance of T R AS EN D, and the effectiveness All the attention models considered thus far add an attention
of the proposed personalization process. module to a RNN layer, while we remember that our algorithm
T R AS EN D completely removes RNNs in favour of a purely
A. Datasets attention-based temporal information extraction technique. We
We present below the three HAR datasets used in our tests. also provide some results for a basic LSTM based architecture
Our choices were based on the statistics shown in Table III of (we implement it with 2 LSTM layers, each with 256 cells,

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.
13480 IEEE SENSORS JOURNAL, VOL. 21, NO. 12, JUNE 15, 2021

TABLE II
F1 S CORE R ESULTS ON D IFFERENT HAR D ATASETS

Fig. 3. Personalization increases the average True Positive rate of the

followed by a fully connected layer that outputs the predicted deep learning models.
class). Finally, to take into consideration also other deep
learning techniques we consider MultiRBM [35], where a evaluation procedure on the HHAR dataset, with learning rate
Restricted Boltzman Machine (RBM) is used for each sensor, ∈ {10−2 , 10−3 , 10−4 }. We then considered the setting that gave
and a single final RBM is used to then merge all the outputs the highest F1 score on the user’s data and used it for all
for the sensors and obtain the predicted class. datasets (no optimization for each different dataset). In the
g) Non-Deep Learning Baselines: As non-deep learning training procedure we trained for 30 epochs for each user and
baselines we considered a Random Forest (RF) classifier (one took the model of the epoch with the highest performance.
of the most used and most effective shallow classifiers for All T R AS EN D based models were trained using the Adam
HAR [47]) on the same raw frequency domain features fed Optimizer [22]. The other methods were trained with the
to the deep learning approaches (denoted with RF-FF), and optimization technique suggested by the authors. For the
then on the most used handcrafted frequency domain features personalization process, we retrain the output layer for 1 epoch
(DC Component, Spectral Energy, and Information Entropy; (per each new data point separately) with TensorFlow’s default
denoted with RF-HC). Adam optimizer parameters: α = 0.001, β1 = 0.5, β2 = 0.9,
and = 10−8 .
C. Experimental Setup
D. Results
For all tests we performed leave-one-user-out cross valida-
tion: we train on data from all subjects except one, and we Table II summarizes the F1 score results for T R AS EN D and
use the data from the excluded subject as test set. We perform the other methods we considered, on the three datasets. We
this procedure for each subject and then average the results. can observe that T R AS EN D and its variants present higher
This validation procedure follows the common practices in F1 score than DeepSense on all the three datasets. Further-
the field, and ensures that the model is not overfitting to the more we notice that T R AS EN D always achieves the highest
training data. performance with a big margin. In fact, T R AS EN D shows an
To evaluate the personalization process we divide the data F1 score that is, on average, 7% higher than the previous best
of each activity of the excluded user into two equal time- performing model. These results confirm that our attention-
contiguous parts. One part is used to personalize the output based technique (without RNNs) is highly capable of extract-
layer after the model has been learned on all other users, and ing temporal dependencies. Most notably we can see that
the other is used as test set. We also make sure to feed the data, T R AS EN D significantly outperforms the newer SADeepSense
both for training and validation, in time-contiguous samples and AttnSense, whose performance are comparable to the ones
(simulating the real-world personalization procedure described of T R AS EN D-BD and T R AS EN D-CA, which are far from
in Section V-D). T R AS EN D’s. In Figure 3 we show how the average True
Due to the imbalance in the number of samples per-class we Positive rate is affected by the personalization process on the
use the F1 score as the measure to quantify the performance of HHAR dataset. We notice an average 5% increase, further
the models. All T R AS EN D models were implemented5 using confirming the ability to adapt to a specific user. We remark
TensorFlow [1]. that all the results come from a cross validation procedure
To ensure a fair comparison and to avoid “hyperpara- where the test data is coming from a user that was not seen
meter hacking” we kept all the values for the architecture during training, hence showing that the model is not simply
hyperparameters (filters size, dropout probability, number of overfitting the training data.
filters, number of GRU units, etc.: see Section V.) equal for Table III presents the F1 score of the DeepSense-based
each DeepSense-based model. Furthermore, for all models, models when evaluated on the datasets with and without
the only optimized hyperparameter was the learning rate. applying personalization. The results confirm the effectiveness
To do so we took out 1 user and tried the training and of our transfer learning personalization process giving an
average ≈ 6.2% increase on the F1 score independently of
5 Code is available at: https://github.com/DavideBuffelli/TrASenD dataset and base architecture.

TABLE III
F1 S CORE R ESULTS OF THE D EEP L EARNING M ODELS W ITH (P) AND
W ITHOUT (NP) P ERSONALIZATION

TABLE IV
P ERFORMANCE ON HHAR W ITH (A) AND W ITHOUT (NA) Fig. 4. Performance of the deep learning models on HHAR when trained
D ATA AUGMENTATION with different number of augmented samples.

result, confirming their superior generalization properties, with

T R AS EN D achieving a higher overall F1 score.

VII. C ONCLUSION
In this paper we presented T R AS EN D, a new deep learning
framework for multimodal time series, and also proposed a
transfer learning procedure to personalize the model to a spe-
cific user for the human activity recognition tasks. T R AS EN D
These results confirm that restricting the transfer learning
is designed to improve the extraction of temporal dependen-
to the last layer of the network allows the model to retain the
cies in the data by replacing RNNs with a purely attention
generalization capabilities in the extraction of useful feature
based temporal information extraction block. Our extensive
(hence confirming the robustness to overfitting), while allow-
experimental evaluation shows that T R AS EN D significantly
ing the last layer to adapt to a specific user.
outperforms the state-of-the-art and that, in general, replac-
1) Validating the Personalization Process: To prove that the
ing RNNs with attention-based strategies leads to significant
training of the output layer alone can significantly impact on
improvements. In particular, we obtain an average increment
the performance of the network we first train the full model of
of more than 7% on the F1 score over the previous best
Section V-A on the HHAR dataset with randomly permuted
performing model. We also show the effectiveness of our
labels, and then we perform the personalization process on
simple personalization process, which is capable of an average
correctly labeled data. The resulting F1 scores (on the test
6% increment on the F1 score on data from a specific user,
set) are 0.166 and 0.523, respectively. We can notice that the
and the impact of data augmentation.
model trained on data with randomly permuted labels has the
The personalization procedure we propose may impact the
performance of a uniform random classifier, as one would
user experience while using an application that implements
expect, and the personalization process is capable of signif-
our technique. In fact, asking too many times for feedback
icantly boosting the performance of the model. This result
about the model’s predictions may not be feasible. Future
shows that in fact the re-training of the output layer alone can
research directions include the optimization of the personal-
largely affect the outcome of the model.
ization process to minimize the feedback required from the
2) Impact of Data Augmentation: To assess the benefits of the user, for example by using data augmentation or curriculum
data augmentation procedure, we evaluate all the deep learning training techniques [8].
models based on the DeepSense framework on HHAR with
and without augmented data. The results, shown in Table IV, R EFERENCES
confirm that data augmentation is important to train a model
[1] M. Abadi et al., “TensorFlow: A system for large-scale machine learn-
that is more robust to noise, and in fact we can see a significant ing,” in Proc. 12th USENIX Symp. Operating Syst. Design Implement.,
increase in the F1 score. Fig. 4 shows how the performance 2016, pp. 265–283.
of the analyzed DeepSense variants change when trained with [2] S. H. Ahammad, V. Rajesh, M. Z. U. Rahman, and A. Lay-Ekuakille,
“A hybrid CNN-based segmentation and boosting classifier for real
different number of augmented samples. It’s interesting to time sensor spinal cord injury data,” IEEE Sensors J., vol. 20, no. 17,
see that using 4 augmented samples for each real sample, pp. 10092–10101, Sep. 2020.
already provides an important performance gain. We also [3] B. Almaslukh, J. Almuhtadi, and A. Artoli, “An effective deep autoen-
coder approach for online smartphone-based human activity recogni-
notice that T R AS EN D is always superior to the other archi- tion,” Int. J. Comput. Sci. Netw. Secur., vol. 17, no. 4, pp. 160–165,
tectures, and performs significantly better than the others even 2017.
when trained without augmented samples. Furthermore, we see [4] S. Ashry, T. Ogawa, and W. Gomaa, “CHARM-deep: Continuous human
activity recognition model based on deep neural network using IMU
that SADeepSense and T R AS EN D are the two architectures sensors of smartwatch,” IEEE Sensors J., vol. 20, no. 15, pp. 8757–8770,
showing the smallest gap between highest and lowest F1 score Aug. 2020.

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.
13482 IEEE SENSORS JOURNAL, VOL. 21, NO. 12, JUNE 15, 2021

[5] Y. Asim, M. A. Azam, M. Ehatisham-ul-Haq, U. Naeem, and [28] H. Ma, W. Li, X. Zhang, S. Gao, and S. Lu, “AttnSense: Multi-
A. Khalid, “Context-aware human activity recognition (CAHAR) in-the- level attention mechanism for multimodal human activity recog-
Wild using smartphone accelerometer,” IEEE Sensors J., vol. 20, no. 8, nition,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019,
pp. 4361–4371, Apr. 2020. pp. 3109–3115.
[6] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016, [29] S. Matsui, N. Inoue, Y. Akagi, G. Nagino, and K. Shinoda, “User
arXiv:1607.06450. [Online]. Available: http://arxiv.org/abs/1607.06450 adaptation of convolutional neural network for human activity recogni-
[7] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by tion,” in Proc. 25th Eur. Signal Process. Conf. (EUSIPCO), Aug. 2017,
jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn. pp. 753–757.
Represent., (ICLR), San Diego, CA, USA, May 2015, pp. 265–283. [30] S. C. Mukhopadhyay, “Wearable sensors for human activity moni-
[8] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum toring: A review,” IEEE Sensors J., vol. 15, no. 3, pp. 1321–1330,
learning,” Proc. 26th Annu. Int. Conf. Mach. Learn. (ICML), 2009, Mar. 2015.
pp. 41–48. [31] V. S. Murahari and T. Plötz, “On attention models for human activity
[9] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, and recognition,” in Proc. ACM Int. Symp. Wearable Comput., Oct. 2018,
I. De Munari, “IoT wearable sensor and deep learning: An integrated pp. 100–103.
approach for personalized human activity recognition in a smart home [32] H. F. Nweke, Y. W. Teh, M. A. Al-garadi, and U. R. Alo, “Deep learning
environment,” IEEE Internet Things J., vol. 6, no. 5, pp. 8553–8562, algorithms for human activity recognition using mobile and wearable
Oct. 2019. sensor networks: State of the art and research challenges,” Expert Syst.
[10] S. Chaudhari, V. Mithal, G. Polatkan, and R. Ramanath, “An attentive Appl., vol. 105, pp. 233–261, Sep. 2018.
survey of attention models,” 2019, arXiv:1904.02874. [Online]. Avail- [33] F. Ordóñez and D. Roggen, “Deep convolutional and LSTM recurrent
able: http://arxiv.org/abs/1904.02874 neural networks for multimodal wearable activity recognition,” Sensors,
[11] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the vol. 16, no. 1, p. 115, Jan. 2016.
properties of neural machine translation: Encoder–Decoder approaches,” [34] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolutional
in Proc. 8th Workshop Syntax, Semantics Struct. Stat. Transl. (SSST), neural networks and learning ECG features for screening paroxysmal
2014, pp. 112–176. atrial fibrillation patients,” IEEE Trans. Syst., Man, Cybern. Syst.,
[12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of vol. 48, no. 12, pp. 2095–2104, Dec. 2018.
gated recurrent neural networks on sequence modeling,” in Proc. NIPS [35] V. Radu, N. D. Lane, S. Bhattacharya, C. Mascolo, M. K. Marina, and
Workshop Deep Learn., 2014, pp. 2–10. F. Kawsar, “Towards multimodal deep learning for activity recognition
[13] T. Cooijmans, N. Ballas, C. Laurent, and C. A. Courville, “Recurrent on mobile devices,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous
batch normalization,” in Proc. Int. Conf. Learn. Represent., 2017, Comput., Adjunct, Sep. 2016, pp. 185–188.
pp. 1–13. [36] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “Deep learning for human
[14] D. Cook, K. D. Feuz, and N. C. Krishnan, “Transfer learning for activity activity recognition: A resource efficient implementation on low-power
recognition: A survey,” Knowl. Inf. Syst., vol. 36, no. 3, pp. 537–556, devices,” in Proc. IEEE 13th Int. Conf. Wearable Implant. Body Sensor
Jun. 2013. Netw. (BSN), Jun. 2016, pp. 71–76.
[15] D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing [37] A. Reiss and D. Stricker, “Creating and benchmarking a new dataset for
techniques for context recognition from accelerometer data,” Pers. physical activity monitoring,” in Proc. 5th Int. Conf. Pervasive Technol.
Ubiquitous Comput., vol. 14, no. 7, pp. 645–662, Oct. 2010. Rel. Assistive Environ. PETRA, 2012, pp. 1–8
[16] Y. Guan and T. Plötz, “Ensembles of deep LSTM learners for activity [38] A. Reiss and D. Stricker, “Introducing a new benchmarked dataset
recognition using wearables,” Proc. ACM Interact., Mobile, Wearable for activity monitoring,” in Proc. 16th Int. Symp. Wearable Comput.,
Ubiquitous Technol., vol. 1, no. 2, pp. 1–28, Jun. 2017. Jun. 2012, pp. 108–109.
[17] Y. N. Hammerla, J. Fisher, P. Andras, L. Rochester, R. Walker, and [39] S. Richoz, L. Wang, P. Birch, and D. Roggen, “Transporta-
T. Plötz, “PD disease state assessment in naturalistic environments using tion mode recognition fusing wearable motion, sound and vision
deep learning,” in Proc. AAAI, 2015, pp. 1–7. sensors,” IEEE Sensors J., vol. 20, no. 16, pp. 9314–9328,
[18] Y. Nils Hammerla, S. Halloran, and T. Plötz, “Deep, convolutional, and Aug. 2020.
recurrent models for human activity recognition using wearables,” in [40] S. A. Rokni, M. Nourollahi, and H. Ghasemzadeh, “Personalized human
Proc. IJCAI, 2016, pp. 1–8. activity recognition using convolutional neural networks,” in Proc. AAAI,
[19] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient 2018, pp. 1–3.
flow in recurrent nets: The difficulty of learning long-term depen- [41] R. Saeedi, K. Sasani, S. Norgaard, and A. H. Gebremedhin, “Personal-
dencies,” in A Field Guide to Dynamical Recurrent Neural Networks, ized human activity recognition using wearables: A manifold learning-
S. C. Kremer and J. F. Kolen, Eds. Piscataway, NJ, USA: IEEE Press, based knowledge transfer,” in Proc. 40th Annu. Int. Conf. IEEE Eng.
2001. Med. Biol. Soc. (EMBC), Jul. 2018, pp. 1193–1196.
[20] M. Inoue, S. Inoue, and T. Nishida, “Deep recurrent neural network [42] A. Sathyanarayana et al., “Impact of physical activity on sleep: A
for mobile human activity recognition with high throughput,” Artif. Life deep learning based exploration,” 2016, arXiv:1607.07034. [Online].
Robot., vol. 23, no. 2, pp. 173–185, Dec. 2017. Available: https://arxiv.org/abs/1607.07034
[21] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep [43] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-
network training by reducing internal covariate shift,” in Proc. Int. Conf. works,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681,
Mach. Learn., vol. 37, Jul. 2015, pp. 448–456. Nov. 1997.
[22] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” [44] P. Siirtola, H. Koskimäki, and J. Röning, “Personalizing human activity
in Proc. Int. Conf. Learn. Represent., Dec. 2014, pp. 1–15. recognition models using incremental learning,” in Proc. 26th Eur.
[23] H. Li, A. Shrestha, H. Heidari, J. Le Kernec, and F. Fioranelli, “Bi- Symp. Artif. Neural Netw., Comput. Intell. Mach. Learn., Apr. 2018,
LSTM network for multimodal continuous human activity recognition pp. 1–6.
and fall detection,” IEEE Sensors J., vol. 20, no. 3, pp. 1191–1201, [45] M. S. Singh, V. Pondenkandath, B. Zhou, P. Lukowicz, and M. Liwickit,
Feb. 2020. “Transforming sensor data to the image domain for deep learning—An
[24] X. Li, Y. Zhang, M. Li, I. Marsic, J. Yang, and R. S. Burd, application to footstep detection,” in Proc. Int. Joint Conf. Neural Netw.
“Deep neural network for RFID-based activity recognition,” in Proc. (IJCNN), May 2017, pp. 2665–2672.
8th Wireless Students, Students, Students Workshop, Oct. 2016, [46] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
pp. 24–26. R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
[25] X. Li, Y. Zhang, I. Marsic, A. Sarcevic, and R. S. Burd, from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
“Deep learning for RFID-based activity recognition,” in Proc. 14th 2014.
ACM Conf. Embedded Netw. Sensor Syst. (CD-ROM), Nov. 2016, [47] A. Stisen et al., “Smart devices are different: Assessing and Mit-
pp. 164–175. igatingMobile sensing heterogeneities for activity recognition,” in
[26] A. P. Lopes, E. Santos, E. Valle, J. Almeida, and A. Araujo, “Transfer Proc. 13th ACM Conf. Embedded Netw. Sensor Syst., Nov. 2015,
learning for human action recognition,” in Proc. 24th SIBGRAPI Conf. pp. 127–140.
Graph., Patterns Images, Aug. 2011, pp. 352–359. [48] Q. Teng, K. Wang, L. Zhang, and J. He, “The layer-wise training
[27] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to convolutional neural networks using local loss for sensor-based human
attention-based neural machine translation,” in Proc. Conf. Empirical activity recognition,” IEEE Sensors J., vol. 20, no. 13, pp. 7265–7274,
Methods Natural Lang. Process., 2015, pp. 1412–1421. Jul. 2020.

[49] M. Toshevska and S. Kalajdziski, “Exploring the attention mechanism Davide Buffelli was born in Verona, Italy, in
in deep models: A case study on sentiment analysis,” in Proc. Int. Conf. 1994. He received the B.S. degree in informa-
ICT Innov., 2019, pp. 202–211. tion engineering and the M.S. degree in com-
[50] N. Tufek, M. Yalcin, M. Altintas, F. Kalaoglu, Y. Li, and S. K. Bahadir, puter engineering from the University of Padova,
“Human action recognition using deep learning methods on limited sen- Padova, Italy, in 2016 and 2019, respectively,
sory data,” IEEE Sensors J., vol. 20, no. 6, pp. 3101–3112, Mar. 2020. where he is currently pursuing the Ph.D. degree
[51] A. Vaswani et al., “Attention is all you need,” in Proc. NIPS, 2017, in information engineering.
pp. 1–15. From June to December 2018, he was a Data
[52] A. Wang, G. Chen, C. Shang, M. Zhang, and L. Liu, “Human activity Science Intern with Philips Digital and Computa-
recognition in a smart home environment with stacked denoising autoen- tional Pathology. From April 2019 to September
coders,” in Web-Age Information Management. Cham, Switzerland: 2019, he was a Graduate Research Fellow with
Springer, 2016, pp. 29–40. the University of Padova. His research interests lie in the area of
[53] J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, “Deep learning for sensor- deep learning, with a focus on techniques for temporal data and graph
based activity recognition: A survey,” Pattern Recognit. Lett., vol. 119, structured data.
pp. 3–11, Mar. 2019.
[54] K. Xu et al., “Show, attend and tell: Neural image caption generation
with visual attention,” in Proc. 32nd Int. Conf. Mach. Learn., vol. 37.
Lille, France, Jul. 2015, pp. 2048–2057. Fabio Vandin was born in Soave, Italy, in 1982.
[55] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “DeepSense: A He received the B.S. and M.S. degrees in com-
unified deep learning framework for time-series mobile sensing data puter engineering, and the Ph.D. degree in
processing,” in Proc. 26th Int. Conf. World Wide Web, Apr. 2017, information engineering from the University of
pp. 351–360. Padova, Italy, in 2004, 2006, and 2010, respec-
[56] S. Yao et al., “SADeepSense: Self-attention deep learning framework tively.
for heterogeneous on-device sensors in Internet of Things applica- In 2016, he was a Research Fellow with the
tions,” in Proc. IEEE INFOCOM Conf. Comput. Commun., Apr. 2019, Simons Institute for the Theory of Computing,
pp. 1243–1251. UC Berkeley, USA. He has been an Assistant
[57] S. Yao et al., “Deep learning for the Internet of Things,” Computer, Professor Research with Brown University, RI,
vol. 51, no. 5, pp. 32–41, May 2018. USA; an Assistant Professor with the University
[58] X. Yao, X. Shi, and F. Zhou, “Human activities classification based on of Southern Denmark, Odense, Denmark; and an Associate Professor
complex-value convolutional neural network,” IEEE Sensors J., vol. 20, with the University of Padova. Since 2020, he has been a Professor
no. 13, pp. 7169–7180, Jul. 2020. with the Department of Information Engineering, University of Padova.
[59] M. Zhang and A. A. Sawchuk, “USC-HAD: A daily activ- He has authored more than 60 papers in international peer-reviewed
ity dataset for ubiquitous activity recognition using wearable sen- conferences and journals. His main research interests are in the area
sors,” in Proc. ACM Conf. Ubiquitous Comput. UbiComp, 2012, of algorithms for data mining and machine learning and applications to
pp. 1036–1043. biomedicine, molecular biology, and e-health.

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 25,2021 at 01:06:24 UTC from IEEE Xplore. Restrictions apply.

Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors
No ratings yet
Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors
10 pages
1 s2.0 S2667096821000392 Main
No ratings yet
1 s2.0 S2667096821000392 Main
18 pages
Human Activity Reco
No ratings yet
Human Activity Reco
17 pages
Basic Activity Recognition From Wearable
No ratings yet
Basic Activity Recognition From Wearable
20 pages
SLR Zainab Saba
No ratings yet
SLR Zainab Saba
21 pages
A New Framework For Smartphone Sensor Based Human Activity Recognition Using Graph Neural Network
No ratings yet
A New Framework For Smartphone Sensor Based Human Activity Recognition Using Graph Neural Network
8 pages
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
No ratings yet
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
9 pages
Research Highlights (Required) : /item
No ratings yet
Research Highlights (Required) : /item
11 pages
Sensors: Deep Convolutional and LSTM Recurrent Neural Networks For Multimodal Wearable Activity Recognition
No ratings yet
Sensors: Deep Convolutional and LSTM Recurrent Neural Networks For Multimodal Wearable Activity Recognition
25 pages
Smartwatch-Based Human Activity Recognition Using Hybrid LSTM Network
No ratings yet
Smartwatch-Based Human Activity Recognition Using Hybrid LSTM Network
4 pages
1717 - Final - Paper 2
No ratings yet
1717 - Final - Paper 2
5 pages
HumanActivity Recognition Deep Learning
No ratings yet
HumanActivity Recognition Deep Learning
6 pages
Deep Learning For Sensor-Based Activity Recognition: A Survey
No ratings yet
Deep Learning For Sensor-Based Activity Recognition: A Survey
10 pages
Multi-STMT Multi-Level Network For Human Activity Recognition Based On Wearable Sensors
No ratings yet
Multi-STMT Multi-Level Network For Human Activity Recognition Based On Wearable Sensors
12 pages
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
No ratings yet
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
25 pages
Multi Input CNN GRU Based Human Activity Recognition
No ratings yet
Multi Input CNN GRU Based Human Activity Recognition
18 pages
Enhanced Human Activity Recognition in Medical Emergencies Using A Hybrid Deep CNN and Bi-Directional LSTM Model With Wearable Sensors
No ratings yet
Enhanced Human Activity Recognition in Medical Emergencies Using A Hybrid Deep CNN and Bi-Directional LSTM Model With Wearable Sensors
24 pages
Optimizing Physical Activity Recognition Using LSTM Network
No ratings yet
Optimizing Physical Activity Recognition Using LSTM Network
14 pages
Sensors 22 01476 v2
No ratings yet
Sensors 22 01476 v2
43 pages
Deep Learning Models For Real-Time Human Activity Recognition
No ratings yet
Deep Learning Models For Real-Time Human Activity Recognition
13 pages
LSTM-CNN Architecture For Human Activity Recognition
No ratings yet
LSTM-CNN Architecture For Human Activity Recognition
12 pages
Dilated Causal Convolution With Multi-Head Self at
No ratings yet
Dilated Causal Convolution With Multi-Head Self at
19 pages
Informatics 09 00056
No ratings yet
Informatics 09 00056
13 pages
Batch 7
No ratings yet
Batch 7
21 pages
DEEP-LEARNING-ENHANCED HUMAN ACTIVITY RECOGNITION FOR INTERNET OF HEALTHCARE THINGS Zhou Et Al 2020
No ratings yet
DEEP-LEARNING-ENHANCED HUMAN ACTIVITY RECOGNITION FOR INTERNET OF HEALTHCARE THINGS Zhou Et Al 2020
10 pages
TII Deep Learning PA Accepted
No ratings yet
TII Deep Learning PA Accepted
12 pages
Deep, Convolutional, and Recurrent Models For Human Activity Recognition Using Wearables
No ratings yet
Deep, Convolutional, and Recurrent Models For Human Activity Recognition Using Wearables
7 pages
Deep Learning Enhanced Human Activity Recognition For Internet of Healthcare Things
No ratings yet
Deep Learning Enhanced Human Activity Recognition For Internet of Healthcare Things
10 pages
Ensembled Transfer Learning Based Multichannel Attention Networks For Human Activity Recognition in Still Images
No ratings yet
Ensembled Transfer Learning Based Multichannel Attention Networks For Human Activity Recognition in Still Images
12 pages
A Human Activity Recognition Method Based On Lightweight Feature Extraction Combined With Pruned and Quantized CNN For Wearable Device
No ratings yet
A Human Activity Recognition Method Based On Lightweight Feature Extraction Combined With Pruned and Quantized CNN For Wearable Device
14 pages
Deep2019 3
No ratings yet
Deep2019 3
6 pages
Ensembles of Deep LSTM Learners For Activity Recognition Using Wearables
No ratings yet
Ensembles of Deep LSTM Learners For Activity Recognition Using Wearables
28 pages
1 s2.0 S0167739X22003089 Main
No ratings yet
1 s2.0 S0167739X22003089 Main
14 pages
Application of HAR - A Review
No ratings yet
Application of HAR - A Review
30 pages
Human Activity Recognition Using Machine Learning: Bachelor of Technology
No ratings yet
Human Activity Recognition Using Machine Learning: Bachelor of Technology
19 pages
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
No ratings yet
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
10 pages
Electronics 10030308
No ratings yet
Electronics 10030308
21 pages
Human Activity Recognition Using CNN & LSTM: A. WISDM Dataset
No ratings yet
Human Activity Recognition Using CNN & LSTM: A. WISDM Dataset
6 pages
Presented By: Dewan Imdadul Islam
No ratings yet
Presented By: Dewan Imdadul Islam
13 pages
A Hybrid Deep Approach To Recognizing Student Activity and Monitoring Health Physique Based On Accelerometer Data From Smartphones
No ratings yet
A Hybrid Deep Approach To Recognizing Student Activity and Monitoring Health Physique Based On Accelerometer Data From Smartphones
18 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
HARcnn
No ratings yet
HARcnn
7 pages
Chapter 2
No ratings yet
Chapter 2
3 pages
Foul Legacy
No ratings yet
Foul Legacy
17 pages
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
No ratings yet
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
30 pages
GRU-based Attention Mechanism For Human Activity Recognition
No ratings yet
GRU-based Attention Mechanism For Human Activity Recognition
6 pages
Intelligent Recognition of Multimodal Human Activities For Personal Healthcare
No ratings yet
Intelligent Recognition of Multimodal Human Activities For Personal Healthcare
11 pages
Deep2019 2
No ratings yet
Deep2019 2
4 pages
Harmonic Loss Function For Sensor-Based Human Acti
No ratings yet
Harmonic Loss Function For Sensor-Based Human Acti
11 pages
International Journal of Cognitive Computing in Engineering
No ratings yet
International Journal of Cognitive Computing in Engineering
10 pages
Human Activity Detection Using Deep - 2-1
No ratings yet
Human Activity Detection Using Deep - 2-1
8 pages
Fpga Paper For Human Activity Recognition
No ratings yet
Fpga Paper For Human Activity Recognition
11 pages
Es2013 11 PDF
No ratings yet
Es2013 11 PDF
10 pages
HUMAN ACTIVITY RECOGNITION - IEEE ConferencePaper
No ratings yet
HUMAN ACTIVITY RECOGNITION - IEEE ConferencePaper
8 pages
Hassan 2018
No ratings yet
Hassan 2018
7 pages
Sensors 22 06018
No ratings yet
Sensors 22 06018
19 pages
An Active Semi-Supervised Deep Learning Model For Human Activity Recognition
No ratings yet
An Active Semi-Supervised Deep Learning Model For Human Activity Recognition
17 pages
Transfer Learning Enhanced Vision-Based Human Activity Recognition
No ratings yet
Transfer Learning Enhanced Vision-Based Human Activity Recognition
15 pages
Detection of Abnormal Human Behavior Using Deep Learning
No ratings yet
Detection of Abnormal Human Behavior Using Deep Learning
10 pages
REFERENCES
No ratings yet
REFERENCES
3 pages
Computer Organization& Architercture-Syllabus
No ratings yet
Computer Organization& Architercture-Syllabus
3 pages
CD Jntuk
No ratings yet
CD Jntuk
2 pages
Ethical Hacking Syllabus
No ratings yet
Ethical Hacking Syllabus
2 pages
Soft Computing Techniques (PE)
No ratings yet
Soft Computing Techniques (PE)
2 pages
Document (AutoRecovered)
No ratings yet
Document (AutoRecovered)
38 pages
Introduction To C# - Unit 1
No ratings yet
Introduction To C# - Unit 1
28 pages
Fsad LM 3
No ratings yet
Fsad LM 3
56 pages
XG Boost
No ratings yet
XG Boost
5 pages
VinayKumar - Resume - QA Lead - 14yrs
No ratings yet
VinayKumar - Resume - QA Lead - 14yrs
4 pages
5710 NM Tutorial 2
No ratings yet
5710 NM Tutorial 2
8 pages
Purchase Order Management
No ratings yet
Purchase Order Management
51 pages
06 Finite Elements Catalogs Options
No ratings yet
06 Finite Elements Catalogs Options
28 pages
Computer Networks: 7 Application
No ratings yet
Computer Networks: 7 Application
46 pages
Lesson 1 - Introduction To ICT
No ratings yet
Lesson 1 - Introduction To ICT
52 pages
Algo Trading
No ratings yet
Algo Trading
64 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
UK Tuberculosis Detection Programme
No ratings yet
UK Tuberculosis Detection Programme
1 page
Upload A Document - Scribd
No ratings yet
Upload A Document - Scribd
4 pages
ETA Floor Mounted PKG Unit
No ratings yet
ETA Floor Mounted PKG Unit
16 pages
Accomplishment Report Format
No ratings yet
Accomplishment Report Format
6 pages
MECHANICAL PADS - CIVIL - Construction - Division 23
No ratings yet
MECHANICAL PADS - CIVIL - Construction - Division 23
23 pages
Practice English Literacy Questions With Answer Keys and Discussion
No ratings yet
Practice English Literacy Questions With Answer Keys and Discussion
10 pages
Records: Archives: Management: Preservation
No ratings yet
Records: Archives: Management: Preservation
113 pages
Levels of Programming Languages
No ratings yet
Levels of Programming Languages
4 pages
Megaproject
No ratings yet
Megaproject
6 pages
Buildroot Image With Qt5 OPEGN GLS 2.0 Mesa VC4 Driver in 32 Bit
No ratings yet
Buildroot Image With Qt5 OPEGN GLS 2.0 Mesa VC4 Driver in 32 Bit
22 pages
Protolabs Investor Presentation - November
No ratings yet
Protolabs Investor Presentation - November
22 pages
Firewall Forward Info
No ratings yet
Firewall Forward Info
2 pages
What Are The Benefits of A User Manual
No ratings yet
What Are The Benefits of A User Manual
3 pages
New Product Forecasting Using Structure Analogies
No ratings yet
New Product Forecasting Using Structure Analogies
11 pages
DTC B1615/14 Front Airbag Sensor LH Circuit Malfunction: Description
No ratings yet
DTC B1615/14 Front Airbag Sensor LH Circuit Malfunction: Description
2 pages
The Grapevine, April 2, 2014
No ratings yet
The Grapevine, April 2, 2014
28 pages
Unit I Iot
No ratings yet
Unit I Iot
4 pages
Neural Network Based DPD
No ratings yet
Neural Network Based DPD
21 pages
An005 Lua BACNET Client Operations
No ratings yet
An005 Lua BACNET Client Operations
8 pages
1.9.4 Test (TST) - Foundations of Geometry (Test)
No ratings yet
1.9.4 Test (TST) - Foundations of Geometry (Test)
11 pages
Infineon-Presentation 2kW ZVS Demoboard description-AP-v01 00-EN
No ratings yet
Infineon-Presentation 2kW ZVS Demoboard description-AP-v01 00-EN
16 pages

Buffelli 2021

Uploaded by

Buffelli 2021

Uploaded by

13474 IEEE SENSORS JOURNAL, VOL. 21, NO.

12, JUNE 15, 2021

Attention-Based Deep Learning Framework

Abstract —Sensor-based human activity recognition (HAR)

I. I NTRODUCTION based on trial-and-error, require a lot of human effort, and

where N is the number of training examples, C is the number

Fig. 3. Personalization increases the average True Positive rate of the

result, confirming their superior generalization properties, with

You might also like