0% found this document useful (0 votes)
20 views12 pages

Adversarial Attacks On Multivariate Time Series

This paper addresses the vulnerability of multivariate time series classification models to adversarial attacks, a concern that has been largely overlooked in existing research. The authors propose a methodology using an Adversarial Transformation Network (ATN) to generate adversarial samples targeting models like 1-Nearest Neighbor Dynamic Time Warping and Fully Convolutional Networks, demonstrating their susceptibility across multiple datasets. The findings highlight the importance of incorporating adversarial data into training to enhance model robustness in critical applications such as healthcare and security.

Uploaded by

Dr Phil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Adversarial Attacks On Multivariate Time Series

This paper addresses the vulnerability of multivariate time series classification models to adversarial attacks, a concern that has been largely overlooked in existing research. The authors propose a methodology using an Adversarial Transformation Network (ATN) to generate adversarial samples targeting models like 1-Nearest Neighbor Dynamic Time Warping and Fully Convolutional Networks, demonstrating their susceptibility across multiple datasets. The findings highlight the importance of incorporating adversarial data into training to enhance model robustness in critical applications such as healthcare and security.

Uploaded by

Dr Phil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

Adversarial Attacks on Multivariate Time Series


Samuel Harford, Fazle Karim, and Houshang Darabi

Abstract—Classification models for the multivariate time series of computer vision take advantage of the underlying spatial
have gained significant importance in the research community, structure in images. However, studies have shown that computer
but not much research has been done on generating adversarial vision models incorrectly classify images that seem obvious to
samples for these models. Such samples of adversaries could
become a security concern. In this paper, we propose trans- the human eye, this is referred to as an adversarial attack [12].
forming the existing adversarial transformation network (ATN) Complex models can be tricked to incorrectly classify data
on a distilled model to attack various multivariate time series from a wide array of fields using several types of adversarial
arXiv:2004.00410v1 [cs.LG] 31 Mar 2020

classification models. The proposed attack on the classification attacks. This is a serious security issue in machine learning,
model utilizes a distilled model as a surrogate that mimics especially Deep Neural Networks (DNN), which is widely
the behavior of the attacked classical multivariate time series
classification models. The proposed methodology is tested onto used for vision-based tasks where adding minor disruptions
1-Nearest Neighbor Dynamic Time Warping (1-NN DTW) and a or carefully crafted noise to an input image may mislead the
Fully Convolutional Network (FCN), all of which are trained on image classification algorithm to make inaccurate predictions
18 University of East Anglia (UEA) and University of California with a high degree of confidence [13], [14]. Although DNNs
Riverside (UCR) datasets. We show both models were susceptible are state-of-the-art models across several fields for a number
to attacks on all 18 datasets. To the best of our knowledge,
adversarial attacks have only been conducted in the domain of of classification tasks, including time series classification [11],
univariate time series and have not been conducted on multi- [15], [16], these vulnerabilities have a harmful impact on
variate time series. such an attack on time series classification real-world applicability in domains where secure and reliable
models has never been done before. Additionally, we recommend predictions are of paramount importance [17]. Compounding
future researchers that develop time series classification models the severity of this issue, Papernot et al’s work has shown
to incorporating adversarial data samples into their training data
sets to improve resilience on adversarial samples and to consider that it is easy to transfer adversarial attacks on a particular
model robustness as an evaluative metric. classifier of computer vision to other similar classifiers [18].
The focus of attacks has only recently been shifted to time
Index Terms—Multivariate Time Series, Adversarial Machine
Learning, Perturbation Methods, Deep learning series classification models based on deep neural networks and
traditional models [19].
Many adversarial sample creation strategies have been sug-
I. I NTRODUCTION gested to trick various DNN (state-of-the-art computer vision
The past decade has seen numerous areas of research and models) image classification models. Most of these techniques
society impacted by machine learning and deep learning [1]. are targeted at the DNNs gradient information that makes them
These areas include medical imaging [2], speech-recognition vulnerable to these attacks [20]–[22]. Research into generating
[3], and manufacturing systems [4]. With the rise of smart adversarial sample for time series classification models has
sensors, vast scale developments in data collection and storage, been limited to the univariate time series [23]. In speech
ease of data analytics and predictive modeling, multivariate time recognition activities that translate text-to-speech, there is one
series data recieved from collections of sensors can be analyzed major security issue. Carlini and Wagner [24] demonstrate
to identify regular patterns that can be interpreted and exploited. how it is possible to attack text-to-speech classifiers. We also
Many researchers have been interested in the classification provide multiple audio clips where the voice is not correctly
of both univariate [5]–[8] and multivariate time series [9]– identified by a text-to-speech classifier, DeepSpeech. Certain
[11]. Time series classification models are used in healthcare, security concerns may arise in healthcare systems that use
where multiple lead ECG data are used to determine diagnose time series classification algorithms, where it can be fooled
cardiac ischemia, in gesture recognition, where posture-level into misdiagnosing patients that may influence their disease
data is used to classify human actions, and in manufacturing, diagnosis. Algorithms used to detect and monitor seismic
where sensor data is used to identify product detects. The activity in time series classification can be manipulated to
combination of multi-channel sensor data that tracks resources create fear and hysteria in our society. Wearables that use
and safety systems, along with real-time analytics, creates the time series data to classify activity of the wearer can be
possibility of automated responses to undesired operational fooled into convincing the users they are doing other actions.
activities. An effective time series classification model can Most of the current state-of-the-art multivariate time series
capture and generalize patterns of time series signals, so it can classification algorithms are traditional approaches, such as
classify unseen data. Similarly, classification models in the field 1-Nearest Neighbor Dynamic Time Warping (1-NN DTW) [25],
WEASEL+MUSE [10] and Hidden-Unit Logistic Model [9].
S. Harford, F. Karim and H. Darabi are with the Department of Mechanical However, due to their simplicity and effectiveness, DNNs are
and Industrial Engineering, University of Illinois at Chicago, 842 West Taylor quickly becoming excellent time series classifiers. Traditional
Street, Chicago, IL 60607, United States. H. Darabi is the corresponding
author. E-mail: sharfo2, karim1, [email protected] time series classification models are more difficult to attack
Manuscript received MONTH XX, 2020; revised MONTH XX, 2020 as it can be considered a black-box model with an internal
2

computation that is not differentiable. As such, it is impossible warping paths can only go forward in time. P is defined as a
to exploit any gradient knowledge. However, as their gradient continuous path of cell in the matrix from (x1 , y1 ) to (xn , ym ).
knowledge can be easily exploited, DNN models are more The sth element of P is defined as ps = d(i, j)s ; where
vulnerable to white-box attacks. A white-box attack is where d(i, j) = (xi -yj )2 , S is the length
p of P = p1 , p2 , . . . , pS . The
the opponent has "given access to all elements of the training distance for DTW is equal to D(n, m), where the distance
procedure" [22], including the training data set, the training of each represented cell is found in Equation (1).
algorithm, the model’s parameters and weights, and the model
architecture itself. In comparison, a black-box attack only has
D(i, j) = (xi −yj )2 +min[D(i−1, j−1), D(i−1, j), D(i, j−1)]
access to the training process and architecture of the target
(1)
models [22].
Initiated by the following conditions:
This study proposes a proxy attack strategy on a target
classifier via a student model, trained using standard model D(1, 1) = 0; D(1, 2 . . . m) = ∞; D(2 . . . n, 1) = ∞ (2)
distillation techniques to mimic the behavior of the target
In the case of multivariate time series, the DTW distance
multivariate time series classification models. The student
matrix is calculated between each channel of two multivariate
network is the neural network distilled from another time
time series. The summation of the DTW distances for each
series classification model, called the teacher model, that learns
channel is then used as the similarity metric between the two
to approximate the output of the teacher model. Once the
multivariate time series.
student model has been trained, our adversarial transformation
2) Multi-Fully Convolutional Network: The Multivariate
network (ATN) is then trained to attack this student model.
Fully Convolutional Network (Multi-FCN) is one of the first
Our methodogolies are applied onto 1-NN DTW and Fully
deep learning networks used for the task of time series
Convolutional Network (FCN) that are trained on 18 multivari-
classification [29]. The Multi-FCN architecture is an extension
ate time series bench marks from the University of East Anglia
of the original FCN model that takes a univariate time series as
(UEA) and the University of California, Riverside (UCR) [26].
input. Multi-FCN consists of 3 2D-Convolutional layers, with
To the best of our knowledge, the result of such an attack
convolution kernels of size 8, 5 and 3 respectively, that emit
on multivariate time series classification models has never
128, 256 and 128 filters respectively. Each convolution layer
been studied before. Finally, we recommend researchers that
is followed by a batch normalization layer [30] with a ReLU
develop time series classification models to consider model
activation layer. A global average pooling layer is applied
robustness as an evaluative metric and incorporate adversarial
after the final ReLU activation layer. The pooling layer is then
data samples into their training data sets in order to further
passed to a softmax layer to determine the class probability
improve resilience to adversarial attacks.
vector.
The remainder of this paper is structured as follows: Section
II provides a background on the utilized multivariate time series
classification models and information on adversarial crafting B. Adversarial Transformation Network
techniques used on computer vision problems. Section III Multiple different approaches for generating adversarial
details our proposed methodologies. Section IV presents and samples have been proposed to attack neural networks. These
explains the results of our proposed methodologies on a set methods have focused on the task of generating adversaries for
of multivariate time series classification models. Section V computer vision tasks. Most of these methods use either the
concludes the paper and proposes future work. gradient with respect to the image pixels of these neural net-
works or explicitly solving an optimization on the image pixel.
II. BACKGROUND AND R ELATED W ORKS Baluja and Fischer [31] propose Adversarial Transformation
Networks (ATNs) to efficiently generate an adversarial sample
A. Time Series Classifiers to attack networks by first using a self-supervised method to
1) Multivariate 1-Nearest Neighbor Dymanic Time Warping: train a feed-forward neural network. Given the original input
The equations below for 1-Nearest Neighbor Dynamic Time sample, ATNs modify the classifier outputs slightly to match the
Warping (1-NN DTW) are derived [8], [27]. Dynamic Time adversarial target. ATNs can be parametrize as a neural network
Warping is a distance metric used to non-linearly align two gf (x) : x → x̂, where f is the target model (a time series
time series. DTW outputs a matrix of the distance path and classifier) which outputs either a class probability vector or a
the shortest distance between series [28]. For two time series sparse class label, and x̂ ∼ x, but argmax f (x) 6= argmax
X = x1 , x2 , . . . , xn and Y = y1 , y2 , . . . , ym of lengths n and f (x̂). To find gf , minimize the following loss function :
m respectively, a DTW matrix can be calculated of size n x m.
Each cell of the DTW matrix represents an alignment between
L = β ? Lx (gf (xi ), xi ) + Ly (f (gf (xi )), f (xi )) (3)
two points of the corresponding time series. Matrix calculations
must follow the following conditions: Boundary Condition: where Lx is a loss function on the input space (e.g. L2 loss
The paths should start from the beginning of each time series function), Ly is the specially constructed loss function on the
(x1 , y1 ) and finish at the last point of each time series (xn , ym ). output space of f to avoid learning the identity function, xi
Continuity Condition: The paths should have no jumps in is the ith sample in the dataset and β is the weighing term
steps; the points to consider for distance at the (i, j) point are between the two loss functions. It is necessary to carefully select
(i-1, j), (i, j-1), and (i-1, j-1). Monotonicity Condition: The the loss function Ly on the output space to successfully avoid
3

learning the identity function. Baluja and Fischer [31] define expanded from ATNs [31]. These ATNs are generative neural
the loss function Ly as Ly (y 0 , y) = L2 (y 0 , r(y, t)), where networks that take a multivariate time series x as an input and
y = f (x), y 0 = f (gf (x)), and r(∆) is a reranking function that outputs an adversarial sample x̂.
modifies y such that yk < yt, ∀k 6= t. This reranking function An Adversarial Transformation Network can be parametrize
r(y, t) can either be a simple one hot encoding function or as a neural network gf (x) : x → x̂, where f is the model to be
can be formulated to take advantage of the already present attacked. The ATN is further adjusted with the gradient of the
y to encourage better reconstruction. The reranking function input sample x with respect to the softmax scaled logits of the
proposed by Baluja and Fischer [31] can be formulated as: target class output by the attacked classifier. This adjustment
results in the Gradient Adversarial Transformation Network
 α ? max(y) if k = t,
! (GATN) as a neural network gf (x, x̃) : (x, x̃) → x̂, where:
rα(y, t) = norm (4)
yk otherwise k∈y ∂x
x̃ = (6)
where α > 1 is an additional hyperparameter which ∂ft
defines how much larger yt should be than the current max
such that x ∈ RT is an input multivariate time series of
classification and norm is a normalizing function that rescales
maximum length T , ft represents the probability of the input
its input to be a valid probability distribution
series being classified as class t. Given the input gradient x̃,
the GANT can construct better adversarial samples to affect
C. Transferability Property the targeted model and reduce the perturbation added to the
Papernot et al. [18] propose a black-box attack by training sample. For this reason, the GANT model is used for all our
a local substitute network, s, to replicate or approximate the attacks.
target deep neural network (DNN) model, f . The local sub- This study focuses on attacking 1-NN DTW and Multi-
stitute model is trained using synthetically generated samples FCN time series classifiers. The 1-NN DTW classifier is non-
and the output of these samples are labels from f . The local differentiable, which creates a problem for the GATN model.
substitute network is than used to generate adversarial samples A solution to overcoming the non-differentiability issue is
that are misclassifications. Generating adversarial samples for discussed in Section III-D, by training a student network s to
s is much easier then generating adversaries from f , as its approximate the output of the non-differentiable time series
full knowledge/parameters are available, making it susceptible classifier f .
to various attacks. The key criteria to successfully generate
adversarial samples of f is the transferability property, where
adversarial samples that misclassify s will also misclassify f . B. Black-box and White-box Restrictions
The formulation presented in Section III-A. is satis-
D. Knowledge Distillation factory for white-box attacks, where the attacked model f
Knowledge distillation, first proposed by Bucila et al. or the student model s is known. For black-box attacks, we
[32], is a model compression technique where a small student do not have access to the time series classifier or the dataset
model, s, is trained to mimic a pretrained teacher model, f . This for model training. A further restriction for black-box attacks
process is also known as the model distillation training. The is to utilize only the outputted predicted label, and not the
knowledge that is distilled from f to s is done by minimizing probabilistic class vector obtained from either a softmax layer
a loss function, where the objective of s is to imitate the or probabilistic approximations for classical model outputs.
probability class vector output by the model f . Hinton et al. For each dataset D, the training data is split into two
[33] note that there are several instances where the probability halves. The GATN is trained on one half of the training data
distribution is skewed such that the correct class probability Dtrain . The remaining training data Deval is used to perform
would have a probability close to 1 and the remaining classes evaluations. In addition to Deval , the unseen test set Dtest is
would have a probability closer to 0. For this reason, Hinton used for evaluation. The available dataset D is not the dataset
et al. [33] recommend computing the probabilities qi from the that the attacked model f was trained on. The available training
prenormalized logits zi , such that: set of the attacked model is never utilized to train or evaluate
the GATN model. To satisfy these constraints, the available
qi = σ(z; T ) = exp(zi /T )/sumj exp(zj /T ) (5) dataset D is defined as the test set of the multivariate time
series classification task. This test set is not used to train any
where T is a temperature factor normally set to 1. Higher
attacked model f , therefore it can be used as an unseen dataset.
values of T produce softer probability distributions over classes.
The test dataset is then split into two halves with equivalent
The loss that is minimized is the model distillation loss, further
class balance. When evaluating black-box attacks, the available
explained in Section III-C.
dataset is treated as if it were unlabeled. Due to this restriction,
the predicted label from the attacked model f is utilized to
III. M ETHODOLOGY
label the dataset prior to the attacks. This restriction adds
A. Gradient Adversarial Transformation Network realism to the training of GATNs, as it is difficult to obtain
This work studies black-box and white-box attacks or create labeled datasets for time series tasks compared to
on multivariate time series. Both attacks use methodologies computer vision tasks.
4

C. Training Methodology conduct a white-box attack on a neural network by targeting


When training models ATN and GATN, the selected the target neural network f directly. In all other cases, whether
reranking function r(·) strongly affects the loss formulation the attack is a white-box or a black-box attack, and whether
on the prediction space (Ly). When we opt for the one hot the attacked model is a neural network or a classical model,
encoding of the target class, we lose the ability to keep class we pick the student model s as the model attacked to train the
ordering and the ability to adjust the ranking weight (α) to GATN, and then use GATN’s predictions (x̂) to test whether
get less skewed adversaries. Nonetheless, to use the correct the teacher model f is also attacked if the expected adversarial
reranking function, we must have access to the class probability input (x̂) is used as a reference.
distribution, which is inaccessible to black-box attacks, or some During the evaluation of the qualified GATN, we calculate
classical models such as 1-NN DTW, which uses distance-based the number of opponents of the f model attacked that were
computations to evaluate the nearest neighbor, may not even obtained on the Deval training collection. We can calculate
be able to calculate. any metric in two situations during the assessment. Provided a
To address this limitiation, we use knowledge distillation a split labelled dataset, we can double check whether or not an
a method to train a student neural network s that is equipped to adversary has been detected. First, we check that the ground
replicate the predictions of the model to be targeted f . As such, truth label matches the classifier’s predicted label when supplied
we need to measure the attacked model’s predictions on the with an unmodified input (y = y 0 when supplied with x when
dataset that we possess just one time, which can either be class supplied with f ), and then check that this predicted label
labels or probability distributions across all classes. Then we is different from the predicted label when supplied with the
use such labels as the ground truth labels that are conditioned adversarial input (y 6= ŷ 0 when supplied with x̂ input). This
to mimic the student s. We use one hot encoding scheme to ensures that we do not count an incorrect prediction from a
measure the cross entropy loss in case the predictions are class random classifier as an attack.
marks, otherwise we try to imitate the distribution of probability Another circumstance is that we do not have any labeled
directly. It should be remembered that the student model shares samples prior to splitting the dataset. This training set is an
Deval with the GATN model of the training dataset. unseen set for the attacked model f , therefore we consider that
As suggested by Hinton et al. [33], we describe the the dataset is unlabeled, and assume that the label predicted by
training scheme of the student as shown in Figure 1. We scale the base classifier is the ground truth (y = y 0 by default, when
the logits of the student s and teacher f (iff the teacher provides sample x is provided to f ). This is done prior to any attack
probabilities and it is a white-box attack) by a temperature by the GATN and is computed just once. We then define an
scaling parameter τ , which is kept constant at 10 for all adversarial sample as a sample x̂ whose predicted class label
experiments. When training the student model, we minimize is different than the predicted ground truth label (y 6= ŷ, when
the loss function defined as: sample x̂ is provided to f ). A drawback of this approach is
that it is overly optimistic and rewards sensitive classifiers that
Ltransf er = γ ∗ Ldistillation + (1 − γ) ∗ Lstudent (7) misclassify due to very minor alterations. In order to adhere to
an unbiased evaluation, we chose the first option, and use the
Ldistillation = H(σ(zf ; T = τ ), σ(zs ; T = τ )) (8) labels we know from the test set to measure the adversarial
inputs properly. In doing so, we consider the need for a labeled
test set, but as shown above, following this method is not
Lstudent = H(y, σ(zs ; T = 1)) (9) strictly necessary.
where H is the standard cross entropy loss function, zs and
zf are the un-normalized logits of the student (s) and teacher IV. E XPERIMENTS AND R ESULTS
(f ) models respectively, σ(·) is the scaled-softmax operation All methodologies were tested on 18 benchmark datasets
described in (5), y is the ground truth labels, and γ is a gating for multivariate time series classification found in the UEA
parameter between the two losses and is used to maintain a and UCR repository [26]. Table I gives information about the
balance between how much the student s imitates the teacher multivariate time series. The evaluation has two objectives,
f versus how much it learns from the hard label loss. When to minimize the mean squared error (MSE) between the
training a student as a white-box attack, we set γ to be 0.5, training dataset and the generated samples and; to maximize
allowing the equal weight to both losses, whereas for a black- the number of adversaries for a set of chosen beta values. For
box attack, we set γ to be 1. Therefore for black-box attacks, all experiments, we keep α, the reranking weight, set to 1.5,
we force the student s to only mimic the teacher f to the limit the target class set to 0, and perform a grid search over 5
of its capacity. In setting this restriction, we limit the amount possible values of β, the reconstruction weight term, such that
of information that may be made available to the GATN. β = 10−b ; b ∈ {1, 2, 3, 4, 5}. The code for all models are
available at https://github.com/houshd/TS_Adv_multivariate.
D. Evaluation Methodology
We train the GATN on one of the two models because A. Experiments
of the different restrictions between available information In this study, both neural networks and traditional time
depending on whether the attack is a white-box or black- series classifiers were chosen as the model f to target. We use
box attack. We assert that we train the GATN only when we a Fully Convolutional Network for the attacked neural network
5

Fig. 1. The top diagram shows the methodology of training the model distillation used in the white-box and black-box attacks. The bottom diagram is the
methodology utilized to attack a time series classifier.

and 1-NN Dynamic Time Warping is used for the traditional relu) -> Fully Connected (number of classes, softmax).
base model. The fully convolutional network is an exptension of the FCN
To retain the strictest definition of black and white-box model proposed by Wang et al [29]. The FCN is comprised of
attacks, we only use the attacked model’s discrete class label for three blocks, each comprised of a sequence of Convolution layer
black-box attacks and use the probability distribution expected -> Batch Normalization -> ReLU activations. All convolutional
by the white-box attack classifier. The only exception where kernels are initialized using the uniform he initialization
a student-teacher network is not used is when conducting proposed by He et al [35]. We utilize filters of size [128,
a white-box attack on a FCN time series model, since 256, 128] and kernel of size of [8, 5, 3].
an Adversarial Transformation Network (ATN) can directly One strong baseline deterministic model for classifying
manipulate the gradient information from a neural network. multivariate time series is 1-NN DTW without a warping
The performance of the adversarial model is evaluated on the window. The distance based nature of the 1-NN classifier and
teacher classification model for the original time series. the reliance on a distance matrix, 1-NN DTW cannot easily be
For every student model we train, we utilize the LeNet-5 used to compute an equivalent soft probabilistic representation.
architecture [34]. The LeNet-5 time series classifier is defined Since white-box attacks have access to only the probability
as a classical Convolutional Neural network following the distribution predicted for each sample, the distance matrix
structure: Conv (6 filters, 5x5, valid padding) -> Max Pooling generated by DTW is used to compute an equivalent soft
-> Conv (16 filters, 5x5, valid padding) -> Max Pooling -> probabilistic representation. The analogous representation is
Fully Connected (120 units, relu) -> Fully Connected (84 units, such that we get the exact same result as selecting the 1-NN
6

TABLE I
DATASET DESCRIPTION FOR UEA AND UCR MULTIVARIATE TIME SERIES BENCHMARKS

Train Test Max Series Num.


Dataset Dimensions
Samples Samples Length Classes
ArticularyWordRecognition 275 300 9 144 25
AtrialFibrillation 15 15 2 640 3
BasicMotions 40 40 6 100 4
CharacterTrajectories 1422 1436 3 182 20
Cricket 108 72 6 1197 12
Epilepsy 137 138 3 206 4
EthanolConcentration 261 263 3 1751 4
ERing 30 270 4 65 6
FingerMovements 316 100 28 50 2
HandMovementDirection 160 74 10 400 4
Handwriting 150 850 3 152 26
JapeneseVowels 270 370 12 29 9
Libras 180 180 2 45 15
LSST 2459 2466 6 36 14
NATOPS 180 180 24 51 6
PenDigits 7494 3498 2 8 10
RacketSports 151 152 6 30 4
UWaveGestureLibrary 120 320 3 315 8

on the real distance matrix if we determine the top class on which is passed to the softmax function, as shown in Equation
this representation 5 with T set to 1, to represent this matrix V 0 as a probabilistic
To compute this soft probabilistic representation, consider a equivalent of the original distance matrix V . An implicit
set of distance matrices V computed using a distance measure restriction placed on Algorithm 1 is that the representation is
such as DTW between all possible pairs of samples between equivalent only when computing the 1-NN DTW. It cannot be
the two datasets being compared. used to to represent the K-NN DTW (or any other distance
metric) and therefore cannot be used for K-NN classification.
Algorithm 1: Equivalent Probabilistic Representation of However, in time series classification, the value of K is typically
the Distance Matrix for 1-Nearest Neighbor Classification set to 1 for nearest neighbor classifiers. While the Algorithm 1
Data: V is a distance matrix of shape has been used to convert the 1-NN DTW distance matrices, any
[Ntest ; Ntrain ; NChannels ] and y is the train set set of distance matrices used for 1-NN classification algorithms
label vector of length Ntrain can also be used to standardize it.
Result: Softmax normalized predictions p of shape
[Ntest ; C] and the discrete label vector q of B. Results
length Ntest Figures 2 and 3 depict the results from white-box attacks
begin on 1-NN DTW and FCN that are applied on 18 multivariate
V ←− (−V ) time series datasets. Figures 4 and 5 represent the results from
uniqueLabels = U nique(y) //Unique Class Labels
Vc = []
black-box attacks on 1-NN DTW and FCN classifiers trained on
for ci ∈ uniqueLabels do the same 18 datasets. Experimental results for these 18 dataset
vc = V(y=ci ) //[Ntest ; Ntrain (y = ci ); NChannels ] aimed to generate adversaries for only one class. The detailed
vc _max = max(vc ) //[Ntest ] results can be found in Appendix A. The proposed methodology
Vc .append(vc _max) is successful in capturing adversaries on all datasets.
V 0 = concatenate(Vc ) //[Ntest ; # of classes] The number of adversaries in each dataset and the amount
p = sof tmax(V 0 ) //[Ntest ; # of classes] of perturbation per sample are reliant on the hyper-parameters
q = argmax(p) //[Ntest ]
return (p,q) being tested on. As an example, the dataset “AtrialFibrillation”
was only able to generate multiple adversaries for the black-box
attack on 1-NN DTW when the target class was 0. However,
Algorithm 1 is an intermediate standardization algorithm if the target class is 1, the number of adversaries generated
that accepts a set of V distance matrices and the training class increases to 5, 2, 3, 13 for a black-box attack on 1-NN DTW,
labels of y as inputs, and calculates an analogous probabilistic white-box attack on 1-NN DTW, black-box attack on FCN and
representation that can be used directly to determine the 1- white-box attack on FCN, respectively. These numbers could
Nearest Neighbor. The Soft-1NN algorithm selects all samples potentially be higher if the hyper-parameters are optimized for
that belong to a class ci , where i ∈ {1, . . . , C} as vc , computes this target class. Additionally, the target class has a significant
the maximum over all train samples for that class, then appends impact on the adversary being produced because of the ATN’s
the vector vc _max to the list Vc . The concatenation of all of loss function. For time series groups, it is easier to generate
these lists of vectors in Vc then represents the matrix V 0 , adversaries which are identical to one another. A Wilcoxson
7

signed-rank test is utilized to compare the number of adversaries feasible on real world devices, even for black-box attacks. The
generated by white-box and black-box attacks on FCN and deployment of a trained GATN with the paired student model
1-NN DTW classifiers that are trained on the 18 datasets, affords a near constant-time cost of generating reasonable
summarized in Table II. Our findings indicate that the FCN number of adversarial samples. As the forward pass of the
classifier is more susceptible to a white-box attack compared GATN requires few resources, and the student model is
with a 1-NN DTW white-box attack. It should be noted that small enough to compute the input gradient (x̃) in reasonable
the FCN classifier white-box attack is producing considerably time, these attacks can be constructed without significant
more adversaries than its counterparts. This is because the computation on small, portable devices. Therefore, the fact that
attack with the white-box is explicitly on the FCN model and certain classifiers that are trained on certain datasets can be
not on a student model approximating the classifier’s behaviour. attacked without requiring any additional on-device training is
We observe that the number of adversarial samples obtained concerning.
on FCN classifiers from black-box attacks is greater than the
number of adversarial samples from either white-box or black- TABLE II
box attacks on DTW classifiers. A Wilcoxson signed-rank W ILCOXSON SIGNED - RANK TEST COMPARING THE NUMBER OF
ADVERSARIES BETWEEN THE DIFFERENT ATTACKS
test supports this finding by showing a statistically significant
difference (at a rate of 0.05) in the number of adversarial
White-box Black-box White-box
samples observed on 1-NN DTW classifiers due to black- 1-NN DTW FCN FCN
box or white-box attacks versus the number of adversarial Black-box
1.99E-01 1.02E-03 1.59E-03
samples obtained by black-box attacks on FCN classifiers. We 1-NN DTW
White-box
also detect that 1-NN DTW classifiers under either type of 1-NN DTW
1.28E-03 3.26E-04
attack have approximately the same number of adversaries Black-box
2.92E-04
generated. Finally, we find that FCNs has the least number FCN
of adversarial samples after black-box attacks, although each
of these samples requires indistinguishable disturbances to
the original signal. These observations are important to future TABLE III
W ILCOXSON SIGNED - RANK TEST COMPARING THE MSE BETWEEN THE
research into the development of time series classifiers, as DIFFERENT ATTACKS
the number of adversarial samples generated under each
methodology can be used as a secondary evaluation metric White-box Black-box White-box
to measure the robustness of a model. The average MSE of 1-NN DTW FCN FCN
adversarial samples after black-box attacks on FCN classifiers is Black-box
1.33E-01 6.42E-02 1.08E-02
1-NN DTW
lower than the average MSE of the adversarial samples obtained White-box
via black-box and white-box attacks on 1-NN DTW classifiers, 2.49E-02 7.07E-02
1-NN DTW
but only statistically significant when compared to white-box, Black-box
4.34E-03
FCN
as observed in Table III. A lower MSE indicates the black-box
attack on FCN classifiers requires minimal perturbations per
time series sample in comparison to the attacks on 1-NN DTW
classifiers. V. C ONCLUSION
Now we test how well GATN generalizes onto an unseen This work proposes a model distillation technique to
dataset, Dtest , such that GATN does not require any additional mimic the behavior of the various classical multivariate time
training. This is beneficial in situations where the time series series classification models and an adversarial transformation
adversarial samples are generated in constant time of a single network to attack various multivariate time series datasets. The
forward pass of the GATN model without requiring further proposed methodology is applied onto 1-NN DTW and Fully
training. Such a generalization is uncommon to adversarial Connected Network (FCN) that are trained on 18 University
methodologies because they require retraining to generate of East Anglia (UEA) and University of California Riverside
adversarial samples. Our proposed methodology is robust, (UCR) datasets. All 18 datasets showed to be susceptible to
successfully generating adversarial samples on data that is at least one type of attack. To the best of our knowledge, this
unseen to both the GATN and the student models, for the is the first time multivariate time series classifiers have been
respective targeted time series classification models. Figure attacked using adversarial neural networks. The classical 1-NN
6 depicts the number of adversarial samples detected, on an DTW proved to be more robust to adversarial attacks than the
unseen dataset, with a white-box and black-box attack on the 1- FCN model. We suggest potential researchers creating models
NN DTW classifiers and FCN classifiers. The white-box attack for the classification of time series to take model robustness as
on the FCN classifier obtains the most adversarial samples per an evaluative metric. In addition, we suggest that adversarial
dataset. This is followed by a white-box and black-box attack data samples be integrated into their training data sets to further
on the 1-NN DTW, which show similar number of adversarial enhance resistance to adversarial attacks.
samples constructed. Finally, we find that the FCN classifier
is the least susceptible to black-box attacks.
The unique consequence of this generalization is the
application of trained GATN models for attacks that are
8

Fig. 2. White-box attack on 1-NN DTW that is trained on 18 datasets

Fig. 3. White-box attack on FCN that is trained on 18 datasets


9

Fig. 4. Black-box attack on 1-NN DTW that is trained on 18 datasets

Fig. 5. Black-box attack on FCN that is trained on 18 datasets


10

Fig. 6. Black-box and white-box attacks on FCN and 1-NN DTW classifiers that are tested on Dtest without any retraining.

A PPENDIX A TABLE V
D ETAILED R ESULTS W HITE - BOX ATTACK ON 1-NN DTW MODELS

Dataset Num. of Adversaries MSE


TABLE IV
ArticularyWordRecognition 18 0.104909
B LACK - BOX ATTACK ON 1-NN DTW MODELS
AtrialFibrillation 0 0.061174
BasicMotions 3 0.105032
Dataset Num. of Adversaries MSE CharacterTrajectories 203 0.084731
ArticularyWordRecognition 8 0.063085 Cricket 1 0.013019
AtrialFibrillation 2 0.015005 Epilepsy 11 0.249629
BasicMotions 2 0.114111 ERing 29 0.176111
CharacterTrajectories 202 0.08986 EthanolConcentration 26 0.033794
Cricket 0 0.012914 FingerMovements 8 0.071567
Epilepsy 5 0.123488 HandMovementDirection 4 0.023995
ERing 30 0.128423 Handwriting 61 0.206134
EthanolConcentration 72 0.466593 JapaneseVowels 33 0.114339
FingerMovements 10 0.09904 Libras 31 0.109957
HandMovementDirection 2 0.004315 LSST 223 0.285733
Handwriting 49 0.125444 NATOPS 16 0.129278
JapaneseVowels 28 0.090313 PenDigits 123 0.136015
Libras 31 0.142043 RacketSports 16 0.157508
LSST 219 0.185534 UWaveGestureLibrary 52 0.373157
NATOPS 14 0.073556
PenDigits 293 0.197882
RacketSports 14 0.147109
UWaveGestureLibrary 34 0.075413

ACKNOWLEDGMENT
The authors would like to thank all the researchers that
helped create and clean the data available in the updated
UEA and UCR Multivariate Time Series Classification Archive.
Sustained research in this domain would be much more
challenging without their efforts.
11

TABLE VI dynamic time warping calculation with utilizing value repetition. Knowl-
B LACK - BOX ATTACK ON FCN MODELS edge and Information Systems, 57(2):359–388, 2018.
[8] Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, and Choti-
rat Ann Ratanamahatana. Fast time series classification using numerosity
Dataset Num. of Adversaries MSE
reduction. In Proceedings of the 23rd international conference on
ArticularyWordRecognition 4 0.000156 Machine learning, pages 1033–1040. ACM, 2006.
AtrialFibrillation 1 0.016261 [9] Wenjie Pei, Hamdi Dibeklioğlu, David MJ Tax, and Laurens van der
BasicMotions 4 0.131301 Maaten. Multivariate time-series classification using the hidden-unit
CharacterTrajectories 9 0.002883 logistic model. IEEE transactions on neural networks and learning
Cricket 1 3.046668 systems, 29(4):920–931, 2017.
Epilepsy 5 0.203446 [10] Patrick Schäfer and Ulf Leser. Multivariate time series classification
ERing 9 0.10876 with weasel+ muse. arXiv preprint arXiv:1711.11343, 2017.
EthanolConcentration 10 0.019332 [11] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel
FingerMovements 3 0.0031 Harford. Multivariate lstm-fcns for time series classification. Neural
HandMovementDirection 2 0.018381 Networks, 116:237–245, 2019.
Handwriting 2 0.012477 [12] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining
JapaneseVowels 17 0.043202 and harnessing adversarial examples. arXiv preprint arXiv:1412.6572,
Libras 3 0.200899 2014.
LSST 27 0.01518 [13] Konda Reddy Mopuri, Aditya Ganeshan, and Venkatesh Babu Rad-
NATOPS 3 0.011849 hakrishnan. Generalizable data-free objective for crafting universal
PenDigits 129 0.129774 adversarial perturbations. IEEE transactions on pattern analysis and
RacketSports 5 0.096207 machine intelligence, 2018.
UWaveGestureLibrary 11 0.016174 [14] Konda Reddy Mopuri, Utkarsh Ojha, Utsav Garg, and R Venkatesh Babu.
Nag: Network for adversary generation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 742–751,
TABLE VII 2018.
W HITE - BOX ATTACK ON FCN MODELS [15] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Shun Chen.
Lstm fully convolutional networks for time series classification. IEEE
Access, 6:1662–1669, 2017.
Dataset Num. of Adversaries MSE [16] Shuichi Hashida and Keiichi Tamura. Multi-channel mhlf: Lstm-fcn using
ArticularyWordRecognition 223 0.103408 macd-histogram with multi-channel input for time series classification. In
AtrialFibrillation 1 0.027168 2019 IEEE 11th International Workshop on Computational Intelligence
BasicMotions 14 0.196938 and Applications (IWCIA), pages 67–72. IEEE, 2019.
CharacterTrajectories 706 0.300598 [17] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii.
Cricket 15 0.083996 Virtual adversarial training: a regularization method for supervised and
Epilepsy 32 0.859604 semi-supervised learning. IEEE transactions on pattern analysis and
ERing 118 0.250408 machine intelligence, 41(8):1979–1993, 2018.
EthanolConcentration 37 0.05864 [18] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability
FingerMovements 13 0.127995 in machine learning: from phenomena to black-box attacks using
HandMovementDirection 17 0.006507 adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
Handwriting 94 0.076364 [19] Izaskun Oregi, Javier Del Ser, Aritz Perez, and Jose A Lozano.
JapaneseVowels 224 0.512852 Adversarial sample crafting for time series classification with elastic
Libras 24 0.259869 similarity measures. In International Symposium on Intelligent and
LSST 376 0.141442 Distributed Computing, pages 26–39. Springer, 2018.
NATOPS 69 0.178026 [20] Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep
PenDigits 850 0.270337 learning in computer vision: A survey. IEEE Access, 6:14410–14430,
RacketSports 39 0.236502 2018.
UWaveGestureLibrary 166 0.248728 [21] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris
Tsipras, and Adrian Vladu. Towards deep learning models resistant to
adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
[22] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan
R EFERENCES Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks
and defenses. arXiv preprint arXiv:1705.07204, 2017.
[1] Y LeCun, Y Bengio, and G Hinton. Deep learning. nature 521. 2015. [23] Fazle Karim, Somshubra Majumdar, and Houshang Darabi. Adversarial
[2] Shih-Chung B Lo, Heang-Ping Chan, Jyh-Shyan Lin, Huai Li, Matthew T attacks on time series. arXiv preprint arXiv:1902.10755, 2019.
Freedman, and Seong K Mun. Artificial convolution neural network for [24] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted
medical image pattern recognition. Neural networks, 8(7-8):1201–1214, attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops
1995. (SPW), pages 1–7. IEEE, 2018.
[3] Vikramjit Mitra, Horacio Franco, Martin Graciarena, and Dimitra [25] Skyler Seto, Wenyu Zhang, and Yichen Zhou. Multivariate time series
Vergyri. Medium-duration modulation cepstral feature for robust speech classification using dynamic time warping template selection for human
recognition. In 2014 IEEE International Conference on Acoustics, Speech activity recognition. In 2015 IEEE Symposium Series on Computational
and Signal Processing (ICASSP), pages 1749–1753. IEEE, 2014. Intelligence, pages 1399–1406. IEEE, 2015.
[4] Stefanos Doltsinis, Marios Krestenitis, and Zoe Doulgeri. A machine [26] Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn,
learning framework for real-time identification of successful snap-fit James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh.
assemblies. IEEE Transactions on Automation Science and Engineering, The UEA multivariate time series classification archive, 2018. CoRR,
2019. abs/1811.00075, 2018.
[5] Kukkong Sirisambhand and Chotirat Ann Ratanamahatana. A dimen- [27] Rohit J Kate. Using dynamic time warping distances as features
sionality reduction technique for time series classification using additive for improved time series classification. Data Mining and Knowledge
representation. In Third International Congress on Information and Discovery, 30(2):283–312, 2016.
Communication Technology, pages 717–724. Springer, 2019. [28] Eamonn Keogh and Chotirat Ann Ratanamahatana. Exact indexing of
[6] Anooshiravan Sharabiani, Houshang Darabi, Ashkan Rezaei, Samuel dynamic time warping. Knowledge and information systems, 7(3):358–
Harford, Hereford Johnson, and Fazle Karim. Efficient classification of 386, 2005.
long time series by 3-d dynamic time warping. IEEE Transactions on [29] Zhiguang Wang, Weizhong Yan, and Tim Oates. Time series classification
Systems, Man, and Cybernetics: Systems, 47(10):2688–2703, 2017. from scratch with deep neural networks: A strong baseline. In 2017
[7] Anooshiravan Sharabiani, Houshang Darabi, Samuel Harford, Elnaz international joint conference on neural networks (IJCNN), pages 1578–
Douzali, Fazle Karim, Hereford Johnson, and Shun Chen. Asymptotic 1585. IEEE, 2017.
12

[30] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating Houshang Darabi (S’98-A’00-M’10-SM’14) re-
deep network training by reducing internal covariate shift. arXiv preprint ceived the Ph.D. degree in Industrial and Systems En-
arXiv:1502.03167, 2015. gineering from Rutgers University, New Brunswick,
[31] Shumeet Baluja and Ian Fischer. Adversarial transformation net- NJ, USA, in 2000.
works: Learning to generate adversarial examples. arXiv preprint He is currently an Associate Professor with the De-
arXiv:1703.09387, 2017. partment of Mechanical and Industrial Engineering,
[32] Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. Model University of Illinois at Chicago (UIC), and also an
compression. In Proceedings of the 12th ACM SIGKDD international Associate Professor with the Department of Computer
conference on Knowledge discovery and data mining, pages 535–541. Science, UIC. He has been a contributing author
ACM, 2006. of two books in the areas of scalable enterprise
[33] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge systems and reconfigurable discrete event systems.
in a neural network. arXiv preprint arXiv:1503.02531, 2015. His research has been supported by several federal and private agencies, such
[34] Yann LeCun et al. Lenet-5, convolutional neural networks. URL: as the National Science Foundation, the National Institute of Standard and
http://yann. lecun. com/exdb/lenet, 20:5, 2015. Technology, the Department of Energy, and Motorola. He has extensively
[35] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving published on various automation and project management subjects, including
deep into rectifiers: Surpassing human-level performance on imagenet wireless sensory networks for location sensing, planning and management of
classification. In Proceedings of the IEEE international conference on projects with tasks requiring multi-mode resources, and workflow modeling and
computer vision, pages 1026–1034, 2015. management. He has published in different prestigious journals and conference
proceedings, such as the IEEE Transaction on Robotics and Automation, the
IEEE Transactions on Automation Science and Engineering, and the IEEE
Transactions on Systems, Man, and Cybernetics, and Information Sciences.
His current research interests include the application of data mining, process
mining, and optimization in design and analysis of manufacturing, business,
project management, and workflow management systems.

Samuel Harford received the B.S. degree in In-


dustrial Engineering from the University of Illinois
at Chicago in 2016, the M.S. degree in Industrial
Engineering from the University of Illinois at Chicago
in 2018. He is currently pursuing the Ph.D. degree
with the Mechanical and Industrial Engineering
Department, University of Illinois at Chicago. His
research interests lie in the domain of machine
learning, time series classification, and heath care
data mining.

Fazle Karim received the B.Sc. degree in industrial


engineering from the University of Illinois at Urbana-
Champaign in 2012, the M.Sc. degree in industrial
engineering from the University of Illinois at Chicago
in 2016. He is currently pursuing the Ph.D. degree
with the Mechanical and Industrial Engineering
Department, University of Illinois at Chicago. He
is also the Lead Data Scientist with Prominent
Laboratory, the university’s foremost research facility
in process mining. His research interests include
education data mining, health care data mining, and
time series analysis.

You might also like