0% found this document useful (0 votes)
6 views

00 Using Variational Autoencoder To Augment Sparse Time Series Datasets

Uploaded by

Mukenze junior
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

00 Using Variational Autoencoder To Augment Sparse Time Series Datasets

Uploaded by

Mukenze junior
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Using Variational Autoencoder to augment Sparse

Time series Datasets


Maxime Goubeaud, Philipp Joußen, Nicolla Gmyrek, Farzin Ghorban, Lucas Schelkes and Anton Kummert
2021 7th International Conference on Optimization and Applications (ICOA) | 978-1-6654-4103-2/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICOA51614.2021.9442619

School of Electrical, Information and Media Engineering


University of Wuppertal
42119 Wuppertal, Germany
{maxime.goubeaud, philipp.joussen, nicolla.gmyrek, farzingrz, lucas.schelkes, kummert}@uni-wuppertal.de

Abstract—In machine learning, data augmentation is called [7] are more examples of data augmentation methods for time
the process of generating synthetic samples in order to augment series data.
sparse training datasets. Reducing the error-rate of classifiers is Variational Autoencoders (VAEs) have been proven recently
the main motivation. In this paper, we generate synthetic training
samples of time series data using a simple implementation of to be suitable for unsupervised anomaly detection [8], as well
the Variational Autoencoder, to test whether classification perfor- as data augmentation purposes [9].
mance increases when augmenting the original training sets with In this paper, we use a Variational Autoencoder to generate
manifolds of generated samples. We demonstrate the effectiveness synthetic examples of time series data, in order to augment
of data augmentation using the Variational Autoencoder as relatively small training sets. In this way, by providing an
a generative model, by conducting experiments with different
standard classifiers evaluated on nine datasets from the UCR increased amount of training data, we aim to achieve a better
Time Series Classification Archive. We show that our method is classification performance of a model.
beneficial in most cases, as we observed an increase of accuracy The rest of the paper is organized as follows: after the
and F1-Score on all datasets. introduction, section II reviews insights and experiments
Index Terms—data augmentation, time series, variational au- related to VAE applications, section III briefly discusses the
toencoder
functionality of VAE, section IV explains our method in
detail and presents the datasets used for evaluation. This is
I. I NTRODUCTION
followed by a detailed evaluation in section V. Section VI
Data augmentation (DA) is the process of creating synthetic concludes the paper.
samples from a set of real-world data. This is used to increase
the number of samples in a dataset in order to increase
the generalization capability of classifiers and to prevent II. R ELATED W ORK
overfitting. Especially in cases where only small datasets are A VAE with encoder and decoder implemented as Bi-
available for an application, data augmentation has become an LSTM networks with tanh activations was used to detect
important research field recently. Another opportunity that data anomalies in the ECG5000 UCR dataset fully unsupervised in
augmentation offers, is to compensate for class imbalances. [8]. Here, introducing a dynamical term in the loss function
Neural networks as classification models tend to be biased that increased during training, helped to achieve better recon-
towards classes that have a significantly larger number of structions of the training samples in early stages of training,
training samples compared to other classes of a training set. and to better regularize at the end of training.
Augmenting the underrepresented class with synthetic samples A VAE architecture based on Recurrent Neural Networks
can help to avoid the classification bias towards relatively over- (RNNs) was used in [10] to detect anomalies in time series
represented classes. An example of this type of augmentation data derived from a motor experiment platform. Abnormal
technique is the SMOTE (Synthetic Minority Over-sampling data was created by loosening screws on the bearing blocks or
Technique) method [1]. adding extra loads. Classification models like Support Vector
In the field of image processing, but also in speech recog- Machine (SVM) and Random Forest showed worse results
nition, there are many straightforward and easy-to-implement with dimensionally reduced training data using a standard
data augmentation methods. Rotating or mirroring images for Autoencoder and VAE, but the classification performance
example can be seen as a label-preserving transformation that improved with extracting lower-dimensional features using the
is able to generate manifolds of a training set easily [2]. RNN-based VAE.
Speeding up or slowing down samples is a method in the field In [9], authors compare a Generative Adversarial Network
of speech recognition [3]. Different methods to augment time and a VAE for generating synthetic training data to improve
series datasets such as rotating, scaling, and jittering have been classification performance. The goal was to augment
tested in [4]. Dynamic Time Warping Barycentric Averaging underrepresented classes in a dataset of malware samples.
(weighted DBA) [5] and simulating time series signals [6], The network structure of the VAE was based on Convolutional

978-1-6654-4103-2/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.
Neural Networks (CNNs) with ReLU activations for the
convolutional layers, as the raw malware bytes were converted Latent Space
into images. Training an 18-layered Residual Network (see
[11]) classifier only on real data resulted in 83% accuracy.
Then, out of 2000 generated malware samples, 100 random
samples for each class have been added to the dataset. For x q(z|x) z p(x|z) x̂
evaluation, results have been analyzed with precision, recall,
and F1-scores. Test accuracy improved by 2%.

III. VARIATIONAL AUTOENCODER


VAE was proposed by Kingma and Welling as a combina- Mapping x to z Mapping z to x̂
tion of a neural network-based inference model and a neural
network-based generative model for generative modeling and
representation learning [12]. The encoder in a VAE is denoted Fig. 1. VAE schema: Encoder model learns a mapping from input x to latent
representation z. The decoder model maps latent input back to x̂.
as q(z|x) and approximates the true posterior distribution
p(z|x), with input x and hidden representations z. Training
samples are encoded into a lower-dimensional hidden repre-
sentation space z that is stochastic, as parameters are put out layers with ReLu activation. A time series sample is fed
to a Gaussian probability density. A decoder network, denoted into the encoder network that returns a multivariate normal
as p(x|z), takes a sample from the latent space as input and distribution with parameters µ and σ, that are derived from
outputs parameters to the data probability distribution (see Fig. the last hidden layer. The variance is parametrized as non-
1). In addition to a reconstruction term, the Kullback-Leibler negative by using Sof tplus activation. The prior distribution
divergence (KL divergence) is used in the loss function, to is defined as a unit Gaussian distribution and the latent space is
have the approximate posterior distribution q(z|x) close to the defined to be 2-dimensional. The symmetrically built decoder
unknown true posterior p(z|x). KL divergence is always non- network takes a two-dimensional sample from the latent space
negative and zero, if q(z|x) is equal to p(z|x). Optimization is as input, which is propagated through the network’s hidden
achieved by maximizing the evidence lower bound (ELBO), layers of the same size as in the encoder network and outputs
which consists of a term maximizing the reconstruction likeli- to a Bernoulli distribution. Using the Adam optimizer, the
hood and the negative KL divergence. By minimizing the KL ELBO is maximized by minimizing -(ELBO). Fig. 2 and Fig.
divergence, q(z|x) can become a better approximation from 3 show samples from the original Starlight Curves dataset
the true posterior. The reparameterization trick is applied so and generated samples after training the VAE for 100 epochs.
that gradients can backpropagate through the encoder network Samples for training set manifolds of 2, 5, and 10 are drawn
while still being able to randomly sample from the latent from the latent space after 100 epochs of training and fed to the
distribution. After a VAE is trained, synthetic training samples decoder network in order to augment the original training data.
can be generated by sampling latent codes from the latent As a result, an augmented training set is n times the size of the
space and feeding them into the decoder network. Small original training set while containing the original training set
training sets can be augmented with these synthetic examples itself plus synthetic training samples of n − 1 times the size of
to increase classification performance or tackle the problem the original training set. The test datasets are kept untouched.
of class imbalances. A latent vector z = µ + σ  can be For a UCR training set, the VAE is trained separately for
sampled using the reparameterization trick, where σ is scaled each class, so it generates only time series that it has learned
with  ∈ N (0, I) and added to µ. The prior distribution can from being trained with one class. Afterwards, the generated
also be defined as a Gaussian distribution with zero mean and samples, as well as the original training data, are concatenated
unit variance. into the final augmented training sets. The proportions of the
To prove the functionality of DA using the VAE, we classes are kept the same in the augmented training sets, as
have selected nine different datasets with time series data each class of a dataset has the same n parameter. Three VAE
from the UCR Time Series Classification Archive [13]. models have been used in the experiments, where they differ
Augmented training sets containing synthetic samples of in the number of hidden layers in the encoder and decoder
different manifolds are used for training using simple standard networks.
algorithms and results are compared to baseline experiments, Only for visualization purposes, the models have also been
in which the original UCR training sets are used alone. trained on all classes of a dataset, in order to see whether
the VAE is able to encode its input efficiently. It is desirable,
to have the ability to accurately reconstruct input samples.
IV. M ETHOD AND I MPLEMENTATION On the other hand, smooth latent space representations of the
For our experiments, both encoder and decoder networks input data are to be learned, so that each area of the latent
have been implemented with tensorflow using fully connected space represents the observed data. Fig. 4 shows latent codes

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Training samples of class two from the original Starlight Curves Fig. 3. Decoded samples of class two from the prior using the VAE.
training set.

TABLE I 3
OVERVIEW OF THE DATASETS USED FOR EVALUATION . A BBREVIATIONS :
CLASSES = NUMBER OF DIFFERENT CLASSES , TRAIN = NUMBER OF TIME
2
SERIES TRANSFORMED INTO SPECTOGRAM TRAINING SAMPLES , TEST =
NUMBER OF TIME SERIES TRANSFORMED INTO SPECTROGRAM TEST
SAMPLES , SL = LENGTH OF UNDERLYING TIME SERIES . 1

0
Dataset classes train test SL type
ECG200 2 100 100 96 ECG
−1
ECG5000 5 500 4500 140 ECG
TwoLeadECG 2 23 1139 82 ECG
StarlightCurves 3 1000 8236 1024 sensor −2
SonyAIBORobotSurface 2 20 601 70 sensor
MoteStrain 2 20 1252 84 sensor
GunPoint 2 50 150 150 motion −2 −1 0 1 2
InlineSkate 7 100 550 1882 motion
UWaveGestureLibraryX 8 896 3582 315 motion
Fig. 4. Visualization of the latent codes for Starlight Curves samples after
100 epochs of training, color-coded by their class label.

color-coded by class labels for the Starlight Curves dataset. It


is noticeable, that there are two classes that the model cannot V. E VALUATION
separate clearly while encoding (namely class one (purple) and For the evaluation, we select two well-known different
three (yellow)), whereas another dense region only contains classifiers from the python machine learning library sci-
latent codes of class two. kit learn. These classifiers are chosen for their widespread
The first model has two hidden layers in each network, application in knowledge discovery and industry. In a first
where the number of neurons equals half the length of an input step, we use these classifiers to create a benchmark result in
sample in each layer. The second model has two hidden layers accuracy (Acc.) and F1-score (F1). Then, we use the VAEs
in each network, where the number of neurons equals half the as explained in section IV to augment the datasets used for
length of an input sample in the first layer and is reduced evaluation. In the following, we use the augmented datasets
by half again in second layer of the encoder network. The to train SVM and kNN and compare the results achieved
third model has one hidden layer in each network, where the on the unchanged test datasets with the previously created
number of neurons equals half the length of an input sample. benchmarks. In summary, there are 10 different results for each
For each UCR dataset tested, three augmented training sets dataset and classifier: The benchmark results obtained with the
with a multiplication number of n = 2, n = 5, and n = 10 unchanged training datasets as well as nine additional results,
are classified using a k-Nearest-Neighbor (kNN) and SVM each corresponding to one out of three VAE architectures and
classifier and compared against the baseline results in terms a different multiplication number n of the respective dataset.
of accuracy and F1-score. Tab. II shows the evaluation results of kNN and Tab. III of

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.
SVM using Acc. and F1 as metrics. In the following the most With respect to the different characteristics of the
important findings for each of the classifiers are discussed: datasets, it can be stated that the number of classes,
the different types as well as the sample length has no
• kNN With the augmented datasets, improvements can significant influence on the functionality. Improvements
be achieved for most tests using kNN as a classifier. could be achieved for datasets with few classes (e.g.
With n = 5 or n = 10, the best results could be ECG200) as well as with many classes (e.g. InlineSkate),
achieved, n = 2 drops in comparison. This obser- with short samples (e.g. TwoLead) as well as with long
vation was made for all three architectures used. For ones (Starlightcurves) and across all 3 different types.
four datasets (StarLightCurves, SonyAIBORobotSurface, However, for the number of training data available in the
GunPoint, InlineSkate) the results in Acc and F1 could original dataset, it is noticeable that for datasets with few
be increased in every experiment. Regarding the different training data, particularly high improvements in Acc. and
characteristics of the datasets, it could be noted that the F1. could be achieved (e.g. SonyAIBORobotSurface or
sample length does not seem to have any influence on TwoLead).
the basic functionality of VAEs as a data augmentation A comparison of the different architectures shows that
method, since improvements could be achieved for both enriched datasets, regardless of the architecture used to
datasets with long samples (inlineskate) and short sam- generate the synthetic samples, perform better in our
ples (e.g. ECG200). However, it can be seen that particu- tests than the original training datasets. Nevertheless, it
larly high improvements were recorded for datasets with can be stated that the results of architecture 3 turn out
few training samples (e.g. TwoLead, SonyAIBORobot- particularly well. With this architecture, the best overall
Surface). result was recorded for 7 of 9 datasets. The results that
When comparing the results of the different architectures, could be achieved with n = 10 stand out positively.
it is noticeable that architecture 1 provides the best Overall, it can thus be said that the use of VAEs to
results. For 5 out of 9 datasets, the best result could be generate artificial samples is a good method when an
achieved with architecture 1. Additionally, the benchmark SVM is used as the classifier. The results are promising
result was not achieved in only a few cases. All training across all datasets. With n greater than or equal to 5,
datasets in which the benchmark result could not be benchmark results were improved in all tests. Only with
achieved with architecture 1 were based on n = 2. With n = 2 did deterioration occur for the UWave dataset.
n = 10, an improved result could be achieved for each In general, for both the kNN and the SVM, an improvement
dataset. could be achieved in most cases with the augmented datasets
Also for architectures 2 and 3, the best results were compared to the benchmark results with the original datasets.
achieved with n = 10. Architecture 2 also shows a very VAEs are particularly well suited as a data augmentation
low degradation rate and here, too, all degradations are method when the SVM is used. Here, better results in Acc.
based on n = 2. The datasets enriched with n = 5 or and F1 could be achieved for n = 5 and n = 10 in all tests. It
n = 10 show good results, even if they drop slightly can be stated that for both kNN and SVM the sample size, the
in comparison with architecture 1. With architecture 3, type of the respective dataset as well as the number of classes
degradations also occur for n = 5. However, with n = 10, have no noticeable influence on the functionality. Likewise,
even with architecture 3, the results can be improved for a choice of n = 10 is recommended for both classifiers. In
all datasets except for the UWave dataset. the comparison of the different architectures, it is noticeable
Overall, it can be concluded that the training datasets that for the kNN with architecture 1 and for the SVM with
enriched with synthetic samples generated by a VAE architecture 3 the best results could be achieved.
achieved good results compared to benchmark results.
The results show that architecture 1 is the best fit for kNN VI. C ONCLUSION
in combination with the datasets used. It is clear that with
n = 10 very good results can be achieved and almost no In this paper, we have presented a new data augmentation
degradation takes place. Architecture 1 in combination technique for time series, using a Variational Autoencoder to
with n = 10 can therefore be recommended as a data generate new, synthetic training samples for sparse training
augmentation method for comparable time series datasets. datasets. To test whether adequate samples of time series can
• SVM be generated with a VAE, we selected 9 different well-known
It can be stated that very good results could be achieved time series datasets. We enriched the training data of these
for SVM. The results of the enriched datasets could datasets for experiments with different numbers of synthetic
outperform the benchmark results in almost all cases. training samples, which were generated with 3 different VAE
Only for the UWave dataset the benchmark result was architectures. Subsequently, the original datasets were first
not reached in three experiments. It can be seen that very benchmarked before performing the same experiments with
good results were achieved for n = 5 and n = 10, with the enriched datasets. The kNN and SVM were used as
n = 2 falling off. This confirms the observations that the classifiers, and accuracy (Acc.) and F1-Score (F1) were
were made with the kNN. selected as the metrics for the results.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.
TABLE II
E VALUATION RESULTS FOR THE K NN CLASSIFIER . S HOWN ARE THE RESULTS FOR THE ORIGINAL DATASET FOLLOWED BY THE RESULTS USING THE
AUGMENTED TRAINING DATASETS FOR n = 2, n = 5, AND n = 10 WITH THE THREE DIFFERENT ARCHITECTURES . I NDICATED ARE ACCURACY [%] AND
F1-S CORE AS MEAN OF 5 RUNS . S ETTINGS : NNEIGHBORS = 5, LEAFSIZE = 30, P = 2, METRIC = MINKOWSKI .

Architecture 1 Architecture 2 Architecture 3


Dataset metric original data
n=2 n=5 n = 10 n=2 n=5 n = 10 n=2 n=5 n = 10
Acc 0.9000 0.9000 0.9020 0.9120 0.9020 0.9040 0.9060 0.8980 0.9080 0.9060
ECG200
F1 0.8986 0.8986 0.8998 0.9055 0.8998 0.9010 0.9020 0.8965 0.9032 0.9022

Acc 0.9391 0.9389 0.9391 0.9395 0.9368 0.9405 0.9397 0.9389 0.9379 0.9398
ECG5000
F1 0.9292 0.9288 0.9286 0.9293 0.9288 0.9307 0.9298 0.9288 0.9291 0.9306

Acc 0.5981 0.5882 0.6253 0.5979 0.5940 0.6047 0.6116 0.5960 0.5961 0.5979
TwoLeadECG
F1 0.5626 0.5833 0.5969 0.5626 0.5709 0.5712 0.5798 0.5667 0.5669 0.5626

Acc 0.8451 0.8467 0.8492 0.8483 0.8459 0.8481 0.8542 0.8457 0.8481 0.8564
StarlightCurves
F1 0.8596 0.8490 0.8643 0.8616 0.8605 0.8480 0.8549 0.8480 0.8485 0.8628

Acc 0.4692 0.5048 0.5325 0.5648 0.4709 0.4848 0.4986 0.4788 0.4889 0.5109
SonyAIBORobotSurface
F1 0.3422 0.4080 0.4614 0.5160 0.3454 0.3721 0.3987 0.3641 0.3721 0.3987

Acc 0.8514 0.8482 0.8557 0.8580 0.8477 0.8516 0.8537 0.8474 0.8496 0.8579
MoteStrain
F1 0.8502 0.8465 0.8540 0.8563 0.8460 0.8499 0.8520 0.8457 0.8479 0.8562

Acc 0.7900 0.8027 0.8053 0.8027 0.0828 0.8096 0.8053 0.8013 0.8027 0.8107
GunPoint
F1 0.7887 0.7991 0.8020 0.8018 0.8020 0.8089 0.8047 0.8004 0.8018 0.8100

Acc 0.2255 0.2273 0.2284 0.2287 0.2280 0.2289 0.2291 0.2255 0.2280 0.2287
InlineSkate
F1 0.2274 0.2295 0.2306 0.2311 0.2303 0.2312 0.2314 0.2274 0.2303 0.2310

Acc 0.7289 0.7267 0.7291 0.7318 0.7224 0.7291 0.7301 0.7244 0.7264 0.7281
UWaveGestureLibraryX
F1 0.6794 0.6794 0.6898 0.7050 0.6705 0.6898 0.6891 0.6726 0.6746 0.6998

TABLE III
E VALUATION RESULTS FOR THE SVM CLASSIFIER . S HOWN ARE THE RESULTS FOR THE ORIGINAL DATASET FOLLOWED BY THE RESULTS USING THE
AUGMENTED TRAINING DATASETS FOR n = 2, n = 5, AND n = 10 WITH THE THREE DIFFERENT ARCHITECTURES . I NDICATED ARE ACCURACY [%] AND
F1-S CORE AS MEAN OF 5 RUNS . S ETTINGS : KERNEL = RBF, GAMMA = AUTO , C = 1.

Architecture 1 Architecture 2 Architecture 3


Dataset metric original data
n=2 n=5 n = 10 n=2 n=5 n = 10 n=2 n=5 n = 10

Acc 0.8200 0.8700 0.8760 0.8880 0.8700 0.8780 0.8880 0.8600 0.8800 0.8900
ECG200
F1 0.8191 0.8687 0.8752 0.8862 0.8676 0.8769 0.8862 0.8580 0.8792 0.8880

Acc 0.9044 0.9377 0.9352 0.9336 0.9385 0.9394 0.9350 0.9391 0.9362 0.9384
ECG5000
F1 0.9032 0.9269 0.9238 0.9219 0.9279 0.9300 0.9236 0.9286 0.9250 0.9298

Acc 0.4707 0.5882 0.6372 0.6699 0.5991 0.6481 0.6645 0.5994 0.6444 0.6699
TwoLeadECG
F1 0.4701 0.5833 0.6272 0.6564 0.5931 0.6366 0.6515 0.5935 0.6337 0.6564

Acc 0.8468 0.8593 0.8938 0.9356 0.8596 0.8918 0.9343 0.8602 0.8924 0.9365
StarlightCurves
F1 0.7933 0.7994 0.8772 0.9356 0.8003 0.8742 0.9322 0.8017 0.8755 0.9347

Acc 0.5706 0.6878 0.7754 0.7045 0.6955 0.7720 0.6689 0.7022 0.7787 0.7178
SonyAIBORobotSurface
F1 0.5556 0.6705 0.7711 0.6880 0.6800 0.7675 0.6573 0.6879 0.7748 0.7045

Acc 0.8182 0.8187 0.8458 0.8586 0.8179 0.8474 0.8578 0.8195 0.8530 0.8594
MoteStrain
F1 0.6556 0.8168 0.8453 0.8586 0.8160 0.8470 0.8586 0.8177 0.8529 0.8595

Acc 0.7667 0.7867 0.8133 0.8240 0.7867 0.8212 0.8240 0.7840 0.8067 0.8267
GunPoint
F1 0.7667 0.7866 0.8132 0.8214 0.7862 0.8200 0.8240 0.7838 0.8066 0.8267

Acc 0.1764 0.2164 0.2491 0.2582 0.2164 0.2493 0.2582 0.2236 0.2509 0.2462
InlineSkate
F1 0.1698 0.2129 0.2438 0.2550 0.2121 0.2451 0.2561 0.2202 0.2445 0.2426

Acc 0.7599 0.7398 0.7635 0.7694 0.7395 0.7856 0.7736 0.7451 0.7859 0.7728
UWaveGestureLibraryX
F1 0.7418 0.7508 0.7539 0.7547 0.7501 0.7819 0.7594 0.7563 0.7818 0.7585

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.
In most evaluation tasks, improvements could be achieved R EFERENCES
with the augmented datasets using both, kNN and SVM as [1] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
classifiers. For both kNN and SVM, it has been shown that “SMOTE: synthetic minority over-sampling technique,” Journal of Ar-
n = 10 is recommendable. With n = 10, an improvement in tificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Acc. and F1 was achieved for the SVM in every test. For with deep convolutional neural networks,” in Advances in Neural
the kNN, too, n = 10 provides the best results. Examining the Information Processing Systems, 2012, pp. 1097–1105.
different architectures, we notice that the kNN works best with [3] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R.
Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scal-
the synthetic samples of architecture 1, while for the SVM the ing up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567,
samples generated by architecture 3 fit best. Considering the 2014.
different characteristics of the datasets, it can be stated that [4] T.T. Um et al., ”Data augmentation of wearable sensor data for
parkinson’s disease monitoring using convolutional neural networks”. In
the sample length, the type or the number of classes have no Proceedings of the 19th ACM International Conference on Multimodal
influence on the applicability of VAE as a data augmentation Interaction (pp. 216-220), 2017.
method. However, with SVM it can be seen that especially [5] G. Forestier, F. Petitjean, H. A. Dau, G. I. Webb and E. Keogh,
”Generating Synthetic Time Series to Augment Sparse Datasets,” 2017
small training datasets benefit from augmentation. IEEE International Conference on Data Mining (ICDM), New Orleans,
Overall, VAE is an adequate method to enrich sparse LA, 2017, pp. 865-870.
datasets with synthetic data. For the SVM, an improvement in [6] C. Kim, A. Misra, K. Chin, T. Hughes, A. Narayanan, T. Sainath, and
M. Bacchiani, “Generation of large-scale simulated utterances in virtual
Acc. and F1 could be achieved in almost every test. Also for rooms to train deep-neural networks for far-field speech recognition in
the kNN, most of the results were better than the benchmark Google home,” in Proc. INTERSPEECH, 2017, pp. 379-383.
results. [7] J. Yeomans, S. Thwaites, W. S. P. Robertson, D. Booth, B. Ng and
D. Thewlis, ”Simulating Time-Series Data for Improved Deep Neural
In future work, we plan to use VAEs compensate for class Network Performance,” in IEEE Access, vol. 7, pp. 131248-131255,
imbalances in unbalanced datasets. Furthermore, experiments 2019.
with higher dimensional latent space may be interesting. [8] J. Pereira and M. Silveira, ”Learning Representations from Healthcare
Time Series Data for Unsupervised Anomaly Detection,” 2019 IEEE
International Conference on Big Data and Smart Computing (BigComp),
2019
[9] Y. Lu and J. Li, ”Generative Adversarial Network for Improving
Deep Learning Based Malware Classification,” 2019 Winter Simulation
Conference (WSC), National Harbor, MD, USA, 2019, pp. 584-593, doi:
10.1109/WSC40007.2019.9004932.
[10] Y. Huang, C. Chen and C. Huang, ”Motor Fault Detection and
Feature Extraction Using RNN-Based Variational Autoencoder,” in
IEEE Access, vol. 7, pp. 139086-139096, 2019, doi: 10.1109/AC-
CESS.2019.2940769.
[11] K. He, X. Zhang, S. Ren and J. Sun, ”Deep Residual Learning for
Image Recognition,” 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi:
10.1109/CVPR.2016.90.
[12] Diederik P Kingma and Max Welling, ”Auto-Encoding Variational
Bayes”, 2nd International Conference on Learning Representations,
ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track
Proceedings
[13] H. A. Dau et al., ”The UCR time series archive,” in IEEE/CAA Journal
of Automatica Sinica, vol. 6, no. 6, pp. 1293-1305, November 2019.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 03,2021 at 13:41:29 UTC from IEEE Xplore. Restrictions apply.

You might also like