0% found this document useful (0 votes)

150 views8 pages

Sakurada 2014

Uploaded by

g262724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views8 pages

Sakurada 2014

Uploaded by

g262724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Anomaly Detection Using Autoencoders

with Nonlinear Dimensionality Reduction

Mayu Sakurada Takehisa Yairi

The University of Tokyo The University of Tokyo
Department of Aeronautics Research Center for Advanced
and Astronautics Science and Technology
[email protected] [email protected]

ABSTRACT has been used since the 1990’s, often with the name autoas-
This paper proposes to use autoencoders with nonlinear di- sociative neural networks [8], [7]. However, there are still few
mensionality reduction in the anomaly detection task. The works in which researchers try to apply those learned fea-
authors apply dimensionality reduction by using an autoen- tures to other data mining tasks. Our idea is to apply them
coder onto both artificial data and real data, and compare to one of the fundamental data mining tasks: the anomaly
it with linear PCA and kernel PCA to clarify its property. detection task. We perform dimensionality reduction us-
The artificial data is generated from Lorenz system, and ing autoencoders to the data which contain anomalies. We
the real data is the spacecrafts’ telemetry data. This paper investigate the difference in performance to detect anoma-
demonstrates that autoencoders are able to detect subtle lies by comparing an autoencoder with other traditional ap-
anomalies which linear PCA fails. Also, autoencoders can proaches such as linear principal component analysis (here-
increase their accuracy by extending them to denoising au- inafter referred to as PCA), and kernel PCA. Previous works
toenconders. Moreover, autoencoders can be useful as non- proposed the other extension to the ordinary autoencoder,
linear techniques without complex computation as kernel named denoising autoencoder [13], and we also include this
PCA requires. Finaly, the authors examine the learned fea- approach in our comparison.
tures in the hidden layer of autoencoders, and present that Our work eventually aims to detect anomalies in the space-
autoencoders learn the normal state properly and activate crafts’ telemetry data by dimensionality reduction technique.
differently with anomalous input. Spacecrafts have a complex system and their telemetry data
have hundreds of variables. Most of the variables are nonlin-
early correlated and temporally dependent. It is difficult for
Categories and Subject Descriptors humans to distinguish the abnormal state from the normal
I.2.1 [Artificial Intelligence]: Applications and Expert Sys- state only by the raw data. For this reason, training the
tems—Industrial automation; I.5.4 [Pattern recognition]: Ap- machine to learn the normal state and displaying the recon-
plications—Signal processing struction error as the anomaly score is valuable. Thus, in
this paper we especially focus on the time series data which
consist of 10-100 variables with the nonlinear correlation.
General Terms Our contribution is three-fold. First we apply dimension-
Performance ality reduction using autoencoders to both artificial data
and real data, and present that autoencoders are applicable
Keywords to anomaly detection. Second, we compare the performance
among autoencoders, denoising autoencoders, linear PCA
anomaly detection, novelty detection, fault detection, au- and kernel PCA to clarify the property of autoencoders.
toencoder, auto-assosiative neural network, denoising au- We found that 1) autoencoders can detect anomalies which
toencoder, dimensionality reduction, nonlinear, spacecrafts linear PCA fails to detect, and also increase the accuracy
by extending autoencoders to denoising autoencoders, and
1. INTRODUCTION 2) autoencoders can avoid complex computation as kernel
Recently, feature learning using the neural network with PCA requires without degrading the quality of detecting
dimensionality reduction has become popular in Deep Learn- performance. Finally, we investigate the learned features
ing context [4]. Actually an autoencoder, which is the neural in the hidden layer of the autoencoder, and display that
network with nonlinear dimensionality reduction capability, they learn the normal state properly and activates differ-
ently with anomalous input.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
2. RELATED WORK
of this work owned by others than ACM must be honored. Abstracting with One of the properties of autoencoders is that they can
credit is permitted. To copy otherwise, or republish, to post on servers or to employ nonlinear dimensionality reduction. There are sev-
redistribute to lists, requires prior specific permission and/or a fee. Request eral papers such as [8], [6], [10] in which the authors in-
permissions from [email protected]. vestigated its nonlinear property. In [6], they theoretically
MLSDA ’14, December 2, 2014, Gold Coast, QLD, Australia
Copyright 2014 ACM 978-1-4503-3159-3/14/12 ...$15.00 demonstrated the nonlinearity of autoencoders. In [8], [6],
http://dx.doi.org/10.1145/2689746.2689747 [10], they applied autoencoders to nonlinear anomaly detec-
tion data which is artificially generated. However, the data
they used are too simple to simulate the real data. In our x1 x^ 1
work, we generated the data with 25 dimensions from a more
complicated nonlinear system using the Lorenz equations. x2 x^ 2
Some of the previous works applied autoencoders to the
real data or the realistic data generated by simulating the
real world model [11], [3], [12], [9]. However, these works are x3 x^ 3
insufficient in that either they only use the low dimensional
data or they lack the comparison with the other approaches. x4 x^ 4
We applied two kinds of real data: one has 10 dimensions
and the other has more than 100 dimensions. Although some +1
works compare an autoencoder with other approaches [7], x5 x^ 5
[15], in this paper we focus on the dimensionality reduction
and determine the difference in performances according to +1
the reconstruction error.
Layer L1 Layer L2 Layer L3
3. ANOMALY DETECTION USING AUTOEN-
CODERS Figure 1: Autoencoder [1]

3.1 Anomaly Detection by Dimensionality Re-

duction complexity. Furthermore, kernel PCA basically requires to
In anomaly detection based on machine learning or data hold all the training samples, which is also computation-
mining, we obtain the model which captures the normal ally expensive. Therefore, compared to kernel PCA, au-
behavior in the training period, and after that we check toencoders have the advantages in computation cost.
whether the test data can be fitted with the trained model or
not. If the test data is inconsistent with the trained model, 3.2 Dimensionality Reduction by Autoencoders
we regard it as an anomaly. First of all, an autoencoder is an unsupervised neural net-
Anomaly detection using dimensionality reduction is based work, whose objective is to learn to reproduce input vectors
on the assumption that data has variables correlated with {x(1), x(2), . . . , x(m)} as outputs {x̂(1), x̂(2), . . . , x̂(m)}. Fig.
each other and can be embedded into a lower dimensional 1 shows an autoencoder. In this figure, Layer L2 is the hid-
subspace in which normal samples and anomalous samples den layer, whereby the inputs are compressed into a small
appear significantly different [2]. In the training phase, we number of neurons. Activation of unit i in layer l is given
have normal data as training set {x(1), x(2), . . . , x(m)}. As- by Eq. 2:
suming each data sample x(i) ∈ RD is represented by a
vector of D different variables. We compress the data into ( )
(l)
∑
n
(l−1) (l−1) (1)
lower dimensional latent subspace and reproduce the out- ai = f Wij aj + bi (2)
put {x̂(1), x̂(2), . . . , x̂(m)} so that the reconstruction error j=1
in Eq. 1 becomes small.
where W and b denote weight and bias parameters respec-
v tively. In the first layer, i.e., the input layer, a(1) = x, and
uD in the last layer, i.e., the output layer, a(3) = x̂. For the
u∑
Err(i) = t (xj (i) − xˆj (i))2 (1) activate function f , we used sigmoid function in hidden lay-
j=1 ers, but in the output layer, we used linear function since
After we determine the subspace, in the test phase, we we don’t pre-scale every input example to a specific interval
project test data into the subspace and reconstruct the orig- like [−1, 1].
inal data. We use the reconstruction error shown in Eq. 1 as During the training period, we minimize the objective
the anomaly score. The reconstruction error has low values function shown in Eq. 3 with respect to W and b. The
if test samples are normal instances that satisfy the normal objective function includes the regularization term, and the
correlation learned during the test phase, while the error parameter λ determines the strength of regularization.
becomes large with anomalous samples.
m ( )
λ ∑ ∑ ∑ ( (l) )2
nl −1 sl sl +1
There are several dimensionality reduction techniques. In 1 ∑ 1
this work, we choose representative linear and nonlinear J(W , b) = ∥x(i) − x̂(i)∥2 + Wji
m i=1 2 2
techniques, linear PCA and kernel PCA, as the baseline to i=1 j=1 l=1
be compared with autoencoders. (3)
Kernel PCA [5] performs nonlinear mapping to the high- where nl denotes number of layers in the network and sl
dimensional feature space by a kernel function, and then em- denotes number of units in layer Ll .
ploy linear PCA in the feature space. In this work, we use Recently a denoising autoencoder [13], which is one of the
||x(i) − x(j)||2 extensions of an autoencoder, has been developed. The idea
the Gaussian kernel k(x(i), x(j)) = exp(− ). is to learn an over-complete set of basis vectors to represent
σ2
In kernel PCA, the reconstruction error is computed in the input vectors, so that our basis vectors can capture struc-
feature space, not in the original observation space. To ob- tures and patterns inherent in the input data better. At the
tain the reconstructed data in the original space, we must same time, in order to avoid highly compressed encoding
solve the pre-image problem which has high computational which is usually highly entangled, we can encode the input
20

50
15

30 10

Variable
z3

5
10

0
0
20
-30
-20
0 -10
0 Training
10 Test
20 -5
z1 -20 30 z2 0 500 1000 1500 2000 2500
Time Index

60 250

50
200

40
150
Variable

Variable
30

100

50
10

0
0

Training Test Training Test

-10 -50
0 200 400 600 800 1000 0 500 1000 1500 2000 2500 3000 3500 4000 4500

Time Index Time Index

Figure 2: Top: Normal {z(1), z(2), ..., z(849)} (blue) and Figure 3: Top: Normalized data of Satellite-A. Bottom:
anomalous {z(850), z(851), ..., z(1000)} (red) data from Normalized data of Satellite-B.
Lorenz system. Bottom: Normalized 25 dimensional Lorenz
system data x.
tions:
ż1 (t) = σ(z2 (t) − z1 (t))
with small subset of neurons. We can achieve this by in- ż2 (t) = z1 (t)(ρ − z3 (t))z2 (t) (4)
creasing the number of hidden units and adding some noise
to the input. There are some ways in adding the noise to ż3 (t) = z1 (t)z2 (t) − βz3 (t)
each input, but in this work, we destruct the input by ran-
We set three parameters σ, ρ and β to 28, 10 and 8/3 re-
domly choosing a fixed number of components of the input
spectively. According to Eq. 4, first we generated the vec-
to be 0, which is sometimes called as the salt-and-pepper
tor z(t) = (z1 (t) z2 (t) z3 (t))T . We sampled 1000 vec-
noise [14].
tors by running this simulation for 100[s] with the sam-
pling rate 0.1[s], with the small observation noise and sys-
tem transition noise. To generate the anomalous data, af-
4. EXPERIMENTAL SETUP ter sampling we flipped the values from z3 (850) to z3 (1000)
We performed dimensionality reduction on each data by 4 horizontally so that z3 aligns in reverse chronological or-
methods: linear PCA, an autoencoder, a denoising autoen- der after 850th. To generate the high dimensional vector
coder and kernel PCA. In each method, the number of latent x(t), first we made the matrix W ∈ R25×3 whose compo-
space dimension was adjusted manually. For autoencoders nents we randomly chose from the interval (−5, 5). Then
and denoising autoencoders, we adjusted several parame- we multiplied W by each vector z(t), i.e., x(t) = W z(t).
ters in the objective function (Eq. 3) as λ = 0.00001, β = 3, We divided 1000 samples into two, which are 700 train-
ρ = 0.01. The destruction level, i.e., the probability of that ing samples {x(1), x(2), ..., x(700)} and 300 test samples
each element is forced to 0, is fixed to 0.1. We compared the {x(701), x(702), ..., x(1000)}, with the latter half of the test
performances based on the reconstruction error in Eq. 1. samples including anomalies. Fig. 2 shows the distribution
of the 1000 vectors of z and the data of x after normalized
4.1 Artificial Data to a mean of zero and a variance of 1.
We prepared the nonlinear simulated data using the Lorenz
system. The Lorenz system consists of the following equa- 4.2 Real Data
We used two kinds of the spacecraft telemetry data in our
real data experiment: Satellite-A and Satellite-B. Satellite- Table 1: The average AUC of the 4 different methods on the
A and Satellite-B have 17 and 106 continuous sensor mea- 3 different data. LPCA, AE, dAE and KPCA denotes linear
surements respectively. Spacecraft telemetry data have many PCA, an autoencoder, a denoising autoencoder and kernel
different sensor measurements and in general these inputs PCA respectively. The first row Lorenz has the results on
are correlated with each other [16], [17]. This means that the artificial data using the Lorenz system, and last two
we can remove redundant inputs and represent each data rows Sat-A and Sat-B has the results on the real data of
sample as a lower dimensional vector. two kinds of spacecrafts’ telemetry data.
LPCA AE dAE KPCA
Fig. 3 shows the data after we normalized each data so
that the mean and variance become 0 and 1 respectively. Lorenz 0.5104 0.6473 0.7011 0.7045
Sat-A 0.8852 0.8847 0.9354 0.8862
5. RESULT AND DISCUSSION Sat-B 0.9764 0.9763 0.8355 0.7689

5.1 Artificial Data

autoencoder only took several minutes.
If we look at Fig. 4, it is clear that the reconstruction error Furthermore, we visualized the activation of a part of the
becomes far bigger after the 150th in the test set with an au- neurons in the hidden layer in Fig. 7. We can see that
toencoder, a denoising autoencoder and kernel PCA. Since the anomalous data is significantly different from the nor-
this data includes verified anomalies after 150th, it means mal data in the latent space. This means that denoising
that the anomaly detection with nonlinear techniques was autoencoders is able to learn the meaningful features to re-
successful. Linear PCA, however, has failed to show a sig- produce normal state and these learned features can’t be
nificant difference between the anomalous and the normal used to reproduce anomalous input.
data. We can also see that in Tab. 1, in which linear PCA
performs poorly in the Lorenz row. This is a good exam-
ple that nonlinear dimensionality reduction technique, like 6. CONCLUSION AND FUTURE WORK
autoencoders, can learn the nonlinear correlation between In this study, we demonstrated examples of applying fea-
a lot of variables and succeed to detect anomalies, while ture learning by autoencoders to anomaly detection, which
linear PCA, which employs linear dimensionality reduction, is one of the fundamental data mining tasks. Another con-
fails and misses anomalies. In the Lorenz row in Tab. 1, tribution was comparison of autoencoders with linear PCA
we can also notice that the denoising autoencoder performs and kernel PCA on the artificial data and real data. We
better than the ordinary autoencoder. In this case, we suc- clarify the property and the effectiveness of autoencoders
ceeded to increase the accuracy by extending autoencoders based on that comparison. In addition, we examined the
to denoising autoencoders. learned features in the hidden layer to show the different
activations with normal input and anomalous input, which
5.2 Real Data has not been done before.
We can see from Fig. 5 and Fig. 6 that basically all dimen- At the moment we manually tune the parameters of the
sionality reduction methods succeeded to detect anomalies regularization term of autoencoders, the destruction level of
on both spacecrafts’ data. We can see in Fig. 5 and in denoising autoencoders, the number of latent dimensions,
the row Sat-A in Tab. 1 that, although the performances and so on. Further detailed investigation for those parame-
of linear PCA and an autoencoder are almost the same, a ters will be necessary in future work. Also, additional com-
denoising autoencoder performs better than linear PCA and parison with other techniques like vector quantization PCA,
an autoencoder. Unlike the experiment on the Lorenz sys- mixture probabilistic PCA, which are known as hybrid of
tem data and Satellite-A data, we can see in the Sat-B row clustering and dimensionality reduction [16], [17], would be
in Tab. 1 that a denoising autoencoder fails to increase the interesting for clarifying the property of autoencoders. We
accuracy. In this case, since the detecting performance is al- regard each data sample at each time index as independent,
ready good enough with the ordinary autoencoder, adding i.e., we disregard time sequence. Although the performance
noise to the input rather gives a bad effect. We tried this is already good enough without temporal information, we
experiment with several different numbers of dimensions. can add the information by giving autoencoders a data vec-
Linear PCA turned out to be very sensitive to the number tor including current as well as past samples, and see how
of latent dimensions, and it was harder to tune the num- it improves the performance.
ber of latent dimension. Autoencoders can detect anoma-
lies even with relatively high latent dimensions while linear
PCA can’t.
7. REFERENCES
When compared to kernel PCA, the autoencoder and de- [1] UFLDL Tutorial. http://ufldl.stanford.edu/wiki/
noising autoencoder performed either better or the same as index.php/Autoencoders and Sparsity. [Online;
kernel PCA. Kernel PCA, however, requires heavy compu- accessed 10-August-2014].
tation. By using autoencoders, we don’t need to hold all the [2] V. Chandola, A. Banerjee, and V. Kumar. Anomaly
training samples and we can avoid memory intensive kernel detection: A survey. ACM Computing Surveys,
computation. Also, in autoencoders we can compare the 41(3):1–58, July 2009.
original and reconstructed data in the original observation [3] S. Hawkins, H. He, G. Williams, and R. Baxter.
space, and we don’t need to solve complex pre-image prob- Outlier detection using replicator neural networks. In
lem which kernel PCA requires. In fact, the overall running Proceedings of the Fifth International Conference and
time including training and test phase was more than an Data Warehousing and Knowledge Discovery, pages
hour in kernel PCA, while the autoencoder and denoising 170–180, 2002.
-14 -15
x 10 Linear PCA (latent space: 10dim) x 10 Linear PCA (latent space: 10dim)
1.2 6

Normal Anomalous
1 4

Reconstruction Error 0.8 2

Difference
0.6 0

0.4 -2

0.2 -4

Normal Anomalous
0 -6
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)

Autoencoder (latent space: 10dim) Autoencoder (latent space: 10dim)
0.7 0.2

Normal Anomalous
0.15
0.6

0.1
0.5
Reconstruction Error

0.05

Difference
0.4

0.3
-0.05

0.2
-0.1

0.1
-0.15
Normal Anomalous
0 -0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)

Denoising Autoencoder (latent space: 50dim, destLevel: 10%) Denoising Autoencoder (latent space: 50dim, destLevel: 10%)
0.8 0.3

Normal Anomalous
0.7
0.2

0.6
0.1
Reconstruction Error

0.5
Difference

0.4

-0.1
0.3

-0.2
0.2

-0.3
0.1
Normal Anomalous
0 -0.4
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)

Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
0.12 0.15

Normal Anomalous
0.1
0.1

0.05
Reconstruction Error

0.08
Difference

0.06

-0.05

0.04
-0.1

0.02
-0.15
Normal Anomalous
0 -0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)

Figure 4: Result on Lorenz system data. The reconstruction error (left column) and the diﬀerence between the original and
reconstructed data (right column) of linear PCA, an autoencoder, a denoising autoencoder and kernel PCA.
Linear PCA (latent space: 4dim) Linear PCA (latent space: 4dim)
4 4

3.5
3

3
Reconstruction Error 2

2.5

Difference
1

0
1.5

-1
1

-2
0.5

Normal Anomalous Normal Anomalous

0 -3
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)

Autoencoder (latent space: 4dim) Autoencoder (latent space: 4dim)
4 3

3.5
2

3
Reconstruction Error

1
2.5

Difference
2 0

1.5
-1

-2
0.5

Normal Anomalous Normal Anomalous

0 -3
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)

Denoising Autoencoder (latent space: 10dim, destLevel: 10%) Denoising Autoencoder (latent space: 10dim, destLevel: 10%)
4.5 2

4
1
3.5
Reconstruction Error

3 0
Difference

2.5
-1
2

1.5 -2

1
-3
0.5
Normal Anomalous Normal Anomalous
0 -4
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)

Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
0.14 3.5

3
0.12

2.5
0.1
Reconstruction Error

2
Difference

0.08 1.5

0.06 1

0.5
0.04
0

0.02
-0.5
Normal Anomalous Normal Anomalous
0 -1
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)

Figure 5: Result on Satellite-A data. The reconstruction error (left column) and the diﬀerence (right column) of linear PCA,
an autoencoder, an denoising autoencoder and kernel PCA are shown from top to bottom.
Linear PCA (latent space: 16dim) Linear PCA (latent space: 16dim)
4.5 1.5

Normal Anomalous
4 1

3.5
0.5
Reconstruction Error
3
0

Difference
2.5
-0.5
2
-1
1.5

-1.5
1

0.5 -2
Normal Anomalous
0 -2.5
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)

Autoencoder (latent space: 16dim) Autoencoder (latent space: 16dim)
6 2

Normal Anomalous 1.5

5
1

0.5
Reconstruction Error

Difference
0

3 -0.5

-1
2
-1.5

-2
1
-2.5 Normal Anomalous
0 -3
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)

Denoising Autoencoder (latent space: 16dim, destLevel: 10%) Denoising Autoencoder (latent space: 16dim, destLevel: 10%)
9 3

Normal Anomalous
8
2
7
Reconstruction Error

6 1
Difference

5
0
4

3 -1

2
-2
1
Normal Anomalous
0 -3
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)

Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
1.4 15

Normal Anomalous
1.2
10

1
Reconstruction Error

5
Difference

0.8

0.6

-5
0.4

-10
0.2
Normal Anomalous
0 -15
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)

Figure 6: Result on Satellite-B data. The reconstruction error (left column) and the difference (right column) of linear PCA,
an autoencoder, a denoising autoencoder and kernel PCA are shown from top to bottom.
[4] G. E. Hinton and R. R. Salakhutdinov. Reducing the
dimensionality of data with neural networks. Science,
313(5786):504–507, 2006.
[5] H. Hoffmann. Kernel pca for novelty detection.
Pattern Recognition, 40(3):863–874, 2007.
[6] B. Hwang and S. Cho. Characteristics of
auto-associative mlp as a novelty detector. In
Proceedings of the International Joint Conference on
Neural Networks, volume 5, pages 3086–3091, 1999.
[7] N. Japkowicz, C. Myers, and M. Gluck. A novelty
detection approach to classification. In Proceedings of
the 14th International Joint Conference on Artificial
Intelligence, volume 1, pages 518–523, 1995.
[8] M. A. Kramer. Nonlinear principal component
1

0.8
analysis using autoassociative neural networks. AIChE
J., 37(2):233–243, 1991.
0.6 [9] M. Martinelli, E. Tronci, G. Dipoppa, and
Unit 3

C. Balducelli. Electric power system anomaly

0.4
detection using neural networks. In Knowledge-Based
0.2
Intelligent Information and Engineering Systems,
volume 3213 of Lecture Notes in Computer Science,
0 pages 1242–1248. 2004.
1
[10] S. O. Song, D. Shin, and E. S. Yoon. Analysis of
0.5
0.6
0.8
1
novelty detection properties of auto-associators. In
Unit 2
0
0.2
0.4
Unit 1
Proceedings of the International Congress on
0
Condition Monitoring and Diagnostic Engineering
Management, pages 577–584, 2001.
1 [11] C. Surace, K. Worden, and G. Tomlinson. A novelty
detection approach to diagnose damage in a cracked
0.9
beam. In Proceedings of SPIE, pages 947–953, 1997.
0.8 [12] B. Thompson, R. Marks, J. Choi, M. El-Sharkawi,
0.7
M.-Y. Huang, and C. Bunje. Implicit learning in
autoencoder novelty assessment. In Proceedings of the
2002 International Joint Conference on Neural
Unit 2

0.6

0.5
Networks, volume 3, pages 2878–2883, 2002.
[13] P. Vincent, H. Larochelle, Y. Bengio, and P.-A.
0.4
Manzagol. Extracting and composing robust features
0.3 with denoising autoencoders. In Proceedings of the
25th International Conference on Machine Learning,
0.2
pages 1096–1103, 2008.
0.1 [14] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and
0 0.2 0.4 0.6 0.8 1

Unit 1
P.-A. Manzagol. Stacked denoising autoencoders:
Learning useful representations in a deep network
with a local denoising criterion. Journal of Machine
Figure 7: Top: An example of the activation in three of neu-
Learning Research, 11:3371–3408, Dec. 2010.
rons in the hidden layer of the denoising autoencoder with
[15] G. Williams, R. Baxter, H. He, S. Hawkins, and L. Gu.
normal input (blue) and anomalous input (red). Bottom:
A comparative study of rnn for outlier detection in
Another example of the activation in two of the neurons in
data mining. In Proceedings of the International
the hidden layer of the denoising autoencoder with normal
Conference on Data Mining, page 709, 2002.
input (blue) and anomalous input (red). In these two fig-
ures, the hidden units of the denoising autoencoder activate [16] T. Yairi, M. Inui, A. Yoshiki, Y. Kawahara, and
in a different way with anomalous input. N. Takata. Spacecraft telemetry data monitoring by
dimensionality reduction techniques. In Proceedings of
SICE Annual Conference, pages 1230–1234, Aug 2010.
[17] T. Yairi, T. Tagawa, and N. Takata. Telemetry
monitoring by dimensionality reduction and learning
hidden markov model. In Proceedings of the
International Symposium on Artificial Intelligence,
Robotics and Automation in Space, 2012.

MATLAB Frame Analysis Programing Code
100% (8)
MATLAB Frame Analysis Programing Code
4 pages
Scilab in Systems and Control
No ratings yet
Scilab in Systems and Control
31 pages
Digital Image Processing Lab Manual
67% (3)
Digital Image Processing Lab Manual
19 pages
Machine Learning Roadmap For 2025
No ratings yet
Machine Learning Roadmap For 2025
4 pages
Sudoku
No ratings yet
Sudoku
35 pages
Practical Cryptography With Go
No ratings yet
Practical Cryptography With Go
54 pages
Neural Network Toolbox Command List
No ratings yet
Neural Network Toolbox Command List
4 pages
D L A D: A S: EEP Earning For Nomaly Etection Urvey
No ratings yet
D L A D: A S: EEP Earning For Nomaly Etection Urvey
50 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
02-Numerical Methods of Approximation-86
No ratings yet
02-Numerical Methods of Approximation-86
86 pages
Striver A2Z DSA 80day Roadmap
No ratings yet
Striver A2Z DSA 80day Roadmap
8 pages
Autoencoders and Their Applications in Machine Learning
No ratings yet
Autoencoders and Their Applications in Machine Learning
52 pages
Anomaly Detection of Gas Turbines Based On Normal Pattern Extraction
No ratings yet
Anomaly Detection of Gas Turbines Based On Normal Pattern Extraction
44 pages
Deep Learning For Anomaly Detection: A Review: Guansong Pang, Chunhua Shen, Longbing Cao, Anton Van Den Hengel
No ratings yet
Deep Learning For Anomaly Detection: A Review: Guansong Pang, Chunhua Shen, Longbing Cao, Anton Van Den Hengel
36 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Deep Learning For Anomaly Detection A Survey.
No ratings yet
Deep Learning For Anomaly Detection A Survey.
50 pages
Chapter 4
No ratings yet
Chapter 4
28 pages
02 - 03 - Anomaly Detection Survey
No ratings yet
02 - 03 - Anomaly Detection Survey
27 pages
Local - PCA - It Is Against PCA
No ratings yet
Local - PCA - It Is Against PCA
24 pages
Handout 36: Final Exam Solutions: Problem 1. Recurrences
No ratings yet
Handout 36: Final Exam Solutions: Problem 1. Recurrences
21 pages
Address All Correspondence To P.seshadri@imperial - Ac.uk
No ratings yet
Address All Correspondence To P.seshadri@imperial - Ac.uk
29 pages
Deep Learningfor Time Series Anomaly Detection
No ratings yet
Deep Learningfor Time Series Anomaly Detection
42 pages
Bioengineering 10 00405 v2
No ratings yet
Bioengineering 10 00405 v2
30 pages
A Review of Neural Networks For Anomaly Detection
No ratings yet
A Review of Neural Networks For Anomaly Detection
26 pages
DSA5105 Lecture8
No ratings yet
DSA5105 Lecture8
35 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
1 s2.0 S0167739X23000560 Main
No ratings yet
1 s2.0 S0167739X23000560 Main
12 pages
Interpolation: 7.2.1 Newton's Forward Interpolation Formula
No ratings yet
Interpolation: 7.2.1 Newton's Forward Interpolation Formula
22 pages
Anomaly Detection Time Series Final PDF
No ratings yet
Anomaly Detection Time Series Final PDF
12 pages
Auto-Encoder Based Dimensionality Reduction
No ratings yet
Auto-Encoder Based Dimensionality Reduction
25 pages
Manifold Learning Algorithms
No ratings yet
Manifold Learning Algorithms
17 pages
Deep Auto Encoders For Intrusion Detection
No ratings yet
Deep Auto Encoders For Intrusion Detection
8 pages
CRC ds2019
No ratings yet
CRC ds2019
15 pages
(ICML 2024) RoSA Accurate Parameter-Efficient Fine-Tuning Via Robust Adaptation
No ratings yet
(ICML 2024) RoSA Accurate Parameter-Efficient Fine-Tuning Via Robust Adaptation
20 pages
1 s2.0 S0306457321003162 Main
No ratings yet
1 s2.0 S0306457321003162 Main
13 pages
Aam Micro
No ratings yet
Aam Micro
13 pages
Anomaly Detection For Advanced Military Aircraft Using Neural Networks
No ratings yet
Anomaly Detection For Advanced Military Aircraft Using Neural Networks
12 pages
Image - Anomaly - Detection With - GAN
No ratings yet
Image - Anomaly - Detection With - GAN
15 pages
Stacked Autoencoders. - Towards Data Science
No ratings yet
Stacked Autoencoders. - Towards Data Science
9 pages
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
No ratings yet
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
14 pages
深度自编码器生成对抗网络数据不平衡
No ratings yet
深度自编码器生成对抗网络数据不平衡
20 pages
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
No ratings yet
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
14 pages
Kotlar Et Al. - 2021 - Novel Meta-Features For Automated Machine Learning Model Selection in Anomaly Detection
No ratings yet
Kotlar Et Al. - 2021 - Novel Meta-Features For Automated Machine Learning Model Selection in Anomaly Detection
13 pages
Auto Encoder
No ratings yet
Auto Encoder
16 pages
Machine Learning For Time Series Anomaly Detection: Ihssan Tinawi
No ratings yet
Machine Learning For Time Series Anomaly Detection: Ihssan Tinawi
55 pages
CG Assignment-3: Meenu Maria Giby U18CO106
No ratings yet
CG Assignment-3: Meenu Maria Giby U18CO106
10 pages
Variational Restricted Boltzmann Machines To Automated Anomaly Detection
No ratings yet
Variational Restricted Boltzmann Machines To Automated Anomaly Detection
14 pages
Cse 12
No ratings yet
Cse 12
12 pages
28682-Article Text-32736-1-2-20240324
No ratings yet
28682-Article Text-32736-1-2-20240324
11 pages
Anomaly Detection Using Deep Learning Based Model With Feature Attention
No ratings yet
Anomaly Detection Using Deep Learning Based Model With Feature Attention
8 pages
1c53091d 1746848322051
No ratings yet
1c53091d 1746848322051
7 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
A Novel Method For Computationally Efficacious Linear and Polynomial Regression Analytics of Big Data in Medicine
No ratings yet
A Novel Method For Computationally Efficacious Linear and Polynomial Regression Analytics of Big Data in Medicine
10 pages
EE 210 - 01 - Linear Systems Theory (Fall 2020) : San José State University Department of Electrical Engineering
No ratings yet
EE 210 - 01 - Linear Systems Theory (Fall 2020) : San José State University Department of Electrical Engineering
6 pages
Improving Performance of Autoencoder-Based Network Anomaly Detection On NSL-KDD Dataset
No ratings yet
Improving Performance of Autoencoder-Based Network Anomaly Detection On NSL-KDD Dataset
11 pages
521 2021 Article 5839
No ratings yet
521 2021 Article 5839
11 pages
SIGKDD使用 Deviation Networks 进行深度异常检测
No ratings yet
SIGKDD使用 Deviation Networks 进行深度异常检测
10 pages
672 Deep Autoencoding Gaussian Mix
No ratings yet
672 Deep Autoencoding Gaussian Mix
19 pages
Scribe 11
No ratings yet
Scribe 11
6 pages
Round Robin Scheduling Algorithm With Example
No ratings yet
Round Robin Scheduling Algorithm With Example
3 pages
Autoencoders For Anomaly Detection Are Unreliable
No ratings yet
Autoencoders For Anomaly Detection Are Unreliable
14 pages
auto-AID: A Data Mining Framework For Autonomic Anomaly Identification in Networked Computer Systems
No ratings yet
auto-AID: A Data Mining Framework For Autonomic Anomaly Identification in Networked Computer Systems
9 pages
Research Article: Unsupervised Anomaly Detection Based On Deep Autoencoding and Clustering
No ratings yet
Research Article: Unsupervised Anomaly Detection Based On Deep Autoencoding and Clustering
8 pages
Anomaly Detection in Aviation Data Using Extreme Learning Machines
No ratings yet
Anomaly Detection in Aviation Data Using Extreme Learning Machines
8 pages
Anomaly Detection in Aircraft Data Using Recurrent Neural Networks RNN
No ratings yet
Anomaly Detection in Aircraft Data Using Recurrent Neural Networks RNN
8 pages
W9a Autoencoders Pca
No ratings yet
W9a Autoencoders Pca
7 pages
Padim: A Patch Distribution Modeling Framework For Anomaly Detection and Localization
No ratings yet
Padim: A Patch Distribution Modeling Framework For Anomaly Detection and Localization
7 pages
Autoencoders Reloaded: Hervé Bourlard Selen Hande Kabil
No ratings yet
Autoencoders Reloaded: Hervé Bourlard Selen Hande Kabil
18 pages
3F3 2a More On DFT
No ratings yet
3F3 2a More On DFT
16 pages
A Comparative Study and Analysis of Dimensionality Reduction Techniques On High Dimensional Datasets For Network Anomaly Detection
No ratings yet
A Comparative Study and Analysis of Dimensionality Reduction Techniques On High Dimensional Datasets For Network Anomaly Detection
8 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
An Encoder-Decoder Based Approach For Anomaly Detection With Application in Additive Manufacturing
No ratings yet
An Encoder-Decoder Based Approach For Anomaly Detection With Application in Additive Manufacturing
8 pages
An Encoder-Decoder Based Approach For Anomaly Detection With Application in Additive Manufacturing
No ratings yet
An Encoder-Decoder Based Approach For Anomaly Detection With Application in Additive Manufacturing
8 pages
Robust Anomaly Detection For Multivariate Time Series Through Stochastic Recurrent Neural Network
No ratings yet
Robust Anomaly Detection For Multivariate Time Series Through Stochastic Recurrent Neural Network
10 pages
Monitoring High-Dimensional Data For Failure Detection and Localization in Large-Scale Computing Systems
No ratings yet
Monitoring High-Dimensional Data For Failure Detection and Localization in Large-Scale Computing Systems
13 pages
The Costas Loop - Wrapping It Up (Eric Hagemann)
No ratings yet
The Costas Loop - Wrapping It Up (Eric Hagemann)
6 pages
1 s2.0 S0893608023006469 Main
No ratings yet
1 s2.0 S0893608023006469 Main
13 pages
1 Order Non-Inverter Filters Analysis
No ratings yet
1 Order Non-Inverter Filters Analysis
7 pages
RCA: A Deep Collaborative Autoencoder Approach For Anomaly Detection
No ratings yet
RCA: A Deep Collaborative Autoencoder Approach For Anomaly Detection
7 pages
Anomaly Detection in Time Series With Robust Variational Quasi-Recurrent Autoencoders
No ratings yet
Anomaly Detection in Time Series With Robust Variational Quasi-Recurrent Autoencoders
13 pages
Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
No ratings yet
Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
4 pages
22mat31c QB
No ratings yet
22mat31c QB
5 pages
FEM 198 - 1 - Online
No ratings yet
FEM 198 - 1 - Online
5 pages
(Önemli) Review of Electric Vehicles Charging Data Anomaly Detection Based On Deep Learning
No ratings yet
(Önemli) Review of Electric Vehicles Charging Data Anomaly Detection Based On Deep Learning
5 pages
8.7 Skills Practice Key
No ratings yet
8.7 Skills Practice Key
1 page
Database Security
No ratings yet
Database Security
4 pages
Uts 2022 - Libra
No ratings yet
Uts 2022 - Libra
2 pages
WWW - Manaresults.co - In: Set No. 1
No ratings yet
WWW - Manaresults.co - In: Set No. 1
2 pages

Sakurada 2014

Uploaded by

Sakurada 2014

Uploaded by

Anomaly Detection Using Autoencoders

with Nonlinear Dimensionality Reduction

Mayu Sakurada Takehisa Yairi

3.1 Anomaly Detection by Dimensionality Re-

Training Test Training Test

Time Index Time Index

5.1 Artificial Data

Reconstruction Error 0.8 2

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Normal Anomalous Normal Anomalous

Time Index (Test Data) Time Index (Test Data)

Normal Anomalous Normal Anomalous

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Normal Anomalous 1.5

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

Time Index (Test Data) Time Index (Test Data)

C. Balducelli. Electric power system anomaly

You might also like