0% found this document useful (0 votes)
150 views8 pages

Sakurada 2014

Uploaded by

g262724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views8 pages

Sakurada 2014

Uploaded by

g262724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Anomaly Detection Using Autoencoders

with Nonlinear Dimensionality Reduction

Mayu Sakurada Takehisa Yairi


The University of Tokyo The University of Tokyo
Department of Aeronautics Research Center for Advanced
and Astronautics Science and Technology
[email protected] [email protected]

ABSTRACT has been used since the 1990’s, often with the name autoas-
This paper proposes to use autoencoders with nonlinear di- sociative neural networks [8], [7]. However, there are still few
mensionality reduction in the anomaly detection task. The works in which researchers try to apply those learned fea-
authors apply dimensionality reduction by using an autoen- tures to other data mining tasks. Our idea is to apply them
coder onto both artificial data and real data, and compare to one of the fundamental data mining tasks: the anomaly
it with linear PCA and kernel PCA to clarify its property. detection task. We perform dimensionality reduction us-
The artificial data is generated from Lorenz system, and ing autoencoders to the data which contain anomalies. We
the real data is the spacecrafts’ telemetry data. This paper investigate the difference in performance to detect anoma-
demonstrates that autoencoders are able to detect subtle lies by comparing an autoencoder with other traditional ap-
anomalies which linear PCA fails. Also, autoencoders can proaches such as linear principal component analysis (here-
increase their accuracy by extending them to denoising au- inafter referred to as PCA), and kernel PCA. Previous works
toenconders. Moreover, autoencoders can be useful as non- proposed the other extension to the ordinary autoencoder,
linear techniques without complex computation as kernel named denoising autoencoder [13], and we also include this
PCA requires. Finaly, the authors examine the learned fea- approach in our comparison.
tures in the hidden layer of autoencoders, and present that Our work eventually aims to detect anomalies in the space-
autoencoders learn the normal state properly and activate crafts’ telemetry data by dimensionality reduction technique.
differently with anomalous input. Spacecrafts have a complex system and their telemetry data
have hundreds of variables. Most of the variables are nonlin-
early correlated and temporally dependent. It is difficult for
Categories and Subject Descriptors humans to distinguish the abnormal state from the normal
I.2.1 [Artificial Intelligence]: Applications and Expert Sys- state only by the raw data. For this reason, training the
tems—Industrial automation; I.5.4 [Pattern recognition]: Ap- machine to learn the normal state and displaying the recon-
plications—Signal processing struction error as the anomaly score is valuable. Thus, in
this paper we especially focus on the time series data which
consist of 10-100 variables with the nonlinear correlation.
General Terms Our contribution is three-fold. First we apply dimension-
Performance ality reduction using autoencoders to both artificial data
and real data, and present that autoencoders are applicable
Keywords to anomaly detection. Second, we compare the performance
among autoencoders, denoising autoencoders, linear PCA
anomaly detection, novelty detection, fault detection, au- and kernel PCA to clarify the property of autoencoders.
toencoder, auto-assosiative neural network, denoising au- We found that 1) autoencoders can detect anomalies which
toencoder, dimensionality reduction, nonlinear, spacecrafts linear PCA fails to detect, and also increase the accuracy
by extending autoencoders to denoising autoencoders, and
1. INTRODUCTION 2) autoencoders can avoid complex computation as kernel
Recently, feature learning using the neural network with PCA requires without degrading the quality of detecting
dimensionality reduction has become popular in Deep Learn- performance. Finally, we investigate the learned features
ing context [4]. Actually an autoencoder, which is the neural in the hidden layer of the autoencoder, and display that
network with nonlinear dimensionality reduction capability, they learn the normal state properly and activates differ-
ently with anomalous input.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
2. RELATED WORK
of this work owned by others than ACM must be honored. Abstracting with One of the properties of autoencoders is that they can
credit is permitted. To copy otherwise, or republish, to post on servers or to employ nonlinear dimensionality reduction. There are sev-
redistribute to lists, requires prior specific permission and/or a fee. Request eral papers such as [8], [6], [10] in which the authors in-
permissions from [email protected]. vestigated its nonlinear property. In [6], they theoretically
MLSDA ’14, December 2, 2014, Gold Coast, QLD, Australia
Copyright 2014 ACM 978-1-4503-3159-3/14/12 ...$15.00 demonstrated the nonlinearity of autoencoders. In [8], [6],
http://dx.doi.org/10.1145/2689746.2689747 [10], they applied autoencoders to nonlinear anomaly detec-
tion data which is artificially generated. However, the data
they used are too simple to simulate the real data. In our x1 x^ 1
work, we generated the data with 25 dimensions from a more
complicated nonlinear system using the Lorenz equations. x2 x^ 2
Some of the previous works applied autoencoders to the
real data or the realistic data generated by simulating the
real world model [11], [3], [12], [9]. However, these works are x3 x^ 3
insufficient in that either they only use the low dimensional
data or they lack the comparison with the other approaches. x4 x^ 4
We applied two kinds of real data: one has 10 dimensions
and the other has more than 100 dimensions. Although some +1
works compare an autoencoder with other approaches [7], x5 x^ 5
[15], in this paper we focus on the dimensionality reduction
and determine the difference in performances according to +1
the reconstruction error.
Layer L1 Layer L2 Layer L3
3. ANOMALY DETECTION USING AUTOEN-
CODERS Figure 1: Autoencoder [1]

3.1 Anomaly Detection by Dimensionality Re-


duction complexity. Furthermore, kernel PCA basically requires to
In anomaly detection based on machine learning or data hold all the training samples, which is also computation-
mining, we obtain the model which captures the normal ally expensive. Therefore, compared to kernel PCA, au-
behavior in the training period, and after that we check toencoders have the advantages in computation cost.
whether the test data can be fitted with the trained model or
not. If the test data is inconsistent with the trained model, 3.2 Dimensionality Reduction by Autoencoders
we regard it as an anomaly. First of all, an autoencoder is an unsupervised neural net-
Anomaly detection using dimensionality reduction is based work, whose objective is to learn to reproduce input vectors
on the assumption that data has variables correlated with {x(1), x(2), . . . , x(m)} as outputs {x̂(1), x̂(2), . . . , x̂(m)}. Fig.
each other and can be embedded into a lower dimensional 1 shows an autoencoder. In this figure, Layer L2 is the hid-
subspace in which normal samples and anomalous samples den layer, whereby the inputs are compressed into a small
appear significantly different [2]. In the training phase, we number of neurons. Activation of unit i in layer l is given
have normal data as training set {x(1), x(2), . . . , x(m)}. As- by Eq. 2:
suming each data sample x(i) ∈ RD is represented by a
vector of D different variables. We compress the data into ( )
(l)

n
(l−1) (l−1) (1)
lower dimensional latent subspace and reproduce the out- ai = f Wij aj + bi (2)
put {x̂(1), x̂(2), . . . , x̂(m)} so that the reconstruction error j=1
in Eq. 1 becomes small.
where W and b denote weight and bias parameters respec-
v tively. In the first layer, i.e., the input layer, a(1) = x, and
uD in the last layer, i.e., the output layer, a(3) = x̂. For the
u∑
Err(i) = t (xj (i) − xˆj (i))2 (1) activate function f , we used sigmoid function in hidden lay-
j=1 ers, but in the output layer, we used linear function since
After we determine the subspace, in the test phase, we we don’t pre-scale every input example to a specific interval
project test data into the subspace and reconstruct the orig- like [−1, 1].
inal data. We use the reconstruction error shown in Eq. 1 as During the training period, we minimize the objective
the anomaly score. The reconstruction error has low values function shown in Eq. 3 with respect to W and b. The
if test samples are normal instances that satisfy the normal objective function includes the regularization term, and the
correlation learned during the test phase, while the error parameter λ determines the strength of regularization.
becomes large with anomalous samples.
m ( )
λ ∑ ∑ ∑ ( (l) )2
nl −1 sl sl +1
There are several dimensionality reduction techniques. In 1 ∑ 1
this work, we choose representative linear and nonlinear J(W , b) = ∥x(i) − x̂(i)∥2 + Wji
m i=1 2 2
techniques, linear PCA and kernel PCA, as the baseline to i=1 j=1 l=1
be compared with autoencoders. (3)
Kernel PCA [5] performs nonlinear mapping to the high- where nl denotes number of layers in the network and sl
dimensional feature space by a kernel function, and then em- denotes number of units in layer Ll .
ploy linear PCA in the feature space. In this work, we use Recently a denoising autoencoder [13], which is one of the
||x(i) − x(j)||2 extensions of an autoencoder, has been developed. The idea
the Gaussian kernel k(x(i), x(j)) = exp(− ). is to learn an over-complete set of basis vectors to represent
σ2
In kernel PCA, the reconstruction error is computed in the input vectors, so that our basis vectors can capture struc-
feature space, not in the original observation space. To ob- tures and patterns inherent in the input data better. At the
tain the reconstructed data in the original space, we must same time, in order to avoid highly compressed encoding
solve the pre-image problem which has high computational which is usually highly entangled, we can encode the input
20

50
15

40

30 10

Variable
z3

20

5
10

0
0
20
-30
-20
0 -10
0 Training
10 Test
20 -5
z1 -20 30 z2 0 500 1000 1500 2000 2500
Time Index

60 250

50
200

40
150
Variable

Variable
30

100

20

50
10

0
0

Training Test Training Test


-10 -50
0 200 400 600 800 1000 0 500 1000 1500 2000 2500 3000 3500 4000 4500

Time Index Time Index

Figure 2: Top: Normal {z(1), z(2), ..., z(849)} (blue) and Figure 3: Top: Normalized data of Satellite-A. Bottom:
anomalous {z(850), z(851), ..., z(1000)} (red) data from Normalized data of Satellite-B.
Lorenz system. Bottom: Normalized 25 dimensional Lorenz
system data x.
tions:
ż1 (t) = σ(z2 (t) − z1 (t))
with small subset of neurons. We can achieve this by in- ż2 (t) = z1 (t)(ρ − z3 (t))z2 (t) (4)
creasing the number of hidden units and adding some noise
to the input. There are some ways in adding the noise to ż3 (t) = z1 (t)z2 (t) − βz3 (t)
each input, but in this work, we destruct the input by ran-
We set three parameters σ, ρ and β to 28, 10 and 8/3 re-
domly choosing a fixed number of components of the input
spectively. According to Eq. 4, first we generated the vec-
to be 0, which is sometimes called as the salt-and-pepper
tor z(t) = (z1 (t) z2 (t) z3 (t))T . We sampled 1000 vec-
noise [14].
tors by running this simulation for 100[s] with the sam-
pling rate 0.1[s], with the small observation noise and sys-
tem transition noise. To generate the anomalous data, af-
4. EXPERIMENTAL SETUP ter sampling we flipped the values from z3 (850) to z3 (1000)
We performed dimensionality reduction on each data by 4 horizontally so that z3 aligns in reverse chronological or-
methods: linear PCA, an autoencoder, a denoising autoen- der after 850th. To generate the high dimensional vector
coder and kernel PCA. In each method, the number of latent x(t), first we made the matrix W ∈ R25×3 whose compo-
space dimension was adjusted manually. For autoencoders nents we randomly chose from the interval (−5, 5). Then
and denoising autoencoders, we adjusted several parame- we multiplied W by each vector z(t), i.e., x(t) = W z(t).
ters in the objective function (Eq. 3) as λ = 0.00001, β = 3, We divided 1000 samples into two, which are 700 train-
ρ = 0.01. The destruction level, i.e., the probability of that ing samples {x(1), x(2), ..., x(700)} and 300 test samples
each element is forced to 0, is fixed to 0.1. We compared the {x(701), x(702), ..., x(1000)}, with the latter half of the test
performances based on the reconstruction error in Eq. 1. samples including anomalies. Fig. 2 shows the distribution
of the 1000 vectors of z and the data of x after normalized
4.1 Artificial Data to a mean of zero and a variance of 1.
We prepared the nonlinear simulated data using the Lorenz
system. The Lorenz system consists of the following equa- 4.2 Real Data
We used two kinds of the spacecraft telemetry data in our
real data experiment: Satellite-A and Satellite-B. Satellite- Table 1: The average AUC of the 4 different methods on the
A and Satellite-B have 17 and 106 continuous sensor mea- 3 different data. LPCA, AE, dAE and KPCA denotes linear
surements respectively. Spacecraft telemetry data have many PCA, an autoencoder, a denoising autoencoder and kernel
different sensor measurements and in general these inputs PCA respectively. The first row Lorenz has the results on
are correlated with each other [16], [17]. This means that the artificial data using the Lorenz system, and last two
we can remove redundant inputs and represent each data rows Sat-A and Sat-B has the results on the real data of
sample as a lower dimensional vector. two kinds of spacecrafts’ telemetry data.
LPCA AE dAE KPCA
Fig. 3 shows the data after we normalized each data so
that the mean and variance become 0 and 1 respectively. Lorenz 0.5104 0.6473 0.7011 0.7045
Sat-A 0.8852 0.8847 0.9354 0.8862
5. RESULT AND DISCUSSION Sat-B 0.9764 0.9763 0.8355 0.7689

5.1 Artificial Data


autoencoder only took several minutes.
If we look at Fig. 4, it is clear that the reconstruction error Furthermore, we visualized the activation of a part of the
becomes far bigger after the 150th in the test set with an au- neurons in the hidden layer in Fig. 7. We can see that
toencoder, a denoising autoencoder and kernel PCA. Since the anomalous data is significantly different from the nor-
this data includes verified anomalies after 150th, it means mal data in the latent space. This means that denoising
that the anomaly detection with nonlinear techniques was autoencoders is able to learn the meaningful features to re-
successful. Linear PCA, however, has failed to show a sig- produce normal state and these learned features can’t be
nificant difference between the anomalous and the normal used to reproduce anomalous input.
data. We can also see that in Tab. 1, in which linear PCA
performs poorly in the Lorenz row. This is a good exam-
ple that nonlinear dimensionality reduction technique, like 6. CONCLUSION AND FUTURE WORK
autoencoders, can learn the nonlinear correlation between In this study, we demonstrated examples of applying fea-
a lot of variables and succeed to detect anomalies, while ture learning by autoencoders to anomaly detection, which
linear PCA, which employs linear dimensionality reduction, is one of the fundamental data mining tasks. Another con-
fails and misses anomalies. In the Lorenz row in Tab. 1, tribution was comparison of autoencoders with linear PCA
we can also notice that the denoising autoencoder performs and kernel PCA on the artificial data and real data. We
better than the ordinary autoencoder. In this case, we suc- clarify the property and the effectiveness of autoencoders
ceeded to increase the accuracy by extending autoencoders based on that comparison. In addition, we examined the
to denoising autoencoders. learned features in the hidden layer to show the different
activations with normal input and anomalous input, which
5.2 Real Data has not been done before.
We can see from Fig. 5 and Fig. 6 that basically all dimen- At the moment we manually tune the parameters of the
sionality reduction methods succeeded to detect anomalies regularization term of autoencoders, the destruction level of
on both spacecrafts’ data. We can see in Fig. 5 and in denoising autoencoders, the number of latent dimensions,
the row Sat-A in Tab. 1 that, although the performances and so on. Further detailed investigation for those parame-
of linear PCA and an autoencoder are almost the same, a ters will be necessary in future work. Also, additional com-
denoising autoencoder performs better than linear PCA and parison with other techniques like vector quantization PCA,
an autoencoder. Unlike the experiment on the Lorenz sys- mixture probabilistic PCA, which are known as hybrid of
tem data and Satellite-A data, we can see in the Sat-B row clustering and dimensionality reduction [16], [17], would be
in Tab. 1 that a denoising autoencoder fails to increase the interesting for clarifying the property of autoencoders. We
accuracy. In this case, since the detecting performance is al- regard each data sample at each time index as independent,
ready good enough with the ordinary autoencoder, adding i.e., we disregard time sequence. Although the performance
noise to the input rather gives a bad effect. We tried this is already good enough without temporal information, we
experiment with several different numbers of dimensions. can add the information by giving autoencoders a data vec-
Linear PCA turned out to be very sensitive to the number tor including current as well as past samples, and see how
of latent dimensions, and it was harder to tune the num- it improves the performance.
ber of latent dimension. Autoencoders can detect anoma-
lies even with relatively high latent dimensions while linear
PCA can’t.
7. REFERENCES
When compared to kernel PCA, the autoencoder and de- [1] UFLDL Tutorial. http://ufldl.stanford.edu/wiki/
noising autoencoder performed either better or the same as index.php/Autoencoders and Sparsity. [Online;
kernel PCA. Kernel PCA, however, requires heavy compu- accessed 10-August-2014].
tation. By using autoencoders, we don’t need to hold all the [2] V. Chandola, A. Banerjee, and V. Kumar. Anomaly
training samples and we can avoid memory intensive kernel detection: A survey. ACM Computing Surveys,
computation. Also, in autoencoders we can compare the 41(3):1–58, July 2009.
original and reconstructed data in the original observation [3] S. Hawkins, H. He, G. Williams, and R. Baxter.
space, and we don’t need to solve complex pre-image prob- Outlier detection using replicator neural networks. In
lem which kernel PCA requires. In fact, the overall running Proceedings of the Fifth International Conference and
time including training and test phase was more than an Data Warehousing and Knowledge Discovery, pages
hour in kernel PCA, while the autoencoder and denoising 170–180, 2002.
-14 -15
x 10 Linear PCA (latent space: 10dim) x 10 Linear PCA (latent space: 10dim)
1.2 6

Normal Anomalous
1 4

Reconstruction Error 0.8 2

Difference
0.6 0

0.4 -2

0.2 -4

Normal Anomalous
0 -6
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)


Autoencoder (latent space: 10dim) Autoencoder (latent space: 10dim)
0.7 0.2

Normal Anomalous
0.15
0.6

0.1
0.5
Reconstruction Error

0.05

Difference
0.4

0.3
-0.05

0.2
-0.1

0.1
-0.15
Normal Anomalous
0 -0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)


Denoising Autoencoder (latent space: 50dim, destLevel: 10%) Denoising Autoencoder (latent space: 50dim, destLevel: 10%)
0.8 0.3

Normal Anomalous
0.7
0.2

0.6
0.1
Reconstruction Error

0.5
Difference

0.4

-0.1
0.3

-0.2
0.2

-0.3
0.1
Normal Anomalous
0 -0.4
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)


Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
0.12 0.15

Normal Anomalous
0.1
0.1

0.05
Reconstruction Error

0.08
Difference

0.06

-0.05

0.04
-0.1

0.02
-0.15
Normal Anomalous
0 -0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Time Index (Test Data) Time Index (Test Data)

Figure 4: Result on Lorenz system data. The reconstruction error (left column) and the difference between the original and
reconstructed data (right column) of linear PCA, an autoencoder, a denoising autoencoder and kernel PCA.
Linear PCA (latent space: 4dim) Linear PCA (latent space: 4dim)
4 4

3.5
3

3
Reconstruction Error 2

2.5

Difference
1

0
1.5

-1
1

-2
0.5

Normal Anomalous Normal Anomalous


0 -3
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)


Autoencoder (latent space: 4dim) Autoencoder (latent space: 4dim)
4 3

3.5
2

3
Reconstruction Error

1
2.5

Difference
2 0

1.5
-1

-2
0.5

Normal Anomalous Normal Anomalous


0 -3
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)


Denoising Autoencoder (latent space: 10dim, destLevel: 10%) Denoising Autoencoder (latent space: 10dim, destLevel: 10%)
4.5 2

4
1
3.5
Reconstruction Error

3 0
Difference

2.5
-1
2

1.5 -2

1
-3
0.5
Normal Anomalous Normal Anomalous
0 -4
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)


Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
0.14 3.5

3
0.12

2.5
0.1
Reconstruction Error

2
Difference

0.08 1.5

0.06 1

0.5
0.04
0

0.02
-0.5
Normal Anomalous Normal Anomalous
0 -1
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Time Index (Test Data) Time Index (Test Data)

Figure 5: Result on Satellite-A data. The reconstruction error (left column) and the difference (right column) of linear PCA,
an autoencoder, an denoising autoencoder and kernel PCA are shown from top to bottom.
Linear PCA (latent space: 16dim) Linear PCA (latent space: 16dim)
4.5 1.5

Normal Anomalous
4 1

3.5
0.5
Reconstruction Error
3
0

Difference
2.5
-0.5
2
-1
1.5

-1.5
1

0.5 -2
Normal Anomalous
0 -2.5
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)


Autoencoder (latent space: 16dim) Autoencoder (latent space: 16dim)
6 2

Normal Anomalous 1.5


5
1

0.5
Reconstruction Error

Difference
0

3 -0.5

-1
2
-1.5

-2
1
-2.5 Normal Anomalous
0 -3
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)


Denoising Autoencoder (latent space: 16dim, destLevel: 10%) Denoising Autoencoder (latent space: 16dim, destLevel: 10%)
9 3

Normal Anomalous
8
2
7
Reconstruction Error

6 1
Difference

5
0
4

3 -1

2
-2
1
Normal Anomalous
0 -3
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)


Kernel PCA (latent space: 10dim) Kernel PCA (latent space: 10dim)
1.4 15

Normal Anomalous
1.2
10

1
Reconstruction Error

5
Difference

0.8

0.6

-5
0.4

-10
0.2
Normal Anomalous
0 -15
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800

Time Index (Test Data) Time Index (Test Data)

Figure 6: Result on Satellite-B data. The reconstruction error (left column) and the difference (right column) of linear PCA,
an autoencoder, a denoising autoencoder and kernel PCA are shown from top to bottom.
[4] G. E. Hinton and R. R. Salakhutdinov. Reducing the
dimensionality of data with neural networks. Science,
313(5786):504–507, 2006.
[5] H. Hoffmann. Kernel pca for novelty detection.
Pattern Recognition, 40(3):863–874, 2007.
[6] B. Hwang and S. Cho. Characteristics of
auto-associative mlp as a novelty detector. In
Proceedings of the International Joint Conference on
Neural Networks, volume 5, pages 3086–3091, 1999.
[7] N. Japkowicz, C. Myers, and M. Gluck. A novelty
detection approach to classification. In Proceedings of
the 14th International Joint Conference on Artificial
Intelligence, volume 1, pages 518–523, 1995.
[8] M. A. Kramer. Nonlinear principal component
1

0.8
analysis using autoassociative neural networks. AIChE
J., 37(2):233–243, 1991.
0.6 [9] M. Martinelli, E. Tronci, G. Dipoppa, and
Unit 3

C. Balducelli. Electric power system anomaly


0.4
detection using neural networks. In Knowledge-Based
0.2
Intelligent Information and Engineering Systems,
volume 3213 of Lecture Notes in Computer Science,
0 pages 1242–1248. 2004.
1
[10] S. O. Song, D. Shin, and E. S. Yoon. Analysis of
0.5
0.6
0.8
1
novelty detection properties of auto-associators. In
Unit 2
0
0.2
0.4
Unit 1
Proceedings of the International Congress on
0
Condition Monitoring and Diagnostic Engineering
Management, pages 577–584, 2001.
1 [11] C. Surace, K. Worden, and G. Tomlinson. A novelty
detection approach to diagnose damage in a cracked
0.9
beam. In Proceedings of SPIE, pages 947–953, 1997.
0.8 [12] B. Thompson, R. Marks, J. Choi, M. El-Sharkawi,
0.7
M.-Y. Huang, and C. Bunje. Implicit learning in
autoencoder novelty assessment. In Proceedings of the
2002 International Joint Conference on Neural
Unit 2

0.6

0.5
Networks, volume 3, pages 2878–2883, 2002.
[13] P. Vincent, H. Larochelle, Y. Bengio, and P.-A.
0.4
Manzagol. Extracting and composing robust features
0.3 with denoising autoencoders. In Proceedings of the
25th International Conference on Machine Learning,
0.2
pages 1096–1103, 2008.
0.1 [14] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and
0 0.2 0.4 0.6 0.8 1

Unit 1
P.-A. Manzagol. Stacked denoising autoencoders:
Learning useful representations in a deep network
with a local denoising criterion. Journal of Machine
Figure 7: Top: An example of the activation in three of neu-
Learning Research, 11:3371–3408, Dec. 2010.
rons in the hidden layer of the denoising autoencoder with
[15] G. Williams, R. Baxter, H. He, S. Hawkins, and L. Gu.
normal input (blue) and anomalous input (red). Bottom:
A comparative study of rnn for outlier detection in
Another example of the activation in two of the neurons in
data mining. In Proceedings of the International
the hidden layer of the denoising autoencoder with normal
Conference on Data Mining, page 709, 2002.
input (blue) and anomalous input (red). In these two fig-
ures, the hidden units of the denoising autoencoder activate [16] T. Yairi, M. Inui, A. Yoshiki, Y. Kawahara, and
in a different way with anomalous input. N. Takata. Spacecraft telemetry data monitoring by
dimensionality reduction techniques. In Proceedings of
SICE Annual Conference, pages 1230–1234, Aug 2010.
[17] T. Yairi, T. Tagawa, and N. Takata. Telemetry
monitoring by dimensionality reduction and learning
hidden markov model. In Proceedings of the
International Symposium on Artificial Intelligence,
Robotics and Automation in Space, 2012.

You might also like