0% found this document useful (0 votes)
62 views9 pages

Tung Kieu - Probabilistic - Graphical - Model - Report

This document discusses outlier detection using a deep variational autoencoder (VAE). It begins by introducing outlier detection and different approaches, including unsupervised models that use assumptions about data distributions. It then discusses using VAEs for outlier detection, combining the ideas of variational techniques and neural network autoencoders. Specifically, it presents the VAE as a probabilistic graphical model that approximates the posterior distribution through a latent variable. It also describes the VAE from a deep learning perspective as a neural network with an encoder, decoder, and loss function to minimize reconstruction error and the difference between the approximate and true posterior distributions.

Uploaded by

TungKVT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views9 pages

Tung Kieu - Probabilistic - Graphical - Model - Report

This document discusses outlier detection using a deep variational autoencoder (VAE). It begins by introducing outlier detection and different approaches, including unsupervised models that use assumptions about data distributions. It then discusses using VAEs for outlier detection, combining the ideas of variational techniques and neural network autoencoders. Specifically, it presents the VAE as a probabilistic graphical model that approximates the posterior distribution through a latent variable. It also describes the VAE from a deep learning perspective as a neural network with an encoder, decoder, and loss function to minimize reconstruction error and the difference between the approximate and true posterior distributions.

Uploaded by

TungKVT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Outlier Detection using Deep Variational

Autoencoder
Machine Learning and Probabilistic Graphical Model

Tung Kieu

1 Introduction
Outlier detection is one of the most important fields in data mining that plays essential
roles in a range of domain such as business, healthcare, climate, and transportation.
For example, an outlier in a electrocardiography may indicate potential heart attacks;
an outlier in a vehicle trajectory may indicate a serious incident; an outlier in banking
transactions may indicate a fraud.
An outlier detection algorithm can be divided into two categories–supervised models
and unsupervised models. A supervised model computes the difference (i.e., distance)
between an unseen data point and normal data points and abnormal data points to assign
the unseen data point into inlier or outlier group. In particular, outlier detection as a
supervised model can be considered as a binary classifier. The disadvantage of supervised
models is the availability of either labeled data or source systems that generate labeled
data. In addition, labeling data is a difficult and uneconomical task because it needs
a large quantity of effort from human experts. In contrast, unsupervised models use
assumptions about the density and distribution of data to distinguish between inliers and
outliers. Most existing methods for outlier detection are based on similarity search [12]
and density-based clustering [2].
Recently, neural network based autoencoders (AE) [5] are proposed for the detection of
outliers [6], achieving competitive accuracy. The idea is to compress original input data
into a compact, hidden representation and then to reconstruct the input data from the
hidden representation. Since the hidden representation is very compact, it is only pos-
sible to reconstruct representative features from the input, not the specifics of the input
data, including any outliers. This way, the difference, or reconstruction errors, between
the original data and the reconstructed data, indicate how likely observations in the data
are outliers. This idea works well in practice and produce impressive performance [6].
However, there is no theory background for this idea.
Another approach is to use generative models for outlier detection because generative
models can be seen as learning the distribution of the normal data. Then, the generative
models compute the distance between the normal data distribution and the outlier to

1
separate outliers from normal data. This idea has strong theory foundation from sta-
tistical learning [7] and becomes the core idea for unsupervised learning models that is
known as variational techniques. In this report, I combine the idea of variational and
neural network based autoencoder as a variational autoencoder (VAE) and use the VAE
for outlier detection.

2 Methodology
2.1 Variation Autoencoder
2.1.1 Probabilistic Graphical Model Perspective
A VAE can be considered as a deep Bayesian model that approximates the posterior of
the variable x through the latent variable z. Figure 1 shows the graphical model of VAE.

Figure 1: Probabilistic Graphical Model of Variational Autoencoder

The latent variable z is drawn from a prior p(z). The data x has a likelihood p(x | z).
The model defines a joint probability distribution over data and latent variables p(x, z)
through the decomposition of likelihood and prior as follows p(x, z) = p(x | z)p(z). I
assume that the likelihood is Gaussian distributed. I aim at inferring good values of the
latent variables z given observed data x. Formally, I aim at computing the posterior
p(z | x). Using the Bayesian formulae I have:

p(x | z)p(z)
p(z | x) = (1)
p(x)

The denominator p(x) is computed in exponential time by the following function:

Z
p(x) = p(x | z)p(z)dz (2)

Thus, I approximate the posterior p(z | x) with a tractable distribution qλ (z) such as
Gaussian distribution where λ is the parameters of distribution such as (µ, σ). To evalu-
ate the difference between the approximate distribution qλ (z) and the true distribution
p(z | x), I use Kullback-Leibler divergence:

2
KL(qλ (z) || p(z | x)) = Eq [log qλ (z)] − Eq [log p(x, z)] + log p(x) (3)
 
qλ (z)
= log p(x) − −Eq log (4)
p(x, z)
= log p(x) − ELBO(λ) (5)
From function 5 I have:

log p(x) = KL(qλ (z) || p(z | x)) + ELBO(λ), (6)


where
 
qλ (z)
ELBO = −Eq log (7)
p(x, z)
I aim at minimizing the difference betIen the approximate distribution qλ (z) and
true distribution p(z | x) thus, minimizing the Kullback-Leibler divergence with the
parameter λ:

arg min KL(qλ (z) || p(z | x)) (8)


λ
Function 6 has some properties:
• log p(x) is a constant
• KL(qλ (z) || p(z | x)) ≥ 0
Using Jensen’s inequality, this means that minimizing the Kullback-Leibler divergence
is equivalent to maximizing the ELBO. ELBO can be rewritten:
 
qλ (z)
ELBO = −Eq log (9)
p(x, z)
 
p(x, z)
= Eq log (10)
qλ (z)
   
p(x | z)p(z) p(x | z) p(z)
= Eq log = Eq log (11)
qλ (z)p(z) 1 qλ (z)p(z)
 
p(z)
= Eq [log p(x | z)] + Eq log (12)
qλ (z)p(z)
 
qλ (z)p(z)
= Eq [log p(x | z)] − Eq log (13)
p(z)
= Eq [log p(x | z)] − KL(qλ (z) || p(z)) (14)
The first term of Function 14 is the expected log-likelihood how well samples from
qλ (z) explains the data x. The second term of Function 14 penalizes the difference
betIen qλ (z) and p(z).

3
2.1.2 Deep Learning Perspective
In neural network perspective, a VAE is a unsupervised feed-forward neural network. A
VAE contains three component–encoder θ(z | x), decoder pφ (x | z), and loss function J .
The goal of VAE is to map data points x to a hidden representation z then maps back
the hidden representation z to the data points x̂ that is similar x.
The encoder is a feed-forward neural network which inputs as data point x. This
inputs go though fully connected hidden layers and it outputs the hidden representation
z in Gaussian distribution that define by (µ, σ). Formally, the encoder is described as
follows:

µi = f (x> wiE + bE
i ) (15)
>
σj = f (x wjE + bE
j ) (16)
(17)

Here, the function f is a nonlinear activation function such as sigmoid, ReLU. wiE and
bE
i is the ith weighting vector and bias of the encoder, respectively.
Then, the hidden representation z is drawn from Gaussian distribution with expecta-
tion µ, σ.

z ∼ N (µ, σ) (18)

The reconstruction x̂ then is computed by z through the decoder that is another


feed-forward neural network:

x̂ = f (z> wiD + bD
i ) (19)
(20)

Figure 2 shows the deep learning model of VAE.

4
Figure 2: Deep Learning Model of Variational Autoencoder

2.2 Objective Function


From both perspective, the objective function is to minimize–(1) the difference between
x and x̂ and (2) the difference between the approximated posterior qλ (z) and the true
posterior p(z). The objective function is defined as follows:

J = ||x̂ − x||22 + KL(qλ (z) || p(z)) (21)

Here, the first term is the reconstruction between the original data points and the
reconstructed data points. The second term is the KL-divergence between the approxi-
mated probability and the true probability of latent variable z.
The model will train to minimize the objective function J .

φ, θ = arg min J (22)


φ,θ

2.3 Outlier Detection


Follow the paper from Jinwon and Sungzoon [1]. The algorithm for outlier detection is
proposed as follows:

5
Algorithm 1: Outlier Detection using VAE
Data: Dataset x
Result: Outlier Score OS
1 φ, θ ← Initialize parameters;
2 while not convergence do
3 Train VAEφ,θ ;
4 for l=1 to L do
5 µx̂ , σx̂ = θ(φ(x));
6 end
7 end
1 PL
8 OS ←= L l=1 p(z | µx̂ , σx̂ );

3 Experiments
3.1 Datasets
I use three datasets for evaluating the performance of VAE–(1) Glass, (2) Shuttle, and (3)
Stamps. These datasets are the subset of a repository for outlier detection evaluation [3].
All three datasets are available on internet 1 . The following table shows the details of
three datasets. The Instances column describes the number of samples, the Outlier
column describes the number of outliers, and the Features shows the dimensionality of
datasets.

Datasets Instances Outliers Features


Glass 214 9 7
Shuttle 1013 13 9
Stamps 341 31 9

Table 1: Datasets

3.2 Baselines
I compare the proposed VAE with four competing solutions–(1) Local Outlier Factor
(LOF) [2], a well-known density-based outlier detection method; (2) One-class Support
Vector Machines (SVM) [10], a kernel-based method, (3) Isolation Forest (IF) [8], which is
a randomized clustering forest, (4) Autoencoder (AE) [9], a deep learning based method.

3.3 Evaluation Metrics


A typical approach to decide which vectors in a time series are outliers is to set a
threshold and to consider the vectors whose outlier scores exceed the threshold as out-
liers. However, setting the threshold is non-trivial and often requires human experts

1
http://www.dbs.ifi.lmu.de/research/outlier-evaluation/DAMI/

6
(a) Glass (b) Shuttle (c) Stamps

Figure 3: Performance of Algorithms on all Datasets

or prior knowledge. Instead, following the evaluation metrics used for evaluating VAE
on non-sequential data [4], I employ two metrics that consider all possible thresholds—
Area Under the Curve of Precision-Recall (PR-AUC ) [11] and Area Under the Curve
of Receiver Operating Characteristic (ROC-AUC ) [11]. In other words, the two metrics
do not depend on a specific threshold. Rather, they reflect the full trade-off among
true positives, true negatives, false positives, and false negatives. Higher PR-AUC and
ROC-AUC values indicate higher accuracy.

3.4 Implementation
All algorithms are implemented in Python 3.6. VAE and AE are implemented using
PyTorch 1.0.1, while the remaining methods, i.e., LOF, SVM, ISF are implemented using
Scikit-learn 1.19. Experiments are performed on a Linux workstation with dual 12-
core Xeon E5 CPUs, 64 GB RAM, and 2 K40M GPUs.

3.5 Hyperparameters Settings


For all deep learning based methods, I use Adadelta [13] as the optimizer, and I set
their learning rates to 10−3 . For the AE and VAE, I use sigmoid as the functions f ; I set
the number of hidden representation units to 8. For all the other baselines, I follow the
settings used by Kieu et al. [6].

3.6 Experimental Results


Figure 3 shows the performance of all algorithms on all dataset.

3.7 Discussion
VAE produces the very competitive results on dataset Glass and Shuttle. However, on
Stamps, VAE produces a poor performance comparing with the other algorithms. In
particular, VAE only wins LOF.

7
4 Conclusion
In this report, I introduce an anomaly detection method that is based on a deep Bayesian
model. The core idea is to use a generative model to generate normal data points,
and when the model meet outliers, it is unable to reconstruct them. Thus, the error
will increase when the model meets outliers. In particular, I introduce the variational
autoencoder. Then, I evaluate the proposed model on three dataset. The performance
of the proposed model is very competitive.
In the future, I will consider more advanced techniques such as robust models. In
addition, it is interested in considering outlier detection using variational autoencoder
for sequential data and graph data.

References
[1] An, J., and Cho, S. Variational autoencoder based anomaly detection using
reconstruction probability. Special IE (2015).

[2] Breunig, M. M., Kriegel, H., Ng, R. T., and Sander, J. LOF: identifying
density-based local outliers. In SIGMOD (2000), pp. 93–104.

[3] Campos, G. O., Zimek, A., Sander, J., Campello, R. J. G. B., Micenková,
B., Schubert, E., Assent, I., and Houle, M. E. On the evaluation of unsu-
pervised outlier detection: measures, datasets, and an empirical study. DMKD.

[4] Chen, J., Sathe, S., Aggarwal, C. C., and Turaga, D. S. Outlier detection
with autoencoder ensembles. In SDM (2017), pp. 90–98.

[5] Hinton, G., and Salakhutdinov, R. Reducing the dimensionality of data with
neural networks. Science 313, 5786 (2006), 504–507.

[6] Kieu, T., Yang, B., and Jensen, C. S. Outlier detection for multidimensional
time series using deep neural networks. In MDM (2018), pp. 125–134.

[7] Kingma, D. P., and Welling, M. Auto-encoding variational bayes. In ICLR


(2014).

[8] Liu, F. T., Ting, K. M., and Zhou, Z. Isolation forest. In ICDM (2008),
pp. 413–422.

[9] Luo, T., and Nagarajan, S. G. Distributed anomaly detection using autoen-
coder neural networks in WSN for IoT. In ICC (2018), pp. 1–6.

[10] Manevitz, L. M., and Yousef, M. One-class SVMs for document classification.
JMLR 2 (2001), 139–154.

[11] Sammut, C., and Webb, G. I., Eds. Encyclopedia of Machine Learning and Data
Mining. Springer, 2017.

8
[12] Yeh, C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Silva,
D. F., Mueen, A., and Keogh, E. J. Matrix profile I: all pairs similarity joins
for time series: A unifying view that includes motifs, discords and shapelets. In
ICDM (2016), pp. 1317–1322.

[13] Zeiler, M. D. ADADELTA: an adaptive learning rate method. CoRR


abs/1212.5701 (2012).

You might also like