Tung Kieu - Probabilistic - Graphical - Model - Report
Tung Kieu - Probabilistic - Graphical - Model - Report
Autoencoder
Machine Learning and Probabilistic Graphical Model
Tung Kieu
1 Introduction
Outlier detection is one of the most important fields in data mining that plays essential
roles in a range of domain such as business, healthcare, climate, and transportation.
For example, an outlier in a electrocardiography may indicate potential heart attacks;
an outlier in a vehicle trajectory may indicate a serious incident; an outlier in banking
transactions may indicate a fraud.
An outlier detection algorithm can be divided into two categories–supervised models
and unsupervised models. A supervised model computes the difference (i.e., distance)
between an unseen data point and normal data points and abnormal data points to assign
the unseen data point into inlier or outlier group. In particular, outlier detection as a
supervised model can be considered as a binary classifier. The disadvantage of supervised
models is the availability of either labeled data or source systems that generate labeled
data. In addition, labeling data is a difficult and uneconomical task because it needs
a large quantity of effort from human experts. In contrast, unsupervised models use
assumptions about the density and distribution of data to distinguish between inliers and
outliers. Most existing methods for outlier detection are based on similarity search [12]
and density-based clustering [2].
Recently, neural network based autoencoders (AE) [5] are proposed for the detection of
outliers [6], achieving competitive accuracy. The idea is to compress original input data
into a compact, hidden representation and then to reconstruct the input data from the
hidden representation. Since the hidden representation is very compact, it is only pos-
sible to reconstruct representative features from the input, not the specifics of the input
data, including any outliers. This way, the difference, or reconstruction errors, between
the original data and the reconstructed data, indicate how likely observations in the data
are outliers. This idea works well in practice and produce impressive performance [6].
However, there is no theory background for this idea.
Another approach is to use generative models for outlier detection because generative
models can be seen as learning the distribution of the normal data. Then, the generative
models compute the distance between the normal data distribution and the outlier to
1
separate outliers from normal data. This idea has strong theory foundation from sta-
tistical learning [7] and becomes the core idea for unsupervised learning models that is
known as variational techniques. In this report, I combine the idea of variational and
neural network based autoencoder as a variational autoencoder (VAE) and use the VAE
for outlier detection.
2 Methodology
2.1 Variation Autoencoder
2.1.1 Probabilistic Graphical Model Perspective
A VAE can be considered as a deep Bayesian model that approximates the posterior of
the variable x through the latent variable z. Figure 1 shows the graphical model of VAE.
The latent variable z is drawn from a prior p(z). The data x has a likelihood p(x | z).
The model defines a joint probability distribution over data and latent variables p(x, z)
through the decomposition of likelihood and prior as follows p(x, z) = p(x | z)p(z). I
assume that the likelihood is Gaussian distributed. I aim at inferring good values of the
latent variables z given observed data x. Formally, I aim at computing the posterior
p(z | x). Using the Bayesian formulae I have:
p(x | z)p(z)
p(z | x) = (1)
p(x)
Z
p(x) = p(x | z)p(z)dz (2)
Thus, I approximate the posterior p(z | x) with a tractable distribution qλ (z) such as
Gaussian distribution where λ is the parameters of distribution such as (µ, σ). To evalu-
ate the difference between the approximate distribution qλ (z) and the true distribution
p(z | x), I use Kullback-Leibler divergence:
2
KL(qλ (z) || p(z | x)) = Eq [log qλ (z)] − Eq [log p(x, z)] + log p(x) (3)
qλ (z)
= log p(x) − −Eq log (4)
p(x, z)
= log p(x) − ELBO(λ) (5)
From function 5 I have:
3
2.1.2 Deep Learning Perspective
In neural network perspective, a VAE is a unsupervised feed-forward neural network. A
VAE contains three component–encoder θ(z | x), decoder pφ (x | z), and loss function J .
The goal of VAE is to map data points x to a hidden representation z then maps back
the hidden representation z to the data points x̂ that is similar x.
The encoder is a feed-forward neural network which inputs as data point x. This
inputs go though fully connected hidden layers and it outputs the hidden representation
z in Gaussian distribution that define by (µ, σ). Formally, the encoder is described as
follows:
µi = f (x> wiE + bE
i ) (15)
>
σj = f (x wjE + bE
j ) (16)
(17)
Here, the function f is a nonlinear activation function such as sigmoid, ReLU. wiE and
bE
i is the ith weighting vector and bias of the encoder, respectively.
Then, the hidden representation z is drawn from Gaussian distribution with expecta-
tion µ, σ.
z ∼ N (µ, σ) (18)
x̂ = f (z> wiD + bD
i ) (19)
(20)
4
Figure 2: Deep Learning Model of Variational Autoencoder
Here, the first term is the reconstruction between the original data points and the
reconstructed data points. The second term is the KL-divergence between the approxi-
mated probability and the true probability of latent variable z.
The model will train to minimize the objective function J .
5
Algorithm 1: Outlier Detection using VAE
Data: Dataset x
Result: Outlier Score OS
1 φ, θ ← Initialize parameters;
2 while not convergence do
3 Train VAEφ,θ ;
4 for l=1 to L do
5 µx̂ , σx̂ = θ(φ(x));
6 end
7 end
1 PL
8 OS ←= L l=1 p(z | µx̂ , σx̂ );
3 Experiments
3.1 Datasets
I use three datasets for evaluating the performance of VAE–(1) Glass, (2) Shuttle, and (3)
Stamps. These datasets are the subset of a repository for outlier detection evaluation [3].
All three datasets are available on internet 1 . The following table shows the details of
three datasets. The Instances column describes the number of samples, the Outlier
column describes the number of outliers, and the Features shows the dimensionality of
datasets.
Table 1: Datasets
3.2 Baselines
I compare the proposed VAE with four competing solutions–(1) Local Outlier Factor
(LOF) [2], a well-known density-based outlier detection method; (2) One-class Support
Vector Machines (SVM) [10], a kernel-based method, (3) Isolation Forest (IF) [8], which is
a randomized clustering forest, (4) Autoencoder (AE) [9], a deep learning based method.
1
http://www.dbs.ifi.lmu.de/research/outlier-evaluation/DAMI/
6
(a) Glass (b) Shuttle (c) Stamps
or prior knowledge. Instead, following the evaluation metrics used for evaluating VAE
on non-sequential data [4], I employ two metrics that consider all possible thresholds—
Area Under the Curve of Precision-Recall (PR-AUC ) [11] and Area Under the Curve
of Receiver Operating Characteristic (ROC-AUC ) [11]. In other words, the two metrics
do not depend on a specific threshold. Rather, they reflect the full trade-off among
true positives, true negatives, false positives, and false negatives. Higher PR-AUC and
ROC-AUC values indicate higher accuracy.
3.4 Implementation
All algorithms are implemented in Python 3.6. VAE and AE are implemented using
PyTorch 1.0.1, while the remaining methods, i.e., LOF, SVM, ISF are implemented using
Scikit-learn 1.19. Experiments are performed on a Linux workstation with dual 12-
core Xeon E5 CPUs, 64 GB RAM, and 2 K40M GPUs.
3.7 Discussion
VAE produces the very competitive results on dataset Glass and Shuttle. However, on
Stamps, VAE produces a poor performance comparing with the other algorithms. In
particular, VAE only wins LOF.
7
4 Conclusion
In this report, I introduce an anomaly detection method that is based on a deep Bayesian
model. The core idea is to use a generative model to generate normal data points,
and when the model meet outliers, it is unable to reconstruct them. Thus, the error
will increase when the model meets outliers. In particular, I introduce the variational
autoencoder. Then, I evaluate the proposed model on three dataset. The performance
of the proposed model is very competitive.
In the future, I will consider more advanced techniques such as robust models. In
addition, it is interested in considering outlier detection using variational autoencoder
for sequential data and graph data.
References
[1] An, J., and Cho, S. Variational autoencoder based anomaly detection using
reconstruction probability. Special IE (2015).
[2] Breunig, M. M., Kriegel, H., Ng, R. T., and Sander, J. LOF: identifying
density-based local outliers. In SIGMOD (2000), pp. 93–104.
[3] Campos, G. O., Zimek, A., Sander, J., Campello, R. J. G. B., Micenková,
B., Schubert, E., Assent, I., and Houle, M. E. On the evaluation of unsu-
pervised outlier detection: measures, datasets, and an empirical study. DMKD.
[4] Chen, J., Sathe, S., Aggarwal, C. C., and Turaga, D. S. Outlier detection
with autoencoder ensembles. In SDM (2017), pp. 90–98.
[5] Hinton, G., and Salakhutdinov, R. Reducing the dimensionality of data with
neural networks. Science 313, 5786 (2006), 504–507.
[6] Kieu, T., Yang, B., and Jensen, C. S. Outlier detection for multidimensional
time series using deep neural networks. In MDM (2018), pp. 125–134.
[8] Liu, F. T., Ting, K. M., and Zhou, Z. Isolation forest. In ICDM (2008),
pp. 413–422.
[9] Luo, T., and Nagarajan, S. G. Distributed anomaly detection using autoen-
coder neural networks in WSN for IoT. In ICC (2018), pp. 1–6.
[10] Manevitz, L. M., and Yousef, M. One-class SVMs for document classification.
JMLR 2 (2001), 139–154.
[11] Sammut, C., and Webb, G. I., Eds. Encyclopedia of Machine Learning and Data
Mining. Springer, 2017.
8
[12] Yeh, C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Silva,
D. F., Mueen, A., and Keogh, E. J. Matrix profile I: all pairs similarity joins
for time series: A unifying view that includes motifs, discords and shapelets. In
ICDM (2016), pp. 1317–1322.