Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Anomaly Detection using Generative Adversarial Networks

Reviewing methodological progress and challenges


Fiete Lüer Christian Böhm
eMundo GmbH, Gofore Oyj University of Vienna
Hofmannstr. 25-27 Währinger Straße 29
Munich, Germany Vienna, Austria
fiete.lueer@e-mundo.de christian.boehm@univie.ac.at

ABSTRACT has shifted the focus to deep learning algorithms with a ma-
jor application being complex high dimensional data, e.g.
The applications of Generative Adversarial Networks (GANs)
image data, where handcrafting features is prone to errors.
are just as diverse as their architectures, problem settings
Neural networks can be used to detect anomalies in vari-
as well as challenges. A key area of research on GANs is
ous ways. It is possible to compare the anomalous input
anomaly detection where they are most often utilized when
data in a discriminative way using a threshold output score
only the data of one class is readily available.
or forecasted values. Discriminative models such as Recur-
In this work, we organize, summarize and compare key con-
rent Neural Networks (RNN) and Convolutional Neural Net-
cepts and challenges of anomaly detection based on GANs.
works (CNN) have found frequent use but can su↵er from
Common problems which have to be investigated to progress
a variety of problems. These issues include di↵erences be-
the applicability of GANs are identified and discussed. This
tween the predicted probability estimate of a class label and
includes stability and time requirements during training as
the ground truth correctness likelihood, which can be im-
well as inference, the restriction of the latent space to pro-
proved by calibrating the output [6]. Other challenges can-
duce solely data from the normal class distribution, con-
not easily be solved by improvements of the processing or
taminated training data as well as the composition of the
training itself. Requiring a balanced dataset to avoid a shift
resulting anomaly detection score. We discuss the problems
to classifying only the predominant class is one of the most
using existing work as well as possible (partial) solutions, in-
impactful problems. This led to a surge in generative mod-
cluding related work from similar areas of research such as
eling approaches such as (variational) autoencoders (VAEs)
related generative models or novelty detection. Our findings
or Generative Adversarial Networks (GANs): Through one-
are also relevant for a variety of closely related generative
class (OC) learning of only the normal class distribution, we
modeling approaches, such as autoencoders, and are of inter-
can attempt to learn a generative procedure which should
est for areas of research tangent to anomaly detection such
only produce normal class data, leveraging the reconstruc-
as image inpainting or image translation.
tion error between real and synthetic data.
For either method, an anomaly score is often calculated, e.g.
Keywords based on the output score of discriminative networks or the
di↵erence between real and forecasted or synthetic values.
Adversarial Generative Models, Anomaly Detection, Gener-
This score can be compared to a predefined threshold which
ative Adversarial Network, Novelty Detection, Outlier De-
may not be exceeded by normal class input data (e.g. au-
tection
toregressive neural networks [7]). Since the concept is based
on learning a distribution from which specific data is gen-
1 Introduction erated, it can be applied to related models such as (V)AEs
whose applicability we further discuss in the following sec-
Anomalies are commonly described as patterns in data not tions.
conforming to expected behavior [1]. Detecting anomalies is Existing reviews [8] [9] on GAN-based anomaly detection fo-
a frequent problem occuring on various types of data where cus on applications and types of data as well as architectures
the “expected behavior” is usually represented by a set of and metrics. Our work especially targets the investigation
normal instances. Anomaly detection can often be framed of more general challenges of GAN-based anomaly detec-
as a binary classification problem where the task is to distin- tion, which are mostly independent of the application, type
guish solely between normal and abnormal instances. De- of data or network architecture. Existing work on solving
tecting anomalies is crucial for an abundance of industries: these challenges is clustered into distinctive categories and
It is important to detect intrusions into networks [2], ab- connected to related work in similar domains such as AEs.
normal data in (spatio-temporal) climate data [3], patterns We introduce foundations on anomaly detection as well as
which indicate human diseases [4] or to minimize the risk GANs in Section 2. The basic idea of AnoGAN [4], a central
of fraud [5]. In many of these applications, prompt actions approach used to detect anomalies using GANs, is presented
are required to diminish or avoid damage, or to enable novel in Section 3. Subsequent advances and challenges are used
applications. For many such applications, recent progress to discuss and extend this framework in Section 4, focusing
on practical as well as theoretical challenges. This especially
includes i) the speed and quality of the training process, ii)

SIGKDD Explorations Volume 25, Issue 2 29


restrictions to the latent space, iii) contaminated training as segmentation or inpainting tasks. Furthermore, the qual-
data, iv) novel anomaly score components and compositions ity of the reconstruction, measured by the degree it di↵ers
to better extract relevant information as well as v) inference from the test input, can allow the assessment of an anomaly
accuracy and speed. Lastly, this work is summarized in a score which does not only give information on whether some
general discussion in Section 5.1 data is anomalous but also how stark the anomaly is in com-
parison to existing data.
One definition of the anomaly detection setting [20] assumes
2 Preliminary that there exists a probability density function pn from which
2.1 Anomaly detection normal data instances are generated:

An outlier, also called anomaly, intuitively is an observation Xn ⇠ pn (x) = p(x|y = 0) (1)


deviating so much from other observations as to arouse sus-
with y being a label signaling that some data belongs to the
picions that it was generated by a di↵erent mechanism [10].
normal class (y = 0) or abnormal class (y = 1). A dataset
The detection of anomalies is especially important in the
is usually composed of normal instances from Xn as well as
medical domain where recent advances, mostly through novel
anomalous instances from dataset Xa , with the latter be-
deep learning approaches, have significantly improved the
ing accordingly distributed with the anomalous distribution
performance on various tasks and types of medical imag-
Xa ⇠ pa (x) = p(x|y = 1). This results in a joint distribution
ing. This includes MRI segmentation [11], CT scan gen-
containing both, normal and anomalous data points:
eration [12] or Alzheimers prediction using F-FDG PET
scans [13]) but also extends to time series data, sometimes Xtotal ⇠ ptotal (x) = (1 − ⇤)pn (x) + ⇤pa (x) (2)
even surpassing the performance of human professionals (e.g.
on ECG data [14]). Due to its relevance, anomaly detection where ⇤ 2 [0, 1] encodes the relative amount of anomalous
has been researched and reviewed for several decades. This points p(y = 1). The task of anomaly detection then is to
includes very broad reviews [1], as well as surveys which fo- assess if data is drawn from the normal or abnormal data
cus on more specific data structures, such as graphs [15], distribution.
where recent advances on Graph Neural Networks [16] open
up interesting future applications. 2.2 Generative Adversarial Networks
Anomaly detection is closely related to novelty detection,
Goodfellow et al. [21] introduce GANs as a framework con-
i.e. the problem of finding novel data points. While the
sisting of two players, generator G and discriminator D, fol-
problem formulation and methodology can di↵er, there is a
lowing a minimax game with value function V (G, D). We
variety of tasks where it does not di↵er. Abati et al. [17]
call this the adversarial loss function L:
introduce novelty detection as the discrimination of obser-
vations that do not conform to a learned model of regular- L = min max V (D, G)
G D
ity, which can be translated to the detection of data points
with novel, additional informational value. Abati et al. fur- = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 − D(G(z)))] .
ther argue that the reconstruction error or discriminative (3)
in-distribution tests express how well we remember an event
In this adversarial game, the generator tries to create syn-
and the surprisal is modeled by low probabilities of events
thetic data similar to real data points x ⇠ pdata , x 2 Rdx
under an expected model, which might be a network condi-
with x consisting of n samples during training. In case of OC
tioned on normal samples, or by lowering a variational free
anomaly detection, all training samples belong to the nor-
energy. But just as in human memory, the rememberance
mal class. The discriminator tries to distinguish data from
itself might not be sufficient. Most anomaly detection tasks
the real training set from data which the generator G(z) syn-
have a non-trivial and non-symmetric distribution of sam-
thetically generates using the latent noise vector z ⇠ pz as
ples which can lead to only a small allowed margin of recon-
input. pz often follows a Gaussian (commonly z ⇠ N (0, I))
struction errors in some part of a resulting learned manifold
or uniform distribution with z 2 Rdz and dz ⌧ dx . The goal
and a larger margin in another part which has to be ac-
is to learn the parameters of G✓ (z) ⇠ p✓ s.t. p✓ is a potential
counted for in most applications to create a realiable convex
candidate to represent pdata .
hull. This is a major obstacle for OC anomaly detection
GANs have been especially useful on image data, and with
where the manifold of normal class data, which corresponds
CNNs being very e↵ective on this domain, deep convolu-
to such “a learned model of regularity”, is approximated.
tional GANs (DCGANs) have been proposed by Radford et
The broad variety of anomaly detection methods itself ranges
al. [22]. A central empirical result critical for many areas of
from fixed threshold values over more traditional machine
research that have since evolved and which is most crucial
learning approaches such as density based methods like Ker-
for the detection of anomalies, is the existence of smooth
nel Density Estimators [18] to more recent deep learning
transitions in latent space. This means that sampling simi-
approaches, which are often based on reconstruction losses,
lar noise z and using it as input for the generator also leads
e.g. autoregressive predictions of future values [7] or GANs.
to the generation of similar images given sufficient training
One of the advantages of reconstruction-based anomaly de-
of the network.
tection is that it allows the localization of the anomalies
This property has been observed on a variety of data struc-
within high dimensional data such as images [19] which en-
tures: after proposed by DCGAN, this property was used
ables the application of further processing techniques such
for many tasks related to imaging which are based on this
1
We further publish an open source framework to evalu- behavior, such as image generation and style transfer [23].
ate various GAN-based anomaly detection approaches at Furthermore, investigations on time series data using RNNs
https://github.com/emundo/ecgan. suggest similar behavior [24], [25] and Bojchevski et al. [26]

30 Volume 25, Issue 2 SIGKDD Explorations


z

Feedback
z

Normal x
class
Normal
class
Abnormal

Figure 1: General training pipeline. The discriminator as- Figure 2: General anomaly detection pipeline. A given da-
sesses the quality of generated and real data. This informa- tum x is judged based on the discriminator as well as recon-
tion is used to iteratively improve the discriminator as well struction loss between x and an arbitrarily similar sample x̂.
as generator. x̂ can be retrieved by an explicit latent optimization mini-
mizing the dissimilarity or implicitly by retrieving an inverse
mapping using an encoding network.

visualize these transitions using attributes of graphs. troductions to models or applications, e.g. Chalapathy et
al. [31] for general deep learning based anomaly detection or
Brophy et al. [32] for a more general use of GANs on time
2.3 Anomaly Detection using Generative Ad- series data, which includes the use of anomaly detection.
versarial Networks More recently, GAN-based anomaly detection has received
a significant amount of attention and concurrent work exists
The most widespread approach to GAN-based anomaly de- which specifically reviews the usage of GANs for anomaly
tection is based on learning the manifold of data of the nor- detection. Di Mattia et al. [9] perform a practical com-
mal class, meaning that the training data usually consists parison of three major approaches in GAN-based anomaly
of only normal class data, i.e. pdata ⇡ pn , such that ⇤ = 0 detection. Sabuhi et al. [8] perform an exhaustive review of
(Eq. 2) for the training data. The abnormal data is only existing literature, investigating the domain of application,
used for validation and testing in the OC setting. It should model architecture, datasets and evaluation metrics used
be considered to include abnormal data during training as by GAN-based anomaly detection. It shows a broad range
“synthetic” data in practical applications to improve the re- of applications for GAN-based anomaly detection ranging
sulting learned generative procedure (e.g. using minimum from medical imaging [4] over the detection of deceptive re-
likelihood regularization [27]) and to improve the separating views [33] to time series data [24] or videos [34]. While
hyperplane of the discriminator. Data points can be rated as this work is an excellent resource for an overview of the
normal or abnormal based on reconstructing the data x us- currently used components, existing work rarely discusses
ing G(z) (e.g. [28], [29]) and calculating a residual loss Lres the actual challenges of GAN-based anomaly detection apart
using a well-defined and domain-dependent distance metric. from short remarks on training stability. Our work focuses
Additionally, the discriminator can be used to assess if some on investigating these practical as well as theoretical obsta-
datum belongs to the distribution represented by the gener- cles which - to the best of our knowledge - have not been
ator by asserting a likelihood (e.g. [27]). A major approach discussed in a systematic manner before.
in this area of research, AnoGAN, utilizes a combination of
both components. The general training pipeline of GANs as 3 The Fundamental Approach of AnoGAN
well as the basic anomaly detection components have been
visualized in Fig. 1 and Fig. 2. Before discussing AnoGAN While GANs can be used to detect anomalies in a variety
and subsequently the most important developments in this of ways, the procedure of AnoGAN by Schlegl et al. [4] can
domain and its applicability on a variety of data structures be considered central to most of these approaches and sub-
as well as practical or theoretical advances on this domain, sequent developments presented in this work are derived as
we will briefly discuss existing applications and reviews of adaptions from this approach. AnoGAN utilizes both, the
GAN-based anomaly detection. di↵erence in the features using the discriminator and its loss
Ldisc , as well as the di↵erence in data space, using the gen-
Previous surveys on anomaly detection frequently focus on erator to calculate the absolute error as the residual loss
a broad array of methods (e.g. [30], [1]) or the general use Lres . This is achieved by fully training a GAN on normal
of deep learning in anomaly detection and very general in- class data to learn the generator mapping G : z 7! x. Dur-

SIGKDD Explorations Volume 25, Issue 2 31


• abnormal
Latent Space Mz • normal Data Space Md
• generatable

zx̂
x

x̂ = G(zx̂ ) ⇡ G(zx ) = x

Figure 3: Simplified mapping between latent and data space. Each point in the lower dimensional latent space maps to a
point in data space, resulting in a subspace of the data space which spans the data generatable by the GAN (blue area in
Md ). By training on only normal class data (yellow area in Md ), it is attempted to restrict the generated data. In practice,
the generatable data also includes abnormal data (red area in Md ) and data which does not belong to the problem domain
(e.g. if the task is to investigate CT images a valid image is generated but it is no realistic CT). Similarly, given a predefined
threshold value for the discriminator error, one can separate a subset aiming to encompass normal class data as good as
possible. By using a combination of both error components, we aim to minimize the amount of falsely classified samples.

ing inference, we try to determine if some novel datum x fidence values to healthy data. The resulting discriminator
should be labeled as anomalous. To achieve this, it is re- loss is used by feeding G(z ) to the discriminator, resulting
quired to find the noise vector z which is mapped as close as in the following loss:
possible to x using the generator by interpolating through
latent space: Initial noise z1 is sampled randomly to gen- Ldisc (z , x) = σ(D(G(z ), ↵) (5)
erate G(z1 ) in data space. The dissimilarity between x and with σ being the sigmoid cross entropy which is used to
G(z1 ) is calculated. AnoGAN utilizes the absolute error but describe the discriminator loss during training with logits
other common, well-defined distances have frequently been D(G(z )) and targets ↵ = 1. The exact calculation of Ldisc
applied as proxies for the similarity of data points, includ- and Lres can di↵er: Schlegl et al. [4] further propose another
ing the euclidan distance. For time series data, RBF kernels Ldisc based on feature matching
have been a common choice and more time series specific
measures such as dynamic time warping can easily be uti- Ldisc (z , x) = |f (x) − f (G(z ))|, (6)
lized. This distance is used to define a loss function to pro-
which has since been frequently applied [35], [29], [36]. The
vide gradients that allow to move to some z2 where G(z2 ) is
generator and discriminator are jointly used to calculate a
more similar to x than G(z1 ). This is repeated Γ times to
combined loss which is a weighted sum of both components:
find the most similar image G(z ) which can be constructed
using the normal class manifold learned by the generator. Γ Ltotal (z , x) = (1 − λ)Lres (z ) + λLD (z ). (7)
can either be a fixed value or be dynamically determined by
a target similarity ✏. A mixture of both strategies is com- Here, Ltotal (z , x) can be used directly to calculate an ano-
monly used where we try to find an ✏-similar datum and maly score. The anomaly score can be thresholded by some
interrupt optimization after a maximum of nmax steps in predefined or optimized ⌧ to determine a label correspond-
case the input is too dissimilar for the target similarity to ing to x using H : Ltotal (z , x) 7! {0, 1} with H = 0 cor-
be reached and to guarantee a maximum runtime. As soon responding to normal samples and H = 1 corresponding to
as x̂ = G(z ) is determined, it is compared to x using the abnormal samples respectively:
similarity in data space by calculating the residual loss Lres . ®
0 if Ltotal (z , x)  ⌧
In the case of images, and very similarly in the case of other H(Ltotal (z , x), ⌧ ) = . (8)
regular data such as time series, Lres can be calculated by 1 if Ltotal (z , x) > ⌧
pointwise comparison of x and G(z ): Parts of the general anomaly detection procedure have been
Lres (x, z ) = |x − G(z )|, (4) visualized in Fig. 3, the AnoGAN interpolation in Fig. 4.
In practice, the sets are not convex and not easily interpo-
or another distance measure, depending on the target do- lateable due to a complex loss surface when minimizing the
main. It does not necessarily need to be the distance mini- dissimilarity.
mized during the latent optimization procedure, even though
they do not di↵er in most applications. Afterwards, the dis-
criminator is used to calculate the discriminative loss which
4 Advances in GAN-based Anomaly Detection
enforces G(z ) to lie on the learned manifold. Just as we try We focus on five important challenges that should be consid-
to force the generator to only produce healthy data given ered when performing anomaly detection with GANs: speed
any valid z, the discriminator should only assign high con- and quality of the training process, restricting the latent

32 Volume 25, Issue 2 SIGKDD Explorations


Latent Space Mz Data Space Md

.z0
.x̂0
.zk
.x̂k
.x̂ .x

.z

Figure 4: AnoGAN visualized. To approximate an abnormal sample x, initial latent vector z0 is sampled and used to generate
x0 . By minimizing the dissimilarity in data space, the latent vector z is retrieved, resulting in the best reconstruction x̂ to
x, i.e. minimizing the reconstruction error. The residual loss corresponds to the remaining di↵erence in data space.

space, contaminated training data, the compositional choice used, di↵erent approaches to stabilize training can further
of the anomaly score and inference performance. Some of be applied. These include spectral normalization [42], where
these challenges introduce large and abstract areas of re- a Lipschitz constraint to regularize the training is applied to
search and are related to each other. They can be split up the discriminator by normalizing all weights with the largest
further if desired, but shall act as a first reference point singular value, and a Lipschitz constraint to guarantee simi-
when considering anomaly detection with GANs or related lar scores for data which is close in data space [43]. Progres-
generative models. sive growing of GANs [44] attempts to stabilize and facil-
itate training by incrementally increasing the resolution of
4.1 Speed and quality of the training process the data to iteratively increase the difficulty of the learned
problem, which has since been used in the medical domain
Training GANs and the selection of suitable hyperparame- [45] as well as anomaly detection [46].
ters generally remains a non-trivial task: Choosing a suffi-
cient latent distribution and dimensionality dz (“the degree
of compression”[37]) is not only crucial for inference time. 4.2 Restricting the latent space
It also has significant implications on the reconstruction of
anomalous samples (e.g. [38]) and thus the reconstruction While the previously presented approaches focus on the sta-
error and the anomaly detection performance in general. A bility of the training process itself, the restriction of produc-
small dz can lead to non-convergence and insufficient preser- ing only normal class samples is less frequently discussed.
vation of information by the network because not all rele- In most settings, GANs are used to generalize and there is
vant features can be learned. If dz is too large, training not always a need to completely restrict the latent space.
(and subsequently inference) are slowed down which is es- While producing out-of-distribution data is often not only a
pecially relevant if interpolation through latent space up un- byproduct but the explicit goal of image synthesis, it is diffi-
til ✏-similarity is chosen. Furthermore, too many irrelevant cult to restrict the generalizability of GANs to a subset of the
features might be learned, which hinders anomaly detection generatable data. We initially assumed that the normal class
performance. data is a subset of Rdx and G : z 7! x with dz ⌧ dx . Opti-
Due to the inherent instabilities of the originally proposed mally, the latent space is a lower dimensional representation
GAN architecture, which commonly causes mode collapse or approximation of the space of normal class data. How-
and further problems that hinder stable training, many re- ever, it cannot be guaranteed that the generator is unable
searchers have resorted to practical workarounds. These to produce abnormal class samples in the most common set-
workarounds include minibatch-discrimination [39] or us- ting: Given z 2 N (0, I)dz , the sampled values during train-
ing ensemble learning [27]. Since such methods often im- ing will lie in a finite interval, likely with values only deviat-
ply a significant overhead, novel objective functions such as ing up to a small number of standard deviations from zero in
the Wasserstein objective function (WGAN [40]) have found each dimension. Even though sampling large latent values is
practical use. WGAN requires 1-Lipschitz continuity to sta- highly unlikely with limzi !1 P (zi ) = 0 8 i 2 {1, .., dz }, it
bilize training, which is enforced by weight clipping and has is in principle possible to draw them since each value has a
subsequently been replaced by ensuring that the norm of the non-zero probability to be drawn from the (standard) Gaus-
gradient of the discriminator is penalized to stay (approxi- sian distribution [36]. Since AnoGAN only utilizes similarity
mately) 1 in WGAN-GP [41]. This has also been successfully based gradient information and disregards how likely the re-
applied during training for anomaly detection, see e.g. [28], construction is, it is possible to reconstruct highly unlikely
[35]. data which has been observed by a variety of existing work
Depending on the underlying structure of data and networks [36], [47]. Tong et al. [43] discuss the sensitivity of the

SIGKDD Explorations Volume 25, Issue 2 33


reconstruction error in low density regions of normal class ing to only generate informative potential outliers. Their
data and abnormal data close to the convex hull using au- approach shows to be especially e↵ective on datasets with
toencoders. They form the hypothesis that the model in- clusters of di↵erent shape as well as high ratios of irrelevant
terpolates “too well for anomaly detection”. To avoid this, variables.
we can restrict the possible values a priori, e.g. by using a Further related approaches to divide anomalous from non-
uniform or truncated normal distribution, which introduces anomalous datasamples include Deep SVDDs [37] which have
a tradeo↵ between variety and fidelity [48]. This has to the been used to learn a minimum-volume hypersphere in which
best of our knowledge not been applied to anomaly detection the representations of, as close as possible to, all non-ano-
and requires a theoretical or empirical estimation of the car- malous samples shall lie, regularizing the compactness of the
dinality of the problem domain. A di↵erent approach is to learnt representation. Similarly, OCGAN [38] tackles the
restrict the “accepted” reconstructions: given a multivariate problem of restricting the latent representations by using a
standard Gaussian distributed latent space, it is possible to denoising autoencoder with a latent space which is forced
punish unlikely interpolations. This can be evaluated using to have bounded support enforced by the encoders output
the χ distribution where only certain deviations of the eu- layer. Instead of only rewarding the generation of similar
clidean distance from origin of the latent vector, its latent samples which might also be close to the convex hull, they
norm, from its mode can be allowed by using predefined add an adversarially trained discriminator in latent space to
accepted standard deviations. This can also implicitly be ensure that the representations of in-class examples resem-
learned if the latent norm is utilized as error component ex- ble uniform random samples drawn from the same bounded
tending Eq. 7 using e.g. a grid search or SVM [36]. space. The latent space is explored to produce potential
The ultimate goal of OC classification - and generation in out-of-class and high quality informative-negative examples
this context - is to obtain a latent space where all instance which are fed into the network to steer the training to pro-
from the latent space represent a datum from the given duce only in-class examples. Another approach which lever-
class [38]. With the approximation of the shape of the ages the disciminator is to have a generator with the aim
boundary separating in-class from out-of-class samples be- to produce weak anomalies to create such a distinct bound-
ing the main goal, it becomes even more important to handle ary [27]. Similarly, the use of a generator that does not
the data which is close to such boundary, i.e. the data close attempt to mirror the true data distribution but improves
to the convex hull of normal data where at least Lres is low low density areas to improve the generalization has been
for anomalous points as well. Furthermore, Ldisc is also low proposed [53]. Combining such approaches is also of inter-
if not accounted for points close to the convex hull. This is est includes using two generators with one generator learn-
increasingly important if we only interpolate through latent ing the distribution of normal class data pnormal
g and one
space: Since the generator might in general also produce ab- generator working as a bad generator learning pbad g . The
normal data, moreso if dz is chosen poorly, a bad initial ran- discriminator receives data from pdata , pbad
g , p healthy
g , where
dom sampling might lead to some z producing G(z ) which data from pbad would lead to an improved enforcement of
g
is far - the maximal allowed dissimilarity measured by ✏ - the boundary between normal and abnormal data. Such
away from x, meaning that a normal point might be recon- a distinct boundary does not only have direct applications
structed with a high Lres or an abnormal point which could in common anomaly detection tasks: Neal et al. [54] use
be reconstructed with a low Lres if ✏ is chosen poorly, lead- a GAN to generate counterfactual examples for unknown
ing to inaccurate classifications. This gets more important classes in image classification, reducing the influence of in-
if the data is higher dimensional: Perera et al. [38] argue correct high-confidence predictions.
that especially complex data has a comparably weak nov- In a similar spirit to the previously mentioned investiga-
elty detection performance because the model automatically tion of the χ distribution mentioned before, Berg et al. [19]
learns to represent some out-of-class objects if the shape is attempt to structure the latent space in such a way that
complex, leading to a low reconstruction error. This can the anomalous and normal samples are separated. The ori-
further be problematic if the normality depends on the con- gin distance is used to measure the distance from the en-
text, e.g. in the medical domain. This leads to a fuzzy coded image to the origin in the latent space, assuming that
and non-deterministic boundary since more information is anomalous samples are farther away in latent space. It is
required for a reliable detection. A stricter restriction of the possible that using the distance from a latent space centered
manifold is thus required for many use cases. Sabokrou et al. around normality might allow a more distinctive mapping of
[49] try to achieve a stricter decision boundary by enhancing points close to the convex hull than using only the recon-
inlier samples and distorting outliers, utilizing the addition struction error.
of normal distributed noise. Koizumi et al. [50] generate
a more suitable boundary by simulating non-normal data
based on the Neyman-Pearson lemma, increasing the true 4.3 Contaminated training data
positive rate by using an arbitrarily low false positive rate.
Previously we have assumed that the data set contains cor-
They directly use an objective function which aims to in-
rectly labeled instances. We would like to highlight an often
crease the anomaly detection performance and utilize rejec-
overlooked, but in practice extremely important, problem:
tion sampling to simulate anomalous data and improve the
the contamination of data, meaning that most data sets
resulting convex hull. Such a more distinctive boundary cre-
rarely contain only correctly labeled data. The discussed
ated by explicitly punishing the generation of abnormal data
generative models remain sensitive to outliers in training
during training has received more attention recently (e.g.
data and few anomalous points might contaminate the train-
[46], [51]). Since the difficulties to establish the boundaries
ing set in OC classification [43]. One way to account for this
get larger the higher the dimensionality of the training data
is by assuming that a given sample (labeled as normal) can
is, Liu et al. [52] utilize generative adversarial active learn-
also come from an anomalous distribution [43] to allow a

34 Volume 25, Issue 2 SIGKDD Explorations


pdata (x)
margin of error. described by DG⇤ = pdata (x)+pg (x)
and in case of training up
Only very recently, researchers have begun to systematically to optimality means pg = pdata (having a Jensen-Shannon
investigate the impact of contamination, focusing on image divergence of zero) where the optimal discriminator will al-
data [19], [55], [46]. Some approaches are to reject poten- ways return 12 for all training samples as well as produced
tial anomalies during training [55] or to jointly train an en- samples from the normal class manifold [21]. This has to
coder with the GAN in a progressive manner [19]. Salehi be taken into account since it implies that the discrimina-
et al. [56] mention that the latent space may also primar- tion error might increase over time for both, normal class
ily capture features that are shared by both, normal and and good synthetic samples if a target value of 1 is used
anomalous data. Their work focuses on autoencoders but (Eq. 5). During optimization of the anomaly score, this has
this holds true for GANs as well. They attempt to force to be accounted for, e.g. using non-linear punishments of
their network to encapture features unique to the normal low discriminator predictions and frequent reparameteriza-
class using adversarial examples. tion of the anomaly score. Schlegl et al. [35] average the
discriminator output across a high number of normal train-
4.4 The compositional choice of the anomaly ing samples and substract the discriminator output of some
score test data from the average discriminator score. While the
The multi-component anomaly score is one of the reasons problem solved by this methodology di↵ers, it is one possible
why GANs are of interest for anomaly detection: Utiliz- solution to the aforementioned problem. Some work tries to
ing both, the reconstructive and the discriminative capabil- work around this by not using the direct discriminator score
ities seems to increase the anomaly detection performance, but the feature matching loss [4], [63], [36].
most likely because di↵erent, complementary information Choi et al. [64] further investigate the susceptiblity to out-
is learned by the networks. In theory, either component of-distribution errors, argueing that likelihood models are in
should be sufficient to decide if some point is anomalous, fact very susceptible to out-of-distribution samples, assign-
with the reconstruction error allowing some limited explain- ing large likelihoods to such samples (see also [65]). Even
ability through visualization and it would be desirable to if the computation of the likelihood would be exact, a one-
retrieve easy-to-interpret probabilities from the discrimina- tailed test to check if some data has a low likelihood does not
tor. In practice, it is usually impossible to train GANs to hold for high-dimensional data: They consider an isotropic
(near-) optimality and since the underlying approach is not high dimensional gaussian distribution, where a datum at
unsupervised, acquiring and labeling the normal class data is the origin has maximum likelihood but is considered highly
seldom possible in the real world, especially without the be- atypical because mostp of the probability mass lies in an
fore mentioned contamination. Exploring the practical dif- annulus of radius dz . While likelihoods can determine
ferences and (dis-)advantages is important and current work whether a point lies in the support of a distribution, they do
rarely directly compares the performance of both: A variety not reveal where the probability mass is concentrated. This
of work argues that either the discriminator or the genera- also motivates the previously mentioned restriction of the
tor is partially unfit to deal with anomalous data without latent space based on the χ distribution. Although the den-
o↵ering any experimental evidence. Most often, this is not sity estimation of GANs should not be able to account for
specific to GANs, but focuses on the up- or downsides of probability mass, the generative ensemble presented in Choi
the novelty likelihood, portrayed by the discrimination er- et al. [64] demonstrate anomaly detection capabilities by
ror in the GAN framework, or the residual error: On one combining density estimation and uncertainty estimation.
hand, Deecke et al. [28] argue that the discrimination error Berg et al. [19] compare the use of the distance in image
is not equipped to deal with samples completely unlike the space and the distance in latent space as a discriminative
training data and only utilize the reconstruction error. On factor of reconstructed data. They find that the distance in
the other hand, it is argued that the reconstruction error image space (the reconstruction error as introduced above)
does not always work very well in practice and su↵ers from is clearly preferable when it comes to a separation of the
a variety of problems such as intrinsic biases or instability if validation samples in latent space, and that a good distance
samples shall be reconstructed outside of the learned mani- in data space implies a good distance in latent space in prac-
fold [43], [57], [58], [51]. Pidhorskyi et al. [59] further argue tice but not vice versa.
that the reconstruction error only a↵ects the noise portion Summarizing, the arguments supporting or opposing either
of the model and does not include the signal portion. error component can di↵er, but commonly named restric-
Schlegl et al. [35] argue that only minimizing the recon- tions are the domain-specificity of the reconstruction error
struction error is a subpar measure in regions of the latent and the lack of its general interpretability or the black-box of
space which are only sampled sparsely during training, but the discriminator. Since both perspectives commonly argue
that the reconstruction error itself leads to better localiza- with the instability of the opposite component for out-of-
tion properties. Restricting the generator during training is distribution samples, a systematic evaluation would be of
an important body of future work (e.g. [56], [60], [61]). great use. For now, it is likely and reasonable that both
Many problems presented here are shared with related frame- methods struggle with such data and the performance often
works such as (V)AEs, with An et al. [62] argueing that sta- depends on the problem domain. Restrictions on the la-
tistical anomaly detection methods, such as their proposed tent space named in the previous section can help to reduce
reconstruction probability, are a more objective, intuitive this impact and utilizing information of the latent space can
and robust anomaly score than the reconstruction error for be relevant for anomaly detection as well. Currently, the
VAEs. And while it is stated that they do ”not require model optimal weighting of the respective components should be
specific thresholds for judging anomalies” using such prob- evaluated for each experiment. Lüer et al. [36] find that the
ability based metrics, this is not necessarily true, even more optimal weighting parameter between the error components
so for GANs: using a fixed G, the optimal discriminator is further changes over time during training. They improve de-

SIGKDD Explorations Volume 25, Issue 2 35


tection performance by considering non-linear relationships coder has shown to improve training as well as detection [29],
between the error components using a non-linear SVM in- [36], [73]. Here, the reconstruction error (Eq. 4) as well as
stead of a linear weighting of the anomaly score. the discriminator error, frequently using the feature match-
ing error from Eq. 6, remain core components. Additionally,
4.5 Inference speed and accuracy the latent error has been utilized to leverage properties of
the latent space [36] or more generally to ensure learning
One of the most significant practical limitation of the Ano-
to correctly encode normal class data. One notable exam-
GAN approach is that inference requires high amounts of
ple is GANomaly [69] which utilizes a second encoder to
computational resources and time. The required resources
retrieve a latent representation of the generated image and
depend on a variety of variables: Deecke et al. [28] sam-
learns to minimize the di↵erence of the parametrizations of
ple from multiple initial random z0 to reduce the influence
both encoders. Using such mappings to latent space, the
of erroneous local minima caused by initial sampling in un-
model is not as dependent on random initial noise anymore,
suitable regions of the latent space. In general, this should
but subsequently depends even more on a sufficient inverse
increase the detection performance but also significantly in-
mapping, which can be difficult to learn by itself. Inverse
creases the computational costs. The interpolation further
mappings have since been widely adopted, not showing any
depends on a variety of parameters, including the optimizer
systematic deficits in the performance and reporting a sig-
and its learning rate schedule, the similarity measure and
nificant speedup during inference, see Table 1 (evaluated
resulting target similarity ✏ as well as the maximum allowed
on the beatwise preprocessed MITBIH dataset [74] using a
iterations nmax for the latent space interpolation. The com-
β-VAEGAN[36]). Latent optimization is based on the sim-
putational costs are not necessarily always a restriction, e.g.
ilarity measured by an RBF-Kernel optimized using Adam
inference time is often less relevant if medical images are in-
[75] with an adapting learning. The (runtime as well as
vestigated. But many real-world use cases involve constant
anomaly detection) performance largely depends on the pre-
monitoring and thus evaluations of data which is changing
viously mentioned parameters, further including the batch
at a fast pace. An example is time series analysis in intru-
size used during optimization since similarity is frequently
sion detection or medical monitoring, where the non-trivial
calculated batch-wise. However, this averaged similarity can
and non-convex optimization is too expensive. Additional
distort the results and should be avoided by evaluating the
to the inference speed, the optimizer, ✏ and the allowed in-
similarity per sample. Comparability across datasets and
terpolation iterations are strongly dependent of each other
across variations of these parameters is very limited. Due to
and strongly influence the anomaly detection performance:
only utilizing the forward pass of the encoder-based archi-
Each use case requires the selection of such parameters and
tecture, higher batch sizes can lead to significantly faster in-
a low ✏ or high amount of interpolation iterations might lead
ference while retaining equally good detection performance,
to the generation of anomalous samples while a high ✏ or low
improving inference time from 1.9 ms on GPU (2.4 ms on
amount of allowed interpolation iterations leads to dissimilar
CPU) using a batch size of 1 to 0.08 ms on GPU using a
samples which do not correspond to the true anomalousness.
batch size of 512.
To avoid the costly and error prone parameter selection and
optimization, one can learn an additional, inverse mapping
to G, E(x) = G −1 (x) = z. Following prior work, espe-
Table 1: Inference times (consumer GPU) for a β-VAEGAN
cially adversarial learned inference [66] and adversarial fea-
model using di↵erent anomaly detection approaches on
ture learning [67], Zenati et al. [68] utilize a bidirectional
22427 test samples of the beatwise preprocessed MITBIH
GAN to learn such a mapping for the task of anomaly detec-
dataset.
tion. The discriminator does not only use G(z) or x as input
but (E(x), x) or (z, G(z)). The mapping of the encoder can ✏ nmax batch size Runtime (ms)
either be learned jointly during training [68], [69] but also AnoGAN 0.005 500 1 1139.7 ± 483.3
after training [35]. Berg et al. [19] report that the joint AnoGAN 0.005 500 64 38.27 ± 2.7
training led to a better separation of normal and anomalous AnoGAN 0.005 1000 1 2254.4 ± 997.6
samples in latent space. AnoGAN 0.05 500 1 325.8 ± 560.5
The work is closely related to CycleGAN [70], which uti- AnoGAN 0.05 100 1 77.9 ± 117.7
lize a cycle consistency loss to obtain such an inverse map- Encoder - - 1 1.9 ± 0.3
ping which is furthermore cycle consistent: While a mapping
from a bidirectional GAN learns some mapping E : x ! z,
the inverse mapping of CycleGAN enforces that G and E are
inverses or inverse approximations of each other. This means
that both mappings are bijections, which is also desirable for
5 Discussion
the task of consistent anomaly detection, i.e. E(G(z)) ⇡ z We identify five major, intertwined obstacles GAN-based
and G(E(x)) ⇡ x corresponding to the forward and back- anomaly detection needs to tackle: speed and quality of the
ward cycle-consistency respectively. This has since been training process, restrictions to the latent space, contami-
adopted for anomaly detection and leads to a significant nated training data, novel anomaly score components and
speedup during inference [63], [71], [72]. compositions as well as inference accuracy and speed. The
Similarly to the BiGAN approach, a significant body of work speed and quality of the training process is a funda-
has evolved around the use of adversarially trained autoen- mental requirement to make GAN-based anomaly detection
coders. While using a (variational) autoencoder deviates suitable for a wide array of applications. Adversarial gener-
from the likelihood-free principle of traditional GANs, train- ative models generalize well, which is not always a desired
ing and mode coverage are significantly improved. The ex- property during anomaly detection and requires restric-
tension of using the discriminator additionally to the autoen- tions of the latent space to produce only normal class

36 Volume 25, Issue 2 SIGKDD Explorations


data. In general, losses are usually weighted existing work focuses on the comparison of a narrow set of
metrics, most commonly Fβ and AUC. The Fβ score is es-
k
X k
X pecially susceptible to imbalanced data and since the test
Ltotal = L i · λi , λi = 1, (9) data often also includes only few anomalies, the Fβ score is
i=1 i=1 less meaningful with the resulting score usually being very
high and misrepresenting the actual capabilities. Using more
with k = 1 [28] or k = 2 [4] being common choices. But in
balanced measures, such as be the phi coefficient, might im-
general, novel anomaly score components and compo-
prove the insights that can be gained.
sitions such as explicit information about the latent space
This work has focused on GAN-based approaches that are
and non-linear weigthings can be incorporated. The perfor-
explicitly used to classify data, usually as a feature extrac-
mance and speed of AnoGAN depends of various parame-
tor or via reconstructions, not implicitly e.g. through gen-
ters which are relevant for the search of G(z ) - the opti-
erating data to augment other anomaly detection methods
mizer, target dissimilarity ✏ and the maximum amount of
or related tasks such as segmentation [78] or data impu-
iterations, as well as the component weighting λ and the
tation [79]. Learning the normal class manifold and using
anomaly score threshold ⌧ . While increasing the amount of
GANs to derive or detect changes from it is not limited to
components can improve the performance, it also increases
anomaly detection in a binary classification setting. It can
the time for optimizing λi and ⌧ , making a large search space
also be extended to multiclass anomaly detection and suffi-
computational infeasible. The optimization process can be
cient learning of the class boundaries also allows to generate
performed by a variety of mechanisms, including a naive grid
more specific data. In medicine, such data can be used for
search or by optimizing an SVM [36]. Using fixed weighting
augmentation [80], [78], [81], [82], [83], [80] or practical train-
parameters and anomaly score thresholds can hinder per-
ing of medical professionals [84]. This has since allowed to
formance and should not be utilized. The optimization is
improve anomaly detection performance on tasks with only
usually still cheap in terms of computational resources re-
few available training samples. More extensive evaluations
quired in comparison to the GAN training, but one should
on the implications and possible fallacies are still required.
be aware of the biases the optimization might introduce.
Approaches to protect the privacy of the patients (such as
To avoid tuning interpolation parameters and speed up
di↵erential privacy, e.g. [85]), can be of high importance in
inference and accuracy, inverse mappings have been uti-
this case and need to be incorporated in advance. Augmen-
lized in a variety of settings. Lastly, OC training still re-
tations can also be of relevance during anomaly detection
quires a definition and selection of normal class data, which
since it might be useful to use generative models to over-
can su↵er from contaminated data. Additionally, more
sample infrequent normal samples (e.g. [86]) which are often
extensive research for non-image data will be required: Time
close to the convex hull and are commonly misclassified as
series are usually split up into subsequences during training
false positives.
as well as evaluation. MAD-GAN [76] report more false pos-
GAN-based anomaly detection relies on significant amounts
itives for larger subsequence lengths. A possible explanation
of normal class data. Since no negative information is in-
is that more training data is required to model larger time
corporated in most settings, the detection of anomalies has
series due to the curse of dimensionality. Furthermore, the
proven to be difficult if the similarity in data space is high.
subsequence length likely influences the (in)stability of the
If only the total anomaly score is available, it is more diffi-
training process.
cult to interpret the results, even though visualizations have
Although a variety of challenges remain open and while a
started to give significant insights into the structure. Fur-
significant amount of progress will still be the crucial re-
thermore, training GANs is a significantly more ambitious,
quirement for some practical applications, the recent suc-
complex and time-consuming approach than solely utilizing
cess and interest in GANs on the domain of anomaly de-
a discriminative model. The training procedure requires a
tection sketches their potential. Existing work focuses on
significant amount of domain knowledge and has a higher-
the medical domain [8], which especially benefits from the
dimensional hyperparameter space in comparison to many
GAN-based procedure: By only requiring the definition of
more traditional methods, not necessarily allowing the fast
a normal class, it is possible to detect unknown anoma-
training of a reasonably good baseline. However, the use of
lies/pathologies which can be investigated in-depth. How-
implicit generative models can help improve generalizability
ever, guarantees regarding the detection performance are
on complex data manifolds, partially reducing the assump-
crucial for many medical use cases and the problem of gen-
tions that have to be posed to define and detect anomalies.
erating abnormal data in the one-class setting has not yet
Implicit generative modeling thrives if it is difficult to ex-
received sufficient attention.
plicitly describe anomalies, which is especially common for
More investigations into the comparison of the learned struc-
high dimensional data. GANs provide a general framework
ture between GANs and similarly used representation learn-
which can be applied to many di↵erent data structures and
ing networks (especially autoencoders) are of interest to im-
established components can be easily incorporated (e.g. the
prove understanding and ultimately performance of exist-
use of CNNs for image data). Another advantage of more
ing work. First attempts on comparing generative models
recent methods is the speed of inference in comparison to
with traditional models (e.g. [77]) commonly do not cover
many related OC anomaly detection methods. An addi-
the pecularities of the respective models sufficiently, espe-
tional benefit of using GANs is the reduced amount of data
cially the sensitivity to hyperparameters. Automated ma-
required in a semi-supervised learning process in compari-
chine learning approaches might allow a more robust and
son to similar fully supervised approaches which is especially
fair comparison.
relevant in domains where labeling is expensive [87].
GANs are most commonly used if the underlying dataset
Independent of the actual task, many of the listed chal-
is heavily imbalanced, i.e. if anomalies are scarce and ob-
lenges are applicable to other generative models. This in-
taining extensive data of the normal class is feasible. Most

SIGKDD Explorations Volume 25, Issue 2 37


cludes (also variational or adversarial) autoencoders which 41(3) (2009) 1–58
are used in a similar way to detect anomalies, e.g. [88] or
[51]. In-depth comparisons are left for future investigation. [2] Denning, D.E.: An intrusion-detection model. IEEE
The similarities of their structure and the compression of Transactions on software engineering (2) (1987) 222–
data into a latent representation of a lower dimension than 232
the data space and the use of the encoder, which is com-
monly used to calculate a residual/reconstruction error, are [3] Das, M., Parthasarathy, S.: Anomaly detection and
of high relevance. VAEs themselves su↵er from the problem spatio-temporal analysis of global climate system. In:
of blurriness in image space. This most likely results from Proceedings of the third international workshop on
the di↵use probability mass distribution over the data space knowledge discovery from sensor data. (2009) 142–150
due to the combination of the conditional independence as- [4] Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-
sumption with the maximum likelihood training paradigm Erfurth, U., Langs, G.: Unsupervised anomaly de-
[66], [89]. Due to the comparably new state of inverse map- tection with generative adversarial networks to guide
ping, possible disadvantages in quality or learned structure marker discovery. In: International conference on infor-
have yet to be explored on a larger scale. mation processing in medical imaging, Springer (2017)
146–157

[5] Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detec-


tion system: A survey. Journal of Network and Com-
6 Concluding Remarks puter Applications 68 (2016) 90–113
Pairing generative models and their reconstructive capabil- [6] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On cal-
ities with adversarial feedback has shown state-of-the-art ibration of modern neural networks. In: International
performance on many complex and high dimensional prob- Conference on Machine Learning, PMLR (2017) 1321–
lems, especially if data is heavily imbalanced towards one 1330
class. The generative procedure still su↵ers from unstable
training, frequently leading to deviations from implicit gen- [7] Malhotra, P., Vig, L., Shro↵, G., Agarwal, P.: Long
erative modeling by combining autoencoders with an ad- short term memory networks for anomaly detection in
versarial component. The addition of an inverse mapping time series. In: Proceedings. Volume 89., Presses uni-
from data space to latent space has further significantly im- versitaires de Louvain (2015)
proved inference times. While the achieved performances are
remarkable, sensitive applications such as medical usecases [8] Sabuhi, M., Zhou, M., Bezemer, C.P., Musilek, P.: Ap-
- currently the predominant domain GAN-based anomaly plications of generative adversarial networks in anomaly
detection is applied to [8] - still have to be treated with detection: A systematic literature review. IEEE Access
caution when utilized in practice. Apart from improving (2021)
the efficiency and e↵ectiveness of models themselves, future
work especially needs to account for properties of the la- [9] Di Mattia, F., Galeone, P., De Simoni, M., Ghelfi, E.:
tent space and the possibility to generate data which is A survey on gans for anomaly detection. arXiv preprint
either abnormal or not belonging to the problem domain arXiv:1906.11632 (2019)
at all. This also includes work on utilizing negative infor-
mation to strengthen the decision boundary. The influence [10] Hawkins, D.M.: Identification of outliers. Volume 11.
of GAN-generated augmentation data on arbitrary anomaly Springer (1980)
detection approaches requires further investigations, espe- [11] Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L.,
cially regarding their theoretical limitations. Weightings Erickson, B.J.: Deep learning for brain mri segmenta-
of the individual anomaly detection components should be tion: state of the art and future directions. Journal of
performed for each experiment and can significantly di↵er digital imaging 30(4) (2017) 449–459
between models and datasets. Even though the applicabil-
ity largely depends on the availability of normal class data, [12] Liu, F., Jang, H., Kijowski, R., Bradshaw, T., McMil-
GANs can be very versatile and the trained GAN model can lan, A.B.: Deep learning mr imaging–based attenuation
be used to perform a large variety of auxiliary tasks, mak- correction for pet/mr imaging. Radiology 286(2) (2018)
ing it a very interesting modelling approach. Many of our 676–684
findings are also applicable more generally on other clas-
sification tasks using di↵erent generative models, such as [13] Ding, Y., Sohn, J.H., Kawczynski, M.G., Trivedi, H.,
di↵usion models. Harnish, R., Jenkins, N.W., Lituiev, D., Copeland,
T.P., Aboian, M.S., Mari Aparici, C., et al.: A deep
Acknowledgements learning model to predict a diagnosis of alzheimer dis-
ease by using 18f-fdg pet of the brain. Radiology 290(2)
This work is supported by the Bavarian Research Founda- (2019) 456–464
tion under grant AZ-1419-20.
[14] Hannun, A.Y., Rajpurkar, P., Haghpanahi, M., Ti-
son, G.H., Bourn, C., Turakhia, M.P., Ng, A.Y.:
7 REFERENCES Cardiologist-level arrhythmia detection and classifica-
[1] Chandola, V., Banerjee, A., Kumar, V.: Anomaly de- tion in ambulatory electrocardiograms using a deep
tection: A survey. ACM computing surveys (CSUR) neural network. Nature medicine 25(1) (2019) 65

38 Volume 25, Issue 2 SIGKDD Explorations


[15] Akoglu, L., Tong, H., Koutra, D.: Graph based [29] Zhou, B., Liu, S., Hooi, B., Cheng, X., Ye, J.: Beat-
anomaly detection and description: a survey. Data min- gan: Anomalous rhythm detection using adversarially
ing and knowledge discovery 29(3) (2015) 626–688 generated time series. In: IJCAI. (2019) 4433–4439

[16] Kipf, T.N., Welling, M.: Semi-supervised classifica- [30] Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko,
tion with graph convolutional networks. arXiv preprint L.: A review of novelty detection. Signal Processing 99
arXiv:1609.02907 (2016) (2014) 215–249
[17] Abati, D., Porrello, A., Calderara, S., Cucchiara, R.:
[31] Chalapathy, R., Chawla, S.: Deep learning for anomaly
Latent space autoregression for novelty detection. In:
detection: A survey. arXiv preprint arXiv:1901.03407
Proceedings of the IEEE Conference on Computer Vi-
(2019)
sion and Pattern Recognition. (2019) 481–490

[18] Yeung, D.Y., Chow, C.: Parzen-window network intru- [32] Brophy, E., Wang, Z., She, Q., Ward, T.: Generative
sion detectors. In: Object recognition supported by user adversarial networks in time series: A survey and tax-
interaction for service robots. Volume 4., IEEE (2002) onomy. arXiv preprint arXiv:2107.11098 (2021)
385–388
[33] Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C.,
[19] Berg, A., Ahlberg, J., Felsberg, M.: Unsupervised Vigna, G.: Detecting deceptive reviews using genera-
learning of anomaly detection from contaminated im- tive adversarial networks. In: 2018 IEEE Security and
age data using simultaneous encoder training. arXiv Privacy Workshops (SPW), IEEE (2018) 89–95
preprint arXiv:1905.11034 (2019)
[34] Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro,
[20] Pimentel, T., Monteiro, M., Viana, J., Veloso, A., Zi- L., Regazzoni, C., Sebe, N.: Abnormal event detection
viani, N.: A generalized active learning approach for in videos using generative adversarial nets. In: 2017
unsupervised anomaly detection. stat 1050 (2018) 23 IEEE International Conference on Image Processing
(ICIP), IEEE (2017) 1577–1581
[21] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: [35] Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G.,
Generative adversarial nets. In: Advances in neural in- Schmidt-Erfurth, U.: f-anogan: Fast unsupervised
formation processing systems. (2014) 2672–2680 anomaly detection with generative adversarial net-
[22] Radford, A., Metz, L., Chintala, S.: Unsupervised rep- works. Medical image analysis 54 (2019) 30–44
resentation learning with deep convolutional generative
adversarial networks. arXiv preprint arXiv:1511.06434 [36] Lüer, F., Dolgich, M., Weber, T., Böhm, C.: Adversar-
(2015) ial anomaly detection using gaussian priors and nonlin-
ear anomaly scores. In: 2023 International Conference
[23] Karras, T., Laine, S., Aila, T.: A style-based genera- on Data Mining Workshops (ICDMW), IEEE (2023)
tor architecture for generative adversarial networks. In:
Proceedings of the IEEE Conference on Computer Vi- [37] Ru↵, L., Vandermeulen, R., Goernitz, N., Deecke, L.,
sion and Pattern Recognition. (2019) 4401–4410 Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep
one-class classification. In: International conference on
[24] Li, D., Chen, D., Jin, B., Shi, L., Goh, J., Ng, S.K.: machine learning. (2018) 4393–4402
Mad-gan: Multivariate anomaly detection for time se-
ries data with generative adversarial networks. In: In- [38] Perera, P., Nallapati, R., Xiang, B.: Ocgan: One-
ternational Conference on Artificial Neural Networks, class novelty detection using gans with constrained la-
Springer (2019) 703–716 tent representations. In: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition.
[25] Lüer, F., Mautz, D., Böhm, C.: Anomaly detection in (2019) 2898–2906
time series using generative adversarial networks. In:
2019 International Conference on Data Mining Work- [39] Salimans, T., Goodfellow, I.J., Zaremba, W., Che-
shops (ICDMW), IEEE (2019) 1047–1048 ung, V., Radford, A., Chen, X.: Improved techniques
for training gans. In Lee, D.D., Sugiyama, M., von
[26] Bojchevski, A., Shchur, O., Zügner, D., Günnemann,
Luxburg, U., Guyon, I., Garnett, R., eds.: Advances
S.: Netgan: Generating graphs via random walks.
in Neural Information Processing Systems 29: Annual
arXiv preprint arXiv:1803.00816 (2018)
Conference on Neural Information Processing Systems
[27] Wang, C., Zhang, Y.M., Liu, C.L.: Anomaly detec- 2016, December 5-10, 2016, Barcelona, Spain. (2016)
tion via minimum likelihood generative adversarial net- 2226–2234
works. In: 2018 24th International Conference on Pat-
tern Recognition (ICPR), IEEE (2018) 1121–1126 [40] Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein
gan. arXiv preprint arXiv:1701.07875 (2017)
[28] Deecke, L., Vandermeulen, R., Ru↵, L., Mandt, S.,
Kloft, M.: Image anomaly detection with generative [41] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V.,
adversarial networks. In: Joint European Conference Courville, A.C.: Improved training of wasserstein gans.
on Machine Learning and Knowledge Discovery in In: Advances in neural information processing systems.
Databases, Springer (2018) 3–17 (2017) 5767–5777

SIGKDD Explorations Volume 25, Issue 2 39


[42] Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: [55] Beggel, L., Pfei↵er, M., Bischl, B.: Robust anomaly de-
Spectral normalization for generative adversarial net- tection in images using adversarial autoencoders. arXiv
works. In: 6th International Conference on Learning preprint arXiv:1901.06355 (2019)
Representations, ICLR 2018, Vancouver, BC, Canada,
April 30 - May 3, 2018, Conference Track Proceedings, [56] Salehi, M., Arya, A., Pajoum, B., Otoofi, M., Shaeiri,
OpenReview.net (2018) A., Rohban, M.H., Rabiee, H.R.: Arae: Adversarially
robust training of autoencoders improves novelty detec-
[43] Tong, A., Wolf, G., Krishnaswamy, S.: A lipschitz- tion. Neural Networks 144 (2021) 726–736
constrained anomaly discriminator framework. arXiv
preprint arXiv:1905.10710 (2019) [57] Šmı́dl, V., Bı́m, J., Pevnỳ, T.: Anomaly scores for gen-
erative models. arXiv preprint arXiv:1905.11890 (2019)
[44] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progres-
sive growing of gans for improved quality, stability, and [58] An, J., Cho, S.: Variational autoencoder based
variation. arXiv preprint arXiv:1710.10196 (2017) anomaly detection using reconstruction probability.
Special Lecture on IE 2(1) (2015) 1–18
[45] Baur, C., Albarqouni, S., Navab, N.: Generating highly
realistic images of skin lesions with gans. In: OR [59] Pidhorskyi, S., Almohsen, R., Doretto, G.: Generative
2.0 Context-Aware Operating Theaters, Computer As- probabilistic novelty detection with adversarial autoen-
sisted Robotic Endoscopy, Clinical Image-Based Proce- coders. In: Advances in neural information processing
dures, and Skin Image Analysis. Springer (2018) 260– systems. (2018) 6822–6833
267
[60] Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L.,
[46] Kimura, M., Yanagihara, T.: Anomaly detection using Tran, B., Madry, A.: Adversarial examples are not
gans for visual inspection in noisy training data. In: bugs, they are features. In: Advances in Neural Infor-
Asian Conference on Computer Vision, Springer (2018) mation Processing Systems. (2019) 125–136
373–385
[61] Madry, A., Makelov, A., Schmidt, L., Tsipras, D.,
[47] Yoon, S., Noh, Y.K., Park, F.C.: Autoencod- Vladu, A.: Towards deep learning models resistant
ing under normalization constraints. arXiv preprint to adversarial attacks. arXiv preprint arXiv:1706.06083
arXiv:2105.05735 (2021) (2017)
[48] Brock, A., Donahue, J., Simonyan, K.: Large scale gan
[62] An, J., Cho, S.: Variational autoencoder based
training for high fidelity natural image synthesis. arXiv
anomaly detection using reconstruction probability.
preprint arXiv:1809.11096 (2018)
Special Lecture on IE 2(1) (2015) 1–18
[49] Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Ad-
versarially learned one-class classifier for novelty detec- [63] Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chan-
tion. In: Proceedings of the IEEE Conference on Com- drasekhar, V.: Adversarially learned anomaly detec-
puter Vision and Pattern Recognition. (2018) 3379– tion. In: 2018 IEEE International Conference on Data
3388 Mining (ICDM), IEEE (2018) 727–736

[50] Koizumi, Y., Saito, S., Uematsu, H., Kawachi, Y., [64] Choi, H., Jang, E., Alemi, A.A.: Waic, but why? gen-
Harada, N.: Unsupervised detection of anomalous erative ensembles for robust anomaly detection. arXiv
sound based on deep learning and the neyman–pearson preprint arXiv:1810.01392 (2018)
lemma. IEEE/ACM Transactions on Audio, Speech,
[65] Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur,
and Language Processing 27(1) (2018) 212–224
D., Lakshminarayanan, B.: Do deep generative mod-
[51] Kimura, D., Chaudhury, S., Narita, M., Munawar, A., els know what they don’t know? arXiv preprint
Tachibana, R.: Adversarial discriminative attention for arXiv:1810.09136 (2018)
robust anomaly detection. In: The IEEE Winter Con-
ference on Applications of Computer Vision. (2020) [66] Dumoulin, V., Belghazi, I., Poole, B., Mastropietro,
2172–2181 O., Lamb, A., Arjovsky, M., Courville, A.: Adversari-
ally learned inference. arXiv preprint arXiv:1606.00704
[52] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, (2016)
M., He, X.: Generative adversarial active learning for
unsupervised outlier detection. IEEE Transactions on [67] Donahue, J., Krähenbühl, P., Darrell, T.: Adversar-
Knowledge and Data Engineering (2019) ial feature learning. arXiv preprint arXiv:1605.09782
(2016)
[53] Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdi-
nov, R.R.: Good semi-supervised learning that requires [68] Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chan-
a bad gan. In: Advances in neural information process- drasekhar, V.R.: Efficient gan-based anomaly detec-
ing systems. (2017) 6510–6520 tion. arXiv preprint arXiv:1802.06222 (2018)

[54] Neal, L., Olson, M., Fern, X., Wong, W.K., Li, F.: [69] Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.:
Open set learning with counterfactual images. In: Pro- Ganomaly: Semi-supervised anomaly detection via ad-
ceedings of the European Conference on Computer Vi- versarial training. In: Asian conference on computer
sion (ECCV). (2018) 613–628 vision, Springer (2018) 622–637

40 Volume 25, Issue 2 SIGKDD Explorations


[70] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired [83] Han, C., Murao, K., Noguchi, T., Kawata, Y.,
image-to-image translation using cycle-consistent ad- Uchiyama, F., Rundo, L., Nakayama, H., Satoh, S.:
versarial networks. In: Proceedings of the IEEE inter- Learning more with less: Conditional pggan-based
national conference on computer vision. (2017) 2223– data augmentation for brain metastases detection using
2232 highly-rough annotation on mr images. In: Proceedings
of the 28th ACM International Conference on Informa-
[71] Hirose, N., Sadeghian, A., Vázquez, M., Goebel,
tion and Knowledge Management. (2019) 119–127
P., Savarese, S.: Gonet: A semi-supervised deep
learning approach for traversability estimation. In: [84] Han, C., Hayashi, H., Rundo, L., Araki, R., Shi-
2018 IEEE/RSJ International Conference on Intelligent moda, W., Muramatsu, S., Furukawa, Y., Mauri, G.,
Robots and Systems (IROS), IEEE (2018) 3044–3051 Nakayama, H.: Gan-based synthetic brain mr image
[72] Armanious, K., Jiang, C., Abdulatif, S., Küstner, T., generation. In: 2018 IEEE 15th International Sympo-
Gatidis, S., Yang, B.: Unsupervised medical image sium on Biomedical Imaging (ISBI 2018), IEEE (2018)
translation using cycle-medgan. In: 2019 27th Euro- 734–738
pean Signal Processing Conference (EUSIPCO), IEEE [85] Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B.,
(2019) 1–5 Mironov, I., Talwar, K., Zhang, L.: Deep learning
[73] van Hespen, K.M., Zwanenburg, J.J., Dankbaar, J.W., with di↵erential privacy. In: Proceedings of the 2016
Geerlings, M.I., Hendrikse, J., Kuijf, H.J.: An anomaly ACM SIGSAC Conference on Computer and Commu-
detection approach to identify chronic brain infarcts on nications Security. (2016) 308–318
mri. Scientific Reports 11(1) (2021) 1–10
[86] Lim, S.K., Loo, Y., Tran, N.T., Cheung, N.M., Roig,
[74] Kachuee, M., Fazeli, S., Sarrafzadeh, M.: Ecg heart- G., Elovici, Y.: Doping: Generative data augmenta-
beat classification: A deep transferable representation. tion for unsupervised anomaly detection with gan. In:
In: 2018 IEEE international conference on healthcare 2018 IEEE International Conference on Data Mining
informatics (ICHI), IEEE (2018) 443–444 (ICDM), IEEE (2018) 1122–1127
[75] Kingma, D.P., Ba, J.: Adam: A method for stochastic [87] Madani, A., Moradi, M., Karargyris, A., Syeda-
optimization. arXiv preprint arXiv:1412.6980 (2014) Mahmood, T.: Semi-supervised learning with genera-
tive adversarial networks for chest x-ray classification
[76] Li, D., Chen, D., Jin, B., Shi, L., Goh, J., Ng, S.K.:
with ability of data domain adaptation. In: 2018 IEEE
Mad-gan: Multivariate anomaly detection for time se-
15th International Symposium on Biomedical Imaging
ries data with generative adversarial networks. In: In-
(ISBI 2018), IEEE (2018) 1038–1042
ternational Conference on Artificial Neural Networks,
Springer (2019) 703–716 [88] Chen, X., Konukoglu, E.: Unsupervised detection of
[77] Škvára, V., Pevnỳ, T., Šmı́dl, V.: Are generative lesions in brain mri using constrained adversarial auto-
deep models for novelty detection truly better? arXiv encoders. arXiv preprint arXiv:1806.04972 (2018)
preprint arXiv:1807.05027 (2018) [89] Theis, L., Oord, A.v.d., Bethge, M.: A note on
[78] Mahmood, F., Borders, D., Chen, R., McKay, G.N., the evaluation of generative models. arXiv preprint
Salimian, K.J., Baras, A., Durr, N.J.: Deep adver- arXiv:1511.01844 (2015)
sarial training for multi-organ nuclei segmentation in
histopathology images. IEEE transactions on medical
imaging (2019)
[79] Luo, Y., Cai, X., Zhang, Y., Xu, J., et al.: Multivariate
time series imputation with generative adversarial net-
works. In: Advances in Neural Information Processing
Systems. (2018) 1596–1607
[80] Frid-Adar, M., Diamant, I., Klang, E., Amitai, M.,
Goldberger, J., Greenspan, H.: Gan-based synthetic
medical image augmentation for increased cnn perfor-
mance in liver lesion classification. Neurocomputing
321 (2018) 321–331
[81] Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn,
R., Hammers, A., Dickie, D.A., Hernández, M.V.,
Wardlaw, J., Rueckert, D.: Gan augmentation: Aug-
menting training data using generative adversarial net-
works. arXiv preprint arXiv:1810.10863 (2018)
[82] Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J.,
Greenspan, H.: Synthetic data augmentation using gan
for improved liver lesion classification. In: 2018 IEEE
15th international symposium on biomedical imaging
(ISBI 2018), IEEE (2018) 289–293

SIGKDD Explorations Volume 25, Issue 2 41

You might also like