Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges
Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges
Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges
ABSTRACT has shifted the focus to deep learning algorithms with a ma-
jor application being complex high dimensional data, e.g.
The applications of Generative Adversarial Networks (GANs)
image data, where handcrafting features is prone to errors.
are just as diverse as their architectures, problem settings
Neural networks can be used to detect anomalies in vari-
as well as challenges. A key area of research on GANs is
ous ways. It is possible to compare the anomalous input
anomaly detection where they are most often utilized when
data in a discriminative way using a threshold output score
only the data of one class is readily available.
or forecasted values. Discriminative models such as Recur-
In this work, we organize, summarize and compare key con-
rent Neural Networks (RNN) and Convolutional Neural Net-
cepts and challenges of anomaly detection based on GANs.
works (CNN) have found frequent use but can su↵er from
Common problems which have to be investigated to progress
a variety of problems. These issues include di↵erences be-
the applicability of GANs are identified and discussed. This
tween the predicted probability estimate of a class label and
includes stability and time requirements during training as
the ground truth correctness likelihood, which can be im-
well as inference, the restriction of the latent space to pro-
proved by calibrating the output [6]. Other challenges can-
duce solely data from the normal class distribution, con-
not easily be solved by improvements of the processing or
taminated training data as well as the composition of the
training itself. Requiring a balanced dataset to avoid a shift
resulting anomaly detection score. We discuss the problems
to classifying only the predominant class is one of the most
using existing work as well as possible (partial) solutions, in-
impactful problems. This led to a surge in generative mod-
cluding related work from similar areas of research such as
eling approaches such as (variational) autoencoders (VAEs)
related generative models or novelty detection. Our findings
or Generative Adversarial Networks (GANs): Through one-
are also relevant for a variety of closely related generative
class (OC) learning of only the normal class distribution, we
modeling approaches, such as autoencoders, and are of inter-
can attempt to learn a generative procedure which should
est for areas of research tangent to anomaly detection such
only produce normal class data, leveraging the reconstruc-
as image inpainting or image translation.
tion error between real and synthetic data.
For either method, an anomaly score is often calculated, e.g.
Keywords based on the output score of discriminative networks or the
di↵erence between real and forecasted or synthetic values.
Adversarial Generative Models, Anomaly Detection, Gener-
This score can be compared to a predefined threshold which
ative Adversarial Network, Novelty Detection, Outlier De-
may not be exceeded by normal class input data (e.g. au-
tection
toregressive neural networks [7]). Since the concept is based
on learning a distribution from which specific data is gen-
1 Introduction erated, it can be applied to related models such as (V)AEs
whose applicability we further discuss in the following sec-
Anomalies are commonly described as patterns in data not tions.
conforming to expected behavior [1]. Detecting anomalies is Existing reviews [8] [9] on GAN-based anomaly detection fo-
a frequent problem occuring on various types of data where cus on applications and types of data as well as architectures
the “expected behavior” is usually represented by a set of and metrics. Our work especially targets the investigation
normal instances. Anomaly detection can often be framed of more general challenges of GAN-based anomaly detec-
as a binary classification problem where the task is to distin- tion, which are mostly independent of the application, type
guish solely between normal and abnormal instances. De- of data or network architecture. Existing work on solving
tecting anomalies is crucial for an abundance of industries: these challenges is clustered into distinctive categories and
It is important to detect intrusions into networks [2], ab- connected to related work in similar domains such as AEs.
normal data in (spatio-temporal) climate data [3], patterns We introduce foundations on anomaly detection as well as
which indicate human diseases [4] or to minimize the risk GANs in Section 2. The basic idea of AnoGAN [4], a central
of fraud [5]. In many of these applications, prompt actions approach used to detect anomalies using GANs, is presented
are required to diminish or avoid damage, or to enable novel in Section 3. Subsequent advances and challenges are used
applications. For many such applications, recent progress to discuss and extend this framework in Section 4, focusing
on practical as well as theoretical challenges. This especially
includes i) the speed and quality of the training process, ii)
Feedback
z
Normal x
class
Normal
class
Abnormal
Figure 1: General training pipeline. The discriminator as- Figure 2: General anomaly detection pipeline. A given da-
sesses the quality of generated and real data. This informa- tum x is judged based on the discriminator as well as recon-
tion is used to iteratively improve the discriminator as well struction loss between x and an arbitrarily similar sample x̂.
as generator. x̂ can be retrieved by an explicit latent optimization mini-
mizing the dissimilarity or implicitly by retrieving an inverse
mapping using an encoding network.
visualize these transitions using attributes of graphs. troductions to models or applications, e.g. Chalapathy et
al. [31] for general deep learning based anomaly detection or
Brophy et al. [32] for a more general use of GANs on time
2.3 Anomaly Detection using Generative Ad- series data, which includes the use of anomaly detection.
versarial Networks More recently, GAN-based anomaly detection has received
a significant amount of attention and concurrent work exists
The most widespread approach to GAN-based anomaly de- which specifically reviews the usage of GANs for anomaly
tection is based on learning the manifold of data of the nor- detection. Di Mattia et al. [9] perform a practical com-
mal class, meaning that the training data usually consists parison of three major approaches in GAN-based anomaly
of only normal class data, i.e. pdata ⇡ pn , such that ⇤ = 0 detection. Sabuhi et al. [8] perform an exhaustive review of
(Eq. 2) for the training data. The abnormal data is only existing literature, investigating the domain of application,
used for validation and testing in the OC setting. It should model architecture, datasets and evaluation metrics used
be considered to include abnormal data during training as by GAN-based anomaly detection. It shows a broad range
“synthetic” data in practical applications to improve the re- of applications for GAN-based anomaly detection ranging
sulting learned generative procedure (e.g. using minimum from medical imaging [4] over the detection of deceptive re-
likelihood regularization [27]) and to improve the separating views [33] to time series data [24] or videos [34]. While
hyperplane of the discriminator. Data points can be rated as this work is an excellent resource for an overview of the
normal or abnormal based on reconstructing the data x us- currently used components, existing work rarely discusses
ing G(z) (e.g. [28], [29]) and calculating a residual loss Lres the actual challenges of GAN-based anomaly detection apart
using a well-defined and domain-dependent distance metric. from short remarks on training stability. Our work focuses
Additionally, the discriminator can be used to assess if some on investigating these practical as well as theoretical obsta-
datum belongs to the distribution represented by the gener- cles which - to the best of our knowledge - have not been
ator by asserting a likelihood (e.g. [27]). A major approach discussed in a systematic manner before.
in this area of research, AnoGAN, utilizes a combination of
both components. The general training pipeline of GANs as 3 The Fundamental Approach of AnoGAN
well as the basic anomaly detection components have been
visualized in Fig. 1 and Fig. 2. Before discussing AnoGAN While GANs can be used to detect anomalies in a variety
and subsequently the most important developments in this of ways, the procedure of AnoGAN by Schlegl et al. [4] can
domain and its applicability on a variety of data structures be considered central to most of these approaches and sub-
as well as practical or theoretical advances on this domain, sequent developments presented in this work are derived as
we will briefly discuss existing applications and reviews of adaptions from this approach. AnoGAN utilizes both, the
GAN-based anomaly detection. di↵erence in the features using the discriminator and its loss
Ldisc , as well as the di↵erence in data space, using the gen-
Previous surveys on anomaly detection frequently focus on erator to calculate the absolute error as the residual loss
a broad array of methods (e.g. [30], [1]) or the general use Lres . This is achieved by fully training a GAN on normal
of deep learning in anomaly detection and very general in- class data to learn the generator mapping G : z 7! x. Dur-
zx̂
x
x̂ = G(zx̂ ) ⇡ G(zx ) = x
Figure 3: Simplified mapping between latent and data space. Each point in the lower dimensional latent space maps to a
point in data space, resulting in a subspace of the data space which spans the data generatable by the GAN (blue area in
Md ). By training on only normal class data (yellow area in Md ), it is attempted to restrict the generated data. In practice,
the generatable data also includes abnormal data (red area in Md ) and data which does not belong to the problem domain
(e.g. if the task is to investigate CT images a valid image is generated but it is no realistic CT). Similarly, given a predefined
threshold value for the discriminator error, one can separate a subset aiming to encompass normal class data as good as
possible. By using a combination of both error components, we aim to minimize the amount of falsely classified samples.
ing inference, we try to determine if some novel datum x fidence values to healthy data. The resulting discriminator
should be labeled as anomalous. To achieve this, it is re- loss is used by feeding G(z ) to the discriminator, resulting
quired to find the noise vector z which is mapped as close as in the following loss:
possible to x using the generator by interpolating through
latent space: Initial noise z1 is sampled randomly to gen- Ldisc (z , x) = σ(D(G(z ), ↵) (5)
erate G(z1 ) in data space. The dissimilarity between x and with σ being the sigmoid cross entropy which is used to
G(z1 ) is calculated. AnoGAN utilizes the absolute error but describe the discriminator loss during training with logits
other common, well-defined distances have frequently been D(G(z )) and targets ↵ = 1. The exact calculation of Ldisc
applied as proxies for the similarity of data points, includ- and Lres can di↵er: Schlegl et al. [4] further propose another
ing the euclidan distance. For time series data, RBF kernels Ldisc based on feature matching
have been a common choice and more time series specific
measures such as dynamic time warping can easily be uti- Ldisc (z , x) = |f (x) − f (G(z ))|, (6)
lized. This distance is used to define a loss function to pro-
which has since been frequently applied [35], [29], [36]. The
vide gradients that allow to move to some z2 where G(z2 ) is
generator and discriminator are jointly used to calculate a
more similar to x than G(z1 ). This is repeated Γ times to
combined loss which is a weighted sum of both components:
find the most similar image G(z ) which can be constructed
using the normal class manifold learned by the generator. Γ Ltotal (z , x) = (1 − λ)Lres (z ) + λLD (z ). (7)
can either be a fixed value or be dynamically determined by
a target similarity ✏. A mixture of both strategies is com- Here, Ltotal (z , x) can be used directly to calculate an ano-
monly used where we try to find an ✏-similar datum and maly score. The anomaly score can be thresholded by some
interrupt optimization after a maximum of nmax steps in predefined or optimized ⌧ to determine a label correspond-
case the input is too dissimilar for the target similarity to ing to x using H : Ltotal (z , x) 7! {0, 1} with H = 0 cor-
be reached and to guarantee a maximum runtime. As soon responding to normal samples and H = 1 corresponding to
as x̂ = G(z ) is determined, it is compared to x using the abnormal samples respectively:
similarity in data space by calculating the residual loss Lres . ®
0 if Ltotal (z , x) ⌧
In the case of images, and very similarly in the case of other H(Ltotal (z , x), ⌧ ) = . (8)
regular data such as time series, Lres can be calculated by 1 if Ltotal (z , x) > ⌧
pointwise comparison of x and G(z ): Parts of the general anomaly detection procedure have been
Lres (x, z ) = |x − G(z )|, (4) visualized in Fig. 3, the AnoGAN interpolation in Fig. 4.
In practice, the sets are not convex and not easily interpo-
or another distance measure, depending on the target do- lateable due to a complex loss surface when minimizing the
main. It does not necessarily need to be the distance mini- dissimilarity.
mized during the latent optimization procedure, even though
they do not di↵er in most applications. Afterwards, the dis-
criminator is used to calculate the discriminative loss which
4 Advances in GAN-based Anomaly Detection
enforces G(z ) to lie on the learned manifold. Just as we try We focus on five important challenges that should be consid-
to force the generator to only produce healthy data given ered when performing anomaly detection with GANs: speed
any valid z, the discriminator should only assign high con- and quality of the training process, restricting the latent
.z0
.x̂0
.zk
.x̂k
.x̂ .x
.z
Figure 4: AnoGAN visualized. To approximate an abnormal sample x, initial latent vector z0 is sampled and used to generate
x0 . By minimizing the dissimilarity in data space, the latent vector z is retrieved, resulting in the best reconstruction x̂ to
x, i.e. minimizing the reconstruction error. The residual loss corresponds to the remaining di↵erence in data space.
space, contaminated training data, the compositional choice used, di↵erent approaches to stabilize training can further
of the anomaly score and inference performance. Some of be applied. These include spectral normalization [42], where
these challenges introduce large and abstract areas of re- a Lipschitz constraint to regularize the training is applied to
search and are related to each other. They can be split up the discriminator by normalizing all weights with the largest
further if desired, but shall act as a first reference point singular value, and a Lipschitz constraint to guarantee simi-
when considering anomaly detection with GANs or related lar scores for data which is close in data space [43]. Progres-
generative models. sive growing of GANs [44] attempts to stabilize and facil-
itate training by incrementally increasing the resolution of
4.1 Speed and quality of the training process the data to iteratively increase the difficulty of the learned
problem, which has since been used in the medical domain
Training GANs and the selection of suitable hyperparame- [45] as well as anomaly detection [46].
ters generally remains a non-trivial task: Choosing a suffi-
cient latent distribution and dimensionality dz (“the degree
of compression”[37]) is not only crucial for inference time. 4.2 Restricting the latent space
It also has significant implications on the reconstruction of
anomalous samples (e.g. [38]) and thus the reconstruction While the previously presented approaches focus on the sta-
error and the anomaly detection performance in general. A bility of the training process itself, the restriction of produc-
small dz can lead to non-convergence and insufficient preser- ing only normal class samples is less frequently discussed.
vation of information by the network because not all rele- In most settings, GANs are used to generalize and there is
vant features can be learned. If dz is too large, training not always a need to completely restrict the latent space.
(and subsequently inference) are slowed down which is es- While producing out-of-distribution data is often not only a
pecially relevant if interpolation through latent space up un- byproduct but the explicit goal of image synthesis, it is diffi-
til ✏-similarity is chosen. Furthermore, too many irrelevant cult to restrict the generalizability of GANs to a subset of the
features might be learned, which hinders anomaly detection generatable data. We initially assumed that the normal class
performance. data is a subset of Rdx and G : z 7! x with dz ⌧ dx . Opti-
Due to the inherent instabilities of the originally proposed mally, the latent space is a lower dimensional representation
GAN architecture, which commonly causes mode collapse or approximation of the space of normal class data. How-
and further problems that hinder stable training, many re- ever, it cannot be guaranteed that the generator is unable
searchers have resorted to practical workarounds. These to produce abnormal class samples in the most common set-
workarounds include minibatch-discrimination [39] or us- ting: Given z 2 N (0, I)dz , the sampled values during train-
ing ensemble learning [27]. Since such methods often im- ing will lie in a finite interval, likely with values only deviat-
ply a significant overhead, novel objective functions such as ing up to a small number of standard deviations from zero in
the Wasserstein objective function (WGAN [40]) have found each dimension. Even though sampling large latent values is
practical use. WGAN requires 1-Lipschitz continuity to sta- highly unlikely with limzi !1 P (zi ) = 0 8 i 2 {1, .., dz }, it
bilize training, which is enforced by weight clipping and has is in principle possible to draw them since each value has a
subsequently been replaced by ensuring that the norm of the non-zero probability to be drawn from the (standard) Gaus-
gradient of the discriminator is penalized to stay (approxi- sian distribution [36]. Since AnoGAN only utilizes similarity
mately) 1 in WGAN-GP [41]. This has also been successfully based gradient information and disregards how likely the re-
applied during training for anomaly detection, see e.g. [28], construction is, it is possible to reconstruct highly unlikely
[35]. data which has been observed by a variety of existing work
Depending on the underlying structure of data and networks [36], [47]. Tong et al. [43] discuss the sensitivity of the
[16] Kipf, T.N., Welling, M.: Semi-supervised classifica- [30] Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko,
tion with graph convolutional networks. arXiv preprint L.: A review of novelty detection. Signal Processing 99
arXiv:1609.02907 (2016) (2014) 215–249
[17] Abati, D., Porrello, A., Calderara, S., Cucchiara, R.:
[31] Chalapathy, R., Chawla, S.: Deep learning for anomaly
Latent space autoregression for novelty detection. In:
detection: A survey. arXiv preprint arXiv:1901.03407
Proceedings of the IEEE Conference on Computer Vi-
(2019)
sion and Pattern Recognition. (2019) 481–490
[18] Yeung, D.Y., Chow, C.: Parzen-window network intru- [32] Brophy, E., Wang, Z., She, Q., Ward, T.: Generative
sion detectors. In: Object recognition supported by user adversarial networks in time series: A survey and tax-
interaction for service robots. Volume 4., IEEE (2002) onomy. arXiv preprint arXiv:2107.11098 (2021)
385–388
[33] Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C.,
[19] Berg, A., Ahlberg, J., Felsberg, M.: Unsupervised Vigna, G.: Detecting deceptive reviews using genera-
learning of anomaly detection from contaminated im- tive adversarial networks. In: 2018 IEEE Security and
age data using simultaneous encoder training. arXiv Privacy Workshops (SPW), IEEE (2018) 89–95
preprint arXiv:1905.11034 (2019)
[34] Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro,
[20] Pimentel, T., Monteiro, M., Viana, J., Veloso, A., Zi- L., Regazzoni, C., Sebe, N.: Abnormal event detection
viani, N.: A generalized active learning approach for in videos using generative adversarial nets. In: 2017
unsupervised anomaly detection. stat 1050 (2018) 23 IEEE International Conference on Image Processing
(ICIP), IEEE (2017) 1577–1581
[21] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: [35] Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G.,
Generative adversarial nets. In: Advances in neural in- Schmidt-Erfurth, U.: f-anogan: Fast unsupervised
formation processing systems. (2014) 2672–2680 anomaly detection with generative adversarial net-
[22] Radford, A., Metz, L., Chintala, S.: Unsupervised rep- works. Medical image analysis 54 (2019) 30–44
resentation learning with deep convolutional generative
adversarial networks. arXiv preprint arXiv:1511.06434 [36] Lüer, F., Dolgich, M., Weber, T., Böhm, C.: Adversar-
(2015) ial anomaly detection using gaussian priors and nonlin-
ear anomaly scores. In: 2023 International Conference
[23] Karras, T., Laine, S., Aila, T.: A style-based genera- on Data Mining Workshops (ICDMW), IEEE (2023)
tor architecture for generative adversarial networks. In:
Proceedings of the IEEE Conference on Computer Vi- [37] Ru↵, L., Vandermeulen, R., Goernitz, N., Deecke, L.,
sion and Pattern Recognition. (2019) 4401–4410 Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep
one-class classification. In: International conference on
[24] Li, D., Chen, D., Jin, B., Shi, L., Goh, J., Ng, S.K.: machine learning. (2018) 4393–4402
Mad-gan: Multivariate anomaly detection for time se-
ries data with generative adversarial networks. In: In- [38] Perera, P., Nallapati, R., Xiang, B.: Ocgan: One-
ternational Conference on Artificial Neural Networks, class novelty detection using gans with constrained la-
Springer (2019) 703–716 tent representations. In: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition.
[25] Lüer, F., Mautz, D., Böhm, C.: Anomaly detection in (2019) 2898–2906
time series using generative adversarial networks. In:
2019 International Conference on Data Mining Work- [39] Salimans, T., Goodfellow, I.J., Zaremba, W., Che-
shops (ICDMW), IEEE (2019) 1047–1048 ung, V., Radford, A., Chen, X.: Improved techniques
for training gans. In Lee, D.D., Sugiyama, M., von
[26] Bojchevski, A., Shchur, O., Zügner, D., Günnemann,
Luxburg, U., Guyon, I., Garnett, R., eds.: Advances
S.: Netgan: Generating graphs via random walks.
in Neural Information Processing Systems 29: Annual
arXiv preprint arXiv:1803.00816 (2018)
Conference on Neural Information Processing Systems
[27] Wang, C., Zhang, Y.M., Liu, C.L.: Anomaly detec- 2016, December 5-10, 2016, Barcelona, Spain. (2016)
tion via minimum likelihood generative adversarial net- 2226–2234
works. In: 2018 24th International Conference on Pat-
tern Recognition (ICPR), IEEE (2018) 1121–1126 [40] Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein
gan. arXiv preprint arXiv:1701.07875 (2017)
[28] Deecke, L., Vandermeulen, R., Ru↵, L., Mandt, S.,
Kloft, M.: Image anomaly detection with generative [41] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V.,
adversarial networks. In: Joint European Conference Courville, A.C.: Improved training of wasserstein gans.
on Machine Learning and Knowledge Discovery in In: Advances in neural information processing systems.
Databases, Springer (2018) 3–17 (2017) 5767–5777
[50] Koizumi, Y., Saito, S., Uematsu, H., Kawachi, Y., [64] Choi, H., Jang, E., Alemi, A.A.: Waic, but why? gen-
Harada, N.: Unsupervised detection of anomalous erative ensembles for robust anomaly detection. arXiv
sound based on deep learning and the neyman–pearson preprint arXiv:1810.01392 (2018)
lemma. IEEE/ACM Transactions on Audio, Speech,
[65] Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur,
and Language Processing 27(1) (2018) 212–224
D., Lakshminarayanan, B.: Do deep generative mod-
[51] Kimura, D., Chaudhury, S., Narita, M., Munawar, A., els know what they don’t know? arXiv preprint
Tachibana, R.: Adversarial discriminative attention for arXiv:1810.09136 (2018)
robust anomaly detection. In: The IEEE Winter Con-
ference on Applications of Computer Vision. (2020) [66] Dumoulin, V., Belghazi, I., Poole, B., Mastropietro,
2172–2181 O., Lamb, A., Arjovsky, M., Courville, A.: Adversari-
ally learned inference. arXiv preprint arXiv:1606.00704
[52] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, (2016)
M., He, X.: Generative adversarial active learning for
unsupervised outlier detection. IEEE Transactions on [67] Donahue, J., Krähenbühl, P., Darrell, T.: Adversar-
Knowledge and Data Engineering (2019) ial feature learning. arXiv preprint arXiv:1605.09782
(2016)
[53] Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdi-
nov, R.R.: Good semi-supervised learning that requires [68] Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chan-
a bad gan. In: Advances in neural information process- drasekhar, V.R.: Efficient gan-based anomaly detec-
ing systems. (2017) 6510–6520 tion. arXiv preprint arXiv:1802.06222 (2018)
[54] Neal, L., Olson, M., Fern, X., Wong, W.K., Li, F.: [69] Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.:
Open set learning with counterfactual images. In: Pro- Ganomaly: Semi-supervised anomaly detection via ad-
ceedings of the European Conference on Computer Vi- versarial training. In: Asian conference on computer
sion (ECCV). (2018) 613–628 vision, Springer (2018) 622–637