0% found this document useful (0 votes)
70 views26 pages

A Review of Neural Networks For Anomaly Detection

This paper reviews neural network approaches for anomaly detection. It examines the most relevant neural network outlier detection algorithms, compares their benefits and drawbacks, and evaluates their performance on benchmark datasets. The review identifies autoencoder neural networks, convolutional neural networks, recurrent neural networks, and generative adversarial networks as promising for outlier detection.

Uploaded by

congduy0908
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views26 pages

A Review of Neural Networks For Anomaly Detection

This paper reviews neural network approaches for anomaly detection. It examines the most relevant neural network outlier detection algorithms, compares their benefits and drawbacks, and evaluates their performance on benchmark datasets. The review identifies autoencoder neural networks, convolutional neural networks, recurrent neural networks, and generative adversarial networks as promising for outlier detection.

Uploaded by

congduy0908
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Received 29 September 2022, accepted 15 October 2022, date of publication 19 October 2022, date of current version 31 October 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3216007

A Review of Neural Networks for


Anomaly Detection
JOSÉ EDSON DE ALBUQUERQUE FILHO , LAISLLA C. P. BRANDÃO,
BRUNO JOSÉ TORRES FERNANDES , (Senior Member, IEEE),
AND ALEXANDRE M. A. MACIEL
Escola Politécnica de Pernambuco, Universidade de Pernambuco, Recife 50720-001, Brazil
Corresponding author: José Edson De Albuquerque Filho ([email protected])
This work was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

ABSTRACT Anomaly detection is a critical issue across several academic fields and real-world applications.
Artificial neural networks have been proposed to detect anomalies from different input types, but there
is no clear guide to deciding which model to use in a specific case. Therefore, this study examines the
most relevant Neural Network Outlier Detection algorithms in the literature, compares their benefits and
drawbacks in some application scenarios, and displays their outcomes in benchmark datasets. The initial
search revealed 1422 papers on projects completed between 2017 and 2021. These papers were further
narrowed based on title, abstract, quality assessment, inclusion, and exclusion criteria, remaining 76 articles.
Finally, we reviewed these publications and verified that Autoencoder Neural Network, Convolutional Neural
Network, Recurrent Neural Network, and Generative Adversarial Network have promisor outcomes for
outlier detection, the advantages of these neural networks for outlier detection, and the significant challenges
of outlier detection strategies.

INDEX TERMS Anomaly detection, neural networks, outlier detection, systematic review.

I. INTRODUCTION
In data mining, anomaly detection or (outliers detecion)
means identifying arouse suspicion samples because they
differ significantly from most data, as shown in the
Figure 1.
It is not easy to solve the problem of detecting anomalies
in a wide sense.The majority of known anomaly detection
approaches address a specific problem formulation. Several
aspects influence the formulation, including the nature of
the data, the availability of known anomalies in the training
data, the sort of abnormalities to be found, and so on. The
application domain in which the anomaly must be discovered
determines these characteristics usually. Researchers have
adopted concepts from various disciplines, such as statistics,
machine learning, computational intelligence, data mining,
and information theory, among others, and applied them in
formulating specific cases. FIGURE 1. A whale is an anomaly in a herd of elephants.

Several factors make this seemingly simple problem com-


The associate editor coordinating the review of this manuscript and plex: defining an embracing edge of all normal behavior
approving it for publication was Fahmi Khalifa . is very hard because the boundary between ordinary and
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
112342 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 10, 2022
J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

While the latest surveys present general machine learning


for anomaly detection, this work differs from them because
it presents comprehensive research on anomaly detection
through machine learning techniques focusing on neural net-
works and covering the period from 2017 to 2021
Our review did not identify any other survey covering
neural networks for anomaly detection. Therefore, this review
differs from the latest surveys because it presents comprehen-
sive research on anomaly detection through machine learning
techniques focusing on neural networks and covering the
period from 2017 to 2021. This paper’s contributions aim
to provide a comprehensive review of neural networks for
anomaly detection and a guide to deciding which model to
use in a specific case.
The remainder of this paper is structured as follows.
Section II details this survey protocol and reveals the research
questions. Section III introduces some important concepts for
a better understanding. Section IV show some related works.
Section V presents survey results. Finally, Section VI shows
the conclusion and future works.
FIGURE 2. This diagram illustrates the structure of this review.

II. REVIEW PROTOCOL


The focus of this survey is publications of primary research
anomalous behavior is blurred and unclear. For that reason, that address issues related to anomaly detection. In addition,
an anomalous instance that approaches the limit might be the search is for works published in international scientific
considered normal, and vice versa. journals during last five years and that employ machine learn-
When anomalies results from malicious actions, the action ing methods and tools for their purposes especially those that
creators try to make the anomalous observations look nor- use neural networks. This survey follows initial steps from
mal, making the mission of defining normal behavior even the protocol proposed by Kitchenham [2], [3]. A systematic
more difficult. In many domains the normal behavior con- review of the literature provides a means for evaluating and
tinues to evolve and a notion of what normal behavior interpreting available research that is pertinent to a specific
would be in the present may not be representative in the subject area.
future [1]. According to kitchenham [2], the systematic review has
An outlier is differently defined according to the three phases: planning, conducting, and reporting the study.
application-specific domain. For example, a tiny variation This study reflects a very similar structure:
from standard, such as a changing body temperature, can be 1) Identification of research questions (Section II-B)
deemed an anomaly in the medical sector. On the other hand, 2) Development of search parameters and exclusion and
a similar deviation in the stock market can be considered inclusion criteria (Section II-C)
typical. As a result, transferring a technique created for one 3) Search Results (Section V)
domain to another is not straightforward. For the best didactic purpose, Figure 3 expands the adopted
Usually the availability of sets of data with anomalies research methodology for this systematic review.
necessary for training and validation of detection models is
small, which makes it more complex to define the abnormal A. CONTRIBUTION
behavior. Furthermore, data often include noise that are close Most of the current literature on anomaly detection is focused
to true anomalies, making it difficult to recognize and delete on specific applications or a specific area of research [4]. This
them. work contributes with an overview of different research areas
Neural networks stand out compared to classical and applications. In addition, it highlights the richness and
approaches in context with complex behavioral or time- complexity associated with each application domain to guide
varying data. For this reason, the main objective of this study the decision of which model to use in a specific case.
is to conduct a review of the embracing literature on anomaly
detection and its applications using neural networks. Thereby, B. DEFINITION OF RESEARCH QUESTIONS
readers will have a better understanding of the different The aim is to systematically collect, evaluate and interpret all
detection techniques involving neural networks. published studies relevant to the predefined research ques-
The structure of this review has five sections: Introduction, tions in order to provide consolidated information, list the
Review Protocol, Background, Results and Discussion anf benefits of certain approaches and to find a research gap that
finally Results and Future Works, as shown in Figure 2. can be filled through investigation. [3].

VOLUME 10, 2022 112343


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 2. Selected review sources.

keywords with some synonyms, according criteria in Table 3


and quality assessment in Table 4.

TABLE 3. List of all inclusion and exclusion criteria.

FIGURE 3. Protocol steps.

TABLE 1. Primary query search parameters.


D. QUALITY ASSESSMENT
To assess the compliance of the papers with the research
questions, a questionnaire with eight questions was applied.
There are three possible answers: Yes (Y), Partially (P) and
No (N) with specific weights for each question.

TABLE 4. Quality assessment rules.

Research questions:
1) What kind of Neural Networks are used to outlier detec-
tion? (Section V-C)
2) What are Neural Networks strengths for outlier detec-
tion? (Section V-D)
3) What are the current challenges of outlier detection
techniques? (Section V-E)

C. SEARCH CRITERIA
This study aims to identify relevant articles on anomaly
detection using neural networks. Therefore we select primary
papers based on keywords, search period and, inclusion and
exclusion criteria. The main research keywords included are III. BACKGROUND
described below in the Table 1. In this section, we will define some important concepts for a
The search query was stated by fields and purpose applica- better understanding of the article, highlighting the concepts
tions and limited to the last 5 years. This work was conducted of anomalies and neural networks.
in the leading digital libraries, as shown in Table 2.
The research exposed a large volume of literature, includ- A. OUTLIERS
ing journal publications, conference annals and many other From the beginning of the research, we encountered several
published materials as shown in Table 5. All digital repos- definitions of what an anomaly (outlier) would be. Here are
itories included were manually searched using predefined some of them:

112344 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

• Because the stochastic model does not create it, an out- When various and contradictory copies of the same data
lier is thought to be partially or wholly unimportant [5]. emerge in separate places, it is called inconsistent data.
• An outlier, or external observation, seems to diverge In the area of data mining, an inconsistent data is one
significantly from the other group members in which it whose value is outside the domain of the attribute or has a
originates [6]. large discrepancy regarding other data. Common examples
• An outlier is an observation that differs so significantly of inconsistency occur when considering different states of
from the rest of the data that it raises concerns that it was measurement or notation, such as weights given in kilo-
generated by a distinct process [7]. grams (kg) or pounds (¿) and distances given in meters or
• An observation (or selection of observations) looks out kilometers [17].
of place in this dataset [8]. Noise has different meanings depending on the context. For
• Outliers are points with lower local density in compari- example, in videos, a noise is that drizzle in the image, and in
son with the density of their neighborhood [9]. radio, it is that interference in the audio signal. Nonetheless,
• Outliers are points that do not belong to the dataset or the notion of noise in data mining is closer to the concept
are subgroups that are considerably smaller than other of noise in statistics (unexplained variations in a sample)
subgroups [10]. and signal processing (unwanted and usually inexplicable
• Outliers instances are not well generated in the output variations in a signal). A noisy data is one that has some
layer and have a significant reconstruction error [11]. variation from its noiseless value, and therefore, noise in
• If the region’s density where the data instance is located the database could lead to inconsistencies. Depending on the
is considerably less or greater than the neighboring clus- noise level, it is not always possible to know whether or not
ters, they are termed outliers [12]. it is present in the data [17].
• A point is considered an outlier if in some projection of
a smaller dimension it is present in a local region of very B. ANOMALY DETECTION
low density [13]. Detecting anomalies (or outliers) is the task of identify-
• An outlier is a data that is very different from the others, ing strange data in comparison with others. Anomalies are
or that does not conform to expected normal behavior, not necessarily wrong data: they are just spots that stand
or that conforms to defined abnormal behavior [14]. out among the general population. There is an enormous
This demonstrates how complex it is to provide a precise practical applicability to this problem, from detecting fail-
definition of what an outlier is. In this work, we will use a ures to discovering financial fraud, from finding health
broader definition: anomalies are those exceptional data that problems to identifying dissatisfied customers. According
deviate from the general pattern. Chandola et al. [1], There are three main types of anomalies:
It is also necessary to consider some inherent factors to • Isolated anomalies: The isolated anomaly (or point
understand the anomalies that may indicate error, failure, anomaly) is an observation point in the dataset that is
fraud or intrusion. For instance, noise promotes a change in far away from the rest of the data.
the value of a data regarding its value without noise, it has no • Contextual anomalies: A contextual anomaly is an
real meaning, but it hinders the analysis. observation that would be regular in one setting, but
According to some studies, there are weak and strong abnormal in another.
outliers [15], [16]. Data noise detection offers a wide range • Collective anomalies: It is necessary to analyze a
of applications. The reduction of noise, for example, results sequence in order to determine an anomalous behavior.
in significantly cleaner data collection that other data mining
One can divide the anomaly detection techniques accord-
methods may use. Although noise is not particularly inter-
ing to their ways of learning, as follows:
esting in and of itself, its removal and detection remain a
significant concern in the mining industry. As a result, both • Supervised anomaly detection: In this class, both nor-
noise and anomaly detection are critical issues that must mal and anomalous data are known. The goal is to
be addressed. Throughout the process, methods specific to build a prediction model for both anomaly and normal
anomaly detection or noise removal will be identified. The classes [18].
majority of outlier detection techniques, on the other hand, • Semi-supervised anomaly detection: In this type of
could be utilized for either situation because the distinction is algorithm, the training only includes ordinary data.
purely conceptual [15]. Anything not classified that way is tagged as an
Anomalies must be correlated to inconsistency, noise and anomaly [19].
incompleteness. The incompleteness of a database can occur • Unsupervised anomaly detection: There is no require-
in several ways. For example, values of a given attribute may ment for training in this scenario. This type of algorithm
be missing, an attribute of interest may be missing or an assumes that normal instances are much more common
object of interest may be missing. However, the absence of than anomalies. However, if this assumption fails, the
an attribute or object is not always noticed, unless an expert algorithm produces a high rate of false positives [20].
in the problem domain analyzes the database and realizes the Many of the semi-supervised techniques can be adapted
lack of data [17]. to operate in unsupervised mode using unlabeled data for

VOLUME 10, 2022 112345


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

training. Such adaptation presumes that there are few but D. METRICS
robust anomalies in the test data. This section summarizes the most used metrics for
machine learning model comparison. The selected papers
C. NEURAL NETWORKS widely employ Accuracy, Sensitivity, Specificity, Precision,
A neural network(NN) is a biologically inspired algorithm F-measure, AUC-ROC, and AUC-PR.
to learn from data. They are function approximators, particu-
larly useful in Reinforcement Learning when the state space 1) ACCURACY
or action space is too large to be known. Their structure has a it is the result of correct classifications divided by all classi-
series of layers, each one composed of one or more neurons. fications, as in (1). It is the most straightforward and widely
Each neuron produces an output, or activation, based on the used metric to measure the performance of a classifier. It is
outputs of the previous layer and a set of weights [21]. This how close a given set of outcomes are to their actual value.
paper highlights some of them. However, accuracy is not always a good metric, especially
when imbalanced data. The fundamental problem is that
1) RECURRENT NEURAL NETWORKS (RNN) when the negative class is dominant, we can achieve high
This is a sort of artificial neural network that recognizes accuracy merely so long as we predict negative most of the
patterns in data sequences such as text, genomes, handwrit- time [26].
ing, spoken word, or data from sensors, stock markets, and
government agencies. RNN is a class of neural networks that 2) SENSITIVITY
includes weighted connections within a layer (compared to (recall, hit rate, or true positive rate) and Specificity (selectiv-
traditional feed-forward networks, where it connects only to ity or true negative rate) mathematically describe the accuracy
subsequent layers) [22]. of a test that reports the presence or absence of a condition.
Individuals for which the condition is satisfied are considered
positive, and those for which it is not are deemed negative.
2) AUTOENCODER NEURAL NETWORKS (AE)
Sensitivity, as in (2), refers to the probability of a positive
Autoencoders are neural networks to replicate their input test, conditioned on truly being positive. Specificity, as in (3),
into their output. Then, they compress the data into a latent refers to the likelihood of a negative test, conditioned on truly
representation space, from which they rebuild the output. The being negative [27].
variational encoding distribution is regularized during the
training phase. The goal is to ensure that its latent space has
3) PRECISION@K
good properties, which allows us to generate new data [23].
is defined as the proportion of true anomalies in a ranked
list of K objects, as in (4). We obtain the ranking list in
3) GENERATIVE ADVERSARIAL NETWORKS (GAN) descending order according to the anomaly scores computed
These are adversarial deep neural network designs, which from a specific anomaly detection algorithm. Precision is how
are made up of two networks that are pitted against one close or dispersed the measurements are to each other [28].
other. It means the model is up against a formidable foe:
a discriminative model that learns to tell whether a sample TP + TN
Accuracy = (1)
comes from the model or data distribution. In this type of TP + TN + FP + FN
algorithm, the training only includes ordinary data. Anything TP
Sensitivity(recall) = (2)
not classified that way is tagged as an anomaly [24]. TP + FN
TN
Specificity = (3)
4) DEEP LEARNING TN + FP
Also known as deep structured learning, is part of a family of TP
Precision = (4)
learning methods in artificial neural networks with represen- TP + FP
tational learning. Learning can be unsupervised, supervised, where TP = True positive; FP = False positive; TN = True
or semi-supervised. The deep term alludes to the network’s negative; FN = False negative.
employment of numerous layers. For example, early research
demonstrated that while a perceptron cannot be a univer- 4) F-MEASURE
sal classifier, a network with a non-polynomial activation Precision is also used with recall. The two measures are
function and an unbounded width hidden layer may. Deep sometimes used together to provide a single measurement,
learning is a recent form that involves an unlimited number as in (5). The F-score, F-measure, F1, or F1-Score is precision
of bounded size layers, allowing for practical application and and recall harmonic mean [27].
optimization while maintaining theoretical universality under
mild conditions. [25]. This paper employs deep learning to Precision × Sensitivity
F1 = 2 × (5)
designate neural networks with many hidden layers. Precision + Sensitivity

112346 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

5) AUC-ROC TABLE 5. Imported studies by source.

The area under the curve (AUC) is the probability that a


randomly chosen anomaly receives a higher score than a
randomly chosen ordinary object. It is interpreted as the
probability that a randomly chosen anomaly gets a higher
score than a randomly chosen ordinary object [28].
6) AUC-PR
It is the area under the curve of precision against recall at
different thresholds, and it only evaluates the performance on
abnormal samples: the positive class. AUC-PR is computed
as the average precision [28].
IV. RELATED WORKS
Conventional detection approaches rely on statistical meth-
ods and spatial thresholds and therefore cannot deal with the
complex and dynamic nature of anomalies. Recent advances
in artificial intelligence using neural network approaches
allow the detection of anomalies in more complex typologies
because they are able to consider temporal and contextual
characteristics of the data.
We found 168 surveys or review papers for anomaly detec-
tion in the last five years. Still, only 19 of them are general
(not for a specific purpose), only 4 of these papers focus on
FIGURE 4. Accepted papers (It shows relevant growth in last years).
neural networks, and only one is for machine learning general
purposes:
1) Survey on Applying GAN for Anomaly Detection [29]. as Intrusion detection, network anomaly detection, internet of
2) Deep Learning for Anomaly Detection: A Review [30]. things, image or video processing, crowded scenes, network
3) A Unifying Review of Deep and Shallow Anomaly security or data streams. Theses kind of jobs were rejected.
Detection [31] Figure 4 shows accepted papers publication by year.
4) Machine Learning for Anomaly Detection: A System-
atic Review [4] A. ACCEPTED PAPERS
Other existing research on anomaly detection techniques, The following is an extensive overview of all the selected
such as [32] and [33] only consider neural network-based works with highlights of the main characteristics of each.
approaches superficially. Current research developments on 1) Towards explaining anomalies: A deep Taylor
neural networks for detecting many anomalies are not incor- decomposition of one-class models: The method pro-
porated. posed an approach for one-class SVMs (OC-SVM),
This review is different from those described above based on the new insight that these models could be
because it presents an extensive research study on anomaly rewritten as distance/pooling neural networks. The step
detection through machine learning techniques focusing on of neuralization allows the application of deep Taylor
neural networks and covering the period from 2017 to 2021, decomposition (DTD) and, besides that, the method is
a period for which, as far as it is known, there is no systematic applicable to different common distance-based kernel
review yet. Besides [29] focus only on GAN, [30] is not functions [34].
for others neural network, [31] investigates the deepness of 2) LRGAN: Visual anomaly detection using GAN with
networks and [4] is shallow on neural networks. locality-preferred recoding: To partly avoid latent
vectors of normal samples being recoded, the authors
V. RESULTS AND DISCUSSION presented an improved model using GAN with an adap-
In this section, the reported results are based on the developed tive locality-preferred recoding LR (ALR) module,
protocol. Each subsection explains about the domain with named LRGAN+. The ALR module applies the clus-
a developed set of machine learning techniques. Table 5 tering algorithm to generate a more compact codebook
illustrates the overall result of the initial search. They include and particularly it helps LRGAN+ to automatically
journal and conference papers. skip the LR module for possible normal samples with
Many anomaly detection methods have been created for a threshold strategy [35].
specific applications, while others are more general. So, sev- 3) Multi-Scale One-Class Recurrent Neural Networks
eral papers were mainly focused on detecting anomalies in for Discrete Event Sequence Anomaly Detection:
specific domains of applications or in specific data type, such the authors proposed OC4Seq, a multi-scale one-class

VOLUME 10, 2022 112347


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

recurrent neural network for detecting anomalies in used to accomplish a multi-dimensional time series
discrete event sequences, with the recurrent neural net- analysis [42].
works (RNNs) embedding the discrete event sequences 10) Unsupervised anomaly detection of industrial
into latent spaces, where anomalies can be easily robots using sliding-window convolutional varia-
detected. Besides, they design a multi-scale RNN tional autoencoder: In that paper, the authors offered
framework to capture different levels of sequential pat- an unsupervised anomaly detection, which consists of a
terns at the same time [36]. sliding-window convolutional variational autoencoder
4) Error-Bounded Graph Anomaly Loss for GNNs: (SWCVAE) that can detect anomalies in real-time, both
To train Graph Neural Networks(GNNs) for anomaly- spatially and temporally, by dealing with multivariate
detectable node representations, the authors proposed time series data [43].
an alternative loss function. It uses global grouping pat- 11) Model fusion of deep neural networks for anomaly
terns discovered from graph mining methods to assess detection: This paper proposed using class weight opti-
node similarity. It can automatically adjust margins for mization to train deep neural networks to learn com-
minority classes based on data distribution [37]. plex patterns from rare anomalies observed in network
5) Imbalanced dataset-based echo state networks for traffic data. This original model fusion combined two
anomaly detection: The traditional echo state network deep neural networks: a binary normal/attack DNN for
(ESN), a brain-inspired neural computing model, was detecting the availability of any attack and a multi-
used in this approach. The error between the input data attack DNN for categorizing the attacks [44].
and the output is smaller when normal data is given 12) GP-ELM-RNN: Garson-pruned extreme learning
to the well-trained network than when abnormal input machine based replicator neural network for
data is added to the well-trained network. Then, if the anomaly detection: For anomaly detection, the author
difference between the input data and the predicted offered an optimal Replicator Neural Network (RNN)
value exceeds a specific threshold, anomalous behavior that is optimized using Extreme Learning Machines
is detected [38]. (ELM) learning and the Garson method. The difficulty
6) Integration of deep feature extraction and ensem- of calculating the number of hidden layers is solved by
ble learning for outlier detection: In this article, the ELM-based RNNs, while the challenge of identifying
authors employed stacked autoencoders to extract fea- the optimal number of neurons in the hidden layer is
tures and then an ensemble of probabilistic neural net- solved by the Garson method [45].
works to do majority voting and find outliers, demon- 13) Outlier exposure with confidence control for out-
strating that using autoencoders significantly improved of-distribution detection: the authors suggested Out-
outlier identification performance [39]. lier Exposure with Confidence Control (OECC),
7) Deep anomaly detection with self-supervised learn- a novel method for detecting out-of-distribution (OOD)
ing and adversarial training: The proposed work was with OE on image and text classification tasks. the
a deep adversarial anomaly detection (DAAD) method, authors further demonstrate that combining OECC with
in which an auxiliary task with self-supervised learning cutting-edge post-training OOD detection approaches
is designed first to learn task-specific features, and then like the Mahalanobis Detector (MD) and Gramian
a deep adversarial training (DAT) model is built to Matrices (GM) increases performance [46].
capture marginal distributions of normal data in various 14) Deep compact polyhedral conic classifier for open
spaces. In addition, to acquire trustworthy detection and closed set recognition: In this paper, they pro-
findings, a majority voting approach is used [40]. posed a new deep neural network classifier that uses the
8) Deep Learning for Anomaly Detection: the authors polyhedral conic classification function to maximize
looked at the state-of-the-art deep learning models, inter-class separation while minimizing intra-class
from building block neural network structures like variance. There are two loss terms in the approach, one
MLP, CNN, and LSTM to more complex structures like for each task (maximize and minimize) [47].
autoencoder, generative models (VAE, GAN, Flow- 15) Multilayer one-class extreme learning machine: A
based models), and deep one-class detection models, multilayer neural network based one-class extreme
and show how transfer learning and reinforcement learning machine (OC-ELM) (in short, as ML-
learning can help correct label scatter problems [41]. OCELM) was developed in this paper. ML-OCELM
9) A machine-learning phase classification scheme for uses stacked autoencoders (AE) to leverage an effective
anomaly detection in signals with periodic charac- feature representation for complex data [48].
teristics: An novel machine-learning method is pro- 16) Deep convolutional clustering-based time series
posed here for data with periodic features that explicitly anomaly detection: This paper presented an origi-
allows for randomly variable period lengths. Train- nal approach that relies on unlabeled data and uses a
ing a data-adapted classifier comprised of deep con- 1D-convolutional neural network-based deep autoen-
volutional neural networks for phase classification is coder architecture. the authors divide the autoencoder

112348 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

latent space into discriminative and reconstructive 22) HIFI: Anomaly Detection for Multivariate Time
latent features, then use a Top-K objective to sepa- Series with High-order Feature Interactions: The
rate the latent space and apply an auxiliary loss based authors offered a novel anomaly detection approach
on k-means clustering for the discriminatory latent for multivariate time series with HIgh-order Fea-
variables [49]. ture Interactions (HIFI). It created multivariate feature
17) On the Usage of Generative Models for Network interaction graph and used the graph convolutional
Anomaly Detection in Multivariate Time-Series: the neural network to achieve high-order feature interac-
authors presented Net-GAN, a new method for detect- tions, modeling the long-term temporal dependencies
ing network anomalies in time series that employs by attention mechanisms and using a variational encod-
recurrent neural networks (RNNs) and generative ing technique to improve the model [55].
adversarial networks (GAN). They also explore the 23) Activation Anomaly Analysis: The suggested method
concepts behind generative models to conceive Net- demonstrated that hidden activation values contain
VAE, a complementary approach to Net-GAN, based information that can be used to differentiate between
on variational auto-encoders (VAE) [50]. normal and abnormal samples. In a purely data-driven
18) MAMA Net: Multi-Scale Attention Memory end-to-end model, three neural networks were com-
Autoencoder Network for Anomaly Detection: The bined in the approach. The alarm network determined
authors proposeed a Multi-scale Attention Memory if a particular sample is normal based on the activation
with hash addressing Autoencoder network (MAMA values in the target network. [56]
Net). the authors designed a hash addressing mem- 24) TadGAN: Time Series Anomaly Detection Using
ory module that explores abnormalities to produce Generative Adversarial Networks: This article pre-
greater reconstruction error for classification, and sented TadGAN, an unsupervised anomaly detection
they connected the mean square error (MSE) with method based on Generative Adversarial Networks.
Wasserstein loss to improve the encoding data dis- The authors employed LSTM Recurrent Neural Net-
tribution, and created the multi-scale global spatial works as basic models for Generators and Critics to
attention block, which could be attached to any net- capture the temporal correlations of time series dis-
work as sampling, upsampling, and downsampling tributions. To enable effective time-series data recon-
functions [51]. struction, TadGAN was trained with cycle consistency
19) A novel multivariate time-series anomaly detection loss [57].
approach using an unsupervised deep neural net- 25) TAnoGAN: Time Series Anomaly Detection with
work: In this paper, the authors proposed a multilayer Generative Adversarial Networks: In this work, the
convolutional recurrent autoencoded anomaly detector authors suggested the method TAnoGan, an unsuper-
(MCRAAD), which was an unsupervised deep learning vised method based on Generative Adversarial Net-
method. They used the data in the sliding window works (GAN) for detecting anomalies in time series
to calculate the feature matrix sequence, a multilayer when just a few data points are available [58].
convolutional encoder to extract the feature matrix 26) A Transfer Learning Framework for Anomaly
sequence’s characteristics, several ConvLSTM units to Detection Using Model of Normality: Deep features
obtain the feature matrix’s time relations, and a con- could be extracted using a Convolutional Neural Net-
volutional decoder to reconstruct the feature matrix work (CNN), and anomalies could be detected using a
sequence [52]. comparable measure between extracted features and a
20) Hybrid discriminator with correlative autoencoder defined model of normality. Here, the authors presented
for anomaly detection: For anomaly detection, the a method for determining the decision threshold that
authors suggested a hybrid discriminator with a cor- improves detection accuracy and a transfer learning
relative autoencoder. The discriminator estimates the framework for anomaly detection based on a Model of
conditional probability density function in the sug- Normality (MoN) similarity measure [59].
gested framework, while the autoencoder enhances the 27) Novelty Detection Through Model-Based Charac-
capacity to manage the reconstruction error. [53] terization of Neural Networks: The authors proposed
21) Unsupervised Boosting-Based Autoencoder Ensem- a model-based characterization of neural networks to
bles for Outlier Detection:The authors created detect new input types and conditions. According to
the Boosting-based Autoencoder Ensemble approach them, most existing research has focused on activation-
(BAE). It was an unsupervised ensemble method. This based representations to detect abnormal inputs, and
method constructs an adaptive cascade of autoencoders back-propagated gradients have been used to formulate
to get better outcomes, very close to boosting. It trains the significance of the perspective in novelty detec-
the autoencoder components in sequence by perform- tion [60].
ing a weighted sampling of the data, intended to reduce 28) USAD: UnSupervised Anomaly Detection on Mul-
the number of anomalies during training and to insert tivariate Time Series: The authors proposed a rapid
diversity in the ensemble. [54] and stable technique based on adversely trained

VOLUME 10, 2022 112349


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

autoencoders and named UnSupervised Anomaly a dispersion measure upper bound based on a decen-
Detection for Multivariate Time Series (USAD). The tralization loss term. When the given test image is
architecture of the autoencoder made it able to learn unknown, the reconstruction error increases [66].
unsupervised and using adversarial training and its 34) Deep multi-sphere support vector data description:
architecture allows it to isolate anomalies while giving In this paper, the authors offered Deep Multi-sphere
fast training [61]. Support Vector Data Description, which optimized
29) Unsupervised anomaly detection with LSTM neu- both the deep network and anomaly detection algo-
ral networks: Given variable-length data sequences, rithms’ goals by combining standard data with a
the authors first passed them through a long short- multimodal distribution into multiple data enclosing
term memory (LSTM) neural network-based structure hyperspheres with minimum volume provides valuable
and obtained fixed-length sequences. Then they found and discriminative features [67].
a decision function based on the one-class support 35) The Elliptical Basis Function Data Descriptor
vector machines (OC-SVMs) and support vector data (EBFDD) Network: A One-Class Classification
description (SVDD) algorithms. Finally, they trained Approach to Anomaly Detection: The paper intro-
and optimized all the parameters using highly effective duced the Elliptical Basis Function Data Descriptor
gradient and quadratic programming-based training (EBFDD) network, based on Radial Basis Function
methods. It was also possible to apply this approach to (RBF) neural networks, as a one-class classification
the gated recurrent unit (GRU) architecture by replac- method for anomaly detection. The EBFDD network
ing the LSTM-based structure with the GRU-based makes use of elliptical basis functions to learn complex
structure [62]. decision boundaries while maintaining the benefits of
30) Deep Active Learning for Anomaly Detection: In this a shallow network [68].
work, the authors presented a new layer that could be 36) Anomaly Detection by Learning Dynamics from a
attached to any deep learning unsupervised anomaly Graph: The authors offered a method for learning
detection model to turn it into an active method. Spatio-temporal properties called dynamics to forecast
They showed results using Multi-layer Perceptrons and the evolution of graphs. It included two steps: extract-
Autoencoder architectures improved with the proposed ing spatial features from static graphs from various
active layer [63]. times and learning temporal features from the time-
31) DeepAlign: Alignment-Based Process Anomaly varying structure. They identified the dynamic anomaly
Correction Using Recurrent Neural Networks: by predicting the affinity score for a node in a dynamics
DeepAlign was a new approach based on recurrent neu- graph rather than predicting overall changes [69].
ral networks and bidirectional beam search. It had two 37) A deep reinforcement learning based homeostatic
recurrent neural networks, one that reads sequences of system for unmanned position control: This research
process executions from left to right, while the other proposed a novel bio-inspired homeostatic technique
reads the sequences from right to left. By combin- based on a Receptor Density Algorithm (an artificial
ing them, the authors showed that it is possible to immune system-based anomaly detection application)
calculate sequence alignments to detect and correct and a Plastic Spiking Neuronal model to capture the
anomalies [64]. randomness of the environment. Deep Reinforcement
32) Arcade: A rapid continual anomaly detector: This Learning (DRL) was used in conjunction with the
study addressed a situation in which the only exam- hybrid model described above. [70]
ples given for training was from the regular class. The 38) Deep Autoencoders with Value-at-Risk Thresh-
authors characterized it as a meta-learning issue and olding for Unsupervised Anomaly Detection: The
defined it as a continuous anomaly detection (CAD) authors presented an incremental learning strategy
learning problem. As a result, they offered A Rapid in which the regular data deep autoencoding (DAE)
Continual Anomaly Detector (ARCADe), a method model is learned and utilized to identify anomalies.
for training neural networks to be robust to the main They applied a unique thresholding approach based
challenges of this novel learning problem, such as on the value at risk (VaR) for detecting them, then
catastrophic forgetting and overfitting to the majority compared the resulting convolutional neural network
class [65]. (CNN) against various subspace methods [71].
33) Latent feature decentralization loss for one-class 39) Computation of person re-identification using
anomaly detection: The suggested method aims to self-learning anomaly detection framework in
disseminate the encoder’s latent feature over multiple deep-learning: This paper proposed a self-Learning
spaces, allowing it to generate images comparable to anomaly detection application. To begin, it got meta-
the standard class for any input. So, a decentralization data from the unsupervised data clustering module
term based on the dispersion measure for latent vec- (DCM), which analyzes the pattern of monitoring data
tors is also added to the existing mean-squared error and could find unforeseen anomalies by enabling self-
loss. The authors limited the latent space by creating learning. The DCM’s pattern is then transferred to a

112350 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

supervised data regression and classification module explored and analyzed in this paper. For the actual and
(DRCM) [72]. reduced datasets, the classifiers Decision Tree (DT),
40) Learning Deep Features for One-Class Classifica- Support Vector Machine (SVM), K Nearest Neigh-
tion: The authors described a new deep-learning-based bor (KNN), and Naive Bayes (NB) were also inves-
approach for one-class transfer learning in one-class tigated. In addition, new performance measurement
categorization. It provided descriptive features while metrics, such as the Classification Difference Measure
preserving a low intra-class variance in the feature (CDM), Specificity Difference Measure (SPDM), Sen-
space for the given class by operating on top of a sitivity Difference Measure (SNDM), and F1 Differ-
convolutional neural network (CNN) of choice. Two ence Measure (F1DM), have been defined and were
loss functions (compactness and descriptiveness) and being used to compare the outcomes on actual and
a parallel CNN architecture were proposed to achieve reduced datasets [78].
this goal [73]. 46) MAD-GAN: Multivariate Anomaly Detection for
41) Performance Analysis of Out-of-Distribution Detec- Time Series Data with Generative Adversarial Net-
tion on Various Trained Neural Networks: When works: In this work, the authors proposed an unsu-
Deep Neural Networks (DNN) were exposed to previ- pervised multivariate method based on Generative
ously unseen out-of-distribution samples, the authors Adversarial Networks (GANs), using the Long-Short-
faced a typical difficulty. Here, they focused on two Term-Memory Recurrent Neural Networks (LSTM-
supervisors on two well-known DNNs with different RNN) as the base models in the GAN framework to
training configurations and found that the quality of the capture the temporal correlation of time series distribu-
training technique increases the outlier identification tions. The Multivariate Anomaly Detection with GAN
performance [74]. (MAD-GAN) framework considers the entire set of
42) Sequential anomaly detection using inverse rein- variables simultaneously to capture latent interactions
forcement learning: The authors proposed an inverse between variables. They also use a new anomaly score
reinforcement learning (IRL) framework for sequential called DR-score to detect anomalies through discrimi-
anomaly detection and a Bayesian technique for IRL. nation and reconstruction [79].
The approach took the series of actions of a target agent 47) Outlier detection for time series with recurrent
as input, and the reward function inferred by IRL then autoencoder ensembles: The authors proposed two
understood the agent’s normal behavior. It represented methods for detecting outliers in time series using
a reward function using a neural network and analyzed recurrent autoencoder ensembles. The autoencoders
whether a new observation from the target agent fol- are made up of sparsely-connected recurrent neural
lows a regular pattern [75]. networks (S-RNNs). These methods permitted several
43) Cyclostationary statistical models and algorithms autoencoders to be created with varied neural network
for anomaly detection using multi-modal data:Using connection architectures. This ensemble-based tech-
a Deep Neural Network-based object detector to extract nique improved overall detection quality by limiting
counts of objects and sub-events from the data was the consequences of overfitting some autoencoders to
offered as a framework for detecting anomalies in outliers [80].
multimodal data. In order to model regular patterns 48) An Approximate Bayesian Long Short-Term Mem-
of behavior in count sequences, a cyclo-stationary ory Algorithm for Outlier Detection: In this study,
model was developed. The anomaly detection chal- the authors presented an Ensemble Kalman Filter-
lenge is defined as finding deviations from learned based approximate estimation of weights uncertainty,
cyclo-stationary behavior [76]. which was easily scalable to a high number of weights.
44) Concept learning through deep reinforcement Besides, they optimized the covariance of the noise dis-
learning with memory-augmented neural networks: tribution in the ensemble update step using maximum
The authors presented a memory-augmented neural likelihood estimation [81].
network inspired by the human concept learning pro- 49) DeepAnT: A Deep Learning Approach for Unsu-
cess. The training teaches how to discriminate between pervised Anomaly Detection in Time Series: The
samples from various classes and group examples authors described a new deep learning-based time
of the same type together. In addition, they sug- series data technique (DeepAnT). It learned the data
gested a sequential procedure in which the network distribution used to forecast the expected behavior of a
should determine how to recall each sample at each time series using unlabeled data. DeepAnT consists of
stage [77]. two modules. First, the predictor module projects the
45) A study of feature reduction techniques and clas- next time stamp on the defined horizon using a deep
sification for network anomaly detection: Principal convolutional neural network (CNN). The predicted
Component Analysis (PCA), Artificial Neural Network value is then provided to the anomaly detector module,
(ANN), and Nonlinear Principal Component Analy- which assigns a normal or abnormal label to the time
sis (NLPCA) are the three reduction methodologies stamp [82].

VOLUME 10, 2022 112351


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

50) Outlier detection with autoencoder ensembles: For for Deep Learning (DL) based applications. The
unsupervised anomaly detection, in this paper the authors provided a classification for existing tech-
authors introduced autoencoder ensembles. The main niques based on their underlying assumptions and
idea was to change the autoencoder’s connectivity adopted approaches. They discussed different tech-
design at random to improve performance substantially. niques in each of the categories and provide their rela-
They also integrated this methodology with an adap- tive strengths and weaknesses [88].
tive sampling strategy to improve the efficiency of the 56) Deep Autoencoders and Feedforward Networks
approach [83]. Based on a New Regularization for Anomaly Detec-
51) Training autoencoder using three different reversed tion: Here, the authors overviewed two architectures
color models for anomaly detection: This paper intro- that push the limits of model accuracy for anomaly
duced Autoencoders (AE) as an anomaly detector. The detection and intrusion classification, the Feedforward
suggested AE was built on a convolutional neural net- Neural Network (FNN) and Variational Autoencoder
work with three different color models: Hue Saturation (VAE). They provided an overview of the architecture
Value (HSV), Red Green Blue (RGB), and their own model, including training approaches, hyperparame-
model (TUV), and it is trained using both normal and ters, regularization, and other aspects. Furthermore,
anomalous data. The trained AE reconstructs normal they created a new regularization technique based on
images unchanged, while anomalous images are rebuilt the standard deviation of weight values, which they
in the opposite direction [84]. applied to both models [89].
52) Detection of Thin Boundaries between Different 57) Convolutional Neural Network-Based Discrimina-
Types of Anomalies in Outlier Detection Using tor for Outlier Detection: The authors suggested a
Enhanced Neural Networks: the authors defined new method for producing training datasets that useed a
types of anomalies for improving a better detection of small set of reliable data to train the discriminator.
the boundary between different types of anomalies and the authors evaluated the discriminator’s performance
basic methods were introduced to detect these defined using multiple benchmark datasets and noise ratios,
anomalies in supervised and unsupervised datasets. and the authors used a Convolutional Neural Network
The Multi-Layer Perceptron Neural Network (MLP) (CNN) for the noise discriminator. They introduced
was upgraded using the Genetic Algorithm to iden- random noise into each dataset and train discriminators
tify the new defined anomalies with more precision, to clean it up [90].
resulting in a lower test error than the calculated for 58) Online anomaly detection with sparse Gaussian
the conventional MLP [85]. processes: The method of sparse Gaussian processes
53) NADS-RA: Network Anomaly Detection Scheme with Q-function (SGP-Q) is proposed in this study. The
Based on Feature Representation and Data Aug- SGP-Q employs sparse Gaussian processes (SGPs),
mentation: This paper offered a Network Anomaly which have a lower time complexity than Gaussian pro-
Detection Scheme based on feature representa- cesses (GPs), allowing for substantially faster online
tion and data augmentation (NADS-RA). To begin, anomaly detection. If the Q-function was used cor-
the Re-circulation Pixel Permutation approach was rectly, the SGP-Q may adapt well to concept drift,
intended to be used as a feature representation strategy where data properties and anomalous behaviors change
for creating images. Then, an image-based augmen- with time [91].
tation strategy was designed to produce augmented 59) Anomaly detection with inexact labels: The authors
images according to the distribution characteristics of proposed a supervised method for data with inexact
rare anomaly images with the help of Least Squares anomaly labels, where each label indicates that at least
Generative Adversarial Network, which softens the one instance in the set is anomalous. They defined
effect of imbalanced training data. Lastly, NADS-RA the inexact AUC, which is an extension of the Area
was implemented on the Convolutional Neural Net- Under the ROC Curve (AUC), to quantify performance.
work classification model [86]. The strategy improved the smooth approximation of
54) An improved BiGAN based approach for anomaly the inexact AUC while decreasing scores for non-
detection: In this study, the authors implemented Bidi- anomalous occurrences by training an anomaly score
rectional GAN (BiGAN) architecture, considering it function. They used an unsupervised anomaly detec-
as a one-class anomaly detection algorithm. They pre- tion method based on neural networks, such as Autoen-
sented two different training methodologies for BiGAN coders, to model the score function [92].
by adding extra training stages, because the generator 60) Skip-GANomaly: Skip Connected and Adversari-
and discriminator are heavily dependent on each other ally Trained Encoder-Decoder Anomaly Detection:
during the training phase [87]. The authors introduced an unsupervised model, trained
55) Anomalous Example Detection in Deep Learning: only on the non-anomalous samples. The method
A Survey: This review provided a well-organized employed an encoder-decoder Convolutional Neural
and thorough overview of anomaly detection research Network with skip connections to completely capture

112352 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

the multi-scale distribution of the expected data applications like constraint handling, noise reduction,
in image space. Besides, an adversarial training and anomaly detection. This paper presented a method
scheme provides superior reconstruction within image for employing negative learning to limit the net-
space and a lower-dimensional vector space encoding. work’s generative capabilities. For the desired input,
Furthermore, minimizing the reconstruction error met- the approach searched the solution in the gradient direc-
ric during the training helped the model learn the distri- tion and for the undesired input, such as anomalies,
bution of normality, and higher reconstruction metrics in the opposite direction. [98].
indicate an anomaly [93]. 66) Robust, Deep and Inductive Anomaly Detection:
61) Deep One-Class Classification Using Intra-Class This article addressed in a single model both difficulties
Splitting: This paper introduced a generic method that in overcoming PCA’s limitations of being sensitive to
enables to use of conventional deep neural networks input perturbations and searching for a linear subspace
for one-class classification, where only samples of that reflects normal behavior. This method, the robust
one regular class are available for training. During Autoencoder, learned a nonlinear subspace that cap-
inference, a closed and rigid decision border around tures most data points while allowing for random cor-
the training samples is desired, which is unattainable ruption of specific data. Moreover, it was easy to learn
with traditional binary or multi-class neural networks. and takes advantage of recent Deep Neural Network
This method can use a binary loss and creates a sub- optimization breakthroughs [99].
network for distance constraints in the latent space 67) An Anomaly Event Detection Method Based on
by separating data into typical and atypical normal GNN Algorithm for Multi-data Sources: To address
subsets [94]. the known problems of inadaptability of multi-source
62) Anomaly detection based on mining six local data data in anomaly detection, the authors designed a new
features and BP neural network: The authors pre- method based on it in this paper. First, they used a
sented a model that used six local data features as spectral clustering approach to extract features from
the input to a back-propagation (BP) neural network. numerous data sources and combine them. They then
The six mined local data features give subtle insight executed an improved anomaly social event detection,
into local dynamics by describing the local monotonic- showing the threatening events, utilizing the power of
ity, convexity/concavity, inflection property, and peak a Deep Graph Neural Network (Deep-GNN) [100].
distribution of one KPI time series through vectoriza- 68) Few-shot network anomaly detection via cross-
tion description on a normalized dataset. This was in network meta-learning: The authors addressed the
contrast to some traditional statistics data character- few-shot network anomaly detection problem by devel-
istics describing the entire variation situation of one oping Graph Deviation Networks (GDN). It was a
sequence [95]. novel family of neural graph networks that could
63) An Empirical Evaluation of Deep Learning for enforce statistically significant differences between
Network Anomaly Detection: Their preliminary stud- abnormal and normal nodes on a network using a
ies revealed a significant degree of non-linearity in limited number of labeled abnormalities. They also
network connection data. Furthermore, their approach included a new cross-network meta-learning technique
explained why traditional algorithms like Adaboost- in the proposed GDN to enable few-shot network
ing, SVM, and Random Forest struggled to enhance anomaly detection by transferring meta-knowledge
anomaly detection performance. In this research, from auxiliary networks [28].
the authors created and tested deep learning mod- 69) Anomaly Detection of Time Series with Smoothness-
els based on Fully Connected Networks (FCNs), Inducing Sequential Variational Auto-Encoder: The
Variational AutoEncoders (VAE), and Sequence-to- authors introduced a smoothness-inducing Sequential
Sequence (Seq2Seq) structures [96]. Variational Auto-encoder (VAE) (SISVAE) approach
64) Optimized fuzzy min-max neural network: An effi- for multidimensional time series robust estimation and
cient approach for supervised outlier detection: The anomaly detection. The model was based on VAE,
authors modified the Fuzzy min-max Neural Network and for both the generating and inference models,
(FMNbasic)’s architecture to represent learned knowl- it was fulfilled by a Recurrent Neural Network to
edge in a compact, coarse-grained manner similar to capture latent temporal features of time series. They
human thinking. The proposed method was known offered a smoothness-inducing prior over potential
as the fuzzy min-max neural network with knowl- estimations as a regularizer that penalized nonsmooth
edge compaction (FMN-KC), and its potential for reconstructions for achieving robust density estima-
supervised outlier detection was shown using available tion. They used a novel stochastic gradient variational
online datasets [97]. Bayes estimator to learn their model efficiently, and
65) Limiting the reconstruction capability of generative they investigated two decision criteria for anomaly
neural network using negative learning: Generative detection: reconstruction probability and reconstruc-
models with only a single input type are beneficial for tion error [101].

VOLUME 10, 2022 112353


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

70) Deep End-to-End One-Class Classifier: The authors


presented an adversarial training strategy for detecting
out-of-distribution samples in an end-to-end trainable
deep model. They achieved this by combining the train-
ing of two Deep Neural Networks (R and D). D serves
as the discriminator, while R assists D in defining a
probability distribution for the target class by providing
adversarial instances during training and collaborates
with it during testing to discover anomalies [102].
71) A survey of deep learning-based network anomaly
detection: In this paper, the authors gave an overview
of Deep Learning methodologies, such as restricted
Boltzmann machine-based deep belief networks, Deep
FIGURE 5. Most used datasets.
Neural Networks, and Recurrent Neural Networks,
as well as machine learning techniques applicable to
network anomaly detection [103].
72) Unsupervised Anomaly Detection in Stream Data
with Online Evolving Spiking Neural Networks:
Their goal was to modify the Online evolving Spiking
Neural Network (OeSNN) classifier such that it could
successfully detect anomalies without having access to
labeled training data. As a result, the authors developed
a method called Online evolving Spiking Neural Net-
work for Unsupervised Anomaly Detection (OeSNN-
UAD), which, unlike OeSNN, worked unsupervised
and does not divide output neurons into discrete deci-
sion classes. Instead, it used three new modules: Gen-
eration of output values of candidate output neurons,
Anomaly classification, and Value correction [104].
73) Decoupling Representation Learning and Classi-
fication for GNN-based Anomaly Detection: The
authors investigated additional options for decoupling
node representation learning and classification for
anomaly detection than joint training, based on graph
neural network (GNN) and self-supervised learning FIGURE 6. Detailed techniques frequency.
(SSL) on graphs. They showed that decoupled training
using existing graph SSL schemes might deteriorate
when the behavior patterns and the label semantics strategy considering the ensemble of different algo-
become highly inconsistent. They propose an effec- rithms. Each selected detector is exclusively responsi-
tive graph SSL scheme, called Deep Cluster Infomax ble for finding a limited number of the most apparent
(DCI), that clusters the entire graph into multiple parts outliers from their specific point of view. Compared
to capture the fundamental graph attributes in more to employing a single detector, having an ensemble of
concentrated feature spaces [105]. weak detectors reduces the possibility of bias. Forward-
74) A Unifying Review of Deep and Shallow Anomaly backward search was used to find the best detector
Detection: The authors aimed to identify the com- combination [106].
mon principles and assumptions often made implicitly 76) Deep Structured Cross-Modal Anomaly Detection:
by various deep learning methods, such as generative The authors proposed a deep neural network-based
models, one-class classification, and reconstruction. cross-modal anomaly detection approach (CMAD) in
They drew linkages between classic and modern deep this paper. First, they trained a deep structured model
approaches and explained how this relationship might to represent features from different modalities and then
cross-fertilize or extend in both directions. They also project them into a latent feature space. After that, they
provided an empirical assessment of the most often ‘‘pulled’’ the projections of a pair of instances from dif-
used approaches [31]. ferent modalities together if their cross-modal patterns
75) EBOD: An ensemble-based outlier detection algo- were consistent. Otherwise, they ‘‘push’’ them apart.
rithm for noisy datasets: The authors offered an unsu- Then they measure the distances between different
pervised ensemble-based outlier detection (EBOD) modalities to identify cross-modality anomalies [107].

112354 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

FIGURE 7. Neural network frequencies.

B. DATASET, NETWORK AND PERFORMANCE The lower-dimensional space usually highlights hidden
Figure 5 shows the Canadian Institute for Advanced Research anomalies that reduce the false-positive rate. Neural net-
(CIFAR) [108] and Modified National Institute of Stan- works, notably deep learning ones, demonstrate substantially
dards and Technology database (MNIST) [109] as the most better ability to extract complex features and nonlinear rela-
used datasets. CIFAR (or CIFAR-10) is a collection of tionships. The features extracted by these models preserve
60000 32 × 32 color images in 10 different image the central information that separates anomalies from nor-
classes, and MNIST is an extensive database of handwritten mal cases. This kind of network performs better than linear
28 × 28 pixel bounding box digits. Both are commonly used methods and is easy to build and run. On the other hand,
to train machine learning algorithms. Table 6 presents all when the correlation between characteristics is very weak,
selected papers with employed neural network techniques, it is common for this network to be taken to local minimums.
datasets, and their respective performance metric. In normality learning, networks are forced to capture the
underlying regularities of the data. These algorithms assume
C. WHAT KINDS OF NEURAL NETWORKS ARE that normal instances are easier to downsize or structure
USED TO OUTLIER DETECTION? than anomalies. The Abnormality threshold is not dependent
This research question aims to specify the neural networks on the existing outliers score. Instead, the neural network
used to detect anomalies within the period covered by survey directly learns the anomalies. [110].
will be addressed. As a fundamental point, the networks most
used in detecting anomalies are identified along with a brief
assessment. D. WHAT ARE NEURAL NETWORKS STRENGTHS
The most Common neural networks found: FOR OUTLIER DETECTION
• AE: Autoencoder Neural Network This section will address this research question that aims to
• CNN: Convolutional Neural Network identify the main characteristics of neural networks used to
• RNN: Recurrent Neural Network(LSTM) detect anomalies, highlighting their strengths and limitations.
• GAN: Generative Adversarial Network One of the most explored hypotheses is that feature rep-
AE: Autoencoders can figure out correlated input features. resentations extracted by deep networks preserve discrimina-
This algorithm will find some of those (linear or not) cor- tory information that helps separate anomalies from normal
relations. Indeed, an autoencoder learns a low-dimensional instances. One of the approaches in this direction is to use
representation very close to PCAs. Since the algorithm learns pre-trained deep learning models, such as AlexNet, VGG,
the correlations, as shown in figure 6, it is widely used to and ResNet [111], [112], [113], to extract low-dimensional
discover anomalies. characteristics. This line is widely used in research to detect
RNN: Recurrent Neural Network includes some othres anomalies in complex high-dimensional data, such as images
subtypes like LSTM, Long Short-Term Memory, GRU, Gated and videos.
Recurrent Unit Neural Network, and HTM, Hierachical Tem- Compared to popular dimension reduction methods in
poral Memory. LSTM represents more than 60% of all RNN. anomaly detection, such as principal component analysis
Anomaly detection algorithms using neural networks could and random projection, deep networks have demonstrated a
be separated into three main categories: Feature Extraction, substantially better ability to extract features in linear and
Normality Learning, Abnormality threshold [30]. nonlinear relationships [110].
Feature extraction algorithms, as CNN, simplify the Consider combining different outlier detection techniques,
data, usually by reducing the number of dimensions. in which each of the selected detectors is only responsible

VOLUME 10, 2022 112355


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. Datasets and Metrics compiled from the researched articles.

112356 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

VOLUME 10, 2022 112357


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

112358 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

VOLUME 10, 2022 112359


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

112360 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

VOLUME 10, 2022 112361


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

112362 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

TABLE 6. (Continued.) Datasets and Metrics compiled from the researched articles.

for finding a small number of outliers that are the most Anomaly detection is a difficult problem to solve in general
visible from their individual perspectives, as shown in [106]. and for that reason most of the techniques in the literature
Moreover, compared to employing a single detector, using an tend to solve a specific case of the general problem, based
ensemble of weak detectors minimizes the potential of bias on the type of application, type of input data and model,
during outlier detection, mainly dealing with noisy datasets. availability of labels for the training and testing data and
Generative/reconstruction model-based approaches for also the types of anomalies. A particular difficult problem is
OOD detection have an advantage because they do not require when the abnormal examples in the dataset tend to disguise
any labels for training [53]. as normal data. [88]
Deep convolutional neural networks are widely used in The study made by [114] indicates that methods that lever-
the field of computer vision, producing state-of-the-art per- age descriptors of pre-trained networks perform better than all
formance in many classification, action detection and conse- other approaches and deep-learning-based generative models
quently anomaly detection tasks. show considerable room for improvement.
Generative Model Base Approach and autoencoders Base Deep detection models can learn abnormalities beyond
models require no labels for training. Anomaly detection given anomaly examples scope. Therefore, it would
performance degrades less due changes intraclass variance be necessary to understand and explore the extent of
and diferent input complexity. the generalizability and develop models to improve the
accuracy [115], [116], [117], [118].
E. WHAT ARE THE CURRENT CHALLENGES OF Big Data Normality Learning is a challenging learning
OUTLIER DETECTION TECHNIQUES? task to obtain sufficient anomaly labeled data. The goal is
This research question aims to identify the main difficul- to transfer pre-trained representations to act unsupervised on
ties faced by scientists in detecting anomalies using neural unknown data.
networks, will be addressed. As a fundamental point of this Several anomaly detection methods address on isolated
review, the intention is to map some research gaps, in order anomalies. However, contextual and collective ones are sig-
to solve them in the future. nificantly less explored.

VOLUME 10, 2022 112363


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

Deep learning has superior capability in capturing the REFERENCES


complex temporal/spatial dependence and learning represen- [1] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’
tations of a set of unordered data points. It is essential to ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009.
[2] B. Kitchenham, ‘‘Procedure for undertaking systematic reviews,’’ Com-
explore whether deep learning could also detect such complex put. Sci. Dept., Keele Univ. (TRISE) Nat. ICT Aust., Sydney, NSW,
anomalies better. For example, one can explore new neural Australia, Tech. Rep. 33, 2004.
network layers or new loss functions. [3] S. Keele, ‘‘Guidelines for performing systematic literature reviews in soft-
Without proper regularization, reconstruction-based meth- ware engineering,’’ EBSE, Goyang-Si, South Korea, Tech. Rep. Version
2.3, 2007.
ods, as GAN, can easily become overfitted, resulting in low [4] A. B. Nassif, M. A. Talib, Q. Nasir, and F. M. Dakalbab, ‘‘Machine
performance [57]. learning for anomaly detection: A systematic review,’’ IEEE Access,
vol. 9, pp. 78658–78700, 2021.
F. REVIEW LIMITATIONS [5] F. J. Anscombe, ‘‘Rejection of outliers,’’ Technometrics, vol. 2, no. 2,
pp. 123–146, May 1960.
This research only focuses on journal and conference papers [6] F. E. Grubbs, ‘‘Procedures for detecting outlying observations in sam-
related to outlier detection using neural networks. The pro- ples,’’ Technometrics, vol. 11, no. 1, pp. 1–21, 1969.
tocol excluded several papers at the initial review step. [7] D. M. Hawkins, Identification of Outliers, vol. 11. London, U.K.:
All papers match the research criteria and this review Chapman & Hall, 1980.
[8] V. Barnett and T. Lewis, ‘‘Outliers in statistical data,’’ in Wiley Series in
excludes approach only for specific purpose as shown in Probability and Mathematical Statistics (Applied Probability and Statis-
Table 3. Furthermore, the same principle applies to quality tics). Chichester, U.K.: Wiley, 1984.
evaluation, as shown in Table 4. [9] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, ‘‘LOF: Identi-
fying density-based local outliers,’’ ACM SIGMOD Rec., vol. 29, no. 2,
pp. 93–104, 2000.
VI. CONCLUSION
[10] M.-F. Jiang, S.-S. Tseng, and C.-M. Su, ‘‘Two-phase clustering pro-
The protocol reaches 1422 papers as shown in Table 5. After cess for outliers detection,’’ Pattern Recognit. Lett., vol. 22, nos. 6–7,
applying inclusion and exclusion criteria detailed in Table 3, pp. 691–700, 2001.
and quality assessment described in section II-D there were [11] S. Hawkins, H. He, G. Williams, and R. Baxter, ‘‘Outlier detection using
replicator neural networks,’’ in Proc. Int. Conf. Data Warehousing Knowl.
76 papers left for the review as shown in Section V-A. Discovery. Berlin, Germany: Springer, 2002, pp. 170–180.
This study reviews anomaly detection using neural net- [12] T. Hu and S. Y. Sung, ‘‘Detecting pattern-based outliers,’’ Pattern Recog-
works techniques. The main goal is look three perspectives: nit. Lett., vol. 24, no. 16, pp. 3059–3068, Dec. 2003.
What kinds of Neural Networks are used to outlier detec- [13] C. C. Aggarwal and P. S. Yu, ‘‘An effective and efficient algorithm for
high-dimensional outlier detection,’’ VLDB J., vol. 14, no. 2, pp. 211–221,
tion, what are neural networks strengths for outlier detec- 2005.
tion and what are the current challenges of outlier detection [14] S. Sadik and L. Gruenwald, ‘‘Online outlier detection for data streams,’’
techniques. in Proc. 15th Symp. Int. Database Eng. Appl. (IDEAS), New York, NY,
USA, 2011, pp. 88–96.
[15] C. C. Aggarwal and P. S. Yu, ‘‘Outlier detection for high dimensional
A. FUTURE WORKS data,’’ in Proc. ACM SIGMOD Int. Conf. Manage. Data (SIGMOD), 2001,
There are some fascinating new research applications and pp. 37–46.
issue contexts where there may be significant room to develop [16] E. M. Knorr and R. T. Ng, ‘‘Finding intensional knowledge of distance-
based outliers,’’ in Proc. VLDB, vol. 99, 1999, pp. 211–222.
deep detection techniques. For example, the majority of shal- [17] L. N. D. Castro and D. G. Ferrari, ‘‘Introdução à mineração de dados:
low and deep models for anomaly identification assume that Conceitos básicos, algoritmos e aplicações,’’ São Paulo: Saraiva, vol. 5,
data anomalies are independently and uniformly distributed. pp. 1–376, Mar. 2016.
[18] N. Görnitz, M. Kloft, K. Rieck, and U. Brefeld, ‘‘Toward supervised
However, in reality, these assumptions are not always accu- anomaly detection,’’ J. Artif. Intell. Res., vol. 46, pp. 235–262, Jan. 2013.
rate, and it is an exciting gap to investigate. [19] O. Chapelle, B. Scholkopf, and A. Zien, ‘‘Semi-supervised learning
The manuscript will be enriched with publications from (Chapelle, O. et al., Eds.; 2006) [Book reviews],’’ IEEE Trans. Neural
Netw., vol. 20, no. 3, p. 542, Mar. 2009.
2022 onward, and additional research questions will be
[20] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and
included; for example, what are data types used? What is G. Langs, ‘‘Unsupervised anomaly detection with generative adversarial
the anomaly detection technique used within each data type? networks to guide marker discovery,’’ in Proc. Int. Conf. Inf. Process.
Also, How is RNN formulated to detect deviations in serial Med. Imag. Cham, Switzerland: Springer, 2017, pp. 146–157.
[21] M. Valença, Aplicando Redes Neurais: UM GUIA COMPLETO. Olinda:
data? How is CNN used to extract features from array data? Meuser Valença, 2016.
Or how are attention models used to learn patterns in array [22] N. K. Manaswi, N. K. Manaswi, and S. John, Deep Learning With
data? Applications Using Python. Berkeley, CA, USA: Apress, 2018.
[23] D. P. Kingma and M. Welling, ‘‘Auto-encoding variational Bayes,’’ 2014,
The data types are fundamental because they can directly arXiv:1312.6114.
influence the choice of the most appropriate technique. [24] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
Include discussions about the technical approaches that could S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial networks,’’
be performed. At last, a discussion of the strategies and Assoc. Comput. Machinery, New York, NY, USA, Tech. Rep. ACM
63.11, 2014.
patterns can be valuable in providing deep analysis and inves- [25] L. Deng and D. Yu, ‘‘Deep learning: Methods and applications,’’ Found.
tigation of the topic. Trends Signal Process., vol. 7, nos. 3–4, pp. 197–387, Jun. 2014.
[26] B. Juba and H. S. Le, ‘‘Precision-recall versus accuracy and the role of
CONFLICT OF INTEREST large data sets,’’ in Proc. AAAI Conf. Artif. Intell., 2019, vol. 33, no. 1,
pp. 4039–4048.
The authors have no conflicts of interest to declare that are [27] Y. Sasaki, ‘‘The truth of the F-measure,’’ Teach Tutor Mater, vol. 1, no. 5,
relevant to the content of this article. pp. 1–5, 2007.

112364 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

[28] K. Ding, Q. Zhou, H. Tong, and H. Liu, ‘‘Few-shot network anomaly [52] P. Zhao, X. Chang, and M. Wang, ‘‘A novel multivariate time-series
detection via cross-network meta-learning,’’ in Proc. Web Conf., anomaly detection approach using an unsupervised deep neural network,’’
Ljubljana, Slovenia, Apr. 2021, pp. 2448–2456. IEEE Access, vol. 9, pp. 109025–109041, 2021.
[29] B. J. Beula Rani and L. Sumathi M. E, ‘‘Survey on applying GAN for [53] J. Lee, M. Umar Karim Khan, and C.-M. Kyung, ‘‘Hybrid discriminator
anomaly detection,’’ in Proc. Int. Conf. Comput. Commun. Informat. with correlative autoencoder for anomaly detection,’’ IEEE Access, vol. 9,
(ICCCI), Jan. 2020, pp. 1–5. pp. 49098–49109, 2021.
[30] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, ‘‘Deep learning for [54] H. Sarvari, C. Domeniconi, B. Prenkaj, and G. Stilo, ‘‘Unsuper-
anomaly detection: A review,’’ ACM Comput. Surv., vol. 54, no. 2, vised boosting-based autoencoder ensembles for outlier detection,’’
pp. 1–38, Mar. 2021. in Advances in Knowledge Discovery and Data Mining (Lecture
[31] L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek, Notes in Computer Science), vol. 12712, K. Karlapalem, H. Cheng,
M. Kloft, T. G. Dietterich, and K.-R. Müller, ‘‘A unifying review of N. Ramakrishnan, R. K. Agrawal, P. K. Reddy, J. Srivastava, and
deep and shallow anomaly detection,’’ Proc. IEEE, vol. 109, no. 5, T. Chakraborty, Eds. Cham, Switzerland: Springer, 2021, pp. 91–103.
pp. 756–795, May 2021.
[55] L. Deng, X. Chen, Y. Zhao, and K. Zheng, ‘‘HIFI: Anomaly detec-
[32] M. Ahmed, A. N. Mahmood, and J. Hu, ‘‘A survey of network anomaly
tion for multivariate time series with high-order feature interactions,’’
detection techniques,’’ J. Netw. Comput. Appl., vol. 60, pp. 19–31,
in Database Systems for Advanced Applications (Lecture Notes in
Jan. 2016.
[33] R. Chalapathy and S. Chawla, ‘‘Deep learning for anomaly detection: A Computer Science), vol. 12681, C. S. Jensen, E.-P. Lim, D.-N. Yang,
survey,’’ 2019, arXiv:1901.03407. W.-C. Lee, V. S. Tseng, V. Kalogeraki, J.-W. Huang, and C.-Y. Shen, Eds.
[34] J. Kauffmann, K.-R. Müller, and G. Montavon, ‘‘Towards explaining Cham, Switzerland: Springer, 2021, pp. 641–649.
anomalies: A deep Taylor decomposition of one-class models,’’ Pattern [56] P. Sperl, J.-P. Schulze, and K. Böttinger, ‘‘Activation anomaly anal-
Recognit., vol. 101, May 2020, Art. no. 107198. ysis,’’ in Machine Learning and Knowledge Discovery in Databases
[35] J. Wang, W. Huang, S. Wang, P. Dai, and Q. Li, ‘‘LRGAN: Visual (Lecture Notes in Computer Science), vol. 12458, F. Hutter, K. Kersting,
anomaly detection using GAN with locality-preferred recoding,’’ J. Vis. J. Lijffijt, and I. Valera, Eds. Cham, Switzerland: Springer, 2021,
Commun. Image Represent., vol. 79, Aug. 2021, Art. no. 103201. pp. 69–84.
[36] Z. Wang, Z. Chen, J. Ni, H. Liu, H. Chen, and J. Tang, ‘‘Multi-scale [57] A. Geiger, D. Liu, S. Alnegheimish, A. Cuesta-Infante, and
one-class recurrent neural networks for discrete event sequence anomaly K. Veeramachaneni, ‘‘TadGAN: Time series anomaly detection using
detection,’’ in Proc. 27th ACM SIGKDD Conf. Knowl. Discovery Data generative adversarial networks,’’ in Proc. IEEE Int. Conf. Big Data (Big
Mining, Aug. 2021, pp. 3726–3734. Data), Atlanta, GA, USA, Dec. 2020, pp. 33–43.
[37] T. Zhao, C. Deng, K. Yu, T. Jiang, D. Wang, and M. Jiang, ‘‘Error- [58] M. A. Bashar and R. Nayak, ‘‘TAnoGAN: Time series anomaly detection
bounded graph anomaly loss for GNNs,’’ in Proc. 29th ACM Int. Conf. with generative adversarial networks,’’ in Proc. IEEE Symp. Comput.
Inf. Knowl. Manage., Oct. 2020, pp. 1873–1882. Intell. (SSCI), Canberra, ACT, Australia, Dec. 2020, pp. 1778–1785.
[38] Q. Chen, A. Zhang, T. Huang, Q. He, and Y. Song, ‘‘Imbalanced dataset- [59] S. Aburakhia, T. Tayeh, R. Myers, and A. Shami, ‘‘A transfer learning
based echo state networks for anomaly detection,’’ Neural Comput. Appl., framework for anomaly detection using model of normality,’’ in Proc.
vol. 32, no. 8, pp. 3685–3694, Apr. 2020. 11th IEEE Annu. Inf. Technol., Electron. Mobile Commun. Conf. (IEM-
[39] D. Chakraborty, V. Narayanan, and A. Ghosh, ‘‘Integration of deep CON), Vancouver, BC, Canada, Nov. 2020, pp. 0055–0061.
feature extraction and ensemble learning for outlier detection,’’ Pattern
[60] G. Kwon, M. Prabhushankar, D. Temel, and G. AlRegib, ‘‘Novelty
Recognit., vol. 89, pp. 161–171, May 2019.
detection through model-based characterization of neural networks,’’ in
[40] X. Zhang, J. Mu, X. Zhang, H. Liu, L. Zong, and Y. Li, ‘‘Deep anomaly
Proc. IEEE Int. Conf. Image Process. (ICIP), Abu Dhabi, United Arab
detection with self-supervised learning and adversarial training,’’ Pattern
Emirates, Oct. 2020, pp. 3179–3183.
Recognit., vol. 121, Jan. 2022, Art. no. 108234.
[41] R. Wang, K. Nie, Y.-J. Chang, X. Gong, T. Wang, Y. Yang, and B. Long, [61] J. Audibert, P. Michiardi, F. Guyard, S. Marti, and M. A. Zuluaga,
‘‘Deep learning for anomaly detection,’’ in Proc. 26th ACM SIGKDD Int. ‘‘USAD: Unsupervised anomaly detection on multivariate time series,’’
Conf. Knowl. Discovery Data Mining Aug. 2020, pp. 3569–3570. in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
[42] L. Ahrens, J. Ahrens, and H. D. Schotten, ‘‘A machine-learning phase Aug. 2020, pp. 3395–3404.
classification scheme for anomaly detection in signals with periodic [62] T. Ergen and S. S. Kozat, ‘‘Unsupervised anomaly detection with LSTM
characteristics,’’ EURASIP J. Adv. Signal Process., vol. 2019, no. 1, p. 27, neural networks,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8,
Dec. 2019. pp. 3127–3141, Aug. 2020.
[43] T. Chen, X. Liu, B. Xia, W. Wang, and Y. Lai, ‘‘Unsupervised anomaly [63] T. Pimentel, M. Monteiro, A. Veloso, and N. Ziviani, ‘‘Deep active
detection of industrial robots using sliding-window convolutional varia- learning for anomaly detection,’’ in Proc. Int. Joint Conf. Neural Netw.
tional autoencoder,’’ IEEE Access, vol. 8, pp. 47072–47081, 2020. (IJCNN), Glasgow, U.K., Jul. 2020, pp. 1–8.
[44] N. AlDahoul, H. Abdul Karim, and A. S. Ba Wazir, ‘‘Model fusion of [64] T. Nolle, A. Seeliger, N. Thoma, and M. Mühlhäuser, ‘‘DeepAlign:
deep neural networks for anomaly detection,’’ J. Big Data, vol. 8, no. 1, Alignment-based process anomaly correction using recurrent neural net-
p. 106, Dec. 2021. works,’’ in Advanced Information Systems Engineering (Lecture Notes in
[45] A. S. Hashmi and T. Ahmad, ‘‘GP-ELM-RNN: Garson-pruned extreme Computer Science), vol. 12127, S. Dustdar, E. Yu, C. Salinesi, D. Rieu,
learning machine based replicator neural network for anomaly detection,’’ and V. Pant, Eds. Cham, Switzerland: Springer, 2020, pp. 319–333.
J. King Saud Univ.-Comput. Inf. Sci., vol. 34, no. 5, pp. 1768–1774, [65] A. Frikha, D. Krompass, and V. Tresp, ‘‘ARCADe: A rapid continual
May 2022. anomaly detector,’’ in Proc. 25th Int. Conf. Pattern Recognit. (ICPR),
[46] A.-A. Papadopoulos, M. R. Rajati, N. Shaikh, and J. Wang, ‘‘Outlier Milan, Italy, Jan. 2021, pp. 10449–10456.
exposure with confidence control for out-of-distribution detection,’’ Neu- [66] E. Hong and Y. Choe, ‘‘Latent feature decentralization loss for one-class
rocomputing, vol. 441, pp. 138–150, Jun. 2021. anomaly detection,’’ IEEE Access, vol. 8, pp. 165658–165669, 2020.
[47] H. Cevikalp, B. Uzun, O. Köpüklü, and G. Ozturk, ‘‘Deep compact
[67] Z. Ghafoori and C. Leckie, ‘‘Deep multi-sphere support vector data
polyhedral conic classifier for open and closed set recognition,’’ Pattern
description,’’ in Proc. SIAM Int. Conf. Data Mining, 2020, pp. 109–117.
Recognit., vol. 119, Nov. 2021, Art. no. 108080.
[48] H. Dai, J. Cao, T. Wang, M. Deng, and Z. Yang, ‘‘Multilayer one-class [68] M. Hossein Zadeh Bazargani and B. Mac Namee, ‘‘The elliptical
extreme learning machine,’’ Neural Netw., vol. 115, pp. 11–22, Jul. 2019. basis function data descriptor (EBFDD) network: A one-class clas-
[49] G. S. Chadha, I. Islam, A. Schwung, and S. X. Ding, ‘‘Deep convolutional sification approach to anomaly detection,’’ in Machine Learning and
clustering-based time series anomaly detection,’’ Sensors, vol. 21, no. 16, Knowledge Discovery in Databases (Lecture Notes in Computer Sci-
p. 5488, Aug. 2021. ence), vol. 11906, U. Brefeld, E. Fromont, A. Hotho, A. Knobbe,
[50] G. García González, P. Casas, A. Fernández, and G. Gómez, ‘‘On the M. Maathuis, and C. Robardet, Eds. Cham, Switzerland: Springer, 2020,
usage of generative models for network anomaly detection in multivariate pp. 107–123.
time-series,’’ ACM SIGMETRICS Perform. Eval. Rev., vol. 48, no. 4, [69] J. Lee, H. Bae, and S. Yoon, ‘‘Anomaly detection by learning dynamics
pp. 49–52, May 2021. from a graph,’’ IEEE Access, vol. 8, pp. 64356–64365, 2020.
[51] Y. Chen, H. Zhang, Y. Wang, Y. Yang, X. Zhou, and [70] P. M. Dassanayake, A. Anjum, W. Manning, and C. Bower, ‘‘A deep
Q. M. J. Wu, ‘‘MAMA Net: Multi-scale attention memory autoencoder reinforcement learning based homeostatic system for unmanned position
network for anomaly detection,’’ IEEE Trans. Med. Imag., vol. 40, no. 3, control,’’ in Proc. 6th IEEE/ACM Int. Conf. Big Data Comput., Appl.
pp. 1032–1041, Mar. 2021. Technol. (BDCAT), Auckland, New Zealand, Dec. 2019, pp. 127–136.

VOLUME 10, 2022 112365


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

[71] A. Akhriev and J. Marecek, ‘‘Deep autoencoders with value-at-risk [93] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, ‘‘Skip-GANomaly:
thresholding for unsupervised anomaly detection,’’ in Proc. IEEE Skip connected and adversarially trained encoder–decoder anomaly
Int. Symp. Multimedia (ISM), San Diego, CA, USA, Dec. 2019, detection,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Budapest,
pp. 208–2083. Hungary, Jul. 2019, pp. 1–8.
[72] J. Gowthamy, S. K. Swamy, M. Kumar, and M. Kumar, ‘‘Computation of [94] P. Schlachter, Y. Liao, and B. Yang, ‘‘Deep one-class classification
person re-identification using self-learning anomaly detection framework using intra-class splitting,’’ in Proc. IEEE Data Sci. Workshop (DSW),
in deep-learning,’’ Int. J. Innov. Technol. Exploring Eng., vol. 9, no. 1, Minneapolis, MN, USA, Jun. 2019, pp. 100–104.
pp. 1106–1109, Nov. 2019. [95] Y. Zhang, Y. Zhu, X. Li, X. Wang, and X. Guo, ‘‘Anomaly detection based
[73] P. Perera and V. M. Patel, ‘‘Learning deep features for one-class classi- on mining six local data features and BP neural network,’’ Symmetry,
fication,’’ IEEE Trans. Image Process., vol. 28, no. 11, pp. 5450–5463, vol. 11, no. 4, p. 571, Apr. 2019.
Nov. 2019. [96] R. K. Malaiya, D. Kwon, S. C. Suh, H. Kim, I. Kim, and J. Kim,
[74] J. Henriksson, C. Berger, M. Borg, L. Tornberg, S. R. Sathyamoorthy, ‘‘An empirical evaluation of deep learning for network anomaly detec-
and C. Englund, ‘‘Performance analysis of out-of-distribution detection tion,’’ IEEE Access, vol. 7, pp. 140806–140817, 2019.
on various trained neural networks,’’ in Proc. 45th Euromicro Conf. [97] N. Upasani and H. Om, ‘‘Optimized fuzzy min-max neural net-
Softw. Eng. Adv. Appl. (SEAA), Kallithea-Chalkidiki, Greece, Aug. 2019, work: An efficient approach for supervised outlier detection,’’ Neural
pp. 113–120. Netw. World, vol. 28, no. 4, pp. 285–303, 2018.
[75] M.-H. Oh and G. Iyengar, ‘‘Sequential anomaly detection using inverse [98] A. Munawar, P. Vinayavekhin, and G. De Magistris, ‘‘Limiting the
reinforcement learning,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl. reconstruction capability of generative neural network using negative
Discovery Data Mining, Anchorage, AK USA, Jul. 2019, pp. 1480–1490. learning,’’ in Proc. IEEE 27th Int. Workshop Mach. Learn. Signal Process.
[76] T. Banerjee, G. Whipps, P. Gurram, and V. Tarokh, ‘‘Cyclostationary sta- (MLSP), Tokyo, Japan, Sep. 2017, pp. 1–6.
tistical models and algorithms for anomaly detection using multi-modal [99] R. Chalapathy, A. K. Menon, and S. Chawla, ‘‘Robust, deep and inductive
data,’’ in Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), anomaly detection,’’ in Machine Learning and Knowledge Discovery in
Anaheim, CA, USA, Nov. 2018, pp. 126–130. Databases (Lecture Notes in Computer Science), vol. 10534, M. Ceci,
[77] J. Shi, J. Xu, Y. Yao, and B. Xu, ‘‘Concept learning through deep rein- J. Hollmén, L. Todorovski, C. Vens, and S. Dzeroski, Eds. Cham:
forcement learning with memory-augmented neural networks,’’ Neural Springer, 2017, pp. 36–51.
Netw., vol. 110, pp. 47–54, Feb. 2019. [100] Y. Ji, J. Wang, S. Li, Y. Li, S. Lin, and X. Li, ‘‘An anomaly event
[78] M. Jain and G. Kaur, ‘‘A study of feature reduction techniques and detection method based on GNN algorithm for multi-data sources,’’
classification for network anomaly detection,’’ J. Comput. Inf. Technol., in Proc. 3rd ACM Int. Symp. Blockchain Secure Crit. Infrastruct.,
vol. 27, no. 4, pp. 1–16, Jun. 2020. Hong Kong, May 2021, pp. 91–96.
[79] D. Li, D. Chen, B. Jin, L. Shi, J. Goh, and S.-K. Ng, ‘‘MAD-GAN: Multi- [101] L. Li, J. Yan, H. Wang, and Y. Jin, ‘‘Anomaly detection of time series with
variate anomaly detection for time series data with generative adversarial smoothness-inducing sequential variational auto-encoder,’’ IEEE Trans.
networks,’’ in Artificial Neural Networks and Machine Learning–ICANN Neural Netw. Learn. Syst., vol. 32, no. 3, pp. 1177–1191, Mar. 2021.
[102] M. Sabokrou, M. Fathy, G. Zhao, and E. Adeli, ‘‘Deep end-to-end one-
2019: Text and Time Series (Lecture Notes in Computer Science),
class classifier,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2,
vol. 11730, I. V. Tetko, V. Kurková, P. Karpov, and F. Theis, Eds. Cham,
pp. 675–684, Feb. 2021.
Switzerland: Springer, 2019, pp. 703–716.
[103] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, ‘‘A survey
[80] T. Kieu, B. Yang, C. Guo, and C. S. Jensen, ‘‘Outlier detection for time
of deep learning-based network anomaly detection,’’ Cluster Comput.,
series with recurrent autoencoder ensembles,’’ in Proc. 28th Int. Joint
vol. 22, pp. 949–961, Jan. 2019.
Conf. Artif. Intell., Macao, China, Aug. 2019, pp. 2725–2732.
[104] P. S. Maciąg, M. Kryszkiewicz, R. Bembenik, J. L. Lobo, and J. Del Ser,
[81] C. Chen, X. Lin, and G. Terejanu, ‘‘An approximate Bayesian long short-
‘‘Unsupervised anomaly detection in stream data with online evolving
term memory algorithm for outlier detection,’’ in Proc. 24th Int. Conf.
spiking neural networks,’’ Neural Netw., vol. 139, pp. 118–139, Jul. 2021.
Pattern Recognit. (ICPR), Beijing, China, Aug. 2018, pp. 201–206.
[105] Y. Wang, J. Zhang, S. Guo, H. Yin, C. Li, and H. Chen, ‘‘Decoupling
[82] M. Munir, S. A. Siddiqui, A. Dengel, and S. Ahmed, ‘‘DeepAnT: A deep
representation learning and classification for GNN-based anomaly detec-
learning approach for unsupervised anomaly detection in time series,’’
tion,’’ in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retr.,
IEEE Access, vol. 7, pp. 1991–2005, 2018.
Jul. 2021, pp. 1239–1248.
[83] J. Chen, S. Sathe, C. Aggarwal, and D. Turaga, Outlier Detection With [106] B. Ouyang, Y. Song, Y. Li, G. Sant, and M. Bauchy, ‘‘EBOD: An
Autoencoder Ensembles. Philadelphia, PA, USA: Society for Industrial ensemble-based outlier detection algorithm for noisy datasets,’’ Knowl.-
and Applied Mathematics, 2017, pp. 90–98. Based Syst., vol. 231, Nov. 2021, Art. no. 107400.
[84] O. Al aama and H. Tamukoh, ‘‘Training autoencoder using three different [107] Y. Li, N. Liu, J. Li, M. Du, and X. Hu, ‘‘Deep structured cross-modal
reversed color models for anomaly detection,’’ J. Robot., Netw. Artif. Life, anomaly detection,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
vol. 7, no. 1, p. 35, 2020. Budapest, Hungary, Jul. 2019, pp. 1–8.
[85] R. Kiani, A. Keshavarzi, and M. Bohlouli, ‘‘Detection of thin boundaries [108] A. Krizhevsky and G. Hinton, ‘‘Learning multiple layers of features from
between different types of anomalies in outlier detection using enhanced tiny images,’’ Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
neural networks,’’ Appl. Artif. Intell., vol. 34, no. 5, pp. 345–377, [109] L. Deng, ‘‘The MNIST database of handwritten digit images for
Apr. 2020. machine learning research,’’ IEEE Signal Process. Mag., vol. 29, no. 6,
[86] X. Liu, X. Di, Q. Ding, W. Liu, H. Qi, J. Li, and H. Yang, ‘‘NADS-RA: pp. 141–142, Nov. 2012.
Network anomaly detection scheme based on feature representation and [110] A. C. Ian Goodfellow and Y. Bengio, ‘‘Deep learning,’’ in The Reference
data augmentation,’’ IEEE Access, vol. 8, pp. 214781–214800, 2020. Book for Deep Learning Models. Cambridge, MA, USA: MIT Press,
[87] M. O. Kaplan and S. E. Alptekin, ‘‘An improved BiGAN based approach 2016.
for anomaly detection,’’ Proc. Comput. Sci., vol. 176, pp. 185–194, [111] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
Jan. 2020. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf.
[88] S. Bulusu, B. Kailkhura, B. Li, P. K. Varshney, and D. Song, ‘‘Anomalous Process. Syst., vol. 25, 2012, pp. 1097–1105.
example detection in deep learning: A survey,’’ IEEE Access, vol. 8, [112] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for
pp. 132330–132347, 2020. image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[89] M. A. Albahar and M. Binsawad, ‘‘Deep autoencoders and feedforward (CVPR), Jun. 2016, pp. 770–778.
networks based on a new regularization for anomaly detection,’’ Secur. [113] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
Commun. Netw., vol. 2020, pp. 1–9, Jul. 2020. large-scale image recognition,’’ 2014, arXiv:1409.1556.
[90] F. Alharbi, K. El Hindi, S. Al Ahmadi, and H. Alsalamn, ‘‘Convolutional [114] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, ‘‘The
neural network-based discriminator for outlier detection,’’ Comput. Intell. MVTec anomaly detection dataset: A comprehensive real-world dataset
Neurosci., vol. 2021, pp. 1–13, Mar. 2021. for unsupervised anomaly detection,’’ Int. J. Comput. Vis., vol. 129,
[91] M. Gu, J. Fei, and S. Sun, ‘‘Online anomaly detection with sparse Gaus- pp. 1038–1059, Apr. 2021.
sian processes,’’ Neurocomputing, vol. 403, pp. 383–399, Aug. 2020. [115] G. Pang, A. van den Hengel, C. Shen, and L. Cao, ‘‘Toward deep super-
[92] T. Iwata, M. Toyoda, S. Tora, and N. Ueda, ‘‘Anomaly detection vised anomaly detection: Reinforcement learning from partially labeled
with inexact labels,’’ Mach. Learn., vol. 109, no. 8, pp. 1617–1633, anomaly data,’’ in Proc. 27th ACM SIGKDD Conf. Knowl. Discovery
Aug. 2020. Data Mining, 2021, pp. 1298–1308.

112366 VOLUME 10, 2022


J. E. D. A. Filho et al.: Review of Neural Networks for Anomaly Detection

[116] G. Pang, C. Shen, H. Jin, and A. van den Hengel, ‘‘Deep weakly- BRUNO JOSÉ TORRES FERNANDES (Senior
supervised anomaly detection,’’ 2019, arXiv:1910.13601. Member, IEEE) received the B.Sc., M.Sc., and
[117] G. Pang, C. Shen, and A. van den Hengel, ‘‘Deep anomaly detection Ph.D. degrees in computer science from the Fed-
with deviation networks,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl. eral University of Pernambuco, Recife, Brazil,
Discovery Data Mining, Jul. 2019, pp. 353–362. in 2007, 2009, and 2013, respectively. He is cur-
[118] L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, rently an Associate Professor with the Univer-
K.-R. Müller, and M. Kloft, ‘‘Deep semi-supervised anomaly detection,’’
sity of Pernambuco, where he received the title
2019, arXiv:1906.02694.
of Livre-Docente, in 2017. He is also a CNPq
Productivity Fellow of technological development,
the Coordinator with the Computer Vision Labo-
ratory, Instituto de Inovação Tecnológica (IIT -UPE), and the Head of the
Pattern Recognition and Digital Image Processing Research Group, UPE.
JOSÉ EDSON DE ALBUQUERQUE FILHO His research interests include machine learning, computer vision, image
received the B.Sc. and M.Sc. degrees in computer processing, and neural networks. He was a recipient of awards, including
science from the Federal University of Pernam- the 2008 Google Academic Prize as the Top M.Sc. Student in the Federal
buco, Recife, Brazil, in 2002 and 2003, respec- University of Pernambuco and the Science and Technology Award for Out-
tively. He is currently pursuing the Ph.D. degree standing Research in the Polytechnic School, University of Pernambuco, in
with the Department of Computer Engineering, 2011 and 2017.
University of Pernambuco. He was a Professor
with the Faculdade Maurício de Nassau and a
Software Engineer Coordinator with the Data Pro-
cess Federal Bureau (SERPRO), Brazil. He is a
Senior System Analyst with the Business Intelligence Division, Pernambuco ALEXANDRE M. A. MACIEL received the Ph.D.
Prosecution Office. His research interests include machine learning, neural degree in computer science from the Federal Uni-
networks, and anomaly detection. versity of Pernambuco, in 2012.
He was the Chief Scientist of the Technological
Innovation Institute, UPE (IIT/UPE), a member of
the Innovation Chamber of Fundação de Amparo
à Pesquisa do Estado de Pernambuco (FACEPE)
LAISLLA C. P. BRANDÃO received the B.Sc. (Foundation of Support for Science and Technol-
degree in electrical and electronic engineering ogy of the State of Pernambuco), and a Visiting
from the Federal University of Pernambuco, Researcher at the State Agency for Information
Recife, Brazil, in 2011. She is currently pursuing Technology (ATI). He is an Associate Professor of computer engineering and
the M.Sc. degree with the Department of Com- the Coordinator of the Data Science and Analytics Course with the University
puter Engineering, University of Pernambuco, of Pernambuco (UPE). Currently, he is the General Manager of Innovation
Recife. Her research interests include artifi- Environments with the Secretary of Science, Technology, and Innovation of
cial intelligence and the design and analysis of Pernambuco State.
machine learning algorithms for anomaly detec- Dr. Maciel received the Santander Science and Innovation Award in the
tion and the prediction of anomalies applied to the Information Technology and Education Category.
welding process of the automotive industry.

VOLUME 10, 2022 112367

You might also like