0% found this document useful (0 votes)
23 views20 pages

Highlighted Anomaly and Intrusion Detection Using Deep Learning For Software-Defined Networks A Survey

This survey reviews the application of deep learning techniques for anomaly and intrusion detection in Software-Defined Networks (SDN), emphasizing the importance of Network Intrusion Detection Systems (NIDS). It examines essential components such as benchmark datasets, data preprocessing, deep learning modeling, hyperparameter tuning, and performance evaluation, highlighting the growing trend in research since 2021. The paper identifies gaps in existing literature and suggests future research directions to enhance the effectiveness of NIDS in SDNs.

Uploaded by

djhwptly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views20 pages

Highlighted Anomaly and Intrusion Detection Using Deep Learning For Software-Defined Networks A Survey

This survey reviews the application of deep learning techniques for anomaly and intrusion detection in Software-Defined Networks (SDN), emphasizing the importance of Network Intrusion Detection Systems (NIDS). It examines essential components such as benchmark datasets, data preprocessing, deep learning modeling, hyperparameter tuning, and performance evaluation, highlighting the growing trend in research since 2021. The paper identifies gaps in existing literature and suggests future research directions to enhance the effectiveness of NIDS in SDNs.

Uploaded by

djhwptly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Expert Systems With Applications 256 (2024) 124982

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Anomaly and intrusion detection using deep learning for software-defined


networks: A survey
Vitor Gabriel da Silva Ruffo a , Daniel Matheus Brandão Lent b , Mateus Komarchesqui a ,
Vinícius Ferreira Schiavon a , Marcos Vinicius Oliveira de Assis c , Luiz Fernando Carvalho d ,
Mario Lemes Proença Jr. a ,∗
a
Computer Science Department, State University of Londrina, Londrina, Paraná, Brazil
b
Electrical Engineering Department, State University of Londrina, Londrina, Paraná, Brazil
c
Engineering and Exact Department, Federal University of Paraná, Paraná, Brazil
d
Federal Technological University of Paraná, Paraná, Brazil

ARTICLE INFO ABSTRACT

Keywords: Software-Defined Networks (SDN) represent an adaptable paradigm for dealing with network users’ dynamic
Literature review demands. Confidentiality, integrity, and availability are fundamental pillars for the security of the networks,
NIDS which are often targeted by cyberattacks. The scientific community has been recently exploring deep learning
SDN
to implement Network Intrusion Detection Systems (NIDS) against network attacks. In this survey, we aim to
Datasets
present an empirical literature review on state-of-the-art NIDS based on deep learning for defending SDNs. The
Deep learning
Hyperparameters
essential steps to develop such systems are carefully examined: benchmark datasets, data preprocessing, deep
learning modeling, hyperparameter tuning, and performance evaluation. There has been a growing trend in
published works since 2021, underpinning the importance of the research field, which is still active and under
investigation. We support the development of the area by discussing the identified open issues and future
research directions.

1. Introduction controls the whole network, facilitating network orchestration and pol-
icy enforcement (Polat, Türko˘ glu, Polat, & Şengür, 2022; Sattari et al.,
Computer networks have been as essential as energy and water ser- 2022). Service providers like Google and Microsoft have been exploring
vices for the current society. Broadband, low-latency networks such as the SDN’s advantages over the traditional paradigm to implement their
Ethernet and 5G over the TCP/IP architecture enable numerous appli- global wide area networks (Zhang et al., 2023).
cations. Examples include communication, cloud computing, distance Malicious agents aim to disrupt network services’ confidentiality,
education, digital banking, online shopping, on-demand entertainment, integrity, or availability to achieve their goals (Aydın, Orman, & Aydın,
AI assistance, autonomous vehicles, and Internet of Things. The net- 2022; Gupta, Tripathi, & Grover, 2022; Mustapha et al., 2023). Cyber-
works’ serviceability has caused the number of connected devices to attacks may compromise the services’ functioning, potentially leading
grow exponentially in recent decades. Consequently, the volume of to money and reputation losses and harming human lives (Friha et al.,
network traffic and stored personal user information has drastically in- 2022; Hidalgo et al., 2022).
creased (Gupta, Jindal, & Bedi, 2021; Imrana, Xiang, Ali, & Abdul-Rauf, Network Intrusion Detection System (NIDS) is among the most
2021; Javed, Khayat, Elghariani, & Ghafoor, 2023). popular security solutions that have been studied to identify cyberat-
Traditional networks lack adaptability and strive to meet users’ tacks on network services promptly (Friha et al., 2023; Qazi, Imran,
increasingly complex and dynamic requirements. Conversely, Software- Haider, Shoaib, & Razzak, 2022; Udas, Karim, & Roy, 2022). NIDS
Defined Networks (SDN) emerged as a flexible paradigm for simplifying are positioned strategically on a network, gathering and analyzing
network management. Network control is decoupled from the data all incoming traffic. Network administrators are notified to counter
plane and moved to a central entity, the controller. This mechanism the problem whenever a traffic anomaly is identified. Some solutions

∗ Corresponding author.
E-mail addresses: [email protected] (V.G. da Silva Ruffo), [email protected] (D.M. Brandão Lent), [email protected]
(M. Komarchesqui), [email protected] (V.F. Schiavon), [email protected] (M.V.O. de Assis), [email protected] (L.F. Carvalho),
[email protected] (M.L. Proença Jr.).

https://doi.org/10.1016/j.eswa.2024.124982
Received 11 May 2024; Received in revised form 30 June 2024; Accepted 1 August 2024
Available online 5 August 2024
0957-4174/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

even include a mitigation module to attenuate the detected anomalies intrusion detection approaches. The authors covered the application of
automatically (Illy & Kaddoum, 2023; Myneni, Chowdhary, Huang, & IDS in the so-called Internet (Traditional Networks), Internet of Things
Alshamrani, 2022). (IoT), Software-Defined Networks, and Industrial Control Networks
The scientific community has been studying novel strategies to (ICN). This study highlighted that 70% of the analyzed papers apply
implement efficient and robust NIDS. Deep Learning (DL) is a state- to conventional networks rather than a specific application domain
of-the-art solution that is applied to solve intrusion detection. DL (e.g., SDN, and IoT), indicating that security in traditional networks
represents a class of mathematical models developed to identify com- is an important research topic. It is also pointed out that the lack of
plex patterns in high-dimensional data (Hairab, Said Elsayed, Jurcut, datasets or the confidentiality of industrial data are key factors limiting
& Azer, 2022; Yousuf & Mir, 2022). Deep Learning models are suit- the development of IDSs in SDN and ICN, respectively. In addition,
able for handling network traffic since they can effectively process preprocessing data proved necessary for building high-performance
high volumes of data. Examples include the Long Short-Term Memory detectors, and traditional supervised machine learning is the most
(LSTM) (Tayfour et al., 2023) and Gated Recurrent Unit (GRU) (As- applied technology. They also present the datasets and performance
sis, Carvalho, Lloret, & Proença, 2021) networks, which can discover metrics usually utilized for experimentation.
long-term dependencies in sequential data. Also, Convolutional Neu- de Souza, Westphall, Machado, Loffi, Westphall, and Geronimo
ral Network (CNN) (Fox & Boppana, 2023) models are usually ap- (2022) conducted a systematic literature review that addresses 108
plied to identify spatial patterns in the processed input. Autoencoder studies on intrusion detection and mitigation in fog computing and
(AE) (Fouladi, Ermiş, & Anarim, 2022) networks compress data to a IoT-based environments. The authors categorize IDSs according to their
fundamental low-dimensional representation. Generative Adversarial detection analysis approach as either signature-based or behavior-based
Networks (GAN) (Shu, Zhou, Zhang, Du, & Guizani, 2021) efficiently and deployment strategy, distinguishing between Network-based IDS
capture training data statistical distribution. and Host-based IDS. Due to the resource constraints of IoT devices, im-
This survey article proposes a comprehensive, empirical review of plementing monitoring systems in the fog is considered one of the most
state-of-the-art deep learning-based intrusion detection for SDNs. None promising strategies in this study. The Machine Learning techniques
of the reviewed related works simultaneously addressed benchmark evaluated were divided into supervised, unsupervised, semi-supervised,
datasets, data preprocessing, deep learning modeling, hyperparameter ensemble, and reinforcement learning. Besides that, collaborative IDSs
tuning, and performance evaluation. The present study aims to fill approaches, post-detection countermeasures, evaluation strategies, and
this research gap by discussing these critical topics often employed leading public datasets were also pointed out.
jointly in developing such systems. Combined with a holistic view of Nuaimi, Fourati, and Hamed (2023) published a systematic litera-
the current intrusion detection scenario, an up-to-date taxonomy of ture review that presents novel learning-based approaches for intrusion
deep learning model applications, public datasets, and dataset gener- detection in the Industrial Internet of Things (IIoT) in papers between
ation tools is conducted. We support innovation in the study field by 2017 and 2022. The authors employ the PRISMA (Preferred Report-
pinpointing the research directions that require further investigation. ing Items for Systematic Reviews and Meta-Analyzes) methodology to
The contributions of this work are six-fold: facilitate reporting systematic reviews and meta-analyses. This work
surveys the IIoT domain in the selected articles, provides a brief back-
• Present the most used datasets, attacks, and tools for developing ground on different systems and architectures for such domain, and
and evaluating intrusion detection systems to inform researchers exposes the addressed intrusion detection methods, datasets, metrics,
of standard and underexplored data sources; and attacks. The security threats are organized into five categories
• List the preprocessing steps commonly applied to NIDS input data based on the security aspect they compromise. Also, the reviewed
and their frequency for newcomers’ awareness; ML/DL models are separated into supervised, unsupervised, and rein-
• Introduce an up-to-date application taxonomy of state-of-the-art forcement learning, and the IDS’s placement (centralized, distributed,
deep learning models, highlighting the established and superfi- or hybrid) and approach (signature or anomaly-based) are considered.
cially explored ones; This survey concluded that deep learning models are prone to be better
• Discuss how state-of-the-art works usually optimize deep learning than machine learning at detecting network attacks and suggests that
models hyperparameters; Federated Learning and Blockchain should be explored further. Finally,
• Quantify the most popular performance metrics among the scien- it is stated that most IIoT devices are battery-powered; therefore, power
tific community to benchmark and compare intrusion detection consumption must be taken into account.
systems; M. and Sethuraman (2023) present a comprehensive survey of Ma-
• Identify the knowledge gaps in the area, pointing out future chine and Deep Learning-based malware detection approaches applied
research directions. to Sandboxing, Android, iOS, Windows, IoT, APT, and Ransomware.
The authors noted that most malicious software exploitation targets
The remainder of this survey article is structured as follows: Sec- the Windows operating system, which is classified into two areas: bulk
tion 2 reviews the related works; Section 3 briefly introduces the automated for widespread malware and sophisticated specialized mal-
background theory; Section 4 discusses the methodology applied to ware. According to this work, these threats can be detected statically
conduct this paper’s research; Section 5 summarizes the found results; by examining the executable file of the potentially malicious software
Section 6 points out the open issues and future research directions or dynamically by evaluating behaviors at executing the malicious
within the area of study; and Section 7 elaborates on the conclusions program in a virtual sandboxed environment. It is inferred that static
of the work. analysis techniques cannot correctly handle malware with obfuscation
but provide a better detection for familiar malware families. Hence, the
2. Related works dynamic or a hybrid of both approaches is employed for more efficient
results.
In this section, we review relevant studies from leading journals Taheri, Ahmed, and Arslan (2023) surveyed the utilization of Deep
that survey intrusion detection using Machine and Deep Learning. The Learning algorithms for the security of Software-Defined Networks.
general outline and particular insights are discussed for each work. We The authors separate Deep Learning into supervised discriminative
conclude by pointing out how our work builds upon them, providing learning, unsupervised generative learning, and hybrid deep learning,
an up-to-date, thorough view of the state of the art. like transfer and reinforcement learning. This work also analyses the
Yang et al. (2022) proposed a systematic and comprehensive litera- architectural vulnerabilities of the SDN, organizing which type of at-
ture that surveyed 119 highly cited papers on anomaly-based network tack thrives in each plane. It provides an in-depth analysis of the

2
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

datasets used throughout security papers and reviews the commonly none discussed hyperparameter tuning. Despite the critical importance
applied performance metrics. Finally, it is pointed out that the need of these subjects, no reviewed study simultaneously addressed the most
for large-scale, high-quality datasets majorly affects the overall per- used benchmark datasets, data preprocessing methods, hyperparameter
formance of Deep Learning models that have notoriously lengthy and tuning techniques, and performance evaluation metrics. We bridge this
resource-intensive training phases. gap by conducting an updated, in-depth review of the state of the art,
Abdulganiyu, Ait Tchakoucht, and Saheed (2023) compare the sig- embracing all elementary topics.
nature, anomaly, and hybrid-based Netword Intrusion Detection System
approaches. The authors analyze every Machine and Deep Learning 3. Background
model applied to each approach, collecting the most used evaluation
metrics and datasets. Thus, it is pointed out that anomaly-based systems 3.1. Deep learning
are the majority of research in NIDS. They presented that most papers
Deep Learning represents a subset of Machine Learning. It comprises
evaluate the proposed systems with simulated datasets, cover only a
mathematical models that are made of multiple layers of abstraction.
fraction of the system, use biased parameters, and report questionable
These models mimic the human brain and can find complex patterns in
results, which may compromise the application of such systems in the
large volumes of data without human intervention (Elsayed, Hamada,
real world.
Abdalla, & Elsaid, 2023; Shaji, Jain, Muthalagu, & Pawar, 2023).
Bilot, Madhoun, Agha, and Zouaoui (2023) surveyed the different
types of graph-based learning employed in network and host intrusion 3.1.1. Models
detection systems, reviewing current methods and datasets. The authors Fig. 1 illustrates examples of common Deep Learning models, in-
also evaluate the robustness of IDS based on Graph Neural Network cluding Deep Neural Network (DNN), CNN, AE, GAN, LSTM, GRU,
(GNN) against adversarial attacks. Even though GNN is relatively ne- and Graph Neural Network (GNN). These models are based on the
glected in cybersecurity, it holds great promise, especially compared to concept of collections of artificial neurons that collectively calculate
traditional ML and most DL models. This method can learn and detect a non-linear function of the input. The DNN (Cil, Yildiz, & Buldu,
complex attacks and transcend obfuscation strategies, proving its strong 2021), or Multilayer perception (MLP), is the most basic neural network
suitability for security systems. within DL, comprising a stack of layers of artificial neurons. Each
Melis, Sadi, Berardi, Callegati, and Prandini (2023) conducted a sys- layer progressively calculates higher-level patterns from input data,
tematic literature review of offensive and defensive security solutions producing a corresponding output.
for Software-Defined Networks. In this work, the authors gathered 466 CNN (Nguyen & Le, 2023) is a neural network specialized to dis-
relevant publications and submitted them to quantitative and qualita- cover spatial patterns in data through convolutional and pooling op-
tive analysis to aggregate the literature. As expected, the number of erations. It is usually less computationally complex than DNN since
paper distributions has increased from 2015 to 2021. The correlations its internal architecture utilizes fewer connections between neurons,
between the articles’ keywords and research questions highlight the reducing the trainable parameters.
connected topics and present open challenges. This analysis evidenced The AE (Ding, Kou, and Wu, 2022) is a neural network with a
the focus on anti-DoS techniques, the absence of an exhaustive threat structure similar to DNN aimed at compressing and decompressing
model, and the need for more variety of technologies. data accurately. It is composed of two internal subnetworks, called
Sabeel, Heydari, El-Khatib, and Elgazzar (2024) surveyed recent encoder and decoder. The former is responsible for calculating a low-
techniques for detecting atypical, polymorphic, and unknown network dimensional representation of input data. The latter takes the codified
attacks using Deep Learning. The authors separated the detection input and reconstructs it as precisely as possible.
methods into misuse, anomaly, hybrid, adversarial generation, transfer The GAN
learning, and reinforcement-based approaches, presenting background (Siniosoglou, Radoglou-Grammatikis, Efstathopoulos, Fouliras, & Sari-
on the reviewed attacks and the often-used datasets. They concluded giannidis, 2021) model is formed by two inner networks with opposite
that many implementations rely on supervised learning, which is re- training objectives: the discriminator and the generator. The discrim-
stricted to a network traffic distribution and may perform poorly in inative one is a binary classifier that aims to correctly distinguish
practical scenarios. Only a few labeled and quality network security between samples coming from inside and outside its training set. The
generative strives to create synthetic samples that resemble the training
datasets are available, and most training data have many features that
set, fooling the discriminator into misclassifying them. A well-trained
demand an attribute extraction phase, making the IDS more complex.
GAN model’s generator can produce convincing fake samples that
It is presented that a good evaluation metric selection is paramount to
follow the training data distribution.
avoid biased results alongside explainability techniques.
None of the previously described models can remember information
Ozkan-Okay, Akin, Aslan, Kosunalp, Iliev, Stoyanov, and Beloev
about past processed samples. Conversely, the LSTM (Sahu et al., 2022)
(2024) reviews Machine Learning, Deep Learning, Reinforcement Lear-
and GRU (Ahmad, Wan, & Ahmad, 2023) networks are improved
ning, and AI tools as ChatGPT devoted to cybersecurity, presenting
versions of the traditional Recurrent Neural Network (RNN), capable
each approach categorized into different learning algorithms, models,
of learning long-term temporal dependencies between data instances.
and applications. This work addresses malware and intrusion detection, They are composed of neurons that feedback on their own output.
vulnerability assessment, data quality, interpretability, and adversarial This design architecture allows the neurons to retain information about
attacks. The authors point out that voluminous datasets are funda- previously processed data, which can be leveraged when processing
mental for Machine and Deep Learning techniques to thrive against new incoming samples. These recurrent models generally demand more
threats, and their performance in real-world scenarios can be affected computing requirements than the previous networks, since they utilize
by poor training. Conversely, Reinforcement Learning, especially com- neurons with a more complex internal architecture.
bined with other strategies, has shown significant potential for cyber- The GNN (Caville, Lo, Layeghy, & Portmann, 2022) uses a graph’s
security defense and should be further explored. Finally, ChatGPT-like topological structure to aggregate neighboring node features. Each
tools hold great promise as valuable resources for enhancing security. node encapsulates its intrinsic properties and the context provided by
Table 1 summarizes the related works based on their coverage its neighbors. Through a process known as message passing, each node
of fundamental topics within deep learning-based intrusion detection. iteratively exchanges information with its neighbors, and the aggre-
Note that we only considered a specific topic to be covered by a gated information is used to update its representation. The resulting
work when the authors discussed it comprehensively. Many studies node embeddings are n-dimensional vector representations that capture
presented an analysis of both datasets and evaluation metrics. Only topological and node-specific information. This process can be extended
one work conducted a detailed review of preprocessing methods, and to edges and entire graphs.

3
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Table 1
Related surveys on deep learning-based intrusion detection.
Reference Benchmark Data Hyperparameter Performance
datasets preprocessing tuning evaluation
Yang et al. (2022) ✓ ✓ ✗ ✓
de Souza et al. (2022) ✓ ✗ ✗ ✗
Nuaimi et al. (2023) ✓ ✗ ✗ ✓
M. and Sethuraman (2023) ✗ ✗ ✗ ✗
Taheri et al. (2023) ✓ ✗ ✗ ✓
Abdulganiyu et al. (2023) ✓ ✗ ✗ ✓
Bilot et al. (2023) ✓ ✗ ✗ ✗
Melis et al. (2023) ✗ ✗ ✗ ✗
Sabeel et al. (2024) ✓ ✗ ✗ ✓
Ozkan-Okay et al. (2024) ✗ ✗ ✗ ✗
Our work (2024) ✓ ✓ ✓ ✓

the depicted animal. By training on labeled data first, a CNN classifier


can learn to accurately assign labels to unseen cat and dog images,
effectively discerning the type of animal presented in each analyzed
input.
In real-world scenarios, it is common for data to be unlabeled, as
labeling a dataset is very costly (Yang, Song, King, & Xu, 2023). Unsu-
pervised learning-based models are utilized to find underlying, hidden
structures in unlabeled data (Van Engelen & Hoos, 2020). For instance,
the GAN model is widely applied to learn the statistical distribution of
an unlabeled dataset. The trained network can generate new synthetic
data samples not in the training set but fitting its distribution.
Federated learning is an implementation paradigm that may be
applied over supervised or unsupervised models. It addresses data
privacy and distribution challenges in deep learning. Unlike a tra-
ditional centralized approach, where data is aggregated on a single
server for training, federated learning enables model training directly
on decentralized data sources, like mobile devices or edge servers (Duy
et al., 2021). Each device maintains control over its data, and only
model updates, rather than raw data, are exchanged with a central
server. This decentralized approach to training allows for collaborative
model learning across distributed devices while mitigating privacy risks
associated with centralized data collection (Houda, Hafid, & Khoukhi,
2023).
Reinforcement learning introduces a dynamic interaction between
an agent and its environment (Shukla, Maheshwary, Subramanian,
Shilpa, & Varma, 2023). The model does not rely on labeled or un-
labeled data. Instead, it learns through trial and error, taking actions
in an environment and receiving rewards or penalties based on the
consequences of its actions. This feedback loop enables the agent to
improve its decision-making policy iteratively, ultimately maximizing
cumulative rewards over time (Kim et al., 2022).

3.2. Software-defined networks

The Software-Defined Network paradigm removes the network con-


trol decision from the forwarding devices, moving it to a logical central
Fig. 1. Common deep learning models: (a) Deep Neural Network, (b) Convolutional entity, the controller. Control rules and network policies are defined
Neural Network, (c) Autoencoder, (d) Generative Adversarial Network, (e) LSTM/GRU, and updated only on the controller, which installs them into the net-
and (f) Graph Neural Network. work switches and routers automatically. This design approach allows
for a centralized view, programmability, and simplified management
of the whole network (Kumar et al., 2023; Zhou, Zheng, Jia, & Shu,
3.1.2. Learning paradigms 2023).
DL models can be categorized as supervised, unsupervised, feder- Fig. 2 illustrates the SDN architecture and network planes. The
ated, or reinforcement learning based on how the learning process data plane represents the forwarding devices responsible for directing
occurs. A supervised model is trained on labeled data to approxi- incoming data to the correct destination. The controller resides in the
mate a function that accurately maps input samples to their correct control plane. It utilizes the southbound API to send control rules to
labels (Fouladi et al., 2022; Nadeem, Goh, Aun, & Ponnusamy, 2023). the forwarders or to gather traffic statistics from them. The application
Labels are continuous or discrete values associated with each data plane comprises the software needed for managing the network. The
sample that provide information about them. An illustrative example administrators utilize the northbound API provided by the controller to
of supervised learning is categorizing cat and dog images. Each data specify and automate the desired network policies utilizing a high-level
sample comprises a cat or dog picture and a discrete label indicating programming language (Lent et al., 2022; Liu et al., 2022).

4
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

define anomaly as any malicious activity, such as denial of service


attacks, since they are deliberately harmful. This definition excludes
legitimate abnormal events like flash crowds. Some works consider
any deviation from the statistical baseline observed from legitimate
traffic as an anomaly (Fouladi et al., 2022; Novaes, Carvalho, Lloret,
& Proença, 2021; Zavrak & Iskefiyeli, 2023).
Efficient detection of attacks is crucial to defend computer networks
and maintain their services online, supporting availability, confiden-
tiality, and integrity. Network Intrusion Detection Systems are state-
of-the-art solutions that have been studied and developed to counter
such threats (Huang, Ye, Hu, & Wu, 2023; Sood et al., 2023; Xue &
Jing, 2023). These systems collect and analyze traffic data periodically.
The NIDS generates a warning alarm for the administrators if any
uncommon behavior is identified.
There are two common design strategies for implementing NIDS:
signature-based and anomaly-based. The first one keeps a database
of known attack signatures. The system analyzes incoming traffic,
identifying when an observed pattern matches a stored signature. In
this case, an anomaly is detected and reported. This approach usu-
ally uses supervised learning to extract signatures of each type of
anomaly. The main drawback of this method is that it cannot recognize
zero-day attacks whose signatures are unknown, augmenting its false
negative rate (Hairab et al., 2022). The anomaly-based design strategy,
or anomaly detection, builds a model representing legitimate traffic
behavior. Incoming traffic is compared with the normality model, and
if their difference exceeds a tolerance threshold, an alarm is raised. This
Fig. 2. SDN architecture and network planes.
design allows for the detection of unknown attacks using unsupervised
or semi-supervised learning. Its principal disadvantage is that it may
interpret normal behavior variations as anomalies, increasing its false
As traditional networks grow, new devices and rules need to be alarm rate (Gupta, Jindal, & Bedi, 2022).
introduced. However, the introduction process is hindered by differ- The anomaly score is a concept some authors apply when imple-
ent factors, such as manufacturer software and compatibility between menting anomaly-based systems. It represents the degree of anomalous-
devices. The management functionalities are limited to the forwarding ness of a traffic sample evaluated by the IDS. This value can be obtained
devices’ software, which can become expensive to change for larger through the raw output of a deep learning model or calculation of
networks. Software-defined networks do not have this problem since some loss function, for example. Hu et al. (2022) take the output of
they frequently utilize an open protocol to control each device. For this a DNN and calculate its Euclidian distance to a fixed point, obtaining
reason, management systems such as intrusion detection and mitigation the anomaly score of the evaluated sample. Shu et al. (2021) define
can be attached to a network without requiring major reconfiguration, the anomaly score as a linear combination of a Bidirectional GAN’s re-
which would not always be possible for traditional ones, such as the construction and discriminating loss. Intrusion Detection Systems based
Internet, Internet of Things, and Industrial Control System networks. on anomaly scores usually calculate a threshold representing an upper
Besides, the controller can request traffic flow data from its switches, bound for tolerating anomalous behavior. The respective traffic sample
which can feed an intrusion detection module through the northbound is classified as malicious whenever the anomaly score surpasses the
interface. Similarly, a mitigation module can quickly implement new defined threshold. Its value can be defined in several ways, like using
forwarding rules through the controller to every device in time to avoid mean, standard deviation, and exhaustive search. Garcia et al. (2021)
significant consequences from attacks (Choobdar, Naderan, & Naderan, implement an anomaly score function based on an autoencoder’s loss.
2021). The tolerance threshold is given by the mean plus standard deviation
SDN’s centralized nature makes the controller a point where ma- of normal instances’ reconstruction errors. Lent et al. (2022) define the
anomaly score using fuzzy logic. A Gaussian membership function is
licious attacks may focus. For example, an attacker may conduct a
used to calculate the anomaly level of a sample. The decision threshold
Distributed Denial of Service (DDoS) attack to take the controller down,
is defined using trial and error on attack validation data.
jeopardizing the whole network operation and affecting the services’
One of the main challenges in developing an IDS for SDN is not over-
availability (Long & Jinsong, 2022). The SDN can also be targeted by
loading the network’s controller (Janabi, Kanakis, & Johnson, 2022).
other types of attacks, such as scanning, to obtain information about
Using an elevated number of features for intrusion detection may
the target network and allow the execution of other attacks; hijacking,
increase resource usage and congestion on the southbound channel dur-
to gain total control over a network element; tampering, to compro-
ing data collection. Higher memory consumption and longer processing
mise the integrity of network data; man-in-the-middle, to listen to the
times can cause bottlenecks in the controller, rendering overhead,
communication between the controller and the data plane, hazarding
and crashes when applied to large-scale networks. Another frequently
their confidentiality (Li, Meng, & Kwok, 2016), (Nisar et al., 2020). addressed challenge is the lack of purely SDN datasets and the fact
The scientific community has been studying and implementing efficient that they may not accurately reflect the breadth of real-world security
protection mechanisms to safeguard the confidentiality, integrity, and threats (Duy, Khoa, Hien, Hoang, & Pham, 2023).
availability of SDN services (Assis et al., 2021). Deep Learning models are the current leading solution for building
intrusion detection systems due to their ability to process massive
3.3. Intrusion detection amounts of traffic, which is essential for the ever-increasing speeds
of modern SDN networks. Their capacity to automatically capture
The definition of anomaly is a core concept in intrusion detection. intricate structures within the data is used to solve many computa-
Different works may have contrasting considerations on anomalies and tional problems within intrusion detection, including pattern extrac-
build their systems accordingly. For example, Siniosoglou et al. (2021) tion (Hnamte & Hussain, 2023), classification (Sivanesan & Archana,

5
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

There are plenty of online scientific databases indexing hundreds of


thousands of articles. We defined some minimum criteria for a database
to be eligible for our work: online accessibility through our institutions’
agreements, including high-impacting journals that are aligned with
our work scope, having the predominance of papers in the English
language, supporting the search by boolean logic and the application of
filters, and allowing the exportation of search results into .csv or .bib
file formats.
We specify the search scope by defining a boolean query string with
the most critical keywords (Querying and Filtering): ‘‘Deep Learning’’
AND (Intrusion OR Anomaly) AND Detection AND (‘‘Software-Defined
Network’’ OR ‘‘Software-Defined Networking’’ OR SDN). All selected
databases are queried with the string. Besides we also apply some
filtering to refine the search results: only full-length papers published in
journals between 2021 and 2024 are considered since we are interested
in the state-of-the-art. Although we have the consciousness that several
relevant papers were published before 2021, this is an ever-changing
research area, and we focus on providing a complete guide of its
advances and challenges in recent years. We also filter out any article
representing a survey or a review. The search results metadata are
downloaded in .csv file format and organized using software for data
tabulation.
The retrieved articles pass through a manual selection stage, which
aims to analyze if they fit the work scope (Article Selection). We check
if the papers propose an SDN intrusion detection solution utilizing a
deep learning model. The checking is an essential step toward further
result refinement, as the returned articles do not necessarily match our
search goals. To illustrate, despite filtering out surveys and conference
papers and excluding articles that do not consider deep learning, we
still get those works returned from our queries to the databases. We
Fig. 3. Deep learning-based NIDS development common framework. also selected some pertinent works for other network environments
despite focusing on Software-Defined Networks. We include them since
they presented innovative approaches that could be straightforwardly
2023), regression (Yeom, Choi, & Kim, 2022), synthetic data genera- adapted to the SDN paradigm.
tion (H., Rao, & Prasad, 2023), and data distribution learning (Duan, After the data is collected and cleansed, we must thoroughly read
Fu, & Wang, 2023). Selecting and optimizing features tend to mitigate all selected articles to answer the six research questions introduced
the risk of controller overload without relinquishing the detection previously (Article Reading). Each question is tackled and answered
ability of the model. Fig. 3 shows the steps generally conducted during separately. Finally, we use the insights extracted from the reviewed
DL-based NIDS development, including data collection and preprocess- works to write about the current state of the art (Result Writing). The
ing, deep learning modeling, hyperparameter tuning, and performance novelties in the concerned area are presented through distinct sections
evaluation. dedicated exclusively to answering each proposed research question.
We chose this organization scheme so that one could skip through the
4. Methodology text and read only the desired insights. The remainder of this work
showcases the results of the conducted literature review.
The objective of this research is to empirically review the state
of the art of deep learning-based anomaly and intrusion detection on 5. Results
Software-Defined Networks. To meet this goal, we must answer im-
portant research questions regarding the common steps applied during In this section, we review the state of the art on Software-Defined
NIDS development: Network anomaly and intrusion detection using deep learning. The fol-
lowing sections present the latest trends within the study area regarding
• RQ1 - Which datasets and attacks are considered in the bench- benchmarking datasets, preprocessing methods, deep learning models,
marking experiments? hyperparameter optimization, and performance metrics.
• RQ2 - What preprocessing steps are usually taken before the
anomaly detection phase? 5.1. Metadata
• RQ3 - How are deep learning models applied to detect anomalies
and intrusions? Firstly, before introducing the findings, we display the metadata
• RQ4 - How are the models’ hyperparameters optimized? relating to the collected articles using our research methodology. We
• RQ5 - Which performance metrics are commonly calculated to utilized our database selection requirements to pick the most relevant
evaluate and compare the models? data sources for the proposed work: IEEE Xplore, Science Direct, ACM
• RQ6 - What are the open issues and possible future directions Digital Library, and Springer Link. The databases were accessed and
within the study area? queried on September 20th, 2023. The search query and the applied
filters retrieved 557, 414, 155, and 350 articles from the data sources,
Fig. 4 illustrates the step-by-step sequence of the research methodol- totaling 1476 works.
ogy to conduct the proposed empirical literature review and answer the After the article selection step, we kept only the works that fit our
previous questions. The first step toward reviewing an area of study is research scope. The number of selected articles was 105, corresponding
the selection of adequate research article sources (Database Selection). to approximately 7% of the original retrieved results. From the selected

6
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Fig. 4. Research methodology steps.

Fig. 6. Selected articles distribution per database.

Fig. 5. Selected articles distribution per year. Table 2


Public datasets usage.
# Name Works

works, 35 (33%) represent approaches not initially meant for the 1 CIC-IDS2017 24
2 NSL-KDD 22
SDN environment. These solutions easily adapt to the Software-Defined
3 CIC-DDoS2019 21
Network paradigm since they are based on IP flow data. Thus, we 4 CSE-CIC-IDS2018 14
considered them in this work. 5 KDD Cup 1999 11
Fig. 5 shows the distribution of the selected articles per year. There 6 InSDN 9
7 UNSW-NB15 9
has been a growing trend in published papers in the research field since
8 ISCXIDS2012 5
2021, reinforcing that deep learning-based intrusion detection for SDNs 9 Bot-IoT 5
is still an evolving area of research. The true amount of work available 10 MIT LL DARPA 4
in 2023 surpasses the previous year, as our literature review did not 11 CIC-DoS2017 3
12 N-BaIoT 3
consider papers from October to December 2023. We also illustrate
13 Hogzilla 3
the number of selected papers per database in Fig. 6. Almost 60% of 14 MAWI 3
the papers come from Science Direct. In contrast, no paper indexed by 15 UTSA-2021 2
ACM DL has been selected, indicating that the former is a valuable data 16 ToN-IoT 2
source for the field. Fig. 7 shows the distribution of the selected articles 17 CTU-13 2
18 WUSTL-IIOT-2018 2
per publication journal. This graph lets us see which journals publish 19 CIDDS-001 2
the most in the area, helping researchers find relevant periodicals for 20 WorldCup 1998 2
their publications. Remarkably, almost 30% of the papers come from 21 ISCX2016 1
either IEEE Access or Computer & Security. Note that we omitted the 22 KDD 2019 1
23 Orion 1
journals with two or fewer published papers for ease of visualization. 24 USTC-TFC16 1
25 Mendeley Data 1
5.2. Datasets and attacks 26 MQTTset 1
27 CSIC 2010 1
28 CESA 1
In this section, we answer the first research question (RQ1) by pro- 29 Edge-IIoTset 1
viding an extensive overview of the datasets and attacks utilized in the 30 CAIDA DDoS 2007 1
31 WSN-DS 1
reviewed works. Table 2 presents the usage of public datasets for devel-
32 SDN-IoT 1
oping state-of-the-art NIDS solutions. Most papers apply more than one 33 IRAD 1
dataset in their experiments, so the total number of works in the table 34 SPEAR Project 1
exceeds the amount of reviewed papers. The five most used datasets 35 Kyoto 2006+ 1
36 NITIDS 1
are, respectively, CIC-IDS2017 (Sharafaldin, Lashkari, Ghorbani, et al.,
2018), NSL-KDD (Tavallaee, Bagheri, Lu, & Ghorbani, 2009), CIC-
DDoS2019 (Sharafaldin, Lashkari, Hakak, & Ghorbani, 2019), CSE-CIC-
IDS2018 (Sharafaldin et al., 2018), and KDD Cup 1999 (Hettich & Bay, of all identified data sources, but they account for 76% of the total data
1999). There are 36 unique datasets, but their utilization distribution usage.
is highly skewed since most were utilized only twice or once in the Fig. 8 presents a taxonomy of the found public datasets. Firstly, we
analyzed literature studies. The top ten datasets represent solely 28% classify them by the type of networking environment they represent,

7
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Fig. 7. Selected articles distribution per journal.

Fig. 8. Public datasets taxonomy.

including traditional, SDN, and IoT. The green-colored datasets are through websites. Malicious agents are constantly updating web attacks
composed of traffic data from real networks. The blue-colored ones to exploit such systems, so it is coherent to investigate these kinds of
comprise synthetic traffic data generated in a controlled environment. traffic anomalies. Computers have grown exponentially in processing
Only 30% of the found datasets contain real data, while the remaining and storing capacity in the last decades. The accessibility to powerful
70% simulate or emulate networking systems to collect artificial data. computing is probably enabling the execution of brute force attacks
We emphasize that the ten most used datasets listed in Table 2 are with a greater success rate, thus requiring the scientists’ cautious
all synthetic. Also, popular datasets among scientists, namely NSL-KDD examination.
and KDD Cup 1999, are not representative of modern networks since We recognized many tools some works utilize to generate synthetic
they contain outdated traffic patterns and attacks. Traffic data captured datasets for their experiments. Table 3 lists the eleven different tools
found in the literature. We also classify them according to how they
on real, contemporary networks is essential for a realistic assessment
support dataset creation, as illustrated in Fig. 10. The most popular tool
of NIDS performance. Thus, future research should focus on collecting
is Mininet (Bob Lantz, 2024), a network emulator applied in ten distinct
and utilizing real-world, up-to-date datasets to develop more reliable
works. An emulator is a tool that aims to replicate the inner workings
intrusion detection systems.
of a real system, in this case, a network. MaxiNet (Wette et al., 2014)
Fig. 9 illustrates a word cloud of the network attacks found in
is an alternative emulator utilized by only one study. hping3 (Oliveira,
the analyzed public datasets. The bigger and bolder an attack is, the 2024) is used by four works to generate traffic since it allows for the
greater its frequency on the data sources. The DoS and DDoS attacks sending of custom packets. Network simulators, as GNS3 (SolarWinds
stand out, highlighting the scientific community’s effort in studying Worldwide, 2024) and OMNeT++ (Ltd., 2024b) are utilized by two
network availability threats. Botnet attacks are also a problem that works each. A simulation tool tries to reproduce the general behavior
captures scientists’ attention. The popularization of the IoT paradigm of a real system, in this case, a network. There are also tools for
is likely facilitating the construction of botnets to launch distributed implementing virtual SDN controllers, for example, Ryu (Community,
attacks. Port scanning may disclose network vulnerabilities that can be 2024), POX (McCauley, 2024), and ONOS (Foundation, 2024). We also
explored by other types of attacks, explaining its frequent appearance found tools for traffic replaying and analysis in the reviewed works:
in benchmark datasets. A multitude of network services are available Tcpreplay (Fred Klassen, 2024) and DNS-STATS (Ltd., 2024a).

8
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Fig. 9. Attacks word cloud.

Fig. 10. Taxonomy of dataset generation supporting tools.

Table 3 Table 4
Usage of dataset generation supporting tools. Preprocessing methods summary.
# Tools Works Subtopic Description Works
1 Mininet 10 Data Normalization Scale data to fit into certain 49
2 hping3 4 interval.
3 GNS3 2 Data Cleaning Remove invalid data (e.g., 28
4 Ryu 2 NaN and infinite values).
5 Tcpreplay 2 Remove Duplicate Data Remove data that may bias 9
6 OMNeT++ 2 the model during training.
7 POX 1 Data Sampling Balancing a dataset by 5
8 ONOS 1 creating or removing entries.
9 MaxiNet 1 Data Encoding Conversion of categorical data 37
10 NS2 1 into numerical data.
11 DNS-STATS 1 Feature Extraction The processing of data to 20
create new features.
Feature Selection Statistical analysis to reduce 37
the number of features used
5.3. Preprocessing methods by the model.

In this section, we answer the second research question (RQ2) by


compiling the methods used by the analyzed articles to prepare data epochs until their weight value adapts to it. This hinders the model’s
before feeding it into their models. All procedures made before the convergence and delays training.
main model receives the data are considered preprocessing methods. Min–max scaling and z-score were the most mentioned methods
This excludes approaches that, for instance, apply the anomaly detec- among the collected articles. The former formula scales the values to a
tion method as the feature selection itself. Table 4 summarizes each given range, usually 0 and 1 (Duy et al., 2023, 2021). This maintains
method for a quick overview. the values’ proportion, so newer data may still be mapped to numbers
outside this interval if they are greater or smaller than the initial
ones (Sayed, Le-Khac, Azer, & Jurcut, 2022). For this reason, min–
5.3.1. Data normalization max normalization is vulnerable to outliers since the result would have
Data normalization is the process of transforming values from each normal data grouped in a smaller interval while leaving outliers in the
feature to match a certain scale. This is especially important to neural edges.
networks due to how neurons work (Lent et al., 2022). Each feature’s Part of these problems can be mitigated with the z-score, a standard-
value is multiplied by its weight and added to the other features’ ization technique (ElSayed, Le-Khac, Albahar, & Jurcut, 2021). In this
results. Any feature that has values extremely high or excessively method, the resulting value represents how many standard deviations
low may overtake the neuron’s ‘‘attention’’ for themselves for several from the set’s mean the value is Sayed et al. (2022). This way, outliers

9
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

become immediately clear and can be treated as such without affecting There are systems that do not use flows or packets directly to detect
the mapping significantly. anomalies. Instead, they may digest traffic by grouping those elements,
for example, by seconds (Janabi et al., 2022) or flow tables (Fouladi
5.3.2. Data cleaning et al., 2022). The extracted features vary by system, either due to how
Data cleaning is a necessary step in data pre-processing in certain the system works or due to some feature selection algorithm.
datasets to avoid the malfunction of deep learning models. Invalid data, We identified that entropy is a frequently extracted feature. It was
e.g., NaN and infinite values, are replaced or have their whole entries used to represent IP addresses and ports on the work presented by No-
removed (Duy et al., 2023). vaes et al. (2021) and other features on Fouladi et al. (2022)’s study.
Among the analyzed articles, the data cleaning process was men- Entropy is a measure of the randomness of a variable. The entropy
tioned for a variety of datasets, namely CICIDS2017 (Yungaicela-Naula, measure changes based on the probability of each possible value that
Vargas-Rosales, & Perez-Diaz, 2021), CICIDS2018 (Duy et al., 2023), the variable can assume. If there are multiple values with none standing
InSDN (Hnamte & Hussain, 2023), and CICDDoS2019 (O. Lopes, Zou, out, the entropy may increase. Otherwise, it may decrease if an element
Abdulqadder, Akbar, Li, Ruambo, & Pereira, 2023). The process was is significantly more common than the others.
also mentioned when the works used datasets generated for their own It is important to highlight that deep learning models can be re-
study to clean collecting errors, as discussed by Nadeem et al. (2023). sponsible for extracting abstract features. Choobdar et al. (2021) use
an autoencoder to extract features from data before sending it to
5.3.3. Remove duplicate data the detection module. Some works may use the main deep learning
Despite being not as crucial when compared with the previous module to select and extract features automatically, but they were not
procedures, the removal of duplicate data can be impactful on the considered preprocessing and will be mentioned in Section 5.4.
performance of a deep learning model. If there are multiple copies of
an entry during training, it is possible that a bias is created toward that 5.3.7. Feature selection
type of data (O. Lopes et al., 2023). Note that this would only impact Some network anomaly detection datasets can have more than
a model if there is a lack of entries or the redundancy is considerably eighty features that describe the network’s traffic. Several of those
high. features do not contribute to anomaly detection and tend not to be used
Another and more problematic issue caused by duplicate data hap- by the deep learning models. Those features may be discarded to save
pens during the split between training and testing, when the redundant memory and even lower training times. Less data means less processing,
data may end in separate groups. This way, the model will be tested avoiding extra training epochs for the network to lower the relevance
with data seen during training, causing an inflation on the resulting of those features.
metrics. Some algorithms can be used to find a correlation between a set
of features and the dataset labels, identifying the most relevant ones.
5.3.4. Data sampling Some examples of algorithms are the Principal component analysis
Data sampling plays an important role in deep learning training (PCA) used by Cherian and Varma (2023); XGBoost, used by Zainudin,
since a considerable amount of data is necessary for this type of model Ahakonye, Akter, Kim, and Lee (2023); Random forest, used by Sayed
to become capable of generalizing a class or a problem. For this reason, et al. (2022); and Autoencoders by Choobdar et al. (2021).
developers may upsample minority classes or downsample majority
ones (O. Lopes et al., 2023). Among the analyzed works, it was common 5.4. Deep learning-based intrusion detection
not to have the sampling process explained, except for Friha et al.
(2023) that specified SMOTE. In this section, we answer the third research question (RQ3). Table 5
lists deep learning models and their usage in the reviewed works. Note
5.3.5. Data encoding that some papers implement multiple models, so the total number
Dada encoding is fundamental for some features in deep learning of uses surpasses the amount of selected works. CNN and LSTM are
models. Categorical or qualitative data are incompatible with most the two most applied models, both with over 30 uses. Following, the
deep learning models since they require numerical input. In addition, DNN and GRU networks were applied 15 and 11 times, respectively.
the numerical values need to have a meaning of quantity to impact the There is a high concentration of usage for these four models. They
model’s training (Lent et al., 2022). Many analyzed articles mentioned reflect only 11% of all applied models but represent 61% of the
the encoding of features as IP addresses, protocols, and labels in their total deep learning utilization. Despite this, we draw attention to the
system’s preprocessing stage. scientific community’s efforts to test new deep learning networks and
Another similar process employed is called one-hot encoding, where improve IDS performance since there are more than 30 different models
instead of simply transforming a categorical value into a number, a under experimentation. Researchers are implementing multiple Autoen-
binary vector is created. This vector comprises zeros except for a single coder variations, such as VAE (Bårli, Yazidi, Viedma, & Haugerud,
element with the value 1 representing the former category. This is fre- 2021), SSAE (Long & Jinsong, 2022), DAE1 (Sood et al., 2023), and
quently applied to labels, but it can also be used on features (Shaji et al., DAE2 (Lopes et al., 2022). There is an interest in neural networks based
2023). Some works handle categorical data by extracting entropies on graph data structure, including GCN (Ding & Li, 2022), HGCN (Phu
from the features. These were considered a form of feature extraction et al., 2023), and ST-GCN (Wang et al., 2023). Generative networks as
and are mentioned in the next section. GAN (Siniosoglou et al., 2021), WGAN (Mustapha et al., 2023), WGAN-
GP (Duy et al., 2023), Bi-GAN (Shu et al., 2021), and Bi-CGAN (H. et al.,
5.3.6. Feature extraction 2023), are also being analyzed for IDS development.
Due to the nature of the problem, anomaly detection systems require Fig. 11 presents a taxonomy of how the listed deep learning models
traffic data to discover anomalies. The SDN paradigm makes collecting are usually applied during the development of an intrusion detection
this data significantly simple since the controller can request flow system. The reviewed works are classified using a tree-like diagram of
data from the whole network and group it in a single place. Some four levels. Each tree level indicates a different system characteristic:
analyzed works use features from the dumped flows directly in their the learning approach, the system architecture, the type of problem
deep learning models. It is also possible to extract features from this solved by the deep learning model, and the applied model itself. A
data to either improve training by reducing the number of features or complete path in the tree describes a deep learning application pat-
to add information that would not be available in its current form. tern in the literature. For instance, it is possible to visualize that the

10
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Table 5
Deep learning models usage.
# Acronym Model Works
1 CNN Convolutional Neural Network 36
2 LSTM Long Short-Term Memory 31
3 DNN Deep Neural Network 15
4 GRU Gated Recurrent Unit Networks 11
5 AE Autoencoder 6
6 RNN1 Recurrent Neural Network 6
7 GAN Generative Adversarial Network 4
8 LSTM-AE Long Short-Term Memory Autoencoder 4
9 SAE Stacked Autoencoder 4
10 Bi-LSTM Bidirectional Long Short-Term Memory 3
11 GCN Graph Convolutional Network 3
12 WGAN Wasserstein Generative Adversarial Network 3
13 WGAN-GP Wasserstein Generative Adversarial Network with Gradient Penalty 2
14 DAE1 Deep Autoencoder 2
15 Bi-GRU Bidirectional Gated Recurrent Unit 2
16 AM Attention Mechanism 2
17 Bi-CGAN Bidirectional Cross Generative Adversarial Network 1
18 Bi-GAN Bidirectional Generative Adversarial Network 1
19 DAE2 Denoising Autoencoder 1
20 DBN Deep Belief Network 1
21 DDPG Deep Deterministic Policy Gradient 1
22 DDQN Double Deep Q-Network 1
23 DMN Deep Maxout Network 1
24 SE Stacking Ensemble 1
25 SSAE Stacked Sparse Autoencoder 1
26 ST-GCN Spatio-Temporal Graph Convolutional Network 1
27 TST Time Series Transformer 1
28 VAE Variational Autoencoder 1
29 RNN2 Replicator Neural Network 1
30 RNN3 Recursive Neural Network 1
31 PNN Probabilistic Neural Network 1
32 HGCN Hyper Graph Convolutional Network 1
33 GNN Graph Neural Network 1
34 GC-LSTM Graph Convolutional Long Short-Term Memory 1

autoencoder is applied across many scenarios: supervised and unsu- learning IDS which coordinates collaborating edge nodes to build a
pervised approaches; centralized or decentralized architectures; and global traffic classification model. Shukla et al. (2023) describes a
feature extraction and data reconstruction problems. system based on Recursive Neural Network for solving the Markov
Fig. 12 shows how the selected works are distributed according to Decision Process. Yeom et al. (2022) utilize LSTM to implement a
their learning approach, system architecture, and the type of solved regression model which predicts future normal traffic volume. The fol-
problem. All three distributions are highly skewed, as the evaluated lowing sections dive deeper into the proposed organization taxonomy
works tend to follow similar design decisions. Firstly, the leftmost graph by discussing the selected works.
(a) reveals that 96 out of 105 works are dependent on labeled data
for developing their system. Implementing these solutions in a real 5.4.1. Supervised learning
environment may require a high investment in data labeling. Systems
Every IDS dependent on labeled data in any stage of develop-
independent of data labels are proposed by 6 works. Notably, only
ment was classified as supervised. For instance, Janabi et al. (2022)
3 works use the reinforcement learning paradigm for their solutions.
and Imrana et al. (2021) works were considered supervised. The former
Secondly, regarding the architecture (b), we observe that approxi-
utilized labeled data to build a CNN model that maps input records to
mately 85% of the systems are built to run on a central server, while
benign or anomalous classes. The latter applied a Bi-LSTM to extract
the remaining 15% follow a distributed design. Thirdly, the majority
patterns from input data, which is then fed to a regular, fully connected
of the selected papers use deep learning to solve feature extraction,
neural network to produce the corresponding classification. Kaur and
classification, or both. There are 9 works using data reconstruction to
Kakkar (2022) created a supervised attack detection system that ap-
build their IDS. Few solutions are implemented to solve regression, data
plies the Deep Maxout Network model to classify traffic as normal or
generation, and Markov decision process.
The skewness in these distributions highlights many possible future anomalous.
research possibilities: unsupervised and reinforcement-based systems, Fig. 11 and Fig. 12 reveal a broad exploration of supervised ap-
distributed IDS architectures, deep learning modeling for solving re- proaches in the literature. These may be implemented centrally, where
gression, data generation and reconstruction, and Markov decision the data and the code are contained in a single server. Hairab et al.
process. We encourage researchers to investigate these uncommon de- (2022) and El-Ghamry, Darwish, and Hassanien (2023) designed their
sign choices to promote further development in the intrusion detection IDS to be installed on a central computer responsible for receiving
area. Some innovative works explore unusual models and approaches and processing all incoming traffic data. Both systems are based on a
for implementing NIDS. Zavrak and Iskefiyeli (2023) train unsupervised CNN model implementing binary classification. Novaes et al. (2021)
Replicator Neural Network and LSTM-based encoder–decoder for re- discussed a centralized IDS based on the GAN model to detect DDoS
constructing normal traffic data. Every sample whose reconstruction attacks. Their system explores the discriminator network to perform
error exceeds a threshold is inferred anomalous. Shu et al. (2021) binary classification on traffic samples. There are also supervised ap-
proposed an unsupervised NIDS installed on distributed SDN controllers proaches based on a decentralized architecture. Illy and Kaddoum
to avoid computational overhead. The system applies Bi-GAN to calcu- (2023) introduced an intrusion detection solution that may be de-
late an anomaly score based on a linear combination of reconstruction ployed in local servers of different network segments. The IDS utilizes
error and discrimination loss. Friha et al. (2022) introduces a federated DNN, CNN, RNN, and LSTM for traffic classification. Illy, Kaddoum, de

11
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Fig. 11. Deep learning application taxonomy.

Fig. 12. Selected works distribution according to their learning approach (a), system architecture (b), and solved problem (c).

Araujo-Filho, Kaur, and Garg (2022) proposed a collaborative intrusion system that uses the federated learning paradigm to develop multiple
detection and prevention system that may be installed on multiple sub- DNN models based on different devices’ local data. These models are
networks to react to raising anomalies promptly. Their system employs trained to perform data classification on their available data. They are
DNN, RNN, and CNN for binary and multiclass classification. Ferrag, later shared and aggregated to build a global model for the whole
Friha, Hamouda, Maglaras, and Janicke (2022) describe a decentralized network.

12
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

The analyzed supervised works implemented their detection systems Table 6


Hyperparameter tuning summary.
by applying deep learning to solve specific mathematical problems. The
most common type of solved problem is feature extraction. It involves Method Class Works

reducing input dimensionality by calculating more expressive data Grid Search Ravi, Chaganti, and Alazab
(2022), Tsogbaatar et al. (2021),
characteristics. Reducing the number of features (input dimensions)
Liu et al. (2022),
allows for more straightforward pattern finding in complex data. Data Ferrag et al. (2022).
classification is another issue commonly solved when building intrusion Population-based Metaheuristics El-Ghamry et al. (2023), Mansour
detection mechanisms. It represents the challenge of deriving a function (2022), Sivanesan and Archana
to map an input to a correct discrete output. Feature extraction and (2023), Ahmad et al. (2023),
Vatambeti et al. (2023).
classification are usually solved in conjunction. To illustrate, Cherian
Bayesian Optimization Presekal, Ştefanov, Rajkumar, and
and Varma (2023), Soltani, Siavoshani, and Jahangir (2021), Tayfour Palensky (2023).
et al. (2023) developed intrusion detection systems that use LSTM to Genetic Algorithm Song et al. (2023).
extract temporal features from traffic data. The reduced input is pro- Taguchi Method Dinh and Park (2021).
cessed by a fully connected neural network, which solves classification
and outputs the traffic state (benign or malign). Kumar et al. (2023)
built a Bi-GRU network to calculate temporal features from incoming
5.4.3. Reinforcement learning
input. The extracted patterns pass through dense layers, producing the
The reinforcement learning paradigm was rarely applied in the
corresponding sample classification.
selected works for this survey. The works that relied on this learning
The regression problem is similar to classification, as it corresponds
approach were all implemented centrally to solve the Markov Decision
to the derivation of a function to map the input to a corresponding
Process (MDP) problem. Phan and Bauschert (2022) proposed an ML-
continuous output. Lent et al. (2022) utilized GRU to solve regression
based intrusion detection system associated with an adaptive intrusion
and learn a non-linear function that inputs past network traffic and
response system based on a Double Deep Q-Network (DDQN). This
outputs the expected traffic for the moment of analysis. The system
system was developed using reinforcement learning to obtain the opti-
issues an anomaly alarm if the observed traffic in the network patterns
mal intrusion response policy for malicious activity. The optimization
does not match the GRU prediction.
problem was modeled as an MDP, where DDQN is intended to address
The scientific community also addresses the topic of data genera-
the slow convergence of Q-learning due to large state space.
tion. Duy et al. (2021) and Duy et al. (2023) apply variations of the
Kim et al. (2022) developed a framework that dynamically allocates
Generative Adversarial Network model, e.g., WGAN, WGAN-GP, and
traffic inspection resources. It employs a deep deterministic policy gra-
AdvGAN, to synthesize adversarial attacks and prepare their IDS to be
dient algorithm to learn an optimal resource allocation policy, adapting
robust against them. Ding et al. (2022), H. et al. (2023) develop GAN
to changing network conditions and malicious flow occurrences by
and Bi-CGAN models to oversample known attack classes with few data
modeling the problem as an MDP. It includes a DRL-based network
instances, aiming to improve the performance of classification models.
Data reconstruction concerns the problem of accurately compressing traffic inspection mechanism and an address shuffling-based moving
and decompressing data. The most common deep learning model used target defense technique for proactive intrusion prevention.
to solve reconstruction is the autoencoder. For data that fit the model Shukla et al. (2023) implemented a Recursive Neural Network strat-
training set, the mean squared error between original and reconstructed egy to monitor traffic flow and improve detection performance. This
(decompressed) samples should be small. Meanwhile, the error may framework aims to teach users how to match traffic flows effectively.
be significant for outliers, as the model cannot precisely reconstruct Enhances transparency and proactively protects the SDN data plane
them. Sarıkaya, Kılıç, and Demirci (2023) built an IDS that reconstructs from overload. Applying a learned traffic flow matching control policy
legit and anomalous traffic data using an autoencoder. Afterward, an optimizes traffic data acquisition for real-time abnormality detection.
ensemble of machine learning models classify the input based on the Reinforcement learning’s ability to learn and adapt over time makes
magnitude of its reconstruction error. it a promising direction for future research in enhancing network
security and resilience against evolving cyber threats.
5.4.2. Unsupervised learning
Intrusion detection systems that do not rely on labeled data for 5.5. Hyperparameter optimization
their configuration were considered unsupervised. The unsupervised
paradigm is not as widely adopted as the supervised one. Fig. 11 In this section, we answer the fourth research question (RQ4) re-
shows that these systems solely solve the data reconstruction problem, garding hyperparameter tuning. This represents an optimization prob-
independent of having a centralized or decentralized architecture. We lem that aims to calculate a deep learning model’s hyperparameters to
should point out that no analyzed work approached unsupervised improve its results in a certain task. It is present in the development of
federated learning. every deep learning model since there are unlimited combinations of
Only a few deep learning models have been explored for implement- parameters to be chosen for a neural network. Most of the study works
ing unsupervised systems. AE and its variations, like Stacked AE and omit this process since it often does not contribute to the description of
LSTM-based AE, are common for solving data reconstruction and build- the presented process nor to its final result. The lack of innovation in
ing unsupervised IDS. For example, Fouladi et al. (2022) and Garcia the tuning process contributes to its disregard. Table 6 summarizes the
et al. (2021) described detection systems using autoencoder to recon- optimization methods and respective works mentioned in this section.
struct benign traffic samples. If the error of a sample reconstruction Most of the works that mention the tuning process describe it briefly
surpasses a predefined threshold value, the input is inferred anoma- in two main ways: Random and empirical search, and grid search (Fer-
lous. Duan et al. (2023) developed a centralized intrusion detection rag et al., 2022; Liu et al., 2022; Ravi et al., 2022; Tsogbaatar et al.,
system based on a Stacked AE. The model learns to reconstruct benign 2021). The former consists of testing different values and altering them
samples with low error while attributing high error to outliers. Fu, based on performance, while the latter establishes a range for each
Duan, Wang, and Li (2022) devised an LSTM-based autoencoder for hyperparameter and tests each combination. Those methods, just like
compressing and decompressing data sequences. Their approach relies most of these optimizations, will result in good enough combinations
on a threshold for separating benign and anomalous input. Nonetheless, but not the optimal ones. It is important to understand how it is unreal-
other models including Bi-GAN (Shu et al., 2021) and RNN3 (Zavrak & istic to find the optimal values since there are too many parameters to
Iskefiyeli, 2023) have also been experimented. optimize that influence each other. In addition, the random nature of

13
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

neural network training generates different results on each trial, which secondary diagonal. This structured representation provides a compre-
can favor or disfavor certain hyperparameter combinations, even for hensive overview of the IDS’s performance in distinguishing between
small amounts. normal and anomalous network traffic.
Some works present some optimization techniques based on meta- With the components of the confusion matrix defined, we can
heuristics, which are a way of avoiding exhaustive combination tri- address the definitions of the remaining metrics outlined above. First,
als. El-Ghamry et al. (2023) employed particle swarm optimization we will discuss the more commonly used metrics, which appeared in
to optimize some CNN parameters: learning rate, dropout rate, and more than 70% of the evaluated papers.
others specific to the proposed model. Other uncommon metaheuristics The Accuracy (Eq. (1)) quantifies the proportion of all samples
are the chicken swarm optimization used by Mansour (2022), the correctly classified by the IDS. It addresses the fundamental question:
firefly optimization applied by Sivanesan and Archana (2023), the What percentage of all samples were accurately classified? Accuracy
slime mould optimization used by Ahmad et al. (2023), and the prairie may not be suitable for scenarios with significant class imbalances,
dog optimization utilized by Vatambeti et al. (2023). All mentioned where the model might favor the most prevalent classes, thus failing
works apply population-based techniques that simulate certain group to detect rarer ones. Consequently, a poorly performing model, for
behaviors in order to maximize a metric. instance, one predicting all samples as the most common class, could
A different class of optimization is used by Presekal et al. (2023). achieve high accuracy. Despite its drawbacks, it still remains one of the
The authors apply Bayesian optimization. This method is intended most used metrics in this area, being applied in 86% of the evaluated
to optimize functions that are costly to evaluate. This fits the deep papers, as shown in Fig. 13.
learning hyperparameter optimization problem since it is necessary for
𝑇𝑃 + 𝑇𝑁
a neural network to be trained to evaluate each combination, thus 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
requiring significant computational processing.
A genetic heuristic was employed by Song et al. (2023). It is called Recall (Eq. (2)), also known as Sensitivity, True Positive Rate (TPR),
gene expression programming, a combination of genetic programming and Detection Rate (DR), quantifies the proportion of actual anomalous
with genetic algorithms to make a global search optimization. This al- samples correctly identified as anomalies by the IDS. It addresses the
gorithm works with chromosome encoding to build the hyperparameter question: What percentage of all anomalous samples in the network
combination for a CNN network. traffic can be detected by the IDS? This metric provides insights into
Most works do not present the tuning process but may mention the model errors since higher recall indicates fewer false negatives. It was
final hyperparameter combination to improve replicability. The param- the most frequently used metric on the evaluated papers, being applied
eters frequently mentioned include learning rate, number of epochs, in 87.6% of them.
batch size, number of neurons, activation functions, and the optimizer 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (2)
algorithm. Those are core parameters of a deep learning model, which 𝑇𝑃 + 𝐹𝑁
explains their presence. Precision (Eq. (3)), also known as Positive Predictive Value (PPV)
and Attack Predictive Value (APV), measures the proportion of anomaly
5.6. Performance metrics predictions that are correct. It answers the question: What percentage
of the samples identified as anomalies by the IDS are truly anoma-
In this section, we answer the fifth research question (RQ5). The lous? This metric indicates the level of false positives: higher precision
study explores the metrics commonly employed in the literature to implies fewer false positives.
gauge the effectiveness of proposed deep learning models for anomaly
𝑇𝑃
and intrusion detection in Software-Defined Networks. Understanding 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃 + 𝐹𝑃
these metrics is essential for assessing the performance of such models
accurately. By examining the metrics utilized in existing studies, we The F1-score (Eq. (4)), also referred to as F-measure or F-score,
aim to gain insights into the evaluation criteria adopted by researchers represents the harmonic mean between Precision and Recall. It provides
to measure the efficiency of their proposed detection systems. a balanced measure when both Precision and Recall are either high or
An extensive analysis of the 105 selected papers showed that the low. Thus, if one of them is high while the other is low, the F1-score
10 most used performance metrics are Recall, Accuracy, F1-score, will be moderate. Unlike accuracy, the F1-score is a preferable metric
Precision, False Positive Rate (FPR), ROC curve, True Negative Rate, for scenarios with data imbalance.
Confusion matrix, Area under the ROC curve and training time, as can 2 ∗ 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = (4)
be seen in Fig. 13. 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
As most of these metrics are based on primitive metrics defined Now, we will discuss some less commonly used metrics, which ap-
by the confusion matrix, we will address this topic first. A confusion peared in 10% to 30% of the evaluated papers. Although less common,
matrix provides a clear breakdown of the IDS’s correct and incorrect they are well-known in the literature.
classifications. It comprises four key variables, as illustrated in Fig. 14: The False Positive Rate (FPR) (Eq. (5)), also known as False Alarm
Rate (FAR) and Fall-out, calculates the percentage of normal samples
True Positive (TP): The IDS correctly identifies abnormal network incorrectly classified as anomalies. It answers the question: What frac-
traffic as anomalous. tion of all normal samples were erroneously labeled as anomalies?
True Negative (TN): The IDS correctly identifies legitimate net-
work traffic as normal.
𝐹𝑃
False Positive (FP): The IDS incorrectly identifies normal traffic 𝐹𝑃𝑅 = (5)
𝐹𝑃 + 𝑇𝑁
as anomalous.
The Receiver Operating Characteristic (ROC) curve offers a graph-
False Negative (FN): The IDS incorrectly identifies anomalous
ical representation of an IDS’s balance between True Positive Rate
traffic as normal.
(TPR), and False Positive Rate (FPR), for various thresholds. Fig. 15
For each classification made by the IDS, the corresponding variable illustrates an example ROC curve, with the 𝑥-axis denoting FPR values
is incremented accordingly. The matrix’s rows represent all samples and the 𝑦-axis representing TPR values. Each point on this curve
classified as anomalous or normal, while the columns represent the corresponds to the TPR and FPR for a specific threshold. A favorable
actual anomalous and normal samples, respectively. Correct classifi- threshold choice yields high TPR coupled with low FPR, indicating
cations, including both true positives and true negatives, are repre- minimal false positives and false negatives. The ROC curve empowers
sented along the main diagonal, while incorrect classifications, en- modelers to select the threshold that optimally balances false negatives
compassing false positives and false negatives, are situated along the and false positives for the IDS’s particular use case.

14
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Fig. 13. Most used performance metrics.

AUC (Area Under the Curve) serves as a metric summarizing the in-
formation provided by the ROC curve. It gauges the model’s proficiency
in distinguishing between positive and negative classes by measuring
the area beneath the ROC curve. Consequently, a higher AUC implies
superior IDS performance, irrespective of the chosen threshold.
True Negative Rate (TNR) (Eq. (6)), also known as Specificity and
Normal Predictive Value (NPV), quantifies the proportion of normal
samples correctly classified by the IDS as normal. It addresses the
question: What percentage of all normal samples in network traffic
could be correctly identified by the IDS? TNR complements the False
Positive Rate (FPR) and, therefore, achieving high TNR and low FPR is
desirable for IDS to reduce the amount of false alarms generated.
𝑇𝑁
𝑇 𝑁𝑅 = (6)
𝑇𝑁 + 𝐹𝑃
Some authors also employed the training time, which was the most
frequent computing performance-related metric. Some authors also
measured the testing time, the inference time, and the convergence
Fig. 14. Confusion matrix.
time. Another approach was to evaluate detection and mitigation times
and the computational complexity of the proposed method.
The remaining observed metrics rarely were applied to the studies,
representing more specific approaches, such as G mean, Bookmaker
Informedness (BM), and mitigation rate. From the remaining metrics,
we will discuss four of them, since they are more common in general
classification research problems.
The first is the Matthews Correlation Coefficient (MCC) (Eq. (7)),
which is considered a robust metric as it takes into account all four
cells of the confusion matrix in its computation. Therefore, it only
yields a favorable result if the model performs well across all four
matrix cells: high TP and TN, and low FP and FN. This property
makes MCC a preferred metric for scenarios with imbalanced class
distributions. Its results range between −1 and 1, with −1 representing
the poorest model performance and 1 indicating the best. An MCC of
zero represents that the model’s prediction is no better than a random
one.
𝑇𝑃 × 𝑇𝑁 − 𝐹𝑃 × 𝐹𝑁
𝑀𝐶𝐶 = √ (7)
Fig. 15. ROC curve example. (𝑇 𝑃 + 𝐹 𝑃 )(𝑇 𝑃 + 𝐹 𝑁)(𝑇 𝑁 + 𝐹 𝑃 )(𝑇 𝑁 + 𝐹 𝑁)
Secondly, the False Negative Rate (FNR) (Eq. (8)) quantifies the
proportion of actual anomalous samples that were incorrectly classified
as normal by the IDS. It complements the recall by addressing the

15
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

question: What percentage of all anomalous samples in network traffic in deployed IDS. A dataset predominantly consisting of attack vectors,
could not be detected by the IDS? without a corresponding diversity of normal behaviors, limits the sys-
𝐹𝑁 tem’s ability to learn and distinguish between benign and malicious
𝐹 𝑁𝑅 = 1 − 𝑅𝑒𝑐𝑎𝑙𝑙 = (8)
𝐹𝑁 + 𝑇𝑃 activities under real operational conditions.
The third one is the False Discovery Rate (FDR) (Eq. (9)), which There is an urgent need for regular updates to existing datasets
is the complement of Precision, measuring the proportion of incorrect and the development of new ones that mirror the current network
anomaly predictions. FDR answers the question: What percentage of the environments. These updates should include recent attack types and
samples flagged as anomalies by the IDS were, in fact, not anomalous? tactics, ensuring that IDS can respond to the latest threats. Additionally,
Thus, a desirable performance goal for FNR is to achieve a low value, these datasets should reflect the varied and legitimate uses of network
indicating that the IDS accurately identifies the majority of anomalous resources in contemporary settings, encompassing different network
samples present in the network traffic. architectures, traffic, and user behavior. Besides, researchers should
𝐹𝑃 evaluate their solutions on available datasets containing real traffic
𝐹 𝐷𝑅 = 1 − 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (9)
𝐹𝑃 + 𝑇𝑃 data, as presented in Fig. 8 (e.g., CTU-13, ISCX2016, and N-BaIoT). The
Finally, the fourth metric is the False Omission Rate (FOR) (Eq. UGR’16 (Maciá-Fernández, Camacho, Magán-Carrión, García-Teodoro,
(10)), quantifying the proportion of actual anomalous samples that & Therón, 2018) is a valuable dataset not used in the reviewed works. It
were incorrectly classified as normal by the IDS. It addresses the comprises five months of real legitimate traffic collected from a Spanish
question: What percentage of the samples flagged as normal by the IDS ISP, which can be used in NIDS benchmarking.
were, in fact, anomalous?
𝐹𝑁
𝐹 𝑂𝑅 = 1 − 𝑇 𝑁𝑅 = (10) 6.2. Deep learning
𝐹𝑁 + 𝑇𝑁

6. Open issues and future directions Deep learning techniques have become popular for attack detection.
The usage of DL models is highly concentrated in four traditional
Recall, Accuracy, F1-score, and Precision were the most used per-
methods: CNN, LSTM, DNN, and GRU. The scientific community has
formance metrics from the 105 analyzed papers, appearing in more
been exploring more than 30 unique models, as listed in Table 5. How-
than 70% of the selected studies. These were employed majorly in
ever, further investigation of their applicability in different anomaly
measuring the performance of CNN, LSTM, DNN, and GRU models.
detection contexts is still an open issue.
These four Deep Learning approaches reflect only 11% of all applied
models but represent 61% of the total deep learning utilization. The five The majority of the reviewed works apply deep learning to solve
most used datasets are synthetic and represent traditional networks, feature extraction or classification problems. A more in-depth analysis
including CIC-IDS2017, NSL-KDD, CIC-DDoS2019, CSE-CIC-IDS2018, of solving regression, data generation, reconstruction, and the Markov
and KDD Cup 1999. Data preprocessing and hyperparameter tuning decision process for implementing NIDS needs to be conducted.
tend to limit the versatility of a model, often making it highly spe- Based on the reviewed works, we recommend a more in-depth
cialized for a specific application. The lack of SDN datasets with real exploration of Graph Convolutional Network models, such as vanilla
data, a significant number of days, and diverse and updated attacks are GCN, ST-GCN, HGCN, and GC-LSTM. Also, Time Series Transformer
limitations found in this research. The best choice of model, data, and and Attention Mechanism are propitious deep learning solutions rarely
metrics is fundamentally conditioned on the target application, and this applied in intrusion detection. These models present promising results
survey aims to highlight the trends in the field. but are still not widely evaluated.
In this section, we answer the final research question (RQ6). We
compiled important open issues found while conducting the analysis of
the selected works, indicating the possible future research directions. 6.2.1. Unsupervised learning
We identified that unsupervised learning models are not commonly
6.1. Datasets used. Training such models can be complex. Unlike supervised learning,
which relies on labeled data, unsupervised learning has to learn from
The effectiveness of intrusion detection systems hinges critically on the underlying data patterns and structures without explicit guidance.
the quality and relevance of the datasets used for training and testing As a result, developing and validating unsupervised models requires
these systems. There is a pronounced reliance on a limited number of more advanced modeling and evaluation techniques, which can be
public datasets, such as CIC-IDS2017, NSL-KDD, CIC-DDoS2019, CSE- costly.
CIC-IDS2018, and KDD Cup 1999. While these datasets have been Another challenge in developing unsupervised models is the need
invaluable to the research community, they may not accurately reflect for suitable and representative datasets for training. Unsupervised
the breadth of real-world network security challenges. learning requires diverse and complex datasets, which are not always
The first issue with this concentrated use of specific datasets is the
feasible or available. Unsupervised learning models tend to create com-
risk of model overfitting. IDS models trained on these datasets might
plex and latent representations, which make it difficult to understand
perform exceptionally well in a lab environment, but they may fail to
the patterns learned. This lack of understanding can limit the confi-
generalize to different or more recent network behaviors. This discrep-
dence and adoption of these techniques in cybersecurity environments.
ancy arises because these datasets, which are often several years old, do
not contain the latest attack signatures or reflect contemporary network Our analysis have shown a significant concentration of work explor-
traffic conditions. The landscape of cyber threats is dynamic, with new ing Autoencoder and Stacked Autoencoder networks in unsupervised
vulnerabilities and attack methodologies continually emerging. This learning for detecting attacks. Other model variations, such as VAE,
rapid evolution renders existing datasets quickly obsolete. The absence SSAE, DAE1 , and DAE2 , should be further evaluated and compared.
of contemporary threats impairs the ability of IDSs to identify and Although autoencoders are powerful at reconstructing data, more un-
mitigate current risks effectively. supervised deep learning models must be tested to promote innovation
Another critical aspect is the representation of legitimate network and progress. For example, generative adversarial network variations
behavior in these datasets. Many of them emphasize attack scenarios, are still not sufficiently explored in this context, leaving vast territory
providing an imbalanced view that can lead to high false-positive rates for future research: WGAN, WGAN-GP, and Bi-GAN.

16
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

6.2.2. Reinforcement learning 7. Conclusion


We also noticed that reinforcement learning has low adoption in
detecting attacks on computer networks. Although reinforcement learn- We presented a thorough, empirical literature review on deep
ing has been widely explored in other areas, e.g., games and robotics, learning-based NIDS for the Software-Defined Network environment.
its application in cybersecurity is still incipient. This may be due to This survey builds upon previous studies by meticulously analyz-
the complexity of the attack detection environment, which may not ing each common step employed in cutting-edge intrusion detection
be well suited to the reward model used in reinforcement learning, solutions. These steps encompass benchmark datasets, data preprocess-
or to the difficulty of adequately simulating the network environment ing, deep learning modeling, hyperparameter tuning, and performance
to train reinforcement learning agents. Considering their optimistic evaluation.
results in the reviewed works, we recommend researchers to conduct Following the proposed methodology, we selected 105 final papers
more experimentation on reinforcement learning models, such as Deep to form the basis of our review. The five most used datasets for
Q-Network, Recursive Neural Network, and Policy Gradient. developing, testing, and comparing NIDS are CIC-IDS2017, NSL-KDD,
CIC-DDoS2019, CSE-CIC-IDS2018, and KDD Cup 1999. The most com-
mon attacks found in the data sources are DoS, DDoS, Botnet, Portscan,
6.2.3. Decentralized architecture
WebAttacks, and Brute force, highlighting the importance of studying
Implementing centralized NIDS is a clear trend, as most analyzed
such threats. Preprocessing is an essential step before feeding data to a
works apply this architecture design. Decentralized systems are a
machine learning model. We discussed the methods the reviewed works
promising area of research since NIDS may leverage parallel computing usually apply to data: normalization, cleaning, duplicate deletion, over
to scale and cope with the tremendous amount of traffic produced by and under-sampling, and feature engineering. The most used deep
modern networks. Scalability is a fundamental characteristic of a NIDS, learning models are CNN, LSTM, DNN, and GRU, among more than
considering the complexity of current networks. Nonetheless, it is a 30 unique explored models. These are utilized to build NIDS following
neglected study branch, as many authors need to discuss their system’s different implementation paradigms. The DL networks solve feature
adaption to high volumes of input data. We encourage researchers to extraction, classification, regression, data generation, reconstruction, or
design their solutions to run on multiple nodes, as this architecture may Markov decision processes. They may function centrally or in a decen-
be more suitable to real-world scenarios. tralized manner. Also, the systems can be supervised, unsupervised, or
reinforcement-based. Most works did not describe their deep learning
6.3. Explainable artificial intelligence hyperparameter optimization process. Nonetheless, some hyperparam-
eter search approaches are used by some studies, including empirical,
In the past decade, significant progress has been made in Artifi- random, and grid-based ones. The most commonly used metrics to
cial Intelligence (AI), resulting in the adoption of algorithms to solve evaluate the deep learning-based NIDS results are Recall, Accuracy, F1-
various problems. This success has led to increased model complexity score, Precision, False Positive Rate, and ROC curve. Many works still
and the use of black-box AI models that lack transparency. Many of utilize accuracy in imbalanced datasets despite the consensus that it is
the decisions taken by these models do not clearly express the reasons not a trustworthy metric in such a context.
that led to that decision being made. In response to this challenge, There has been an increasing number of published works in the
Explainable AI (XAI) has been proposed to enhance the transparency regarded research field. The scientific community has tried to develop
of AI and facilitate its adoption in critical domains. better solutions for the intrusion detection problem. Nevertheless, we
identified some critical issues that remain open and require further
Despite its rapid growth and increasing acceptance, XAI remains
investigation. More real-world, updated datasets are needed for reli-
an emerging field that lacks formality and agreed-upon definitions
able NIDS evaluation. Unsupervised learning has not been significantly
(Linardatos, Papastefanopoulos, & Kotsiantis, 2021). Among several
explored in the literature. Since collecting labeled data in real en-
studies dealing with XAI, the work by Roscher, Bohn, Duarte, and
vironments is challenging, this solution paradigm is more practical
Garcke (2020) stands out. They discuss the requirements for using
and cheaper than the supervised one. Novel unsupervised NIDS stud-
ML for scientific discovery and organize them into three core ele-
ies may experiment with alternative deep learning models to solve
ments: transparency, interpretability, and explainability. An ML model
problems other than reconstruction error. Our research revealed that
is transparent if its construction process (including methods for model
reinforcement learning is an implementation paradigm rarely utilized
structure choices and fitting the parameters) can all be well described
in state-of-the-art NIDS. The examination of such solutions remains a
and motivated. In general, interpretability refers to the ability to make
future direction to be studied. Scalable, decentralized NIDS is also a
sense of an ML model obtained. The definition of interpretability takes
promising future research topic. XAI is an evolving area that aims to
into account the data and the model created. Thus, interpretability
provide insight into the inner workings of black-box machine learning
methods try to explain which input data was responsible for the model’s models. Scientists may apply this solution to deep learning-based NIDS
prediction. Finally, explainability presents the relationship between the to understand their inference process, allowing for debugging and
data, the model, and the user. Explainability is the ability to explain the improvement.
model’s operation, making its behavior more intelligible to humans.
The potential of using XAI is twofold. Firstly, it can justify algorith- CRediT authorship contribution statement
mic decisions, enabling users to understand and improve the model’s
performance. Secondly, XAI can contribute to knowledge discovery by Vitor Gabriel da Silva Ruffo: Conceptualization, Formal analysis,
revealing learned patterns (Nauta et al., 2023). XAI algorithms must Investigation, Data curation, Writing – original draft, Writing – review
be extensively validated to ensure their effectiveness and usefulness in & editing. Daniel Matheus Brandão Lent: Conceptualization, Formal
achieving this potential. The increase in studies looking at the inter- analysis, Investigation, Data curation, Writing – original draft, Writing
pretability of deep learning models confirms the room for improvement – review & editing. Mateus Komarchesqui: Conceptualization, Formal
in this area, not only by improving the model training pipeline but analysis, Investigation, Data curation, Writing – original draft, Writing
also by highlighting the flaws and how much they are lacking in – review & editing. Vinícius Ferreira Schiavon: Conceptualization,
performance. In general, XAI has few explored aspects and the potential Formal analysis, Investigation, Data curation, Writing – original draft,
to be unlocked in future work. We suggest new research to study how Writing – review & editing. Marcos Vinicius Oliveira de Assis: Con-
XAI methods, such as SHAP and LIME, may be integrated into NIDS ceptualization, Formal analysis, Investigation, Data curation, Writing –
solutions to assist in explaining their inference process. original draft, Writing – review & editing. Luiz Fernando Carvalho:

17
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Conceptualization, Formal analysis, Investigation, Data curation, Writ- Ding, Q., & Li, J. (2022). Anogla: An efficient scheme to improve network anomaly
ing – original draft, Writing – review & editing. Mario Lemes Proença detection. Journal of Information Security and Applications, 66, Article 103149.
http://dx.doi.org/10.1016/j.jisa.2022.103149.
Jr.: Conceptualization, Formal analysis, Investigation, Data curation,
Dinh, P. T., & Park, M. (2021). R-EDoS: Robust economic denial of sustainability
Writing – original draft, Writing – review & editing. detection in an SDN-based cloud through stochastic recurrent neural network. IEEE
Access, 9, 35057–35074. http://dx.doi.org/10.1109/access.2021.3061601.
Declaration of competing interest Duan, X., Fu, Y., & Wang, K. (2023). Network traffic anomaly detection method
based on multi-scale residual classifier. Computer Communications, 198, 206–216.
http://dx.doi.org/10.1016/j.comcom.2022.10.024.
The authors declare that they have no known competing finan-
Duy, P. T., Khoa, N. H., Hien, D. T. T., Hoang, H. D., & Pham, V. H. (2023).
cial interests or personal relationships that could have appeared to Investigating on the robustness of flow-based intrusion detection system against
influence the work reported in this paper. adversarial samples using generative adversarial networks. Journal of Information
Security and Applications, 74, Article 103472. http://dx.doi.org/10.1016/j.jisa.2023.
103472.
Data availability
Duy, P. T., Tien, L. K., Khoa, N. H., Hien, D. T. T., Nguyen, A. G. T., & Pham, V.-H.
(2021). DIGFuPAS: Deceive IDS with GAN and function-preserving on adversarial
Data will be made available on request. samples in SDN-enabled networks. Computers & Security, 109, Article 102367.
http://dx.doi.org/10.1016/j.cose.2021.102367.
Acknowledgments El-Ghamry, A., Darwish, A., & Hassanien, A. E. (2023). An optimized CNN-based
intrusion detection system for reducing risks in smart farming. Internet of Things,
22, Article 100709. http://dx.doi.org/10.1016/j.iot.2023.100709.
This work was supported by CAPES, Brazil, due to the conces- Elsayed, R. A., Hamada, R. A., Abdalla, M. I., & Elsaid, S. A. (2023). Securing IoT and
sion of scholarships and by the National Council for Scientific and SDN systems using deep-learning based automatic intrusion detection. Ain Shams
Technological Development (CNPq) of Brazil under Grant of Project Engineering Journal, 14(10), Article 102211. http://dx.doi.org/10.1016/j.asej.2023.
102211.
306397/2022-6.
ElSayed, M. S., Le-Khac, N. A., Albahar, M. A., & Jurcut, A. (2021). A novel
hybrid model for intrusion detection systems in SDNs based on CNN and a new
References regularization technique. Journal of Network and Computer Applications, 191, Article
103160. http://dx.doi.org/10.1016/j.jnca.2021.103160.
Abdulganiyu, O. H., Ait Tchakoucht, T., & Saheed, Y. K. (2023). A systematic Ferrag, M. A., Friha, O., Hamouda, D., Maglaras, L., & Janicke, H. (2022). Edge-
literature review for network intrusion detection system (IDS). International Journal IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT
of Information Security, 22(5), 1125–1162. http://dx.doi.org/10.1007/s10207-023- applications for centralized and federated learning. IEEE Access, 10, 40281–40306.
00682-2. http://dx.doi.org/10.1109/access.2022.3165809.
Ahmad, I., Wan, Z., & Ahmad, A. (2023). A big data analytics for DDOS attack detection Fouladi, R. F., Ermiş, O., & Anarim, E. (2022). A DDoS attack detection and counter-
using optimized ensemble framework in internet of things. Internet of Things, 23, measure scheme based on DWT and auto-encoder neural network for SDN. Computer
Article 100825. http://dx.doi.org/10.1016/j.iot.2023.100825. Networks, 214, Article 109140. http://dx.doi.org/10.1016/j.comnet.2022.109140.
Assis, M. V., Carvalho, L. F., Lloret, J., & Proença, M. L. (2021). A GRU deep Foundation, O. N. (2024). Open network operating system (ONOS) SDN controller for
learning system against attacks in software defined networks. Journal of Network SDN/NFV solutions. https://opennetworking.org/onos/. (Last Access 6 May 2024).
and Computer Applications, 177, Article 102942. http://dx.doi.org/10.1016/j.jnca. Fox, G. T., & Boppana, R. V. (2023). On early detection of anomalous network flows.
2020.102942. IEEE Access, 11, 68588–68603. http://dx.doi.org/10.1109/access.2023.3291686.
Aydın, H., Orman, Z., & Aydın, M. A. (2022). A long short-term memory (LSTM)- Fred Klassen, A. (2024). Tcpreplay - pcap editing and replaying utilities. https://
based distributed denial of service (DDoS) detection and defense system design tcpreplay.appneta.com/. (Last Access 6 May 2024).
in public cloud network environment. Computers & Security, 118, Article 102725.
Friha, O., Ferrag, M. A., Benbouzid, M., Berghout, T., Kantarci, B., & Choo, K. K. R.
http://dx.doi.org/10.1016/j.cose.2022.102725.
(2023). 2DF-IDS: Decentralized and differentially private federated learning-based
Bilot, T., Madhoun, N. E., Agha, K. A., & Zouaoui, A. (2023). Graph neural networks
intrusion detection system for industrial IoT. Computers & Security, 127, Article
for intrusion detection: A survey. IEEE Access, 11, 49114–49139. http://dx.doi.org/
103097. http://dx.doi.org/10.1016/j.cose.2023.103097.
10.1109/ACCESS.2023.3275789.
Friha, O., Ferrag, M. A., Shu, L., Maglaras, L., Choo, K. K. R., & Nafaa, M. (2022).
Bob Lantz, M. C. (2024). Mininet: An instant virtual network on your laptop (or other
FELIDS: Federated learning-based intrusion detection system for agricultural in-
PC). http://mininet.org/. (Last Access 6 May 2024).
ternet of things. Journal of Parallel and Distributed Computing, 165, 17–31. http:
Brandao Lent, D. M., Novaes, M. P., Carvalho, L. F., Lloret, J., Rodrigues, J. J. P. C.,
//dx.doi.org/10.1016/j.jpdc.2022.03.003.
& Proenca, M. L. (2022). A gated recurrent unit deep learning model to detect
Fu, Y., Duan, X., Wang, K., & Li, B. (2022). Low-rate denial of service attack detection
and mitigate distributed denial of service and portscan attacks. IEEE Access, 10,
method based on time-frequency characteristics. Journal of Cloud Computing, 11(1),
73229–73242. http://dx.doi.org/10.1109/access.2022.3190008.
http://dx.doi.org/10.1186/s13677-022-00308-3.
Bårli, E. M., Yazidi, A., Viedma, E. H., & Haugerud, H. (2021). Dos and DDoS
Garcia, N., Alcaniz, T., González-Vidal, A., Bernabe, J. B., Rivera, D., & Skarmeta, A.
mitigation using variational autoencoders. Computer Networks, 199, Article 108399.
(2021). Distributed real-time SlowDoS attacks detection over encrypted traffic using
http://dx.doi.org/10.1016/j.comnet.2021.108399.
artificial intelligence. Journal of Network and Computer Applications, 173, Article
Caville, E., Lo, W. W., Layeghy, S., & Portmann, M. (2022). Anomal-E: A self-
102871. http://dx.doi.org/10.1016/j.jnca.2020.102871.
supervised network intrusion detection system based on graph neural networks.
Knowledge-Based Systems, 258, Article 110030. http://dx.doi.org/10.1016/j.knosys. Gupta, N., Jindal, V., & Bedi, P. (2021). LIO-IDS: Handling class imbalance using
2022.110030. LSTM and improved one-vs-one technique in intrusion detection system. Computer
Cherian, M., & Varma, S. L. (2023). Secure SDN–IoT framework for ddos attack Networks, 192, Article 108076. http://dx.doi.org/10.1016/j.comnet.2021.108076.
detection using deep learning and counter based approach. Journal of Network and Gupta, N., Jindal, V., & Bedi, P. (2022). CSE-IDS: Using cost-sensitive deep learning and
Systems Management, 31(3), http://dx.doi.org/10.1007/s10922-023-09749-w. ensemble algorithms to handle class imbalance in network-based intrusion detection
Choobdar, P., Naderan, M., & Naderan, M. (2021). Detection and multi-class clas- systems. Computers & Security, 112, Article 102499. http://dx.doi.org/10.1016/j.
sification of intrusion in software defined networks using stacked auto-encoders cose.2021.102499.
and CICIDS2017 dataset. Wireless Personal Communications, 123(1), 437–471. http: Gupta, S. K., Tripathi, M., & Grover, J. (2022). Hybrid optimization and deep learning
//dx.doi.org/10.1007/s11277-021-09139-y. based intrusion detection system. Computers & Electrical Engineering, 100, Article
Cil, A. E., Yildiz, K., & Buldu, A. (2021). Detection of ddos attacks with feed forward 107876. http://dx.doi.org/10.1016/j.compeleceng.2022.107876.
based deep neural network model. Expert Systems with Applications, 169, Article H., S. C., Rao, K. V., & Prasad, M. H. M. K. (2023). Deep neural network empowered
114520. http://dx.doi.org/10.1016/j.eswa.2020.114520. bi-directional cross GAN in context of classifying DDoS over flash crowd event
Community, R. S. F. (2024). Ryu SDN framework. https://ryu-sdn.org/. (Last Access 6 on web server. Multimedia Tools and Applications, 82(24), 37303–37326. http:
May 2024). //dx.doi.org/10.1007/s11042-023-15030-8.
de Souza, C. A., Westphall, C. B., Machado, R. B., Loffi, L., Westphall, C. M., & Hairab, B. I., Said Elsayed, M., Jurcut, A. D., & Azer, M. A. (2022). Anomaly detection
Geronimo, G. A. (2022). Intrusion detection and prevention in fog based IoT based on CNN and regularization techniques against zero-day attacks in IoT
environments: A systematic literature review. Computer Networks, 214, Article networks. IEEE Access, 10, 98427–98440. http://dx.doi.org/10.1109/access.2022.
109154. http://dx.doi.org/10.1016/j.comnet.2022.109154. 3206367.
Ding, S., Kou, L., & Wu, T. (2022). A GAN-based intrusion detection model for 5G Hettich, S., & Bay, S. D. (1999). KDD Cup 1999 Data. University of California,
enabled future metaverse. Mobile Networks and Applications, 27(6), 2596–2610. Department of Information and Computer Science, https://kdd.ics.uci.edu/. (Last
http://dx.doi.org/10.1007/s11036-022-02075-6. Access 6 May 2024).

18
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Hidalgo, C., Vaca, M., Nowak, M. P., Frölich, P., Reed, M., Al-Naday, M., et al. (2022). Melis, A., Sadi, A. A., Berardi, D., Callegati, F., & Prandini, M. (2023). A systematic
Detection, control and mitigation system for secure vehicular communication. literature review of offensive and defensive security solutions with software defined
Vehicular Communications, 34, Article 100425. http://dx.doi.org/10.1016/j.vehcom. network. IEEE Access, 11, 93431–93463. http://dx.doi.org/10.1109/ACCESS.2023.
2021.100425. 3276238.
Hnamte, V., & Hussain, J. (2023). An efficient DDoS attack detection mechanism in SDN Mustapha, A., Khatoun, R., Zeadally, S., Chbib, F., Fadlallah, A., Fahs, W., et al. (2023).
environment. International Journal of Information Technology, 15(5), 2623–2636. Detecting ddos attacks using adversarial neural network. Computers & Security, 127,
http://dx.doi.org/10.1007/s41870-023-01332-5. Article 103117. http://dx.doi.org/10.1016/j.cose.2023.103117.
Houda, Z. A. E., Hafid, A. S., & Khoukhi, L. (2023). MiTFed: A privacy preserving Myneni, S., Chowdhary, A., Huang, D., & Alshamrani, A. (2022). SmartDefense: A
collaborative network attack mitigation framework based on federated learning distributed deep defense against DDoS attacks with edge computing. Computer
using SDN and blockchain. IEEE Transactions on Network Science and Engineering, Networks, 209, Article 108874. http://dx.doi.org/10.1016/j.comnet.2022.108874.
10(4), 1985–2001. http://dx.doi.org/10.1109/tnse.2023.3237367. Nadeem, M. W., Goh, H. G., Aun, Y., & Ponnusamy, V. (2023). Detecting and mitigating
Hu, B., Bi, Y., Zhi, M., Zhang, K., Yan, F., Zhang, Q., et al. (2022). A deep one- botnet attacks in software-defined networks using deep learning techniques. IEEE
class intrusion detection scheme in software-defined industrial networks. IEEE Access, 11, 49153–49171. http://dx.doi.org/10.1109/access.2023.3277397.
Transactions on Industrial Informatics, 18(6), 4286–4296. http://dx.doi.org/10.1109/ Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., et al. (2023).
tii.2021.3133300. From anecdotal evidence to quantitative evaluation methods: A systematic review
Huang, H., Ye, P., Hu, M., & Wu, J. (2023). A multi-point collaborative ddos on evaluating explainable AI. ACM Computing Surveys, 55(13s), http://dx.doi.org/
defense mechanism for IIoT environment. Digital Communications and Networks, 10.1145/3583558.
9(2), 590–601. http://dx.doi.org/10.1016/j.dcan.2022.04.008. Nguyen, X. H., & Le, K. H. (2023). Robust detection of unknown DoS/DDoS attacks in
Illy, P., & Kaddoum, G. (2023). A collaborative DNN-based low-latency IDPS for IoT networks using a hybrid learning model. Internet of Things, 23, Article 100851.
mission-critical smart factory networks. IEEE Access, 11, 96317–96329. http://dx. http://dx.doi.org/10.1016/j.iot.2023.100851.
doi.org/10.1109/access.2023.3311822. Nisar, K., Jimson, E. R., Hijazi, M. H. A., Welch, I., Hassan, R., Aman, A. H. M., et al.
Illy, P., Kaddoum, G., de Araujo-Filho, P. F., Kaur, K., & Garg, S. (2022). A hybrid (2020). A survey on the architecture, application, and security of software defined
multistage DNN-based collaborative IDPS for high-risk smart factory networks. networking: Challenges and open issues. Internet of Things, 12, Article 100289.
IEEE Transactions on Network and Service Management, 19(4), 4273–4283. http: http://dx.doi.org/10.1016/j.iot.2020.100289.
//dx.doi.org/10.1109/tnsm.2022.3202801. Novaes, M. P., Carvalho, L. F., Lloret, J., & Proença, M. L. (2021). Adversarial
Imrana, Y., Xiang, Y., Ali, L., & Abdul-Rauf, Z. (2021). A bidirectional LSTM deep deep learning approach detection and defense against DDoS attacks in SDN
learning approach for intrusion detection. Expert Systems with Applications, 185, environments. Future Generation Computer Systems, 125, 156–167. http://dx.doi.org/
Article 115524. http://dx.doi.org/10.1016/j.eswa.2021.115524. 10.1016/j.future.2021.06.047.
Janabi, A. H., Kanakis, T., & Johnson, M. (2022). Convolutional neural network based Nuaimi, M., Fourati, L. C., & Hamed, B. B. (2023). Intelligent approaches toward
algorithm for early warning proactive system security in software defined networks. intrusion detection systems for industrial internet of things: A systematic compre-
IEEE Access, 10, 14301–14310. http://dx.doi.org/10.1109/ACCESS.2022.3148134. hensive review. Journal of Network and Computer Applications, 215, Article 103637.
http://dx.doi.org/10.1016/j.jnca.2023.103637.
Javed, Y., Khayat, M. A., Elghariani, A. A., & Ghafoor, A. (2023). PRISM: A hierarchical
O. Lopes, I., Zou, D., Abdulqadder, I. H., Akbar, S., Li, Z., Ruambo, F., et al. (2023).
intrusion detection architecture for large-scale cyber networks. IEEE Transactions
Network intrusion detection based on the temporal convolutional model. Computers
on Dependable and Secure Computing, 20(6), 5070–5086. http://dx.doi.org/10.1109/
& Security, 135, Article 103465. http://dx.doi.org/10.1016/j.cose.2023.103465.
TDSC.2023.3240315.
Oliveira, M. d. (2024). hping3 | kali linux tools. https://www.kali.org/tools/hping3/.
Kaur, G., & Kakkar, D. (2022). Hybrid optimization enabled trust-based secure routing
(Last Access 6 May 2024).
with deep learning-based attack detection in VANET. Ad Hoc Networks, 136, Article
Ozkan-Okay, M., Akin, E., Aslan, Ö., Kosunalp, S., Iliev, T., Stoyanov, I., et al.
102961. http://dx.doi.org/10.1016/j.adhoc.2022.102961.
(2024). A comprehensive survey: Evaluating the efficiency of artificial intelligence
Kim, S., Yoon, S., Cho, J. H., Kim, D. S., Moore, T. J., Free-Nelson, F., et al. (2022).
and machine learning techniques on cyber security solutions. IEEE Access, 12,
DIVERGENCE: Deep reinforcement learning-based adaptive traffic inspection and
12229–12256. http://dx.doi.org/10.1109/ACCESS.2024.3355547.
moving target defense countermeasure framework. IEEE Transactions on Network
Phan, T. V., & Bauschert, T. (2022). DeepAir: Deep reinforcement learning for adaptive
and Service Management, 19(4), 4834–4846. http://dx.doi.org/10.1109/tnsm.2021.
intrusion response in software-defined networks. IEEE Transactions on Network
3139928.
and Service Management, 19(3), 2207–2218. http://dx.doi.org/10.1109/tnsm.2022.
Kumar, P., Kumar, R., Aljuhani, A., Javeed, D., Jolfaei, A., & Islam, A. K. M. N. (2023).
3158468.
Digital twin-driven SDN for smart grid: A deep learning integrated blockchain
Phu, A. T., Li, B., Ullah, F., Ul Huque, T., Naha, R., Babar, M. A., et al. (2023).
for cybersecurity. Solar Energy, 263, Article 111921. http://dx.doi.org/10.1016/j.
Defending SDN against packet injection attacks using deep learning. Computer
solener.2023.111921.
Networks, 234, Article 109935. http://dx.doi.org/10.1016/j.comnet.2023.109935.
Li, W., Meng, W., & Kwok, L. F. (2016). A survey on OpenFlow-based software
Polat, H., Türko˘ glu, M., Polat, O., & Şengür, A. (2022). A novel approach for
defined networks: Security challenges and countermeasures. Journal of Network and
accurate detection of the DDoS attacks in SDN-based SCADA systems based on deep
Computer Applications, 68, 126–139. http://dx.doi.org/10.1016/j.jnca.2016.04.011.
recurrent neural networks. Expert Systems with Applications, 197, Article 116748.
Linardatos, P., Papastefanopoulos, V., & Kotsiantis, S. (2021). Explainable AI: A review
http://dx.doi.org/10.1016/j.eswa.2022.116748.
of machine learning interpretability methods. Entropy, 23(1), http://dx.doi.org/10.
Presekal, A., Ştefanov, A., Rajkumar, V. S., & Palensky, P. (2023). Attack graph model
3390/e23010018.
for cyber-physical power systems using hybrid deep learning. IEEE Transactions on
Liu, Y., Zhi, T., Shen, M., Wang, L., Li, Y., & Wan, M. (2022). Software-defined ddos Smart Grid, 14(5), 4007–4020. http://dx.doi.org/10.1109/tsg.2023.3237011.
detection with information entropy analysis and optimized deep learning. Future Qazi, E. u.-H., Imran, M., Haider, N., Shoaib, M., & Razzak, I. (2022). An intelligent
Generation Computer Systems, 129, 99–114. http://dx.doi.org/10.1016/j.future.2021. and efficient network intrusion detection system using deep learning. Computers &
11.009. Electrical Engineering, 99, Article 107764. http://dx.doi.org/10.1016/j.compeleceng.
Long, Z., & Jinsong, W. (2022). A hybrid method of entropy and SSAE-SVM based DDoS 2022.107764.
detection and mitigation mechanism in SDN. Computers & Security, 115, Article Ravi, V., Chaganti, R., & Alazab, M. (2022). Recurrent deep learning-based feature
102604. http://dx.doi.org/10.1016/j.cose.2022.102604. fusion ensemble meta-classifier approach for intelligent network intrusion detection
Lopes, I. O., Zou, D., Abdulqadder, I. H., Ruambo, F. A., Yuan, B., & Jin, H. (2022). system. Computers & Electrical Engineering, 102, Article 108156. http://dx.doi.org/
Effective network intrusion detection via representation learning: A denoising 10.1016/j.compeleceng.2022.108156.
AutoEncoder approach. Computer Communications, 194, 55–65. http://dx.doi.org/ Roscher, R., Bohn, B., Duarte, M. F., & Garcke, J. (2020). Explainable machine learning
10.1016/j.comcom.2022.07.027. for scientific insights and discoveries. IEEE Access, 8, 42200–42216. http://dx.doi.
Ltd., S. I. T. (2024a). DNS-STATS. https://dns-stats.org/. (Last Access 6 May 2024). org/10.1109/ACCESS.2020.2976199.
Ltd., O. (2024b). OMNeT++ discrete event simulator. https://omnetpp.org/. (Last Sabeel, U., Heydari, S. S., El-Khatib, K., & Elgazzar, K. (2024). Unknown, atypical and
Access 6 May 2024). polymorphic network intrusion detection: A systematic survey. IEEE Transactions
M., G., & Sethuraman, S. C. (2023). A comprehensive survey on deep learning based on Network and Service Management, 21(1), 1190–1212. http://dx.doi.org/10.1109/
malware detection techniques. Computer Science Review, 47, Article 100529. http: TNSM.2023.3298533.
//dx.doi.org/10.1016/j.cosrev.2022.100529. Sahu, S. K., Mohapatra, D. P., Rout, J. K., Sahoo, K. S., Pham, Q. V., & Dao, N.
Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. N. (2022). A LSTM-FCNN based multi-class intrusion detection using scalable
(2018). UGR ‘16: A new dataset for the evaluation of cyclostationarity-based framework. Computers & Electrical Engineering, 99, Article 107720. http://dx.doi.
network IDSs. Computers & Security, 73, 411–424. http://dx.doi.org/10.1016/j.cose. org/10.1016/j.compeleceng.2022.107720.
2017.11.004. Sarıkaya, A., Kılıç, B. G., & Demirci, M. (2023). RAIDS: Robust autoencoder-based
Mansour, R. F. (2022). Blockchain assisted clustering with intrusion detection system intrusion detection system model against adversarial attacks. Computers & Security,
for industrial internet of things environment. Expert Systems with Applications, 207, 135, Article 103483. http://dx.doi.org/10.1016/j.cose.2023.103483.
Article 117995. http://dx.doi.org/10.1016/j.eswa.2022.117995. Sattari, F., Farooqi, A. H., Qadir, Z., Raza, B., Nazari, H., & Almutiry, M. (2022). A
McCauley, M. (2024). The POX network software platform. https://github.com/ hybrid deep learning approach for bottleneck detection in IoT. IEEE Access, 10,
noxrepo/pox. (Last Access 6 May 2024). 77039–77053. http://dx.doi.org/10.1109/access.2022.3188635.

19
V.G. da Silva Ruffo et al. Expert Systems With Applications 256 (2024) 124982

Sayed, M. S. E., Le-Khac, N. A., Azer, M. A., & Jurcut, A. D. (2022). A flow- Udas, P. B., Karim, M. E., & Roy, K. S. (2022). SPIDER: A shallow PCA based network
based anomaly detection approach with feature selection method against ddos intrusion detection system with enhanced recurrent neural networks. Journal of
attacks in SDNs. IEEE Transactions on Cognitive Communications and Networking, 8(4), King Saud University - Computer and Information Sciences, 34(10), 10246–10272.
1862–1880. http://dx.doi.org/10.1109/tccn.2022.3186331. http://dx.doi.org/10.1016/j.jksuci.2022.10.019.
Shaji, N. S., Jain, T., Muthalagu, R., & Pawar, P. M. (2023). Deep-discovery: Anomaly Van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised learning. Machine
discovery in software-defined networks using artificial neural networks. Computers Learning, 109(2), 373–440. http://dx.doi.org/10.1007/s10994-019-05855-6.
& Security, 132, Article 103320. http://dx.doi.org/10.1016/j.cose.2023.103320. Vatambeti, R., Venkatesh, D., Mamidisetti, G., Damera, V. K., Manohar, M., & Yadav, N.
Sharafaldin, I., Lashkari, A. H., Ghorbani, A. A., et al. (2018). Toward generating a S. (2023). Prediction of ddos attacks in agriculture 4.0 with the help of prairie dog
new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1, optimization algorithm with IDSNet. Scientific Reports, 13(1), http://dx.doi.org/10.
108–116. http://dx.doi.org/10.5220/0006639801080116. 1038/s41598-023-42678-x.
Sharafaldin, I., Lashkari, A. H., Hakak, S., & Ghorbani, A. A. (2019). Developing Wang, K., Cui, Y., Qian, Q., Chen, Y., Guo, C., & Shen, G. (2023). USAGE: Uncertain
realistic distributed denial of service (DDoS) attack dataset and taxonomy. http: flow graph and spatio-temporal graph convolutional network-based saturation
//dx.doi.org/10.1109/CCST.2019.8888419. attack detection method. Journal of Network and Computer Applications, 219, Article
Shu, J., Zhou, L., Zhang, W., Du, X., & Guizani, M. (2021). Collaborative intrusion 103722. http://dx.doi.org/10.1016/j.jnca.2023.103722.
detection for VANETs: A deep learning-based distributed SDN approach. IEEE Wette, P., Dräxler, M., Schwabe, A., Wallaschek, F., Zahraee, M. H., & Karl, H.
Transactions on Intelligent Transportation Systems, 22(7), 4519–4530. http://dx.doi. (2014). MaxiNet: Distributed emulation of software-defined networks. In 2014 IFIP
org/10.1109/tits.2020.3027390. networking conference. http://dx.doi.org/10.1109/IFIPNetworking.2014.6857078.
Shukla, P. K., Maheshwary, P., Subramanian, E., Shilpa, V. J., & Varma, P. R. K. (2023). Xue, H., & Jing, B. (2023). SDN attack identification model based on CNN algorithm.
Traffic flow monitoring in software-defined network using modified recursive IEEE Access, 11, 87652–87666. http://dx.doi.org/10.1109/access.2023.3296798.
learning. Physical Communication, 57, Article 101997. http://dx.doi.org/10.1016/ Yang, Z., Liu, X., Li, T., Wu, D., Wang, J., Zhao, Y., et al. (2022). A systematic literature
j.phycom.2022.101997. review of methods and datasets for anomaly-based network intrusion detection.
Siniosoglou, I., Radoglou-Grammatikis, P., Efstathopoulos, G., Fouliras, P., & Sarigian- Computers & Security, 116, Article 102675. http://dx.doi.org/10.1016/j.cose.2022.
nidis, P. (2021). A unified deep learning anomaly detection and classification 102675.
approach for smart grid environments. IEEE Transactions on Network and Service Yang, X., Song, Z., King, I., & Xu, Z. (2023). A survey on deep semi-supervised
Management, 18(2), 1137–1151. http://dx.doi.org/10.1109/tnsm.2021.3078381. learning. IEEE Transactions on Knowledge and Data Engineering, 35(9), 8934–8954.
Sivanesan, N., & Archana, K. S. (2023). Detecting distributed denial of service (DDoS) http://dx.doi.org/10.1109/TKDE.2022.3220219.
in SD-IoT environment with enhanced firefly algorithm and convolution neural Yeom, S., Choi, C., & Kim, K. (2022). LSTM-based collaborative source-side DDoS attack
network. Optical and Quantum Electronics, 55(5), http://dx.doi.org/10.1007/s11082- detection. IEEE Access, 10, 44033–44045. http://dx.doi.org/10.1109/access.2022.
023-04553-x. 3169616.
SolarWinds Worldwide, L. (2024). GNS3 | the software that empowers network Yousuf, O., & Mir, R. N. (2022). DDoS attack detection in internet of things using
professionals. https://www.gns3.com/. (Last Access 6 May 2024). recurrent neural network. Computers & Electrical Engineering, 101, Article 108034.
Soltani, M., Siavoshani, M. J., & Jahangir, A. H. (2021). A content-based deep intrusion http://dx.doi.org/10.1016/j.compeleceng.2022.108034.
detection system. International Journal of Information Security, 21(3), 547–562. Yungaicela-Naula, N. M., Vargas-Rosales, C., & Perez-Diaz, J. A. (2021). SDN-based
http://dx.doi.org/10.1007/s10207-021-00567-2. architecture for transport and application layer DDoS attack detection by using
Song, D., Yuan, X., Li, Q., Zhang, J., Sun, M., Fu, X., et al. (2023). Intrusion detection machine and deep learning. IEEE Access, 9, 108495–108512. http://dx.doi.org/10.
model using gene expression programming to optimize parameters of convolutional 1109/access.2021.3101650.
neural network for energy internet. Applied Soft Computing, 134, Article 109960. Zainudin, A., Ahakonye, L. A. C., Akter, R., Kim, D. S., & Lee, J. M. (2023). An
http://dx.doi.org/10.1016/j.asoc.2022.109960. efficient hybrid-DNN for DDoS detection and classification in software-defined IIoT
Sood, K., Nosouhi, M. R., Nguyen, D. D. N., Jiang, F., Chowdhury, M., & Doss, R. networks. IEEE Internet of Things Journal, 10(10), 8491–8504. http://dx.doi.org/10.
(2023). Intrusion detection scheme with dimensionality reduction in next genera- 1109/jiot.2022.3196942.
tion networks. IEEE Transactions on Information Forensics and Security, 18, 965–979. Zavrak, S., & Iskefiyeli, M. (2023). Flow-based intrusion detection on software-defined
http://dx.doi.org/10.1109/tifs.2022.3233777. networks: A multivariate time series anomaly detection approach. Neural Computing
Taheri, R., Ahmed, H., & Arslan, E. (2023). Deep learning for the security of software- and Applications, 35(16), 12175–12193. http://dx.doi.org/10.1007/s00521-023-
defined networks: A review. Cluster Computing, 26(5), 3089–3112. http://dx.doi. 08376-5.
org/10.1007/s10586-023-04069-9. Zhang, P., He, F., Zhang, H., Hu, J., Huang, X., Wang, J., et al. (2023). Real-
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of time malicious traffic detection with online isolation forest over sd-wan. IEEE
the KDD CUP 99 data set. In 2009 IEEE symposium on computational intelligence for Transactions on Information Forensics and Security, 18, 2076–2090. http://dx.doi.
security and defense applications. http://dx.doi.org/10.1109/CISDA.2009.5356528. org/10.1109/TIFS.2023.3262121.
Tayfour, O. E., Mubarakali, A., Tayfour, A. E., Marsono, M. N., Hassan, E., & Zhou, H., Zheng, Y., Jia, X., & Shu, J. (2023). Collaborative prediction and detection of
Abdelrahman, A. M. (2023). Adapting deep learning-LSTM method using optimized DDoS attacks in edge computing: A deep learning-based approach with distributed
dataset in SDN controller for secure IoT. Soft Computing, http://dx.doi.org/10.1007/ SDN. Computer Networks, 225, Article 109642. http://dx.doi.org/10.1016/j.comnet.
s00500-023-08348-w. 2023.109642.
Tsogbaatar, E., Bhuyan, M. H., Taenaka, Y., Fall, D., Gonchigsumlaa, K., Elmroth, E.,
et al. (2021). Del-IoT: A deep ensemble learning approach to uncover anomalies
in IoT. Internet of Things, 14, Article 100391. http://dx.doi.org/10.1016/j.iot.2021.
100391.

20

You might also like