0% found this document useful (0 votes)
121 views33 pages

Paper 4

This survey paper comprehensively reviews the applications of deep learning techniques in cybersecurity. It discusses how deep learning can help address challenges in cybersecurity as attacks grow more sophisticated. The paper provides an overview of key deep learning architectures used in security and surveys state-of-the-art solutions. It identifies limitations and outlines opportunities for future research to improve cybersecurity defenses using deep learning.

Uploaded by

Muhsina Gowth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views33 pages

Paper 4

This survey paper comprehensively reviews the applications of deep learning techniques in cybersecurity. It discusses how deep learning can help address challenges in cybersecurity as attacks grow more sophisticated. The paper provides an overview of key deep learning architectures used in security and surveys state-of-the-art solutions. It identifies limitations and outlines opportunities for future research to improve cybersecurity defenses using deep learning.

Uploaded by

Muhsina Gowth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Computer Networks 212 (2022) 109032

Contents lists available at ScienceDirect

Computer Networks
journal homepage: www.elsevier.com/locate/comnet

Survey paper

A survey on deep learning for cybersecurity: Progress, challenges, and


opportunities
Mayra Macas a,b ,∗,1 , Chunming Wu a , Walter Fuertes b
a College of Computer Science and Technology, Zhejiang University, No. 38 Zheda Road, Hangzhou 310027, China
b
Department of Computer Science, Universidad de las Fuerzas Armadas ESPE, Av. General Rumiñahui S/N, P.O. Box 17-15-231B, Sangolquí, Ecuador

ARTICLE INFO ABSTRACT

Keywords: As the number of Internet-connected systems rises, cyber analysts find it increasingly difficult to effectively
Cybersecurity monitor the produced volume of data, its velocity and diversity. Signature-based cybersecurity strategies
Artificial intelligence are unlikely to achieve the required performance for detecting new attack vectors. Moreover, technological
Machine learning
advances enable attackers to develop sophisticated attack strategies that can avoid detection by current security
Deep learning
systems. As the cyber-threat landscape worsens, we need advanced tools and technologies to detect, investigate,
Cyber-threat
Botnets
and make quick decisions regarding emerging attacks and threats. Applications of artificial intelligence (AI)
Intrusion detection have the potential to analyze and automatically classify vast amounts of Internet traffic. AI-based solutions
Spam filtering that automate the detection of attacks and tackle complex cybersecurity problems are gaining increasing
Encrypted traffic analysis attention. This paper comprehensively presents the promising applications of deep learning, a subfield of AI
based on multiple layers of artificial neural networks, in a wide variety of security tasks. Before critically
and comparatively surveying state-of-the-art solutions from the literature, we discuss the key characteristics of
representative deep learning architectures employed in cybersecurity applications, we introduce the emerging
trends in deep learning, and we provide an overview of necessary resources like a generic framework and
suitable datasets. We identify the limitations of the reviewed works, and we bring forth a vision of the current
challenges of the area, providing valuable insights and good practices for researchers and developers working
on related problems. Finally, we uncover current pain points and outline directions for future research to
address them.

1. Introduction 2021, one of the most bizarre and terrifying cyberattacks happened. A
cybercriminal gained unlawful access to the water treatment system of
Cybersecurity is a set of technologies and methods that strive to the City of Oldsmar in Florida and attempted to make the water unsafe
safeguard computer networks, end systems, programs, and data from to consume by changing specific chemical levels [2]. Following this
attacks, unauthorized access, changes, or harm. Cyber-defense mecha- trend, in the last report of threat landscape compiled by the European
nisms exist at the host, network, application, and data level. A plethora Union Agency for Network and Information Security (ENISA) [3], it is
of tools, such as firewalls, antivirus software, intrusion detection sys- predicted that a secure and reliable cyberspace will become even more
tems (IDS) and intrusion protection systems (IPS), work stealthily to important in the new social and economic norm established after the
avoid attacks and detect security breaches. However, adversaries are COVID-19 pandemic and the consequent transformation of the digital
still at an advantage because they solely need to find one vulner- environment.
ability on the systems under protection. As the number of systems
Organizations face an urgent need to ramp up and improve their
connected to the Internet rises, cyber-attacks are also increasing in size,
cybersecurity due to the continuous growth of the number of end-user
sophistication, and cost. During the years 2019 and 2020, the attack
devices, networks, and user interfaces, combined with the implicated
surface expanded due to the accelerated digitization and the increasing
increasing quantities of data transmitted over the Internet brought
dependence on the digital infrastructure (e.g., cloud-based services, re-
by the advances in cloud and fog/edge computing, the Internet of
mote work). According to Symantec’s report, over 60 million malicious
Things (IoT), Industry 4.0/5.0, and 5G/6G [4,5]. In such a context,
attempts were blocked in the second quarter of 2020, representing
a 74.6% increase over the previous months [1]. In the beginning of signature-based cybersecurity strategies have become time-consuming

∗ Corresponding author at: College of Computer Science and Technology, Zhejiang University, No. 38 Zheda Road, Hangzhou 310027, China.
E-mail address: [email protected] (M. Macas).
1
The research work of Mrs. Mayra Macas was partially supported by the Chinese government scholarship CSC Reg. No.: 2017GBJ005834.

https://doi.org/10.1016/j.comnet.2022.109032
Received 8 October 2021; Received in revised form 28 February 2022; Accepted 22 April 2022
Available online 16 May 2022
1389-1286/© 2022 Elsevier B.V. All rights reserved.
M. Macas et al. Computer Networks 212 (2022) 109032

and make security protection systems behave reactively rather than • Q5. Which are the most important and promising directions for further
proactively. Furthermore, technological advances are also benefiting study?
attackers who are developing new, complex, and sophisticated attack To that end, we explore the application of deep learning techniques
strategies that can evade detection by existing security systems [6]. As to cybersecurity tasks like intrusion detection in IoT and Software
the cyber-threat landscape continues to extend, advanced technologies Defined Networks (SDNs), IoT malware analysis, botnet detection and
and tools for predicting, detecting, and making decisions faster to domain generation algorithms (DGAs), cyber–physical system security,
address emerging attacks and threats are required. spam filtering, fraud detection, and encrypted traffic analysis.
In the past few years, cybersecurity researchers have started to
explore AI because it can potentially intelligently analyze and auto- 1.2. Previous surveys
matically classify large amounts of Internet traffic [4,5]. It is estimated
that the market for AI in cybersecurity will grow from US$8.8 billion Cybersecurity and deep learning problems have been researched
in 2019 to a US$38.2 billion net worth by 2026 [7]. Deep learning, a mostly independently. Only recently, a crossover between the two areas
subfield of AI based on multiple layers of artificial neural networks, has has emerged. In [14–16], machine learning applications to cyberse-
established a key role in solving complicated cybersecurity problems curity problems have been studied without deep learning techniques.
due to its ability to manage complex data structures, its automatic Other authors describe deep learning methods for a narrow set of cy-
feature extraction, and its efficiency in recognizing patterns and cor- bersecurity applications. Xin et al. [17] analyze only the attacks related
relations. In practice, the application of deep learning in cybersecurity to intrusion detection and the deficiencies of the existing datasets.
offers three strategic advantages: Similarly, the authors in [5] review different deep learning techniques
• Simplicity: In contrast to traditional machine learning (ML) tech- focusing on their application to intrusion detection, along with the
niques, deep learning considerably simplifies feature handcrafting, respective datasets. Shaukat et al. [18] analyze cyber-attacks related
replacing brittle, complex, engineering-heavy pipelines with straight- solely to malware analysis, intrusion detection, and spam detection and
forward, end-to-end trainable models allowing to offload a lot of do not cover other areas. In [19], the authors summarize and present
work [8,9]. Within the cybersecurity context, feature handcrafting works that aim to secure cyber physical systems (CPSs). Several deep
for each type of attack requires outstanding human effort because learning-based anomaly detection approaches for CPSs are also ana-
of the ever-changing and growing cyber-threat landscape [9]. Deep lyzed and evaluated in [20]. The studies [21,22] review machine learn-
learning techniques, on the other hand, can be trained for learning ing and deep learning methods for securing IoT. Rodriguez et al. [23]
the features, which makes them an excellent choice for security tasks compile deep learning techniques in mobile networks security.
because they can detect previously unknown intrusions (e.g., zero-day More recent surveys like [10,24,25] present deep learning models
malware) [10]. that have been used to automate various security tasks such as intrusion
• Scalability: Classical ML learning algorithms often require storing all detection and malware analysis, but they suffer from the following
data points in memory, something that is computationally non-viable shortcomings: (i) they do not include the most cutting-edge deep
under big data scenarios. Besides, the traditional ML algorithms do not learning methods; (ii) deep learning model selection directions for
significantly improve their performance with a massive volume of data. solving security problems are not provided; (iii) they focus only
Thus, they do not provide scalability [8]. In contrast, deep learning on unencrypted traffic and traditional networks, neglecting emerging
models can be trained on datasets of varying size, since they can iterate paradigms like IoT and SDNs; (iv) spam filtering is examined exclu-
over small batches of data (e.g., using Stochastic Gradient Descent- sively within the context of e-mail, ignoring online social networks and
SGD [11]). Furthermore, using a vast amount of data in the training user reviews; (v) credit card and telecommunication fraud detection is
process of the deep learning techniques has the additional benefit of overlooked; (vi) intelligent transportation systems are not examined;
preventing model over-fitting. These properties are compatible with and (vii) they do not provide any guidelines regarding the use of the
the security domain, where vast amounts of heterogeneous data are most appropriate dataset per case.
produced from sensors, logs, endpoint agents, and distributed directory
systems. 1.3. Contributions
• Reusability: Unlike many traditional ML approaches, deep learning
models can be trained on additional data without starting again from This paper comprehensively reviews recent advances and state-of-
scratch. Thus, they are suitable for continuous online training, a de- the-art DL-based solutions in a wide range of cybersecurity applica-
sirable property for huge production models [8]. Moreover, trained tions, in order to bridge the gaps and limitations identified in previous
deep learning models are repurposable and, therefore, reusable via surveys. The contributions of our work can be summarized as follows:
transfer learning, allowing to reinvest previous work into increasingly • We examine cutting-edge deep learning methods from the perspective
sophisticated and robust models. This is essential in the cybersecurity of the security domain, focusing on their applicability to this area,
domain because it decreases the computational and memory require- while giving less attention to conventional deep learning models that
ments of cyber defense systems when performing multi-task learning may be out-of-date. Moreover, we provide insights about selecting
applications, e.g., collaborative spam filtering [12,13]. the most appropriate deep learning architecture for each security task
(answering Q1 in Section 2).
1.1. Survey scope • We introduce a step-wise deep learning framework customized for
cybersecurity applications. At every constituent stage of the framework,
This survey aims to present a comprehensive analysis of state-of- we discuss the related challenges, and we provide insights (answer-
the-art deep learning practices in the cybersecurity domain. By doing ing Q2 in Section 3.1).
this, we aim to answer the following key questions: • We present an exhaustive list and a brief overview of the available
• Q1. What are the cutting-edge deep learning techniques significant to the datasets that can be used for each cybersecurity application (answer-
cybersecurity domain? ing Q3 in Section 3.2).
• Q2. How can cyber analysts, researchers, or engineers apply deep learning • We explore applications previously overlooked in other related sur-
to specific cybersecurity problems? veys, such as malware detection in IoT devices, network intrusion
• Q3. What are the available datasets for training, validating and testing detection in IoT and SDNs, credit card and telecommunication fraud
deep learning-based cyber-defense systems? detection, spam filtering in online social networks and users reviews,
• Q4. What are the latest successful deep learning-based systems in cyber- cyber–physical systems security, and encrypted traffic analysis. Fur-
security? thermore, we discuss the advantages and limitations of each reviewed

2
M. Macas et al. Computer Networks 212 (2022) 109032

study in order to highlight best practices and caution against poten-


tial pitfalls in developing deep learning-based cyber-defense systems
(answering Q4 in Section 4).
• We identify the open issues and current challenges of the crossover
between deep learning and cybersecurity, and we provide a compre-
hensive list of promising research directions to address and overcome
them (answering Q5 in Section 5).

1.4. Literature search methodology

As mentioned in the introduction, the applicability of deep learning


techniques in cybersecurity is not straightforward. In order to prepare
and compile this survey, we adopted the following methodological
approach. Initially, we meticulously investigated research contributions
that addressed various aspects of the most prominent threats to security
and privacy in recent times. The aim was to extract relevant, common,
and impactful cyberthreats. We then confirmed their consistency with Fig. 1. Distribution of analyzed research works by year.
the last report of the threat landscape compiled by ENISA [3]. This
report lists 15 types of current threats such as malware, web-based
attacks, phishing, spam, denial-of-service, and botnets as the most • Spam [35] which is unsolicited bulk messaging, usually for the pur-
prominent threats. Next, we searched academic publication repositories
pose of advertising;
(e.g., Google Scholar, ACM Digital Library, and IEEE Xplore) to check
• Telecommunication fraud [36] that regards the abuse of telecommuni-
whether there is research work that applied artificial intelligence-based
cations products and services to illegally gain money from the telecom
methods to detect such threats. We identified several cybersecurity-
operator or its subscribers.
related applications with an extensive body of work based on deep
learning during this process. These applications included Malware De- • Robocalls [37] that are calls delivering pre-recorded messages realized
tection and Analysis, Fraud Detection, Spam Filtering, Cyber–Physical by auto-dialing software;
System (CPS) Attack Detection, Botnet Detection & DGAs, Encrypted • Online credit card fraud [38] which refers to the unauthorized use of
Traffic Analysis, Network Intrusion Detection, and Authentication Solu- funds in online transactions realized through credit or debit cards;
tions (especially biometric-based ones such as the face, iris, fingerprint • Denial of Service (DoS) [39]which is based on temporarily blocking
recognition). Subsequently, we short-listed the aforementioned cyber- the normal use of network utilities by flooding the network with traffic;
security topics based on their maturity in the field of deep learning and • Distributed DoS (DDoS) [40] in which the flooding is performed by a
their significance in the current cybersecurity threat landscape. Finally, great number of sources;
we organized the review of state-of-the-art DL-based studies around this • Phishing [41] in which the attackers pretend to be a reputable entity
taxonomy. to reveal personal information or obtain private assets;
More precisely, even though authentication solutions are an essen- • Click Fraud [42,43] that refers to a bot pretending to be a legitimate
tial element in cybersecurity, we noticed that authentication schemes user/visitor on a website and clicking on ads, buttons, or other types
based on face, iris, or fingerprints had been significantly investigated of hyperlinks;
and surveyed from the perspective of deep learning. For instance, Guo • Cyber, Physical, and Cyber–Physical attacks [20] in CPSs that impact
et al. [26] and Wang et al. [27] extensively surveyed face recognition the computer and/or physical components of the targeted systems,
techniques that utilize different deep learning techniques. Similarly,
respectively;
other image-based biometric modalities such as iris recognition and
• User-to-Root (U2R) [39] that involves behaving as a regular user to
fingerprint recognition have been extensively studied [28,29]. Thus,
detect system vulnerabilities and gain root access;
we excluded such topics from our work. Moreover, we observed that
• Remote-to-Local (R2L) [39] that attempts to use a remote system to
Intrusion Detection in traditional networks and PC-based Malware
Detection and Analysis are also well-surveyed areas [10,24,25]. Given gain unauthorized access and cause damage;
the above, we have come up with the following comprehensive list of • Port scanning (i.e., vulnerability probe) [39] which involves search-
topics that form the structure of the main body of this survey: Malware ing for vulnerabilities throughout the entire network by sending scan
Detection (in particular mobile device malware analysis, IoT malware packets and gaining information about the system;
analysis), Botnet Detection and DGAs, Spam Filtering, Fraud Detec- • Brute force (i.e., password) [44] which is an attempt to gain unau-
tion, Network Intrusion Detection (specifically in IoT networks and thorized access to the system by using guessing techniques to steal
SDNs and WLANs), CPS Attack Detection, and Encrypted Traffic Analy- passwords;
sis. For each of these topics/applications, the selected publications are • SQL Injection [5] that uses scripts injecting commands/queries to gain
presented in chronological order from 2016 up to 2021, inclusively. unauthorized access and steal information.
We did not include papers published prior to 2016, except for seminal For every cybersecurity application we analyzed, there are one or
and highly relevant works. The numbers of citations assessed by Google
more relevant attacks as outlined in Table 1. It should be noted that
Scholar and Scopus were used to filter the most influential research
although encrypted traffic analysis does not directly correspond to a
papers. Overall, we reviewed 75 research works, whose distribution per
particular attack, it plays an increasingly important role in network
year is illustrated in Fig. 1.
Within the scope of this survey, we examined the following types of management and cybersecurity since network traffic encryption has
attacks: become ubiquitous. Over time, malicious usage of encrypted traffic
• Malware that describes any software (e.g., worm [30], trojan [31], classification has evolved significantly and includes website finger-
ransomware [32], spyware [33], and Advanced persistent threats — printing (i.e., identifying the websites that users access by discovering
APTs [34]) designed to cause harm or obtain unauthorized access to patterns in the monitored network packets) and protocol/application
computer systems; identification.

3
M. Macas et al. Computer Networks 212 (2022) 109032

Table 1 any structure of interconnected neurons sending information to each


Examined cybersecurity applications and related attacks.
other. The key difference between Deep Neural Networks (DNNs) and
Type of attack NID M-DA BD-DGA CPS-AD SP-F FR-D the simpler single-hidden-layer neural networks is the large depth of
Worm [30] ◦ ∙ ◦ ◦ ◦ ◦ the former; that is, the high number of hidden layers participating
Trojan [31] ◦ ∙ ◦ ◦ ◦ ◦ in the multistep process of pattern recognition. Specifically, a DNN
Spyware [33] ◦ ∙ ◦ ◦ ◦ ◦
Adware [45] ◦ ∙ ◦ ◦ ◦ ◦ consists of an input layer, a suitable number (more than one) of hidden
Ransomware [32] ◦ ∙ ◦ ◦ ◦ ◦ layers, and an output layer. Each layer of a DNN is composed of neurons
Rootkit [46] ◦ ∙ ◦ ◦ ◦ ◦ that are capable of producing nonlinear outputs from their inputs. Data
APTs [34] ◦ ∙ ◦ ◦ ◦ ◦ is propagated to the hidden layers by the input layer neurons that
Spam [35] ◦ ◦ ∙ ◦ ∙ ◦ initially receive it. Then, the neurons in the hidden layers generate
Telecommunication fraud [36] ◦ ◦ ◦ ◦ ◦ ∙
Robocalls [37] ◦ ◦ ◦ ◦ ◦ ∙ weighted sums of the input data on which they apply specific activation
Online credit card fraud [38] ◦ ◦ ◦ ◦ ◦ ∙ functions (e.g., ReLU or tanh). Subsequently, these outputs are propa-
DoS [39] ∙ ◦ ◦ ◦ ◦ ◦ gated to the output layer, where the final results are presented. Fig. 2(a)
DDoS [40] ∙ ◦ ∙ ◦ ◦ ◦ shows an example of a DNN.
Phising [41] ◦ ◦ ∙ ◦ ◦ ◦
Click Fraud [42,43] ◦ ◦ ∙ ◦ ◦ ◦
Cyber attacks [20] ◦ ◦ ◦ ∙ ◦ ◦ 2.1.1. Convolutional Neural Networks (CNNs):
Physical attacks [20] ◦ ◦ ◦ ∙ ◦ ◦ CNNs, also called ConvNets, are a specialized kind of neural network
Cyber–physical attacks [20] ◦ ◦ ◦ ∙ ◦ ◦ destined to process data that come in the form of multiple arrays. Many
U2R [39] ∙ ◦ ◦ ◦ ◦ ◦ data structures are organized in multiple arrays, such as 1D arrays for
R2L [39] ∙ ◦ ◦ ◦ ◦ ◦
Probe [39] ∙ ◦ ◦ ∙ ◦ ◦ signals and sequences, 2D arrays for digital images or spectrograms of
Brute force (i.e., password) [44] ∙ ◦ ◦ ◦ ◦ ◦ audio, and 3-D arrays for volumetric images and videos. To fully use
SQL Injection attack [5] ∙ ◦ ◦ ∙ ◦ ◦ the 2D structure of input data, local connections and shared weights
Legend: ∙Yes, ◦No, NID: Network Intrusion Detection in the network are employed in place of the traditional fully connected
M-DA: Malware detection and analysis, SP-F: Spam filtering networks. This process results in significantly fewer parameters, making
BD-DGA: Botnet detection and DGA, FR-D: Fraud detection the network faster to train. In a typical CNN, a series of convolutional
CPS-AD: Cyber–Physical System Attack Detection
layers are followed by polling (subsampling) layers, while in the final
stage fully connected layers (similar to multilayer perceptron — MLP)
are generally employed. An example of image classification using CNN
1.5. Paper organization is illustrated in Fig. 2(c). The CNN and its variants (e.g., ResNet [49],
DenseNet [50], SqueezeNet [51], MobileNets [52], and YOLO [53])
The remainder of this article is structured as follows. Section 2 have been investigated for a variety of cybersecurity applications such
introduces and compares the most relevant deep learning models, also as user authentication, fraud, and malware detection.
providing guidelines for model selection towards solving cybersecurity
problems. Furthermore, it analyzes the most recent advances of deep 2.1.2. Recurrent Neural Network (RNN) and Long Short-Term Memory
learning (e.g., attention and transformers) and their influence in the (LSTM)
cybersecurity domain. The proposed framework and the datasets used RNNs [54] are robust sequence learners, arranged to capture the
for training, validation, and testing of DL-based defense systems are temporal dependencies in data by including memory. Fig. 2(b) illus-
overviewed in Section 3. In Section 4, state-of-the-art studies applying trates a conventional RNN unfolded in time that can be trained across
deep learning techniques to specific cybersecurity applications are many times steps using backpropagation through time (BPTT) [55].
reviewed, and the related advantages and shortcomings are discussed.
Despite their efficiency in modeling sequential data, RNNs suffer from
Lessons learned, future research directions and open challenges are
the so-called vanishing gradient problem that occurs when the output
presented in Section 5. Finally, the paper is concluded in Section 6 with
at any given time step depends on inputs much earlier in time. The
a summary of its main take-away messages.
LSTM [56,57] architecture has been developed in order to address this
issue. As shown in Fig. 2(e), each unit in the LSTM model has a cell
2. Deep learning background
memory with a state that stores information about the input sequences
across time steps. The reading and modifying access to the memory
AI is an approach striving to build intelligent machines that mimic
units are controlled through sigmoid gates. LSTM models perform bet-
or even surpass human intelligence. Many techniques fall under this
broad umbrella, such as expert systems, evolutionary algorithms, and ter than RNN models when data is characterized by a long dependency
machine learning. Machine learning enables the artificial process to ab- on time [58]. Such long dependency can be observed in data generated
sorb knowledge from data and make decisions without being explicitly by IoT networks or complex systems (e.g., Cyber–Physical Systems —
programmed. Generally, machine learning algorithms are categorized CPSs). Overall, the use of LSTM and its variants (e.g., Convolutional
into supervised, unsupervised, and reinforcement learning. Deep learn- Long Short-Term Memory — ConvLSTM [59]) shows promise in im-
ing (DL) is a subfield of machine learning that carries out representation proving the attack detection and prediction accuracy in settings where
learning through multilayer transformation, thereby generating more data is time-dependent.
accurate results for detection and prediction tasks. Particularly in cy-
bersecurity, DL-based defense systems are being used to automate the 2.1.3. Autoencoder (AE)
detection of cyber-attacks while evolving and improving their capabili- In AEs, the desired output is set to be equal to the input [47].
ties over time. With the goal of answering Q1 (What are the cutting-edge As shown in Fig. 2(d), AEs generally include two parts, namely the
deep learning techniques significant to the cybersecurity domain?), this sec- encoder and the decoder, which are non-linear mapping functions
tion summarizes the most representative models and emerging trends implemented through NNs. The encoder maps the input data into the
in deep learning for cybersecurity applications. low-dimensional latent space, whereas the decoder maps the latent
representation into the output layer to reconstruct the input. The
2.1. Deep learning models encoder and decoder can be implemented by different types of NNs,
including recurrent neural networks or feedforward non-recurrent neu-
DL is a set of prediction models that are based on Artificial Neural ral networks. This type of models has been mainly employed to solve
Networks (ANNs) [47,48]. An ANN is a generic term that encompasses unsupervised learning problems and transfer learning [47]. Depending

4
M. Macas et al. Computer Networks 212 (2022) 109032

Fig. 2. DNN, RNN, CNN, and LSTM architectures.

on the size of the hidden layer, an autoencoder is classified as under- of training with multiple layers in a greedy layer-by-layer manner
complete or overcomplete. Moreover, based on the constraints imposed allows to build a hybrid probabilistic generative model, termed DBN,
on the loss function, there exist various types of AEs, such as sparse that involves both undirected connections between its top two layers
AEs [47], denoising AEs [60], Stacked Contractive AEs [61,62], Adver- and downward directed connections between the rest layers [70,71].
sarial AEs [63], and Variational Autoencoder (VAEs) [64–66]. Due to The lower visible layer represents the states of the input layer as a
reconstructing the input at the output layer, AEs are typically employed data vector. A DBN learns to probabilistically reconstruct its inputs
for network intrusion and spam detection tasks. This architecture has in an unsupervised manner, while the layers act as further detectors
received much attention for enabling applications of Industrial IoT on the inputs. Moreover, an additional supervised training process
(IIoT), such as fault diagnosis in machines and hardware devices, and provides the ability to perform classification tasks. Many applications
physical-based anomaly detection mechanisms. can benefit from the use of DBNs, such as false data injection attack
detection in industrial environments and the detection of anomalies in
2.1.4. Deep Belief Network (DBN) IoT networks.
As can be seen in Fig. 3(a), DBNs [67] are a type of generative ANNs
that resembles a composition of several stacked Restricted Boltzmann 2.1.5. Generative Adversarial Network (GAN)
Machines (RBMs) [68]. The RBM is an energy-based model that has GAN was developed by Goodfellow et al. [72] and consists of
a single layer of hidden units without connection to each other and a generative network and a discriminative network. The generator
an undirected connection to a layer of visible units. Multiple hidden captures the distributions of the real data and tries to produce sam-
layers can be trained using the hidden layer output of one RBM as ples with the same characteristics in order to fool and confuse the
the training data for the next higher-level RBM [69–71]. This method discriminator, which in turn attempts to distinguish the real data from

5
M. Macas et al. Computer Networks 212 (2022) 109032

Fig. 3. DBN, GAN, DRL, TL architectures.

the fake/generated. Typically, the training process of conventional 2.2.1. Transfer learning (TL)
GANs is highly sensitive to model structures, learning rates, and other Ideally, in machine learning, there is a considerable volume of
hyper-parameters. Thus, numerous ad hoc ‘‘tricks’’ are usually needed labeled training data that follows the same distribution as the test
for achieving convergence and improving the fidelity of generated data [87,88]. Nevertheless, collecting relevant and sufficient training
data. In order to mitigate this problem, several variants of GANs have data is often time-consuming, expensive, or even unrealistic in some
been introduced, such as Wasserstein Generative Adversarial Network scenarios. Particularly in the cybersecurity domain, where new types
(WGAN) [73], BigGAN [74], and Loss-Sensitive Generative Adversar- of attacks (e.g., zero-day attacks) appear daily [87,89]. Semi-supervised
ial Network (LS-GAN) [75]. In the cybersecurity domain, GANs are learning can partly alleviate this problem by relaxing the requirement
typically applied to overcome the data imbalance problem by using of a large volume of labeled data. However, in many cases, unla-
artificial variations to minimize any bias in data collection [76,77]. An beled instances are also complicated to collect. To solve the above
example of GAN is depicted in Fig. 3(b). problem, a promising machine learning methodology that focuses on
transferring knowledge across domains is TL [47]. TL aims at improving
2.1.6. Deep Reinforcement Learning (DRL) the performance of target learners on target domains by transferring
DRL is composed of Reinforcement Learning (RL) and DNNs. It the knowledge contained in different but related source domains (see
aims to create an intelligent agent that can carry out efficient poli- Fig. 3(d)) [88]. In this way, target learners can be constructed without
cies for maximizing the rewards of long term tasks with controllable depending on the existence of a large amount of target domain data. TL
actions (see Fig. 3(c)). Typically, RL searches for the optimal policy is an attractive potential solution for many cybersecurity applications
of actions over states from the environment, and the DNN repre- where gathering training data is not an easy task (e.g., data from IoT de-
sents a large number of states and approximates the action values vices [87]). Deep learning models are a good match for TL due to their
to estimate the quality of the action for any given state. Represen- capability of learning both low-level and abstract representations from
tative DRL methods comprise Deep Q-Networks (DQNs) [78], Deep input data [90]. In particular, stacked denoising autoencoders [87,91]
Deterministic Policy Gradient (DDPG) algorithm [79], Asynchronous and other variants of autoencoders [92] have been shown to perform
Advantage Actor–Critic (A3C) [80], deep policy gradient methods [81], very well in this area. More information about TL can be found in the
Rainbow [82], Distributed Proximal Policy Optimization (DPPO) [83], survey [88].
adaptive deep Q-learning (ADQL) algorithm [84], and content based
deep reinforcement learning (C-DRL) [85]. By incorporating DL into 2.2.2. One-shot/few-shot learning
traditional RL, DRL is highly efficient in solving dynamic, complex, Two extreme TL paradigms are one-shot learning and zero-shot
and especially high-dimensional security problems [86], including DRL- learning. The former involves a pre-trained model and only one or a
based security methods for CPSs, multiagent DRL-based game the- handful of samples per category, whereas the latter does not require any
ory simulations for cyber-defense strategies against cyber-attacks, and sample [93]. Instead, it leverages the meta description of the category
and the correlations with existing training data. Even though research
autonomous intrusion detection approaches.
regarding deep one-shot learning and deep zero-shot learning [94–
Lessons learned. More recent architectures, including 96] is in its infancy, both paradigms are very promising in detecting
VAE and GAN, are expected to have a significant impact new threats or intrusions. Some initial works in cybersecurity lever-
on cybersecurity applications since they cover semi- aged ANNs such as Triplet Networks [97] and Siamese Networks [96]
supervised/unsupervised learning. Those are more favorable for one-shot/few-shot learning, alleviating in this way the need to
for cybersecurity applications, particularly in IIoT-based gather and train with a large dataset. A Siamese neural network (SNN)
security, where only a small fraction of the vast amount of (a.k.a. twin NN) [98] is composed of two ‘‘twin’’ networks that are
generated data can be annotated for supervised ML. Emerging trained simultaneously to learn the similarity of two instances called
machine learning architectures such as DRL can support a pair. Triplet networks [99] are comprised of parallel and identi-
autonomous intrusion detection approaches and DRL-based cal sub-networks that share the same weights and hyperparameters.
security methods used for CPSs. The networks are trained using three different inputs called triplets.
During training, each input is individually fed to its corresponding
sub-network.

2.2. Emerging trends in deep learning 2.2.3. DL with attention


Broadly speaking, attention is used to focus on the important parts
Herein, we briefly review the recent advances and emerging trends of the input data, while ignoring other irrelevant information. To that
in deep learning. end, an attention-mechanism examines the input sequence and decides

6
M. Macas et al. Computer Networks 212 (2022) 109032

Fig. 4. Deep learning framework for cybersecurity applications.

at each step which other parts of the sequence are important. In the cy-
bersecurity domain, attention mechanisms (e.g., temporal and spatial)
have been demonstrated to achieve outstanding accuracy in predicting
intrusion/attacks. Usually, such attention mechanisms are used in con-
Fig. 5. Cybersecurity datasets per application/category.
junction with a recurrent network (e.g., GRU, LSTM, Bi-LSTM, and so
on) [100–105]. However, recurrent architectures are computationally
inefficient because they rely on the sequential processing of input at the
encoding step, prohibiting parallelization. A novel architecture called client or server-side of the communication channel, the edge of the
Transformer (Tr) [106–108] addresses this issue by relying only on a network, or any place in between. After collecting an appropriate and
sufficient amount of raw data, it is necessary to contemplate how to
self-attention mechanism to capture global dependencies between input
store it for carrying out further processing. Typically, physical devices
and output. For further details about attention models, see Ref. [109].
and cloud storage services are used to keep the data into databases or
files [113].
2.2.4. DL with non-Euclidean data
Subsequently, in the preprocessing step, the previously collected data
Deep learning techniques have been very successful in processing
should be cleaned, mapped into a common schema, merged, and con-
grid-like data such as image, sound, video or text. However, it is
verted into suitable formats and types. Feature engineering refers to ex-
important to explore DL in non-Euclidean domains (e.g., graphs and
tracting and selecting suitable features, which are of paramount im-
manifolds), which are becoming increasingly present in real life. Within
portance in machine learning since they are pivotal in defining and
the context of cybersecurity, recent works [36,38,110,111] have started
enriching the predictors. Nevertheless, in practice, coming up with
employing Graph Neural Networks (GNNs) due to their ability to
appropriate features is usually challenging and requires a lot of labor,
capture complex relationships between objects and make inferences
time, and technical expertise. An alternative approach is representation
based on data described by graphs [112].
learning, whose fundamental concept is recognizing and disengaging
the latent explanatory factors present in the data [114]. In that way, the
3. Deep learning framework and datasets for cybesecurity
extraction of valuable information in the form of appropriate features
is facilitated via learning representations of the data.
In order to answer Q2 (How can cyber analysts, researchers, or Selecting the ‘‘right’’ deep learning model depends greatly on the input
engineers apply deep learning to specific cybersecurity problems?) and features, which directly determine the model’s accuracy. The modeling
Q3. (What are the available datasets for training, validating and testing process is iterative, providing critical insights regarding the refinement
deep learning-based cyber-defense systems?), this section introduces a of data preparation and model specification at each repetition. In order
deep learning framework for cybersecurity applications and briefly to find the optimal model, it is necessary to try several algorithms
overviews the datasets and testbeds that can be used for training, with specific parameters (and hyper-parameters) in a trial-and-error
validating, and testing deep learning-based cyber-defense solutions. fashion. Typically, a dataset comprises three parts: training, validation,
and test set. The first subset is employed during training, while the
3.1. Deep learning framework second is used to measure the prediction accuracy. This validation
accuracy is one of the principal criteria for deciding whether to accept
Herein, we propose a generic deep learning framework for or reject the trained model. When we settle on the chosen model
cybersecurity applications (DLF-CA) that is built upon the knowledge type and hyper-parameters, we proceed to train a new model with the
distilled from the papers examined in this survey. The conceptual entire set of available data using the best hyper-parameters found. This
model of our DLF-CA is illustrated in Fig. 4. As can be seen, the should include any data that was previously held aside for validation.
first step in building a cyber defense system (i.e., classifier/predictor) The last step is periodic evaluation over updated test sets, which is
is the problem formulation. In this step, we should clearly define the essential for verifying that the model can recognize and predict zero-
goal of the classifier/predictor. Typically, goals include malware de- day attacks. Table 2 describes the most common evaluation metrics for
tection/classification, botnet detection, cyber–physical system security, deep learning models.
network intrusion detection, spam filtering, fraud detection, and en-
crypted traffic analysis. Then, it is necessary to collect a vast amount of 3.2. Cybersecurity datasets and testbeds
data. Sufficient data quality and quantity are crucial for solving com-
plex and challenging security problems. In practice, researchers often Benchmark datasets allow researchers to train, validate, and test
collect a dataset specific to their classifier’s/predictor’s goal. To do this, the proposed DL-based security solutions. Furthermore, they are es-
the first step is to determine the data collection location(s), which can sential for reproducing experiments and comparatively evaluating the
drastically affect the capability to learn general trends, as well as the achieved performance using the same data. In this subsection, we
available features, the granularity, and the reliability of the data. In provide an overview of the most prominent datasets employed in
particular, data collection can happen in various locations, namely the cybersecurity, organized per specific application/category (Fig. 5), and

7
M. Macas et al. Computer Networks 212 (2022) 109032

Table 2
Common evaluation metrics for deep learning models.
Metric Description Equation
𝐹𝑃
False Positive Rate (FPR) or The proportion of the elements that were wrongly
𝐹𝑃 + 𝑇𝑁
False Acceptance Rate (FAR) determined as positive among the actual negatives.
𝑇𝑃
Recall, Sensitivity or True The proportion of actual positives that were
𝐹𝑁 + 𝑇𝑃
Positive Rate (TPR) correctly identified.
𝐹𝑁
False Negative Rate (FNR) or The proportion of the elements that were wrongly
𝐹𝑁 + 𝑇𝑃
False Rejection Rate (FRR) determined as negatives among the actual positives.
𝑇𝑁
Specificity or The proportion of actual negatives that were
𝐹𝑃 + 𝑇𝑁
True Negative Rate (TNR) correctly identified.
𝑇𝑃
Precision The ratio of actual positives over all the elements
𝑇𝑃 + 𝐹𝑃
predicted as positives.
𝑇𝑃 + 𝑇𝑁
Accuracy The ratio of correctly predicted items over the total
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
number of items.
2 × 𝑃 𝑟 × 𝑅𝑒𝑐
F1-Score The harmonic mean of precision and recall. Also .
(𝑃 𝑟 + 𝑅𝑒𝑐)
known as F-Score or F-measure.

Area Under Curve (AUC) The area covered by the plot of TPR and FPR (ROC 𝑅𝑂𝐶

Curve) at different threshold values between 0 and 1.

a list of suitable testbeds. We have also created a GitHub repository2 Regarding malware detection and analysis in IoT, N- BaIoT [124]
with an extended curated list of available datasets, which we aim to contains real traffic (115 numerical features) obtained from 9 com-
update continuously. mercial IoT devices. For each device, the data was collected under
The most commonly used datasets for network intrusion detec- normal operating conditions and when several different attacks were
tion are KDD Cup 1999 [115] and its evolution NSL-KDD [39]. Both launched by BASHLITE. Furthermore, IoTPOT [125] contains 500 IoT
datasets include traffic records that are labeled in five classes: Normal, malware samples (belonging to four major families) collected by an
Probe, Remote to Local (R2L) attacks, User to Root (U2R) attacks, and IoT honeypot. Finally, VirusShare [126] is a repository of malware
Denial of Service (DoS) attacks. However, it should be noted that the samples that is continuously updated, providing access to the latest
underlying network traffic dates back to the year 1998. NSL-KDD was threats discovered.
created to mitigate the redundancy and bias problems of its prede- To detect spam in mobile devices, Genome [127] has 1242 ma-
cessor. Another popular dataset is CTU-13 [42], which contains the licious applications gathered from unofficial Chinese marketplaces in
raw packet data (PCAP files) and considers 13 scenarios with different 2010 and 2011. The samples are divided into 49 malware families com-
botnet attacks. UNB ISCX 2012 [116] was created by the Canadian prising almost all malware categories: Rootkit, Botnet, SMS Trojans,
Institute for Cybersecurity of the University of New Brunswick (UNB), Trojan, Installer, and Spyware. Contagio-Mobile [128] is a blog-like
in 2012. Traffic was captured in an emulated network environment website operating since 2012 that aims to collect malware for different
over a period of 7 days. AWID [117] is a labeled dataset that focuses mobile operating systems. Contrary to Genome which contains several
on 802.11 networks. A small network environment with 10 clients samples of each malware family, Contagio provides only a few samples
was designed to capture the WLAN traffic in a packet-based format, (most commonly one). Finally, AndroZoo [129] currently contains
while 15 specific attacks (e.g., Probe Request, and CTS Flooding) were more than 16 million apps collected from several sources. Each app
executed. The CIC-IDS2017 [118] dataset was also created by UNB and has been analyzed by tens of different AntiVirus products in order to
contains different types of user profiles (generating background traffic) determine if it is malware or not.
and multistage attacks (e.g., Heartbleed and DDoS). The CICFlowMeter The most significant data sources for domain generation algo-
tool was used to extract 80 traffic features. rithms detection are several dynamic lists such as the global ‘‘Top
CSE-CIC-IDS2018 [119] contains 6 different types of network at- Sites’’ published by Alexa Internet [130] for generating benign domain
tacks (i.e., DoS, DDoS, Botnet, brute-force, infiltration, and web attacks) names and OSINT [131], DGArchive [132], 360netlab [133], Amri-
and was generated by using synthetic user profiles to capture abstract taDGA [134], and UMUDGA [135] for producing malicious domain
representations of network events and behaviors. A victim infrastruc- names. In particular, Bambenek Consulting provides more than 800
ture composed of 420 computers and 30 servers was attacked by fifty thousand malicious domain names from 50 different families via the
network nodes. The CICFlowMeter-V3 tool was employed to extract OSINT DGA feed. The DGArchive contains more than 30 reverse-
the 84 network traffic features of the dataset. The CIC-DDoS2019 engineered DGAs that can be leveraged to generate malicious domain
dataset [120] includes many different DDoS attacks carried out through names on an internal network. Similarly, 360netlab and AmritaDGA
application layer protocols using TCP/UDP. IoT-23 [121] was created include 52 and 20 DGA families, respectively. Finally, UMUDGA offers
in 2019 and consists of 20 captures with malware activity and three a collection of over 30 million manually-labeled algorithmically gener-
captures of benign IoT traffic. TON_IoT was generated in 2019 by ated domains, sorted in 50 malware classes. Apart from the above, the
Abdullah et al. [122] and includes various types of IoT data (e.g., oper- list of the most queried domains based on passive DNS usage provided
ation system logs, telemetry data), as well as IoT traffic gathered from a by Cisco Umbrella [136] is also used for DNS-based detection.
medium-scale network at the Cyber Range and IoT Labs of the UNSW The following four datasets are widely utilized in industrial con-
Canberra, Australia. Finally, the LITNET-2020 dataset [123] contains trol systems. BATADAL [137] regards a water distribution network
feature vectors generated during 12 attacks on general computers composed of seven storage tanks with eleven pumps and five valves,
deployed on an academic network. It was collected for a period of 10 controlled via nine Programmable Logic Controllers (PLCs). The net-
months. work was created using epanetCPA [138] and the dataset contains
8761 records of 43 variables. SWaT [139] concerns a scaled-down, fully
operational water treatment plant and comprises a six-stage process.
2
https://github.com/mmacas11/Cybersecurity-Datasets The entire dataset has 946,722 labeled (attack or normal) records,

8
M. Macas et al. Computer Networks 212 (2022) 109032

containing 51 attributes corresponding to the sensor and actuator data. Table 3


Some open access testbeds for cybersecurity.
WADI [140] is also available upon request and regards several large
water tanks supplying water to consumer tanks. It contains 15 attacks, Testbed Focus area

aiming to stop the water supply to the consumer tanks. It is significantly DETER [150] Networked or distributed cyber and cyber–physical
systems.
larger than the SWaT and BATADAL datasets, with 1,221,372 records
and 126 features. Finally, the HAI Dataset (HIL-based Augmented FIT Lab [151] Wireless sensors and IoT

ICS) [141] is a collection of data from three physical control systems NITOS [152] The Outdoor Testbed, the Indoor RF Isolated
(i.e., a GE’s turbine, an Emerson’s boiler, and a FESTO’s water treat- Testbed, and the Office Testbed.

ment system), combined through the dSPACE HIL simulator. On the ORBIT [153] Cognitive radio networks, future Internet
architecture, WiFi networks, inter-layer wireless
other hand, the studies focusing on smart grids most frequently employ
security and cloud computing.
simulations. To that end, the IEEE X-bus (e.g., 39-bus, 123-bus) is an
Fed4FIRE+ project [154] 5G technology, IoT, OpenFlow, Cloud computing, Big
effective evaluation platform. Regarding intelligent transportation
Data, wired, and wireless networks.
systems, experimentation is usually performed on real-world datasets
DRAKVUF [155] Malware detection
that contain CAN bus data collected from actual vehicles.
Emulab [156] Networking and distributed systems
The most frequent dataset used for web spam detection is WEBS -
PAM-UK2007 [142]. It is based on crawls of the .uk Web domain COSMOS [157] Real-world experimentation on next-generation
wireless technologies and applications
performed in May 2007 and includes 105.9 million pages, over 3.7
billion links, and about 114,529 hosts. In order to detect spam in the EdgeNet [158] Distributed edge cloud

Twitter social network, the following datasets can be employed: Social


honeypot [143] and UtkMI [144]. Social honeypot has 22,223 con-
tent polluters together with their number of followings and 2,353,473 or malicious behavior breaching the security or standard policies. Intru-
tweets, as well as 19,276 legitimate users with the corresponding num- sion Detection Systems (IDSs) are broadly classified into host-based and
ber of followings and 3,259,693 tweets, collected over 7 months. The network-based. The former monitors an individual computer system
UtkMl’s Twitter dataset contains 11968 tweets and 8 features (i.e., id, (e.g., operating system files and logs) looking for malicious activities,
tweet, following, followers, action, is_tweet, location and type), with whereas the latter examines network traffic to recognize any malicious
the 49% of the tweets being spam. Regarding spam detection in short and anomalous activities that can be part of an attack.
message service (SMS), the SMS Spam Collection v.1 [145] dataset Generally, network-based IDSs (NIDSs) can be designed follow-
is employed. It is a collection of 5574 spam and legitimate English ing two main methods of identifying and alerting against threats: a
text messages distributed in 4827 legitimate messages and 747 spam signature- or an anomaly-based detection approach. Signature-based
messages. detection compares new data with a knowledge base of known in-
The available datasets for encrypted internet traffic classification trusions, searching for specific patterns or signatures of attacks. Even
include ISCX VPN-nonVPN [146] and ISCX Tor-nonTor [147]. The though this approach demands regular updates to databases storing
former covers 15 popular applications such as Facebook, YouTube, rules and signatures and, as a consequence, is not useful against zero-
Netflix, etc., which are encrypted using various protocols. The latter day attacks, it remains the most popular choice in commercial intrusion
detection systems. On the other hand, anomaly-based detection focuses
contains eight types of traffic—namely, VOIP, chat, audio-streaming,
on knowing normal behavior to identify abnormalities (i.e., significant
video-streaming, mail, P2P, browsing, and File Transfer—from more
deviations from normal traffic) as malicious activity. As a result, this
than 18 representative applications. It uses benign traffic from a VPN
approach can detect zero-day attacks that have never been seen before
project created by Draper-Gill [146]. The Open HTTPS [148] and
and is difficult to be avoided by the attackers since the normal activity
QUIC [149] datasets can be employed for the performance analysis
is customized for the particular user, applications, and network be-
of dedicated encryption protocols. The first contains full HTTPS raw ing monitored. Anomaly-based detection, however, may produce high
PCAP files from crawling the top 779 accessed HTTPS websites. The FPRs because it considers any previously unseen traffic as a potential
second is a self-collected encrypted dataset of the newly established malicious attack, even though it might actually be benign.
QUIC protocol. In this section, we focus on anomaly-based NIDSs, for which a large
Cybersecurity research can be challenging due to the continuous body of DL-based methods has been developed in recent years. More
technological advances and the required interdisciplinary collabora- precisely, machine learning (ML) techniques have been adopted in NIDs
tion. Going beyond simulations, which are useful during the devel- to create models of normal behavior and detect deviations that cannot
opment of the initial proofs-of-concept, researchers need to test and be recognized by the conventional signature-based methods, hence
evaluate their prototypes in suitable platforms experimentally. To that providing better generalization capabilities for detecting advanced at-
end, Table 3 provides some indicative testbeds for experimentally tacks [125,159,160]. Moreover, they require a lighter deep packet
evaluating the proposed systems with real pilots and experiments over inspection, which alleviates some privacy concerns. However, although
bare-metal hardware. many traditional ML techniques, such as NNs, fuzzy logic, and Hidden
Markov Models (HMMs), have been successfully applied for network
4. DL applications to cybersecurity anomaly detection, they suffer from their shallow architecture [90].
This leads to some limitations in dealing with the emergence of new
technologies and the increased Internet traffic that produces large-scale
Deep Learning is playing an increasingly important role in the
and multidimensional data making the attack scenarios increasingly
cybersecurity domain, enabling and facilitating a wide range of applica-
more sophisticated. In contrast, DL techniques have demonstrated out-
tions (Fig. 6). In this section, we address Q4 (What are the latest success-
standing performance in heterogeneous and non-linearity data analysis.
ful deep learning-based systems in cybersecurity?) by critically reviewing
They can extract better representations from the raw data and leverage
state-of-the-art DL-based defense systems for each main application
them to construct better models. Additionally, deep networks can auto-
category. matically reduce the network traffic complexity to find the correlations
among data without human intervention. Finally, unlike shallow ML
4.1. Network intrusion detection algorithms, DL approaches can be designed to perform both the feature
extraction and classification tasks together. Different NIDs deployment
Intrusion Detection is a way of monitoring the events happening strategies are shown in Fig. 7. Table 4 presents a summary of the
within a network or on a local system to detect any signs of abnormal DL-based cybersecurity systems for network intrusion detection.

9
M. Macas et al. Computer Networks 212 (2022) 109032

Fig. 6. Overview of the most common cybersecuirty applications.

Fig. 7. Various Architectures for NIDS deployment. Centralized deployment is usually used in small networks due to the involved transmissions to the central server, whereas
distributed and decentralized architectures employ inter-agent communication and hierarchical decision-making. Finally, federated learning takes advantage of edge computing.

4.1.1. Software Defined Networks (SDNs) and Wireless Local Area Net- most of the experiments, an AUC of 0.988 was achieved. Due to the
works (WLANs) limitations that RNNs suffer, the proposed method is unsuitable in
In SDN, the brain of the system is decoupled from the nodes com- scenarios when the network data is characterized by long dependency
prising the network. It is located in a centralized and well-separated on time.
entity called the Controller. This entity has control of the entire net- With the extensive popularization of WLAN technology used in
work and can act at a higher level coordinating all the network nodes in hardware devices, the IEEE 802.11 protocol-based short-distance trans-
order to evade potential intrusions. Tang et al. [160] used a DNN model mission wireless network confronts significant security challenges [161,
consisting of one input layer, three hidden layers, and a softmax layer 162]. The application of DL to recognize the attack features and per-
as an output layer in an SDN environment to detect intrusion activities form wireless network intrusion detection on two imbalanced datasets
by classifying traffic flows into normal and abnormal classes. After (i.e., AWID [117] and LITNET [123]) is described by Yang et al. [162].
being trained on the NSL-KDD dataset [39], the model was loaded in In order to handle the impact of the imbalanced dataset and the data
the (logically) centralized SDN controller, which received the statistics redundancy on the detection accuracy, a window-based instance selec-
of the traffic traversing the network, and applied the prediction scheme tion algorithm ‘‘SamSelect’’ was adopted to undersample the majority
in order to decide whether a particular flow is affected by malicious class data samples. Then, the stacked contractive AE algorithm [61,62]
code or not. The experimental results showed that their model achieved was used to reduce the dimensionality of the data samples. Finally,
a detection accuracy rate of 75.75%, utilizing only six basic flow Conditional DBN [162] performed the attack detection. The proposed
features. Nonetheless, there was notable evidence of overfitting to the approach achieved an accuracy of 97.40% and recall and F1 score of
data, suggesting that regularization techniques could enhance results. 97.60%, and 97.10%, respectively, on both datasets, outperforming LR
DDoS attacks can benefit from existing vulnerabilities in virtual- and SVM. However, more experimental scenarios and the FPR/FNR
ization technologies (e.g., virtual machines and containers) and IoT should be examined and analyzed.
devices, which can be harnessed as part of a botnet to launch at- In [161], a Restricted Boltzmann Machine-based Clustered IDS
tacks [125,159]. Elsayed et al. [159] introduced an IDS to combat DDoS (RBC-IDS) was employed for monitoring critical infrastructures by
attacks in an SDN environment, combining an RNN with AE models. wireless sensor networks (WSNs). RBC-IDS slightly outperformed the
The RNN model was applied to address the loss in data information due adaptively supervised and clustered hybrid IDS (ASCH-IDS) [163],
to the sequential traffic, and the flow-based features from CICFlowMe- achieving 99.12% and 99.91% for detection rate and accuracy, re-
ter were exploited by employing the CIC-DDoS2019 dataset [120]. In spectively, over the KDD Cup 1999 dataset. The detection time of

10
M. Macas et al. Computer Networks 212 (2022) 109032

RBC-IDS was approximately twice that of ASCH-IDS, which additionally standalone IDS. Nonetheless, the proposed distributed approach might
degrades the performance of IDS. However, the KDD Cup 1999 dataset be vulnerable to iterative generated attacks. Once the discriminator
was generated up a decade ago and may not depict the type of network is trained, it is possible to find ways to generate instances capable of
attack traffic that would be expected in today’s WSNs. bypassing the detection system.
Notwithstanding the significant effort made in annotating IoT traffic
4.1.2. Internet of Things (IoT) records, the number of labeled records is still limited, raising the
The number of client and IoT/mobile devices (i.e., edge devices) difficulty in identifying attacks and intrusions. To address this issue,
is continuously rising, leading to the rapid expansion of the digital Abdel-Basset et al. [172] introduced a semi-supervised DL approach
substrate serving as a launchpad for automated attacks, which are for intrusion detection called SS-Deep-ID, which consists of a multi-
expected to grow both in scope and in frequency [164]. However, it is scale residual temporal convolutional (MS-Res) module that finetunes
necessary to consider that such devices are not able to support the same the network capability in learning spatio-temporal representations. The
type of cybersecurity functionality found in enterprise data centers key in the MS-Res module is the dilated causal convolutions (DC-Conv)
and clouds. The main challenge in the design and implementation [173], and a traffic attention module is incorporated to help the net-
of robust attack detection systems for IoT devices regards resource work emphasize the most significant features for detecting intrusions.
limitations, delay sensitivity, and distribution issues [165]. The en- The CIC-IDS2017 [118] and CSE-CIC-IDS2018 [119] datasets were used
terprise infrastructure and services deployed closer to the end-user in the experiments. Despite its efficiency, the proposed model suffers
devices (i.e., at the perimeter of the enterprise networks) comprise a from the following limitations: (i) It is not clear whether it can
definite first line of defense for dealing with the scale and distribution preserve its effectiveness in scenarios with a massive amount of IoT
of cyberattacks [166,167]. The fog/edge infrastructure tier introduces traffic data; (ii) Distributed training, which is essential in intelligent
new opportunities to support cybersecurity strategies, since it resides IoT applications, is not analyzed.
closer to the devices and therefore can see, detect, and react to events Given that the fifth generation (5G) mobile communications are
more quickly, thus forming a new security perimeter that can contain still in their infancy, several security gaps that can be exploited in
attacks faster and more efficiently. intrusion attempts are expected. Rezvy et al. [174] introduced a DL
In this regard, Abeshu et al. [165] used the self-learning abilities model for intrusion classification and prediction in 5G and IoT net-
of DL methods to detect cyberattacks by employing fog nodes as data works. In the proposed model, an AE was used to provide a compressed
and control processing centers in IoT with a fog platform. Using the representation of the input space and a dense NN functioned as the
NSL-KDD dataset [39], they built stacked autoencoders (SAEs) with two supervised classifier to distinguish intrusive events (i.e., impersonation,
hidden layers to extract hidden features. Then, the obtained features flooding, and injection) from the normal ones. The study reported
were applied to the test data to extract end features for softmax classi- its five-fold cross-validation performance over the imbalanced AWID
fication. The experimental results showed that their model reached an dataset [117] as overall 99.9% accuracy with unknown variance. The
accuracy of 99.2% with a detection rate of 99.27% and a false alarm lowest performance of 99.42% corresponded to the flooding category.
rate of 0.85% on a four-class problem, outperforming shallow learning Although the work achieved promising results, it, unfortunately, did
models. Although the proposed approach seems to be useful to intrusion not present how the multi-class imbalanced problem was addressed.
detection in the fog-to-things computing platform, the dataset used in Energy-efficient IoT is known as Green IoT. With the development
the evaluation is out-of-date (i.e., it lacks the more recent types of of 5G mobile communications, Green IoT has attracted considerably
attacks) and so its relevance is questionable. more attention. Nie et al. [175] developed an IDS based on the DDPG
The Industrial IoT (IIoT) regards the use of many interconnected algorithm [79]. Their method first extracts the statistical features of
smart sensors, instruments, and other devices in industrial sectors and prior network traffic to capture the trends of traffic flows and perform
applications, including manufacturing and energy management. Yao traffic prediction. Then, the developed traffic predictors are employed
et al. [168] proposed a hybrid IDS architecture for edge-based IIoT in combination with a suitable threshold to enable intrusion detection.
leveraging ML techniques. They divided the IIoT scenario into a central The CIC-DDoS2019 dataset [120] was used to evaluate the proposed
network component and an edge component. Devices with reliable model. The achieved performance included 99% precision with an FPR
computing power and enough resources such as edge routers were of 1.21%, outperforming PCA and Sparse Regularized Matrix Factoriza-
considered as the master nodes, while the industrial equipment of the tion (SRMF) [176]. With the 5G cellular applications, traffic will show
edge part was regarded as edge nodes. Due to the restricted computing more complex features, raising a more significant challenge to network
power and resources of edge nodes, they applied the LightGBM algo- traffic prediction [177]. How to identify and extract significant features
rithm on them, and performed the first intrusion detection task. On for improving the accuracy must be analyzed.
master nodes, they employed a CNN structure to perform the second Social IoT (SIoT), which combines users’ social behaviors and phys-
intrusion detection task and enhance the detection accuracy of the ical IoT [178], can provide ubiquitous Internet access for users. As a
overall network. The authors claimed that their proposed IDS could strategy to mitigate the rapid increase of resource congestion, collabo-
improve detection accuracy and reduce detection time and network rative edge computing (CEC) has become a paradigm for covering the
resource consumption. Nevertheless, not enough information about the demands of IoT. To ensure the security of CEC, the authors in [178]
employed datasets (e.g., size and date of creation) and the evaluation proposed a GAN-based IDS for extracting low-dimensional features
metrics for the entire system is included. from original network flows. The CIC-DDoS2019 [120] and CSE-CIC-
Ferdowsi and Saad [169] introduced a distributed GAN-based IDS IDS2018 [119] datasets were used to evaluate the proposed methods
that enabled the IIoT devices to monitor their neighbors in order to for binary and multi-class classification, showing a high achieved pre-
detect intrusions with a minimum dependence on a central unit. The cision. However, it is unclear how they handle the spatio-temporal
objective of their distributed GAN [170] was to place a discriminator patterns present in networks data, given that the generator and the
at every IoT device without sharing their local datasets. The principal discriminator network of the employed GAN were constructed using
difference between the distributed IDS and the standalone IDS is that FFNNs.
the latter learns to compare a new data point with its own data There are scenarios in which it is useful or even mandatory to
distribution. Contrary to that, in the proposed distributed IDS, every isolate different subsets of training data from each other [179,180].
IoT device could compare a new data point with the distribution of Federated learning systems fully embrace this principle [179,181,182].
the total data. The SBHAR dataset [171] was used in the experiments. Wang et al. [183] proposed a federated anomaly detection system
The study reported the achieved performance as 20% higher accuracy, employing DRL to enable multiple parties to jointly learn an accurate
25% higher precision, and 60% lower false-positive rate than the deep model, while preserving the data itself local and confidential.

11
M. Macas et al. Computer Networks 212 (2022) 109032

Table 4
Selected studies focusing on DL-based security methods for Network Intrusion Detection.
Software Defined Networks and Wireless Networks (Section 4.1.1)
Year Authors Deep Model Dataset Advantages and Limitations
2016 Tang et al. [160] – DNN – NSL-KDD [39] – The model is overfitting.
2019 Rezvy et al. [174] – AE – AWID [117] – No explanation about how the multi-class imbalanced
– Dense NN problem was addressed.
2019 Otoum et al. [161] – RBM – KDD Cup 1999 [115] – Outdated dataset.
2020 Yang et al. [162] – Contractive AE – AWID [117] – Needs more experimental scenarios and the FPR/FNR
– Conditional DBN – LITNET [123] should be analyzed.
2020 Elsayed et al. [159] – RNN – CIC-DDoS2019 [120] – Not suitable for scenarios when the network data is
– AE characterized by long dependency on time.
Internet of Things (Section 4.1.2)
Year Authors Deep Model Dataset Advantages and Limitations
2018 Abeshu et al. [165] – SAEs – NSL-KDD [39] – The dataset lacks more recent types of attacks.
2019 Yao et al. [168] – LightGBM – Unknown – Not enough information about the datasets used.
– CNN
2019 Ferdowsi et al. [169] – Distributed GAN – SBHAR [171] – This approach might be prone to iterative generated
attacks.
2021 Abdel-Basset et al. [172] – DC-Conv – CIC-IDS2017 [118] – Experiments with a massive amount of IoT traffic data are
– C-Conv – CSE-CIC-IDS2018 [119] needed.
– Attention module – Distributed training is not analyzed.
2021 Nie et al. [175] – DDPG – CIC-DDoS2019 [120] – 99% precision with an FPR of 1.21%.
2021 Nie et al. [178] – GAN – CIC-DDoS2019 [120] – It is unclear how the spatio-temporal patterns present in
– CSE-CIC-IDS2018 [119] network data are handled.
2021 Wang [183] – DDPG – Simulated IoT – In order to prevent privacy leakage, abnormal actions of
environment users were detected on time.

Another significant advantage of using federated distribution instead of via detecting whether it is being run inside a sandbox or a controlled
a centralized architecture is that unexpected intrusions in one or more environment and, based on this, deciding not to exhibit any malicious
client systems do not affect the whole system. However, since federated behavior. The hybrid technique combines the respective advantages of
learning employs secure aggregation to protect the confidentiality of both the static and dynamic analysis methods.
the local models, it cannot detect anomalies in the participants’ con- In practice, these techniques are time-consuming and involve a
tributions to the joint model. For instance, a compromised participant manual component, which makes them hard to scale as the number, so-
can submit a malicious model tampered with to include backdoor phistication, and complexity of the malware increase. Many ML-based
functionality [184]. techniques have been proposed that address scalability by automating
various steps of the malware detection and categorization processes.
Findings. We note that CNNs and AEs are the most commonly Nevertheless, their effectiveness is restricted by high FPRs that render
employed deep neural networks. This can be attributed to the them inaccurate. To address this issue, researchers have recently fo-
former’s capability of processing data from multiple arrays and cused on DL-based systems. DL-based methods have been demonstrated
the latter’s suitability for semi-supervised/unsupervised learn-
to classify malware at a much better speed than human analysis with
ing. Although a large part of the analyzed works focuses on
high accuracy rates. These methods can be used in malware analysis
algorithms that improve detection results, most studies have
to comb out the suspicious binaries that can then be examined and
neglected to evaluate the reliability of the benchmark datasets.
validated by an expert human. Table 5 summarizes the examined deep
Furthermore, some features are less effective in the detection
learning-based research in malware detection.
of new variants of attacks. Consequently, the reported per-
formance evaluation that is based on detecting older attack
patterns in well-known datasets can be misleading. Thus, there 4.2.1. Mobile malware analysis
is a need for the construction and usage of more up-to-date Globally, the total number of mobile devices is expected to grow
datasets. from 8.8 billion in 2018 to 13.1 billion by 2023, 6.7 billion of which
are anticipated to be smartphones [136]. Smartphones have become
attractive due to the accessibility of office applications, Internet, vehicle
4.2. Malware detection and analysis guidance employing location-based services, and games along with
traditional services such as voice calls, SMS texts, and multimedia
Techniques for malware (short for malicious software) analysis can services. The Android smartphone OS has captured a significant market
be divided into three groups [185]: (1) Static, (2) Dynamic, and (3) share because of its open architecture and the high acceptance of its
Hybrid. Static malware analysis is conducted by reverse-engineering application programming interface (APIs) in the developer community,
the malware binary to its assembly code and then examining the leaving its major competitor iOS far behind [186]. The increasing
included instructions without actually executing it. Nevertheless, such popularity of Android devices and associated monetary gains have
a technique can be effortlessly defeated by evasion techniques like drawn malware developers’ attention, raising a notable increase in
obfuscation and embedding of syntactic code errors. Dynamic malware Android malware apps. Recent studies reveal that a new malicious
analysis is carried out by executing the malware in a controlled sandbox application for Android is introduced every ten seconds [187]. The
environment in order to observe its behavior and effect on the host common types of Android malware apps include, but are not limited to,
system. Although resource-intensive, it is effective against malware ob- trojans, spyware, backdoors, worms, botnets, adware, and ransomware,
fuscation. However, sophisticated malware can avoid dynamic analysis which use different techniques to infiltrate the users’ systems. Dynamic

12
M. Macas et al. Computer Networks 212 (2022) 109032

payload, code obfuscation, drive-by download, and repackaging popu- mechanisms, have been applied during the last decades. However, it
lar applications are some examples of abundant methods leveraged by has already been proven that they have low accuracy and limited
malware authors to bypass the existing protection mechanisms. scalability for malware detection and analysis in IoT devices [87,198,
Consequently, the exclusion of those malicious Android applications 199]. DL is a promising approach for IoT devices due to some of
is highly requested by app markets. Commonly, the employed methods their specific properties. For example, IoT devices produce a sheer
utilize ML algorithms to detect malware. However, their performance amount of data required by DL techniques to bring intelligence to the
relies heavily on human-engineered features, which limits their gen- systems. Furthermore, the heterogeneous data generated by IoT devices
eralizability. On the contrary, DL removes the necessity of a domain is better utilized by DL techniques, which enable the IoT systems to
expert selecting features because it automatically selects features by make informed, fast, and intelligent decisions.
training. Thus, it is no wonder that DL methods have been widely A DL-based method to detect the Internet of Battlefield Things
applied in Android malware detection and categorization, recently. (IoBT) malware via the device’s OpCode sequence was proposed by
Starting from the raw sequences of the app’s API method calls Azmoodeh et al. [198]. The Class-Wise Information Gain technique was
derived from the DEX (Dalvik Executable) file, Karbab et al. [188] used to select the top 82 features and, at the same time, overcome
introduced MalDozer that aimed to detect malware and its associ- the problem of the imbalanced dataset. Each sample’s selected features
ated class using a CNN architecture that received as input a se- were converted into a Control Flow Graph which was used to classify
quence of vectors obtained using the Word2Vec word embedding IoT malware and goodware applications applying deep eigenspace
technique [189]. The experiments employed several different datasets, learning and CNN techniques. The experimental results showed that
the proposed system achieved 99.68% accuracy in detecting malware
including Malgenome [190] (1K data points), Drebin [191] (5.5K
samples, with precision and recall rates of 98.59% and 98.37%, respec-
data points), the MalDozer dataset (20K data points) that contained
tively. Unfortunately, the study does not report any information about
data collected from the Internet (e.g., Virusshare [126] and Conta-
the hyper-parameters used during the training phase of the network.
gio Minidump [128]), and a merged dataset of 33K malware data
In order to detect IoT malware in embedded Linux-based IoT de-
samples. Apart from that, around 38K of benign apps were down-
vices, Jeon et al. [199] introduced DAIMD that performed dynamic
loaded from Google Play Store [192] and were used during evaluation.
analysis in nested cloud-based VM environments and learned behavior
The experimental results showed that the F1 scores achieved for the
images compressed with behavior data based on CNNs. The study stated
class attribution and detection tasks were between 96%-98% and
a performance of 99.28% with 0.63% FPR. However, the dataset used
96%–99% respectively, whereas the false positive rate was in the range
in the experiments includes only 1,401 samples which are not cross-
of 0.06%–2%. Although MalDozer appears to be useful in malware de- validated. Moreover, sending data to the cloud for inference or training
tection and class classification, the Drebin [191] and MalGenome [190] may incur additional queuing and propagation delays from the network
datasets are out-of-date and could negatively influence the resilience and cannot satisfy strict end-to-end low-latency requirements needed
of the proposed approach facing new, sophisticated and obfuscated for real-time.
malware families. To address the ‘‘lack of labeled information’’ for training the de-
MobiDroid [193] is a lightweight Android malware detection sys- tection model in pervasive IoT devices, Vu et al. [87] developed a
tem that can be executed on the users’ mobile devices in real-time system based on deep TL named MMD-AE. The labeled and unlabeled
and relies on the information from APK files. The first stage of the data were fitted into two AE models with the same network structure.
system is feature preparation which is based on decoding each APK Moreover, the Maximum Mean Discrepancy (MMD) metric was used to
file into original resources and .smali files. The resulted feature vector transfer knowledge from the first AE to the second AE. This study used
is then fed to a CNN classifier. Finally, the Android app prediction nine IoT attacks from N-BaIoT [124] for the evaluation. Overall, the
is performed with the help of a migrated and quantized detection IoT attack detection task in the target domain achieved an AUC score
model. The study used a dataset of 50,000 Android apps, 29,010 of 0.937. A limitation in this approach is the excessive time for training
of which were malware samples collected from various sources in- the model compared to baseline methods.
cluding Drebin [191], Genome [127], Contagio [194], Pwnzen, and In [200], Dib et al. defined a multi-dimensional classification ap-
VirusShare [126], whereas the rest were benign apps crawled from proach employing LSTM and CNN models to first extract and then
Google Play Store [192]. The authors reported a performance of 97% combine the features of string- and image-based representations of the
accuracy and demonstrated the fast reactive (less than 10 s) detection executable binaries towards accurate IoT malware classification and
service provided directly on mobile services. A similar approach was family attribution. The proposed model was evaluated over 74,429
proposed in [195], but instead of decompiling APK into source code, IoT malware binaries from well-known online malware repositories to-
like smali code, the manifest properties and API calls were extracted gether with a special-purpose IoT honeypot (IoTPOT [125]). The results
and vectorized directly from the binary code. were 99.78% accuracy and 99.57% F-score, outperforming classifiers
Multivector malware usually hides under legitimate third-party soft- based on a single data modality.
ware and can be easily turned into an executable file extension, making
its detection extremely challenging. Haq et al. [196] proposed a hybrid Findings. Since deep learning can automatically find correla-
DL system that leverages CNN and Bi-LSTM to identify persistent tions in the data, it is a promising technology for developing
malware. The data used consists of 30831 legitimate Android APK’s novel malware detection methods, particularly for detecting
extracted from Androzoo [129] and 8011 Malware APK’s obtained from zero-day malware. Note that the final stage in an ML-based
AMD dataset [197]. Although the constructed dataset was highly imbal- malware detection system’s life cycle is evaluating and val-
anced, the proposed model achieved the best performance with a preci- idating the learned model via large-scale deployments in
sion rate of 99.39% and an FPR of 1.9% using tenfold cross-validation real-world environments and settings. This is a challenging
and 905 features. task that requires manual analysis and human intervention,
consuming considerable human effort by malware analysts.
4.2.2. IoT malware analysis Most existing works in the literature do not proceed to this
Existing vulnerabilities in IoT devices that could be employed for type of experimental validation, relying on performance eval-
malware injection are related to application security, authorization, uation using one or more well-known datasets. However, for
and authentication. Apart from these, physically tampering with the the proposed systems to find practical commercial applications
IoT devices for software modification and misconfiguration of secu- in the industry, this type of trial deployments must not be
rity parameters could also enable attackers to inject malicious code. overlooked.
Traditional approaches, such as classical ML-based malware detection

13
M. Macas et al. Computer Networks 212 (2022) 109032

Table 5
Selected studies focusing on DL-based security methods for Malware detection and analysis.
Mobile devices (Section 4.2.1)
Year Authors Deep Model Dataset Advantages and Limitations
2018 Karbab et al. [188] – CNN – Drebin [191] – Datasets are out-of-date.
– MalGenome [190] – This work introduced MalDozer dataset
– Maldozer [188]
2019 Feng et al. [193] – CNN – Drebin [191], Genome – 97% accuracy and fast reactive (less than 10 s) detection
– Parameter [127] service provided directly on mobile services.
Quantization – Contagio [194], Pwnzen
– VirusShare [126]
– Google Play Store [192]
2020 Feng et al. [195] – LSTM/GRU – Drebin [191], Genome – Extension of previous work.
– Bi LSTM/GRU, [127]
CNN – Contagio [194], Pwnzen
– Parameter – VirusShare [126]
quantization – Google Play Store [192]
2021 Haq et al. [196] – CNN, Bi-LSTM – Androzoo [129], AMD – Highly imbalanced dataset.
[197]
IoT devices (Section 4.2.2)
Authors Deep Model Dataset Advantages and Limitations
2018 Azmoodeh et al. [198] – CNN – IoT App store [201] – Does not report any information about the hyper
parameters used during the training phase.
2020 Jeon et al. [199] – CNN – Live – Limited set of samples considered.
– May not achieve a good accuracy and time response for
real-time IoT malware detection.
2020 Vu et al. [87] – AEs, TL – N-BaIoT [124] – Excessive training time compare to baseline methods.
2021 Dib et al. [200] – LSTM, CNN – IoTPOT [125], VirusShare – 99.78% accuracy and 99.57% F-score.
[126]

human-engineered lexical features [208,209]. As a consequence, the


maintenance of such ML systems is rather labor-intensive. Today, the
DL techniques for detecting DGAs learn features automatically, thereby
potentially bypassing the human effort of feature engineering. Table 6
presents the DL-based security systems for Botnet Detection and DGAs.

4.3.1. DGA detection and family classification


To perform DGA detection and family classification, Woodbridge
et al. [207] proposed a data-driven approach using an LSTM model
trained on data collected from the Alexa (top 1 million) whitelist [130]
Fig. 8. The basic botnet life cycle contains four phases: (1) injection (i.e., spreading and approximately 750,000 DGA domain names from the Bambenek
bots by injection), (2) Command & Control (i.e., bots are ready to receive commands Consulting blacklist [131], including thirty DGA malware families. The
and launch an attack), (3) attack (i.e., the botmaster send the command for launching
LSTM network for binary classification (DGA or not-DGA) was extended
an attack), and (4) release (i.e., the botmaster removes his footprint or distributed
source code). to an LSTM network with a softmax layer to perform multi-class clas-
sification (i.e., detect the DGA class) and was compared with an One-
versus-All Random Forest (OVA-RF) approach, including Alexa [130]
4.3. Botnet detection and DGAs as a benign family. The experimental results showed that the LSTM
approach outperformed the OVA-RF approach for all families; however,
According to [202], the global Botnet Detection market size is it failed to identify some of the families correctly due to their insuffi-
expected to reach US$965.6 million by 2027, from US$207.4 million cient representation in the dataset. As a potential solution, the authors
in 2020, at a compound annual growth rate (CAGR) of 24.0% over proposed to train a DGA classifier that assigns domain names to super-
the period 2021–2027. A botnet is a software program that manages families, effectively making the multi-class classification problem more
computers (or other devices) for malicious intentions. Bots are small straightforward and resulting in higher predictive accuracy scores.
scripts created for carrying out particular automated tasks [203] and The study by Tran et al. [206] proposed the use of a cost-sensitive
are operated by one or a small group of collaborating attackers known learning algorithm to train an LSTM network for DGA family clas-
as ‘‘botmaster(s)’’ [204]. Fig. 8 shows a typical botnet’s life-cycle. sification that takes class imbalances into account. Specifically, they
Botnets rely on DGAs to connect to their command and control (C&C employed a hierarchical classifier architecture. The paper used the
or C2) server. These DGAs periodically produce many algorithmically data collected from Alexa (whitelist) [130] and Bambenek Consulting
generated domains (AGDs), which serve as trust points for the bot- (blacklist). The authors claimed that their approach allows them to
net [104,205–207]. Although the bots query all of the AGDs, only provide an improvement of at least 7% in terms of macro averag-
the ones registered by the botmaster in advance resolve to valid IP ing recall, precision and F1-score compared to the CS-NN, CS-SVM,
addresses. In this way, blocking the bot’s connection attempts to its C2 CS-C4.5, Weighted Extreme Learning Machine, HMM, C5.0 DT and
server is more complex than using fixed IP addresses or fixed domain the earlier study of Woodbridge et al. [207]. Still, some families are
names, and conventional techniques such as blacklisting or sink-holing not accurately classified at all by this approach, such as the 𝑙𝑜𝑐𝑘𝑦
are becoming less efficient. Traditional ML methods for DGA detection family(ransomware malware), which was not included in the work of
based on domain name strings rely on the extraction of predefined, Woodbridge et al. [207].

14
M. Macas et al. Computer Networks 212 (2022) 109032

Many DGAs use English wordlists to generate plausibly clean- phase and a live phase that acts as a custom DNS server. In the first
looking domain names, making automatic detection difficult. Curtin stage, the required benign and malicious data were gathered. Subse-
et al. [205] applied RNN architecture for DGA domain detection using quently, labeled data was fed in the DL module, which was composed
a smashword score as a measure of domain resemblance to English of an embedding layer followed by a 1-dimensional CNN that was then
words. The proposed DGA detection model split the input data into forwarded as input to an LSTM recurrent neural network. The trained
subdomain, domain, and top-level domain (TLD). The subdomain and model was used in the second stage to make predictions in real time
domain were fed into individual RNN (with LSTM cells) models to when answering DNS requests. The study reported a mean AUC value
predict the next character in them and these predictions were combined of 0.950.
via a generalized likelihood ratio test (GLRT). Then, the output was To detect attacks launched from IoT bots, Meidan et al. [124] pro-
combined with WHOIS data and the one-hot encoded TLD into the posed a network-based approach that employs a data collection, feature
final logistic regression model to predict whether the input domain extraction, and deep AEs stage, combined with continuous monitoring.
is malicious or not. The employed dataset included 41 DGA families In the first stage, real traffic data (in PCAP format) was collected from
plus not-DGA data, totaling 2.29 million domain names (1.01 million IoT devices that were connected via Wi-Fi to several access points. In
were not-DGA, and 1.28 million were DGA), which were used in several the second stage, whenever a packet arrived, a behavioral snapshot
experiments. The experimental results demonstrated that the proposed of the hosts and protocols that communicated this packet was taken.
approach provides better performance at lower false-positive rates on The snapshot obtained the packet’s context by extracting 115 traffic
DGA families with high 𝑠𝑚𝑎𝑠ℎ𝑤𝑜𝑟𝑑 scores, such as the difficult 𝑚𝑎𝑡𝑠𝑛𝑢 statistics over several temporal windows. The behavioral snapshots
and 𝑠𝑢𝑝𝑝𝑜𝑏𝑜𝑥 families. of benign IoT traffic were used in the third step to train a deep
A heterogeneous DNN framework for extracting the local features autoencoder for each IoT device. The main idea is that after the model
of domain names and a self-attention based Bi-LSTM (SA-Bi-LSTM) for is trained with normal traffic, it is expected to reconstruct it efficiently.
extracting further global features were presented by Yang et al. [104]. However, when the model receives anomalous input, it should not be
The authors used benign samples from the top 1 million domain names able to reconstruct it equally well, thus leading to higher reconstruction
dataset collected by Cisco [210], whereas the DGA samples were from errors. The experimental results showed that the method succeeded
the dataset collected by 360netlab [133]. The focal loss function [211] in detecting every single attack launched by every compromised IoT
was introduced to mitigate the imbalance of the samples’ quantity in device with a TPR of 100% and an FPR of 0.007 ± 0.01. Although
the training phase. Although HDNN achieved a better DGA detection, this approach achieved a very high detection rate, its scalability is
the average accuracy rate was still not higher than 90%. According questionable given that it requires a separate NN for modeling the
to the authors, the detection of DGA domain names can be further behavior of each IoT device-type.
improved by incorporating more perspectives, such as the side infor- In [203], Kim et al. proposed a flow-based botnet detection system
mation of DNS request behaviors. Moreover, the proposed approach using DL to cope with the periodicity of traffic flows, consisting of three
is computationally intensive, which limits its application in the real stages: data pre-processing, anomaly scoring, and anomaly detection. In
world. the first stage, every flow sorted in chronological order is aggregated to
With the development of beyond 5G (B5G) mobile networks, issues obtain statistic features within the windows, which are then fed to the
that threaten security and privacy have risen. To detect the mali- second stage, an RVAE. The RVAE model produces anomaly scores by
cious domain names, Xu et al. [212] used a hierarchical Bidirectional comparing the input with the model’s output (i.e., the reconstructed
LSTM (H-BiLSTM) model, which was trained with data collected from input). Lastly, based on the calculated anomaly scores, the anomaly
Alexa [130] and 360netlab [133]. The experimental results showed that detection function classifies individual connections into either Mali-
the proposed model with three layers in the hierarchy achieved 96.1% cious or Non-malicious. The CTU-13 dataset [42] was employed during
precision, outperforming traditional algorithms (i.e., SVM, DT, LR, and evaluation, showing good detection performance and demonstrating
NB). However, when the number of layers increases, the computational generalization capabilities. However, the sequences used in this work
complexity of the model increase too, decreasing the precision rate. were created by aggregating NetFlows on their source IP and then
each sequence was reduced by summarizing NetFlows, thus inducing
4.3.2. DNS-based and IoT botnet detection a non-negligible loss of data which may break the underlying temporal
DNS data carry rich traces of Internet activities and is a powerful patterns.
resource in fighting against malicious domains. DNS-based detection To detect domain name spoofing attacks in smart cities from DNS
techniques rely on particular DNS information generated by a botnet. traffic, Vinayakumar et al. [204] proposed cost-sensitive deep learning
In order to access the C&C server that is typically hosted by a Dynamic architectures (i.e., RNN, LSTM, GRU, IRNN, B-RNN, B-LSTM, B-GRU,
DNS (DDNS) provider, the bots send DNS queries. Therefore, it is and B-IRNN) combined with Siamese Neural Networks (SNNs) [214].
possible to identify botnet DNS traffic and, ultimately, traffic anomalies Although the experimental results revealed substantial improvements
via DNS monitoring. To detect whether domain names and IP addresses in terms of F1-score, speed of detection, and false alarm rate com-
are benign, malicious, or sinkholes, Lison et al. [208] used a DNN pared to other DL-based DGA approaches (i.e., LSTM, RNN, and GRU),
architecture. The model was trained on a large passive DNS database the proposed models were not able to correctly recognize some DGA
provided by Mnemonic [213] (not freely available to the public). families.
Rather than employing the domain name as the only input to the NN, DNS homograph (DNSH) attacks are a form of phishing that is
numerical (e.g., the lifespan of the DNS record, and number of TTL used by an attacker to make the domain name look very similar to a
changes) and categorical (e.g., ISP associated with the IP address or the trusted domain name (e.g., netflixlife.com → netflixlîfe.com). In order
TLD) features were also used as inputs. While the domain names were to detect randomly generated domain names and domain name system
fed into a RNN with Gated Recurrent Units, the categorical features homograph attacks, Ravi et al. [215] focused on several DL-based
were fed into an embedding layer and finally all three inputs were fed SNNs and DL-based cost-sensitive models. The SNNs accept a pair
into dense feed-forward layers. The authors claimed that the model was of domain names and have identical DL subnetworks for each input.
capable of detecting 95% of the malicious hosts with a false positive The Euclidean distance between the fully-connected layer outputs is
rate of 0.1%. Nevertheless, their model takes much time to converge, computed and then passed through a sigmoid activation function to
which can be a problem in a real environment. determine similarity (i.e., similar or spoof and dissimilar or not-spoof).
The study by Spaulding et al. [209] proposed a DNS Filtering and Moreover, cost-sensitive classification using several DL models is per-
Extraction Network System called DFENS, which served as the first formed for DGA domain name categorization. Experimental validation
line of defense against malicious domains. DFENS employed a training used four datasets (i.e., HDN [216], HPN [216], IDFC [206], and

15
M. Macas et al. Computer Networks 212 (2022) 109032

Table 6
Selected studies focusing on DL-based security methods for Botnet Detection and Domain Generation Algorithms.
DGA detection and family classification (Section 4.3.1)
Year Authors Deep Model Dataset Advantages and Limitations
2016 Woodbridge et al. [207] – LSTM Alexa top 1M domain names [130] and – Pioneer work in applying DL for DGA
750,000 DGA domain names from the detection and family classification.
Bambenek Consulting blacklist.
2018 Tran et al. [206] – LSTM 88,347 benign domains collected from – Suitable for DGA families with a limited
Alexa [130], and 81,484 DGA domain representation in the data
names from the Bambenek. – The collected dataset is publicly available.
2019 Curtin et al. [205] – LSTM 1.02M benign domains from: Alexa – Variety of experiments were performed.
[130] and OpenDNS. 1M malware
domains from different sources such as
DGArhive and Andrey Abakumovs
DGArepository.
2020 Yang et al. [104] – SA-Bi-LSTM Benign samples from the top 1 million – Computationally intensive.
domain names dataset collected by Cisco
[210] and the DGA samples were from
the dataset collected by 360netlab [133].
2021 Xu et al. [212] – H-Bi-LSTM Alexa top 1M domain names [130] and – With more than three layers in the
360netlab [133]. hierarchy, the precision rate falls drastically.
DNS-based and IoT botnet detection (Section 4.3.2)
Year Authors Deep Model Dataset Advantages and Limitations
2017 Lison et al. [208] – GRU 171 million distinct domain names and – The model takes a lot of time to converge.
17 million IP addresses extracted from
Mnemonic.
2018 Spaulding et al. [209] – CNN Around 1,5M benign samples from Alexa – AUC value of 0.950.
– LSTM [130], DMOZ directory, and Agten et al.
[217]’s dataset, and around 40k
malicious records from Openphish,
malwaredomainlist.com, and Agten et al.
[217]’s dataset.
2018 Meidan et al. [124] – AEs Simulated IoT environment – The scalability of the model is questionable
2020 Kim et al. [203] – RVAE CTU-13 dataset [42] – The data preprocesing stage induces a
non-negligible loss of data.
2020 Vinayakumar et al. [204] – SNNs AmritaDGA [134] – High model complexity.
– RNNs
2021 Ravi et al. [215] – SNNs HDN [216], HPN [216], IDFC [206], – Unsuitable when an attacker uses a
– RNNs AmritaDGA [134] spoofed domain name with multiple
– FNNs unknown characters.

AmritaDGA [134]). The best performance was achieved using the Bi- electrical power grids, oil refineries, water treatment & distribution
LSTM with an accuracy of 99%. However, a major limitation of the plants, and public transportation systems. As the deployment of IoT and
proposed approach appears when an attacker uses a spoofed domain 5G/6G mobile communications is undergoing an exponential increase,
name that has more than one unknown character, falsely resulting in the rise in the use of CPSs comes as no surprise. Simultaneously, CPSs,
greater distance than the predefined threshold. IoT and 5G/6G also increase the likelihood of cybersecurity incidents
and vulnerabilities [219]. From the cybercriminals’ perspective, at-
Findings. RNNs (e.g., LSTM, GRU, and Bi-LSTM) can tacking CPSs is a unique opportunity to provoke maximum damage
efficiently handle sequential, time-dependent, and high- with minimum effort [220]. As a result, many elaborate techniques
have been applied to stealthily exploit the CPSs of essential sectors,
dimensional massive data. For that reason, the majority of the
including smart cities [221], energy networks [222], and supply chain
works in this category employed this type of NNs. Addition-
management [223].
ally, the employed dataset plays an essential role in developing
Since the services provided by such systems are essential for the
DGA domain detection systems using ML/DL. By accurately
well-being of the community, CPSs can be classified as Critical Infras-
selecting the data involved in the analysis, it is possible to
tructures [219]. Therefore, they must be robust and flexible against
boost the accuracy and generally increase the performance.
cyber-attacks. Any attack that could compromise or interrupt the pro-
Thus, it is necessary to use more representative and up-to-date vided services would result in severe consequences for the public safety
DGA datasets, e.g., UMUDGA. and order, the economy and the environment. Hence, the ability to
detect sophisticated cyber-attacks on the increasingly heterogeneous
nature of the CPSs, amplified by the arrival of IoT and 5G/6G, has
become a critical task. Regrettably, developing a well-defined security
4.4. CPS attack detection
model of the complex physical process is not a trivial task [103,224].
It demands in-depth knowledge of the system and its implementation,
CPSs comprise a new generation of complex systems whose normal which takes significant time and cannot scale properly to large-scale
operation depends mainly on the robust communications between their and complex systems. An alternative strategy that has recently received
cyber and physical components. The CPS market is expected to grow attention uses deep learning techniques to build more intelligent and
by 9.7% annually, reaching US$9,563 million by 2025 [218]. These powerful methods that leverage big data to identify intrusions and
systems have been proven instrumental to various sectors and have anomalies. Table 7 summarizes the representative works focusing on
been widely implemented in several industrial environments, such as deep learning-based security methods for CPSs.

16
M. Macas et al. Computer Networks 212 (2022) 109032

4.4.1. Industrial Control Systems (ICSs) Therefore, it is not known how well it could scale well to a real-time
In ICSs, different methods based on DL have been proposed for environment.
detecting both attacks and faults [103,225–229]. The attack types
4.4.2. Smart Grid (SG)
include injecting false control commands, spoofing sensor values, and
Smart grids take advantage of CPSs to provide services with high re-
altering communicating traffic packets. Goh et al. [225] applied LSTM
liability and efficiency, focusing on consumer needs. They can adapt to
and Cumulative Sum (CUSUM) to detect anomalies on the first stage energy demands in real time, allowing for increased functionality [24].
of the SWAT dataset [139]. They reported the recognition of nine out However, these grids depend on information technology, which is
of ten attacks with four false positives. Unfortunately, this study did vulnerable to cyber-attacks. One such attack is false data injection
not provide a comprehensive analysis of the stability of the results. (FDI) [236–239]. Generally, FDI attacks inject malicious packets with
In [226], a comparative study of two anomaly detection techniques, the goal of creating small measurement errors that corrupt the compo-
namely DNN and OC-SVM, was conducted. The study was carried out nent of the smart grid that performs state estimation. To overcome this
on all six stages of the SWaT dataset and achieved 92% and 98% problem, He et al. [236] used Conditional Deep Belief Network (CDBN)
precision for SVM and DNN, respectively. However, 23 out of the 36 to efficiently reveal the high-dimensional temporal behavior features
attacks had a detection recall of zero for DNN and only slightly better of the unobservable FDI attacks. Their approach was evaluated on the
IEEE 118-bus power test system and the IEEE 300-bus system and was
for OC-SVM, leading to a low F1 score for both models. Additionally,
able to achieve an accuracy surpassing 93% on several tests. However,
the proposed framework is resource-demanding and complex.
the number of examined experimental scenarios was rather limited.
Similarly, Kravchik et al. [227] used two deep learning mod- According to [237], attackers could inject multivariate malicious data
els, namely 1D-CNN and LSTM, for detecting attacks on the SWAT points in a time period (contextual or collective anomalies). Since such
testbed [139]. They reported that the proposed system reached av- FDI attacks are stealthier, inspecting measurement data alone may
erage rates of 96.8%, 79.1%, and 87.1% for precision, recall, and fail to detect them. The authors proposed a hybrid anomaly detection
F1 score, respectively. However, the attack detection was performed model based on 1D-CNN and RNN that combined sensor measurements
separately at each stage, with no way of learning inter-stage dependen- and network packets to address this issue. Data points that generated
cies. Contrary to that, Macas et al. [103] applied an Attention-based large prediction errors were classified as anomalies. The proposed
Convolutional LSTM Encoder-Decoder (ConvLSTM-ED) model on the framework was evaluated on an IEEE 39-bus system, where it achieved
an accuracy above 90%. By considering anomaly detection as a binary
SWaT testbed [139] in its entirety (i.e., all six subsystems). Thus,
classification problem, Wang et al. [239] applied an RNN model to
their framework was able to model both inter-sensor correlation and
FDI attacks. Simulations over the IEEE 39-bus system indicate that
temporal dependencies of multivariate time series. This study reported their model can achieve an acceptable FDI attack detection accuracy.
96.0%, 81.5% and 88.0% for precision, recall, and F1 scores respec- However, the performance of their framework was compared only with
tively. In [228], the authors applied 1D convolution and autoencoders shallow architectures, and the settings of the RNN model were not
to detect anomalies (cyber-attacks) using the physical state of the specified.
system as measured by the sensors. Among the three employed datasets Given that collecting large-scale labeled data can be excessively
(namely, SWaT [139], BATDAL [137], and WADI [140]), the experi- expensive and time-consuming, Zhang et al. [240] proposed a semi-
mental evaluation of the proposed model reported higher performance supervised learning approach by integrating the AEs into a GAN frame-
results for the SWaT dataset (89.0%, 80.3%, and 84.4% for precision, work for detecting unobservable FDI attacks in distribution systems.
recall, and F1 score). A drawback of this research is that it requires The AEs serve for dimension reduction and feature extraction of mea-
surement datasets and the GAN is employed in the attack detection
the manual setting of a threshold to detect attacks. Xie et al. [229]
task. The authors evaluated the proposed approach on three-phase
proposed a hybrid NN architecture that relies on CNN and RNN for
unbalanced benchmarks: IEEE 13-bus and 123-bus distribution sys-
anomaly detection in CPSs on the SWaT dataset [139]. Although the tems [241]. The experimental results showed that their method has
performance of their framework reported a high precision, they do not a high and robust detection accuracy compared (around 97% of ac-
consider that many SWaT features do not have the same distribution in curacy on both systems) to other semi-supervised learning techniques.
training and testing data [139]. Features like this create a lot of false Similar to [240], the authors in [242] combined two DNNs, namely
positives and, therefore, should be excluded from modeling. AE and GAN, to develop an anomaly detection model capable of
Lu et al. [230] proposed the population extremal optimization-based (i) detecting anomalies and (ii) classifying Modbus/TCP and DNP3 cy-
deep belief network detection method (PEO-DBN), where the PEO berattacks. The proposed model was validated on three SG evaluation
environments originating from the SPEAR project [243]: (i) SG lab,
algorithm [231] was employed to determine the DBN’s parameters.
(ii) hydropower plant, and (iii) power plant. The performance of the
Furthermore, the performance in cyber-attack detection was enhanced
parallel detection of both anomalies and particular cyberattacks was
by introducing a majority voting scheme for aggregating the proposed
95% accuracy with 3.6% FPR.
PEO-DBN, leading to the creation of EnPEO-DBN. The study used A phasor measurement unit (PMU) data manipulation attack
two SCADA network datasets of a gas pipeline system [232] and a (PDMA) can blind the control centers to the real-time operating condi-
water storage tank system [232] in several experiments. Although tions of power systems. To detect this type of attack, Wang et al. [238]
simulation results showed that PEO-DBN and EnPEO-DBN outperform used a deep AE. The input of the AE was 108 features extracted
the accuracy of other methods such as SVM, DT, the ensemble of from PMU measurements (e.g., the three-phase magnitude, angles, and
SVM, and the ensemble of DBN [233], the whole process of the PEO voltages). An attack was detected, if the reconstruction error was above
algorithm has high computational complexity. The authors suggest a pre-defined threshold. The proposed model achieved a high detection
using a surrogate-assisted model to address this problem. performance reaching 94.1% accuracy, 99.6% precision, 88.6% recall,
and 93.8% F1 score.
Considering that with the advent of 5G and the increased use of
CPSs in the industry the attack surface will become broader, the authors 4.4.3. Intelligent Transportation Systems (ITSs)
in [234] developed a framework based on the ResNet-50 model to By enabling the seamless exchange of information between vehicles
mitigate such attacks. This work used the Telecom Italia’s dataset [235] and roadside infrastructure in real-time, connected and automated
for experimentation. Even though the proposed model achieved higher vehicles (CAVs) are expected to entirely and drastically change the
than 91% attack detection accuracy, its complexity is not analyzed. transportation industry [244,249]. CAVs rely heavily on their sensor

17
M. Macas et al. Computer Networks 212 (2022) 109032

Table 7
Selected studies focusing on DL-based security methods for Cyber–physical Systems.
Industrial Control Systems (Section 4.4.1)
Year Authors Deep Model Dataset Advantages and Limitations
2017 Goh et al. [139] – LSTM – SWAT [139] – Does not provide a detailed analysis regarding the stability of the
results.
2017 Inoue et al. [226] – DNN – SWAT [139] – The detection recall for 23 out of the 36 attacks was 0% for DNN.
2018 Kravchik et al. [227] – 1D-CNN, LSTM – SWAT [139] – Learning inter-stage dependencies was not examined.
2019 Macas et al. [103] – Attention module – SWAT [139] – Adaptively selects the most significant input features at each time
– ConvLSTM-ED step.
2019 Kravchik et al. [228] – 1D-CNN, AE – SWaT [139] – A manually set up threshold is required.
– BATDAL [137]
– WADI [140]
2020 Xie et al. [229] – 1D-CNN, RNN – SWaT [139] – Does not consider that several SWaT features do not have the same
distribution in the training and test data.
2021 Lu et al. [230] – PEO [231], DBN – Gas Pipeline [232] – High computational complexity.
– Water Storage Tank [232]
2021 Hussain et al. [234] – CNN (ResNet-50) – Telecom Italia [235] – The complexity of the model was not analyzed.
Smart Grid (Section 4.4.2)
Year Authors Deep Model Dataset Advantages and Limitations
2017 He et al. [236] – CDBN – IEEE 118-bus, IEEE 300-bus – The number of examined experimental scenarios was rather limited.
2018 Wang et al. [238] – AE – IEEE 9-bus, IEEE 30-bus – 99.6% precision in detecting PDMA.
2019 Niu et al. [237] – 1D-CNN, LSTM – IEEE 39-bus – Achieved higher than 90% accuracy in several experiments.
2019 Wang et al. [239] – RNN – IEEE 39-bus – The settings (i.e., hyper-parameters) of the deep model are not
specified.
2020 Zhang [240] – AEs, GAN – IEEE 13-bus, IEEE 123-bus – A reduced amount of annotated data was used.
2021 Siniosoglou [242] – AEs, GAN – SPEAR testbed [243] – A great number of scenarios in the experimental stage.
Intelligent Transportation Systems (Section 4.4.3)
Year Authors Deep Model Dataset Advantages and Limitations
2018 Wyk et al. [244] – CNN – SPMD [245] – Achieved 99.8% precision and 99.5% F1 score.
2020 Hanselmann et al. [246] – LSTM, AE – Real and synthetic CAN data – True Positive (detection) Rate of around 99%.
– Synthetic data is publicly available in [247].
2021 Li et al. [108] – Transformer – Real-World – Accuracy and F1-Score of 99.80%.
(DCS and substation system traffic)
2021 Yue et al. [248] – CNN, RNN – ECN testbed – Time complexity is not examined.

readings and on the information received from other vehicles and road- real-work and synthetic data and achieved a True Positive (detection)
side units to navigate roadways. Therefore, anomalous sensor readings Rate of around 99%.
caused by either malicious cyber-attacks or faulty vehicle sensors can In intelligent transportation infrastructure, safe and reliable intel-
result in disruptive consequences and lead to fatal crashes. In this
ligent charging stations are of paramount importance. Many smart
context, before the mass implementation of CAVs, it is essential to
charging stations have been deployed over the past few years, and
develop strategies for the real-time and seamless detection of anoma-
most of them are online and connected, raising the potential risks of
lies, including identifying their sources. Wyk et al. [244] created an
anomaly detection for CAVs by combining CNN and Kalman filtering threats. Li et al. [108] proposed a DL-based anomaly detection method
(KF). First, a CNN model that consisted of three CNN layers and two for in-vehicle power supply systems on real data (i.e., data collected
fully connected layers was employed to eliminate false sensor readings. by the author’s institute). In particular, they used the Transformer
Then, scrutinized data was fed to KF to remove further anomalies architecture that considers the inherent correlations of traffic generated
undetected by the CNN model. The method was validated on a two-year by ICSs. The results showed that their model achieved an accuracy
real-world dataset obtained from the Safety Pilot Model Deployment rate and an F1 score of 99.80%, outperforming other traditional and
(SPMD) program [245]. The overall hybrid approach is promising, deep architectures such as DT, RF, and CNN. The train Ethernet Consist
reaching up to 99.7% accuracy, 99.2% sensitivity, 99.8% precision, and Network (ECN) undertakes the task of transmitting critical train control
99.5% F1 score, outperforming the two baseline methods (standalone instructions. In order to detect network attacks against the train ECN,
KF and CNN). Yue et al. [248] introduced an ensemble IDS based on CNN and RNN. In
In-vehicle security is particularly challenging due to the controller
particular, they used three variants of CNN, namely LetNet5, AlexNet,
area network (CAN) bus that does not have built-in security. Aiming
and VGGNet, to capture the spatial patterns in the data. At the same
to detect attacks/anomalies on the controller area network (CAN)
time, three variants of recurrent NN called vanilla-RNN, LSTM, and
bus, which is responsible for the communication between devices
GRU models were used to capture the temporal patterns. Thirty-four
(e.g., airbags) and Electronic Control Units (ECUs) [246,250], Hansel-
mann et al. [246] introduced a DL-based framework called CANet features of various protocol contents were extracted from the raw data
trained in an unsupervised manner. They designed CANet using LSTM produced by employing an ECN testbed to build a specific dataset.
to capture the CAN bus time series behavior, AE to learn the normal Although the proposed model achieved an accuracy rate of 98%, it is
behavior, and Exponential Linear Unit (ELU) [251] to improve the clas- not possible to evaluate whether it could work in a real environment
sification of their framework. CANet was tested on high-dimensional since its time complexity is not examined.

18
M. Macas et al. Computer Networks 212 (2022) 109032

such as sending unsolicited messages to legitimate users, posting ma-


Findings. We can see that in ICSs, sensor time-series measure-
licious links, displaying aggressive behavior to obtain attention, and
ment data is usually collected. Often, the attacker’s goal is to
repeatedly posting duplicate updates. In this regard, the study [259]
change the system’s physical behavior, to which end she/he
applied a DL technique based on Word2Vec [189] and MLP to detect
is spoofing the sensors’ values, thus breaking time relations in
spam in Twitter at the tweet level. The proposed model was evaluated
the data. LSTM-based models and variants are used to capture
on four real-world datasets and achieved an accuracy higher than 90%
such time relations. In contrast, FDI attacks are widespread in
on several tests, outperforming traditional classifiers such as Naive
the SG. As can be observed, the majority of security methods
Bayes, Random Forest, and Decision Tree. However, this study does
are employed to aid conventional state estimator methods. AEs
not include explanations of the technical details of the individual
and LSTM techniques can both be adopted. In ITSs, attacks on
methods. A similar approach was developed in [255] but using the
the CAN bus system are the most frequent. Thus, LSTM and
HSpam14 [262], and 1KS10KN [263] datasets.
CNN are applied to capture both time relations and context
A multistage spam detector for mobile social networks using DL
information (e.g., packet order and content). Lastly, DNNs cou-
techniques was developed by Feng et al. [253]. The authors followed
pled with attention mechanisms have been shown to improve
an edge-computing architecture, where initial detection occurred at a
the performance of DL-based security methods, as they enable
the model to learn and focus automatically on the essential mobile terminal and results were then forwarded to the cloud server
features. for further calculation. The study used the Sina Weibo dataset and
reported an accuracy of 88.62%, with 9.04% FPR. Specialized strate-
gies (e.g., parameter pruning, parameter quantization, and knowledge
distillation) for compressing deep networks to accommodate their high
4.5. Spam filtering resource requirements on less powerful mobile terminals could improve
the accuracy and reduce the FPR of the proposed framework. Guo
Spam is a severe problem for most Internet users and the most et al. [111] investigated spamming problems in IoT-based social media
common cyber-attack. It is estimated that spam e-mails constitute the applications. They proposed a spammer detection mechanism named
majority proportion of global e-mails, around 73% [252]. According Co-Spam composed of three NNs: Bi-AE, GCN, and LSTM for IoT
to [252], spam costs businesses US$20.5 billion annually in decreased applications. A series of experiments were performed on Twitter [260]
productivity and technical expenses. Thus, it becomes evident that and Weibo datasets. The results revealed that Co-Spam achieved a
there is a need for reliable and intelligent anti-spam filters. Traditional high precision rate, but many hyper-parameters were introduced to
ML methods are sometimes considered inadequate in capturing the construct a fine-grained feature space.
variability of spamming behavior. Recently, DL has been adopted in Taking into account that deep reinforcement learning algorithms
developing anti-spam filters and was proven to be effective in this converge slower when looking for an optimal sequence of actions
area [111,253–260]. Table 8 summarizes the representative works to reach out a goal state, Lingam et al. [264] introduced a particle
focusing on DL-based spam filtering. swarm optimization (PSO) based deep Q learning algorithm (P-DQL) for
detecting social spam bots by integrating PSO with a Q-value function.
4.5.1. Spam in E-mails and websites The performance of the proposed P-DQL algorithm was evaluated on
Seth et al. [258] proposed two multimodal architectures based on two Twitter datasets, namely the Social Honeypot dataset [143] and the
CNN to tackle spam e-mails based on images and spam content. These Fake Project dataset [265]. The results demonstrated an improvement
architectures combined both text and image classifiers to produce an of up to 15% on precision over other existing algorithms such as
output class (i.e., spam or non-spam). The first architecture employed FFNN, ADQL [84], and C-DRL [85]. This approach works in offline
feature fusion, whereas the other mined the rules between the two clas- settings, thus an interactive environment with online experiments can
sifiers and employed class probabilities. Among the two approaches, the be embedded in the Twitter network as future work.
second architecture achieved the best accuracy with a rate of 98.11%. Attackers exploit OSNs by using abusive accounts to perform ma-
However, the size of the used dataset is limited (it includes only 1500 licious actions for personal or political gain. To address this issue, Xu
images), resulting in an overfitting problem. On the other hand, to et al. [266] introduced a Deep Entity Classification (DEC) framework
detect spam websites in IoT environments, Makkar [256] introduced that leverages DNNs and the multi-stage multi-task learning (MS-MTL)
an LSTM-based model that was trained using the link features. This paradigm to detect abusive accounts in OSNs. DNNs are used to extract
work used the WEBSPAM-UK2007 dataset [142] and achieved 95.25% the accounts’ features based on the properties and behavioral features
accuracy in detecting spam hosts. Regrettably, the evaluation part does observed in their social graph neighbors, whereas MS-MTL allows DEC
not provide any other measures such as precision, recall, or F1 score. to learn the common underlying representations of different abuse
A DL-based scheme providing edge intelligence for web data filtra- types (i.e., fake, compromised, spam, and scam). During its deployment
tion and spam detection was proposed in [261]. The proposed solution for a period larger than two years at Facebook, DEC detected hundreds
comprised three layers. The bottom layer was responsible for collecting of millions of abusive accounts. The authors estimate that DEC is
web data from various sources and delivering the processed data to the responsible for a 27% reduction in the platform’s volume of active abu-
next layer. The middle layer was responsible for web spam detection at sive accounts. The major limitation of DEC is that it is computationally
the edge, where the LSTM and CNN models were used for building the expensive, mainly due to deep features.
spam detector. Finally, the upper layer was in charge of storing efficient
web information in the cloud environment. Although the proposed 4.5.3. Spam in online reviews and short message service
approach achieved an accuracy of 98.77%, a small-sized dataset (300 With the prevalence of the Internet, online reviews have become a
samples for training and 300 samples for testing) was used, which valuable information resource for people. However, the authenticity of
might cause an overfitting problem. online reviews remains a concern, and deceptive reviews have become
one of the security problems to be solved. The study [254] introduced
4.5.2. Spam in Online Social Networks (OSNs) an unsupervised spam detection model based on DL for online reviews
OSNs such as Twitter, Facebook, and Sina Weibo have become using a modified version of Conditional GAN. The proposed model
increasingly popular platforms where users can post their messages was evaluated on reviews collected from Douban (a Chinese online
and share ideas worldwide. Unfortunately, spammers are also active community where users share their reviews to express their feelings
on these social networks using many social techniques to spread spam, about movies) and reported 87.03% accuracy, outperforming several

19
M. Macas et al. Computer Networks 212 (2022) 109032

Table 8
Selected studies focusing on DL-based security methods for Spam filtering.
Spam in E-mails and Websites (Section 4.5.1)
Year Authors Deep Model Dataset Advantages and Limitations
2017 Seth et. al [258] – CNN – Real-world (images) – Dataset with only 1500 images.
2020 Makkar et al. [256] – LSTM – WEBSPAM-UK2007 [142] – Only accurancy rate was reported.
2021 Makkar et al. [261] – LSTM, CNN – Real-world (images) – Small-size dataset (only 600 images).
Spam in Online Social Networks (Section 4.5.2)
Year Authors Deep Model Dataset Advantages and Limitations
2017 Wu et al. [259] – WordVector, MLP – Real-world (tweets) – Comprehensive technical details were not reported.
2018 Madisetty et al. [255] – CNN – HSpam14 [262], 1KS10KN [263] – Several experiments were executed.
2018 Feng et al. [253] – CNN – Real-world (Sina Weibo) – Multistage Spam Detection (mobile terminal and
cloud server).
2020 Guo et al. [111] – Co-Spam – Yang et al.’s dataset [260] – High accuracy requires high complexity.
(Bi-AE, GCN, LSTM) – Live (Sina Weibo)
2021 Lingam et al. [264] – P-DQL – Social Honeypot [143] – This approach works in offline settings.
– Fake Project [265]
2021 Xu et al. [266] – DNN, MS-MTL – Real-world (Facebook) – Computationally expensive.
Spam in Online Reviews and Short Message Service (Section 4.5.3)
Year Authors Deep Model Dataset Advantages and Limitations
2020 Gong et al. [254] – Improved Conditional – Real-world (Douban) – Unsupervised learning.
GAN
2020 Roy el al. [257] – CNN, LSTM – SMS Spam Collection v.1 [145] – English only text messages.
2021 Liu el al. [107] – Modified Transformer – SMS Spam Collection v.1 [145] – A multihead attention mechanism is used for
[267] – UtkMl’s Twitter [144] improving accuracy.

popular deep and traditional unsupervised classifiers (e.g., VAE, LOF, 4.6. Fraud detection
and OC-SVM).
Considering that short message service [257] usage has been rising In technological systems, fraudulent actions occur in several areas
over the last decade, Roy et al. conducted a comparative analysis of daily life, e.g., in telecommunication networks, online banking,
of two DL models (namely, LSTM and CNN) and several traditional mobile communications, and e-commerce [271]. These frauds lead to
systems for classifying spam and not-spam text messages [257]. The considerable financial loss for individuals, businesses, and the govern-
proposed models were based only on text data, and the SMS Spam ment. According to [272], the global market for Fraud Detection and
Collection v.1 dataset [145] was used for their evaluation. The study Prevention is projected to reach US$51.3 billion by 2027, from US$19.5
reports that CNN outperformed LSTM and other traditional ML models, Billion in 2020, growing at a CAGR of 14.8% over the period 2020–
achieving a rate of 98,5%, 97,6%, and 98,0% for precision, recall, and 2027. Fraud detection refers to promptly recognizing fraud as soon
F1-score, respectively. However, the dataset used is not big enough, as possible after it has been committed [36,37,273,274]. Detection
with only 747 spam messages and 4827 not-Spam messages, leading methods are under a constant development to confront criminals by
to overfitting. Moreover, the text messages examined in this study are adapting to their strategies, usually leveraging data mining, statistics,
and machine learning. In particular, DL techniques allow the extraction
written exclusively in English.
of complex information from the data and are more capable to explore
To address the computational efficiency limitation of RNN variants
deeper implicit fraud patterns. The summary of the analyzed works in
such as LSTM, the authors in [107] proposed a modified version of
this subsection can be found in Table 9.
the Transformer model [267], which uses only a multi-head attention
mechanism instead of RNN variants as encoders and decoders. In
4.6.1. Telecommunication fraud
the experiments, two different datasets were utilized: the SMS Spam
Fraud is expensive for a network carrier both in terms of lost
Collection v.1 [145] and UtkMl’s Twitter [144] datasets. Although the
income and wasted capacity. Aiming to detect fraudulent activities at a
results revealed that the proposed model outperforms both traditional
city-wide telecommunication network, Ji et al. [36] proposed the Multi-
and deep learning models (e.g., LR, NB, RF, SVM, LSTM, and CNN- Range Gated Graph Neural Network (MRG-GNN) for learning latent
LSTM) on the two datasets, there are some improved models based on features from social networks. First, a social network was modeled as a
the Transformer with more complex architecture such as GPT-3 [268] directed graph whose vertices represent subscribers and edges represent
and BERT [269,270] that could be examined in the future. activities between them. Then, a graph convolution block was used to
capture content information and related information between users,
Findings. It is interesting to note that the majority of deep with convolutions based on efficient short walks and node-merging.
learning-enabled methods for Spam detection employ RNNs A real-world dataset collected from Shanghai, China, was used for
and CNNs, as they can efficiently extract the underlying experimental evaluation. The proposed model achieved an AUC of
spatial and temporal correlations. Furthermore, we observe 0.948, outperforming traditional models (e.g., SVM). Unfortunately, the
that it is necessary to introduce new tools and extend novel study examines a limited number of scenarios for the evaluation of the
approaches for analyzing and handling multilingual spam de- proposed model.
tection systems. Finally, another frequently encountered flaw Robocalling systems affect millions of people daily. Regrettably,
of spam detectors is their inability to detect new spam variants traditional approaches in detecting such activities rely on the construc-
caused by training with relatively small and severely outdated tion of blacklisting number systems. Nevertheless, criminals can easily
datasets. masquerade their phone numbers. In order to address the above chal-
lenge, Yu et al. [37] introduced a DL-based approach for blacklisting

20
M. Macas et al. Computer Networks 212 (2022) 109032

unwanted phone numbers, while keeping a high detection rate through


distributed crowdsourcing. The system comprises two parts: (i) a semi-
automated caller ID tagging system leveraging the predictions of an
LSTM model and (ii) a blacklist-based crowdsourcing and aggregation
edge system. The experiments were performed against real incoming
calls on Android phones. Although the results showed that the system
design could attain decent detection rates, the study used a small
dataset with just 1488 not cross-validated data points.
RobocallGuard, a DNN-based virtual assistant, was introduced by
Pandit et al. [273] to stop current robocalls. In particular, Robocall-
Guard mimics a human call screener who picks up an incoming phone
call and makes the user aware of the call only after confirming that
Fig. 9. The comprehensive lifecycle of credit card fraud. It begins with attackers
the call is not a robocall or other type of spam. RobocallGuard was
stealing card numbers and using them to purchase services and/or goods, leading to
tested over a corpus of 8,081 real robocalls and managed to correctly chargebacks for merchants, while law enforcement has limited or no routes of action.
label 97.8% of robocalls without negatively impacting legitimate calls.
However, the dataset is not recent, and as such, it does not reflect the
behavior of more sophisticated robocalls. 4.7. Encrypted traffic analysis

Encryption protocols provide security guarantees for data confi-


4.6.2. Credit card fraud dentiality and integrity, reducing as a side effect the network admin-
Generally, credit card fraud activities can happen both online and istrators’ ability to monitor their infrastructure for malicious traffic
offline (see Fig. 9). Online fraud is performed via phone shopping, the and sensitive data exfiltration. Attackers have shifted to using en-
web, or cardholder-not-present. The criminals solely require the card cryption and cryptographic methods in their attacks, extending from
information and it is not necessary to have the card in hand or simulate ransomware to HTTPS for protecting communications with infected
the cardholder’s signature. In order to detect the most relevant fraudu- devices and avoiding detection. Accordingly, the main objective of
lent behavior patterns in an online detection system, Cheng et al. [38] security professionals is to find a balance between end-to-end security
proposed a spatial–temporal attention-based graph network (STAGN), and the ability to gather valuable information from the traffic to detect
where the temporal and location-based transaction graph features were possible threats and better allocate and protect resources.
learned by a GNN. The attentional weights were jointly learned in Traffic classification refers to categorizing network traffic into suit-
an end-to-end manner with 3D convolution and detection networks. able classes, which is essential for several applications including mal-
ware/intrusion detection. The first and more straightforward approach
STAGN was evaluated on a real-world card transaction dataset from a
employs port numbers. Nevertheless, its accuracy has deteriorated
commercial bank. Although a mean AUC of 0.88 to 0.90 was achieved
because new applications use well-known port numbers for disguising
in most of the conducted experiments, human interaction is required
their traffic or evading standard registration port numbers [275]. Deep
and there is no way to block the fraudulent transactions in real-time.
packet inspection (DPI) is the next generation of traffic classification,
Regarding finding the online fraud transactions, Cao et al. [105] which focuses on payloads. However, this technique can be applied
proposed a two-level attention model to capture the deep representa- only to unencrypted traffic and has a high computational cost. As a
tion of features of online transaction behaviors. This is achieved by consequence, new methods that depend on statistical or time-series
integrating two data embeddings at the data sample level (tree-based features enabling them to handle both encrypted and unencrypted
model) and the feature level (bidirectional GRU). Finally, the embed- traffic came forth. They usually use traditional ML techniques such
dings learned by the two attention mechanisms were combined for the as RF and K-NN and their performance depends mainly on human-
training of a fraud detection model. The proposed model was evaluated engineered features, which limits their generalizability. To address this,
on four public datasets and a private dataset (card transaction records) employing DL techniques can eliminate the disadvantages of manually
provided by a financial company in China. The results showed that constructing features. Table 10 presents the reviewed DL-based security
the proposed method achieved an accuracy, precision, recall and F1 methods in encrypted traffic analysis.
of over 85% in the conducted experiments outperforming traditional
techniques such as Gradient boosting. 4.7.1. Website fingerprinting, and protocol/application identification
Identifying network traffic of visited websites through privacy-
enhancing technologies like Tor is also known as Website Fingerprint-
Findings. We note that the challenge in credit card fraud de-
ing (WF). In [276], the authors studied the efficiency of DL-based
tection is that frauds have no consistent patterns. The typical
classifiers in WF for the first time. They demonstrated that stacked
approach in credit card fraud detection is to maintain a usage
denoising AEs are useful in detecting websites using only the Tor
profile for each user and monitor the user profiles to detect
packets’ direction (incoming or outgoing) and inter-arrival times with
any deviations. Since there are billions of credit card users,
an 86% success rate. Rimmer et al. [277] showed that DL is a helpful
this technique of user profiling is not very scalable. Thus, the
tool for automating the features engineering process, and their SDAE
majority of the studies used deep anomaly detection methods.
model achieved a 95.3% success rate using only the Tor packets’
On the other hand, most of the fraud detection methods in
direction. In [278], a deeper CNN classifier was built to outperform
telecommunications focused on mobile cellular networks due
earlier studies, improving the success rate to 98%.
to their rapid deployment and evolution. In this context, using
Application protocol classification is closely related to appli-
GNNs for developing security frameworks against telecom-
cation type classification. The former identifies protocols such as
munication frauds is becoming a trend due to their ability
HTTP or SSH, while the latter recognizes individual applications (such
to capture complex relationships between objects and make
as Skype and Google Talk). A mobile traffic classifier based on DL
inferences based on data described by graphs.
was introduced in [279], demonstrating that deep NNs can handle
encrypted traffic and represent its complex patterns. The experimental

21
M. Macas et al. Computer Networks 212 (2022) 109032

Table 9
Selected studies focusing on DL-based security methods for Fraud detection.
Telecommunication Fraud (Section 4.6.1)
Year Authors Deep Model Dataset Advantages and Limitations
2020 Ji et al. [36] – MRG-GNN – Real-world (call detail records in a – Limited number of scenarios for the model
city) evaluation.
2020 Yu et al. [37] – LSTM – Real-world (incoming calls on Android – Small dataset.
phones)
2021 Pandit et al. [273] – DNN – Real-world (robocalls records) – The dataset is not recent.
Credit Card Fraud (Section 4.6.1)
Year Authors Deep Model Dataset Advantages and Limitations
2020 Cheng et al. [38] – STAGN – Real-world (card transaction) – Human interaction is necessary.
2021 Cao et al. [105] – Attention-based – Custom – There is no detailed information about the
bidirectional GRU – Real-world (card transaction records) used datasets.

results showed that DL-based solutions (e.g., MLPs, CNNs, and LSTMs) VPN-nonVPN traffic data set [146] was used in the experiments. When
achieved superior accuracy over RF in classifying IOS, Android, and the number of labeled samples is 1000, the classification accuracy of
Facebook traffic. However, the experiment settings were not completely ByteSGAN is 99.18%. However, this performance was achieved using
fair and equal because the input features used for RF and the DL only 7 of the total 15 classes and gets worse when more classes are em-
methods were different. ployed. Finally, the authors in [284] demonstrate a mobile application
A convolutional neural network (CNN) model is proposed in [280], for encryption traffic classification based on the TLS flow sequence.
whose input consists of each Internet flow represented as a picture.
A LeNet-5 style architecture was used for classifying Internet traffic Findings. As DL techniques can perform automatic feature
into five categories (VoIP, video, file transfer, chat, and browsing). For extraction, they become a strong candidate for encrypted traf-
each flow, an image was constructed based on the packet sizes and fic classification and analysis. Among them, we find systems
arrival times. The ISCX VPN-nonVPN [146] and ISCX Tor-nonTor [147] based on either CNNs, which are effective at processing data
datasets were used for evaluating the model. The authors demonstrated coming in the form of multiple arrays and capturing the spatial
that their model can classify traffic to a category with an accuracy of patterns, or AEs, which can be pre-trained on unlabeled data
over 96%, except for browsing. Furthermore, the proposed method is and fine-tuned on a small amount of labeled data. Although
able to classify traffic that passes through a VPN with an accuracy of some initial works have demonstrated the potential of DL-
over 99.2% and also achieves good results for traffic that traverses Tor based methods over more robust encryption protocols, the
(over 89% except for file transfer). However, it requires the recording employed datasets are not large and diverse enough to rep-
of the size and the timestamping of the packets of each flow, causing resent real-world settings. Therefore, the scalability of such
additional overhead to the classification time. methods remains an open issue and needs to be further
To reduce the number of parameters and shorten the running time of investigated.
a DL architecture for performing real-time traffic classification, Cheng
et al. [106] developed a DL model based on multi-head attention and
1D-CNN. The model automatically extracts high-order flow-level and 5. Lessons learned and future directions
packet-level features. The experimental results demonstrated that the
number of parameters was reduced to the 1.8% and 2.7% of the number To address Q5 (Which are the most important and promising directions
used for 1D-CNN and CNN with LSTM, respectively. Similarly, the for further study?), this section reviews the lessons learned and outlines
training time was the 49.7% and the 6.8% of the time needed by 1D- future directions.
CNN and CNN with LSTM. Although the proposed model reached a
precision and recall of around 100% on both datasets, outperforming 5.1. Lessons learned
the 1D-CNN and CNN with LSTM, the employed datasets only include
HTTPS and VPN traffic, without involving other protocols. This section provides useful insights and outlines recommendations
In [110], an encryption traffic classification method based on Graph to developers, researchers, and practitioners of the security domain
Convolutional Network (GCN) and AE was proposed. The authors used who intend to use DL to solve security problems of interest. Table 11
a two-layer GCN architecture for flow feature extraction and encrypted presents an overview of the research in each cybersecurity application,
traffic classification. Furthermore, an autoencoder was employed to focusing on the employed DL model. Fig. 10 also depicts the frequency
learn the representation of the flow data itself and integrate it into the with which the different research works have utilized the different
GCN-learned representation to form a complete feature representation. models. About 28.57% of the papers have used RNNs and variants
The experimental results demonstrated that their method achieved (e.g., sequence models) for constructing the proposed systems, while
an accuracy of 85.82% and 94.33% on ISCX VPNnonVPN [146] and RBMs are the least used models (about 1%) overall. This highlights
USTC-TFC2016 [281] datasets respectively, outperforming traditional that defense methods based on deep learning are increasingly evolving,
methods like RF. since researchers no longer focus on traditional DL models (e.g., RBM,
In order to avoid the need for an extensive labeled dataset, Rezaei DBN). The table also emphasizes that around one-quarter of the cy-
et al. [282] introduced a multi-task learning model architecture using bersecurity applications are related to time-series, text streams, or
an 1D-CNN. They employed a large unlabeled dataset and a small serial data. Moreover, another one-quarter of the works used CNNs due
labeled dataset to train a model that predicts three tasks: application, to their capability to extract high-level feature representations from
bandwidth, and duration of flows. The experimental results revealed spatial data such as images.
that the proposed model significantly outperforms both single-task and The main guidelines that we have identified via our analysis can be
transfer learning approaches. To examine a larger number of classes, summarized as follows:
Wang et al. [283] proposed a GAN-based semi-supervised learning Obtain and use the right data. The data must be sufficient,
encrypted traffic classification method named ByteSGAN. The ISCX representative, relevant to the current attack landscape, and correctly

22
M. Macas et al. Computer Networks 212 (2022) 109032

Table 10
Selected studies focusing on DL-based security methods for Encrypted Traffic Analysis.
Year Authors Deep Model Dataset Advantages and Limitations
2016 Abe et al. [276] – SDAE – Wang’s dataset [285] – Pioneer study in implementing DL-based classifiers for WF.
2018 Rimmer et al. [277] – SDAE – Real-world (page visits) – Larger datasets are used in the experimental phase.
– The employed dataset is available upon request in [286].
2018 Sirinam et al. [278] – CNN – Real-world (page visits) – Accuracy rate of 98%
2018 Aceto et al. [279] – MLPs, CNNs, LSTMs – Real-world (mobile user activity) – Features used for the RF and the DL methods are different.
2019 Shapira et al. [280] – CNN (LeNet-5) – ISCX VPN-nonVPN [146] – Additional processing and overhead to the classification
– ISCX Tor-nonTor [147] time.-
2020 Cheng et al. [106] – Multi-head Attention – ISCX VPN-nonVPN [146] – The dataset only includes HTTPS and VPN traffic, but it
– 1D-CNN – Open HTTPS [148] does not involve other protocols.
2020 Sun et al. [110] – GCN, AE – ISCX VPNnonVPN [146] – The encrypted traffic classification problem with the traffic
– USTC-TFC2016 [281] of unknown application types should be analyzed.
2020 Rezaei et al. [282] – CNN (Multi-task learning) – QUIC [149] – Only a few classes are considered in the experiments.
– ISCX VPN-nonVPN [146]
2021 Wang et al. [283] – DCGAN [287] – ISCX VPN-nonVPN [146] – With 15 classes, the accuracy drops to 92.15%.

Table 11
The use of different DL models in Cybersecurity applications.
Domain DL techniques
DNN AE CNN DBN RBM RNN GAN DRL SNN TR GNN
Network Intrusion [160] [162,174] [168] [161] [159] [169] [175]
Detection [174] [159,165] [172] [178] [183]
Malware detection [87] [198,199] [195,200],
and analysis [188,200], [196]
[193,196]
Botnet Detection [124,203] [209] [206,207] [204]
and DGAs [104,205] [215]
[208,212]
[204,209]
[215]
Cyber–Physical [226] [103,228] [227,228] [230] [139,227] [240] [108]
System Security [238,240] [229,234] [236] [103,229] [242]
[242,246] [237,244] [237,239]
[248] [246,248]
Spam filtering [259] [111] [258,261] [256,261] [254] [264] [107] [111]
[266] [253,255] [111,257]
[257]
Fraud detection [273] [37,105] [36]
[38]
Encrypted Traffic [279] [276,277] [278,279] [279] [283] [106] [110]
Analysis [110] [106,280]
[282]

labeled if needed. Moreover, it is essential not to overlook other issues


related to public benchmark datasets, such as repeated data, miss-
ing values, and incorrect labeling. Building models based on skewed
and biased data produces systems that are unsuitable for exploitation.
Therefore, obtaining valid, representative, and accurate data should be
a priority and a primary objective of the research.
Combine detection methods in a multi-layered architecture.
Due to the diversity of today’s threat attack vectors, cybersecurity
solutions should be organized in a multi-layered manner. In other
words, deep learning-based detection should work in synergy with
alternative kinds of detection, forming a multi-layered approach for
achieving efficient cybersecurity protection.
Consider using online learning. The data used in the cybersecurity
domain is increasing and evolving very quickly, so the data-driven
attack detection models should be frequently retrained and updated.
Online learning [288] is adapted to the constant change of the learning
environments and as such can be used when deploying security models.
Fig. 10. Use of DL models in the surveyed papers.
Nevertheless, many challenges associated with such setups remain, like
Catastrophic forgetting [289]. This challenge refers to a learning model
that forgets its prior knowledge when fine-tuned with new data and
is a severe problem for neural networks [290,291]. Another aspect
of real-time data handling, especially in traffic anomaly detection, is

23
M. Macas et al. Computer Networks 212 (2022) 109032

performing accurate traffic flow sampling. In large networks, it is not accurately based only on a few training examples (usually between
feasible to analyze all traffic flows. Therefore, it is crucial to explore zero and five) [93]. Although few-shot learning has received much
methods proposed for traffic sampling [292–294] and incorporate them research attention in computer vision, its application in cybersecurity
into the training process. remains mostly unexplored. Representative works include an IDS based
Beware of deep learning against rare attacks. ML/DL handles on one-shot learning [96], a data augmentation method for the few-shot
tasks efficiently when malicious and benign samples are numerously WF attack [94], and a behavioral biometrics-based user authentication
represented in the training set. However, some attacks are so rare scheme [95]. Nevertheless, n-shot learning is still an emerging area in
that we have only a few sample data for training. This is typical for cybersecurity, and extensive research must be conducted to understand
high-profile targeted attacks. how such methods can be effectively utilized.
Decrease false-positive rates as much as possible. The objective Deep Lifelong Learning — Learning Continuously: Deep lifelong
should be to decrease the false positive rate as much as possible, learning [304] aims to mimic human behavior and seeks to build a ma-
ideally down to zero. To achieve this goal, it is necessary to impose chine that can continuously adapt to new environments, while retaining
stringent requirements for ML/DL models and metrics optimized during as much knowledge as possible from previous learning experiences.
training, focusing on low FPR models. This still might be insufficient This concept can be adopted in the dynamic IoT environment. Given
because new, previously unseen benign files may occasionally be falsely IoT’s nature, normal structures and patterns, as well as threats and at-
categorized as malicious. Thus, the goal should be to implement a flex- tacks can considerably change over time [22,23]. Therefore, discerning
ible model design that allows fixing false positives on the fly without between normal and abnormal IoT system behavior cannot be always
completely retraining the employed neural networks. pre-defined. As a consequence, security models must be frequently
Do not overlook attack detection in resource-constrained plat- updated in order to handle and understand IoT modifications. This
forms. The key challenge is supporting advanced ML/DL while still novel learning paradigm is in its infancy regarding the cybersecurity
running on devices with a wide range of performance capabilities. For domain. Nevertheless, it shows great promise and potential in detecting
example, prediction times for the same ML/DL model with identical new threats in dynamic environments.
inputs and system architectures may vary by one to four orders of Deep Active Learning: Active learning (AL) [305] strives to max-
magnitude, depending on the underlying hardware. The core issue is imize a model’s performance gain while annotating as few samples as
that security-related ML/DL must be executed efficiently, or else it will possible. The main idea is to mitigate the cost of labeling without af-
exclude people using low-end devices. fecting performance by selecting only the most useful samples from the
Optimize the hyper-parameters. Often, the hyper-parameters of unlabeled dataset. This concept can also be applied in the cybersecurity
the deep models are not fine-tuned accurately and thoroughly. At first domain, where the potent learning capabilities of DL can be retained
sight, it is impossible to know the optimal value for a model’s hyper- while at the same time reducing the sample annotating cost [306,307].
parameter on every examined problem. Thus, we may follow heuristic Deep AL is an emerging research area in cybersecurity, and we expect
guidelines [230,295,296], copy values from previous implementations, more efforts in this direction in the future.
or search by trial and error to create a competent NN architecture. Interpretability of Deep Neural Models: The black-box nature of
Another commonly used structured and systematic way of determining DNNs constitutes one of the primary obstacles for their wide accep-
a model’s optimal hyper-parameters is grid search [297]. Furthermore, tance in mission-critical applications. Recent studies suggest that model
AutoML techniques can also be used to automate this process [298]. interpretability and robustness are closely connected [308]. On the
one hand, improvements made in a model’s robustness also develop
5.2. Feature research directions its interpretability. For example, a DNN that has been subjected to
adversarial training shows better interpretability (with more accurate
Construction of high quality and up to date datasets: Since saliency maps) than the same model trained without adversarial ex-
the performance of deep learning-based methods strongly depends on amples. On the other hand, deeply understanding a model enables
the quantity and quality of the available data [299], the biases and us to better determine its weaknesses and potential vulnerabilities,
limitations of the datasets used for training the models affect the ultimately improving its accuracy and reliability. Apart from that, in-
reliability of the predictions. The majority of works currently focus terpretability plays a vital role in the ethical use of DL [309]. In [310],
on researching algorithms that can yield improved detection results, the authors used SHAP [311] to provide the reasoning behind the
but very few studies are dedicated to evaluating the reliability of IDSs’ predictions and interpret the detected intrusions to the cyber-
benchmark datasets. For instance, studies such as [300,301] proposed security personnel. However, excluding this work, investigating the
a list of criteria for assessing the reliability of a dataset for intrusion interpretability of DNNs in cybersecurity is currently rather scarce.
detection. Among them, attack and traffic diversity play a significant Deep Reinforcement Learning for Cybersecurity: Deep reinforce-
role since a limited diversity or a high imbalance among the attack ment learning, which is created by incorporating deep learning into
types might increase the bias of the detection approaches towards traditional reinforcement learning, can solve dynamic, complex, and
specific situations. The research community has not yet discovered a mainly high-dimensional cybersecurity problems, as discussed in Sec-
way to artificially generate adequately realistic cyber data [302]. On tion 4. However, current deep reinforcement learning applications
the other hand, data from real networks or the Internet usually contain to cybersecurity are usually limited by discretizing the action space,
sensitive information such as personal or company details and could restricting in this way the achieved performance to real-world prob-
potentially reveal security vulnerabilities of the network from which lems. Studying methods that can deal with continuous action spaces in
they originate if made publicly available. We can conclude that deep cyber environments (e.g., policy gradient [312] and actor–critic [313]
learning applications in the cybersecurity area can only advance if algorithms) is another promising research direction.
researchers and industry stakeholders release more realistic datasets, Adversarial attacks in the Cybersecurity domain: Although AI
especially considering that attack patterns tend to evolve to better over- can help in cyber-defense, it can also facilitate dangerous attacks
come the existing security systems. Finally, according to [303], some (i.e., offensive AI). Attackers can employ AI to make attacks smarter
features in available benchmark datasets are less useful in detecting and more complex, avoiding detection methods to infiltrate computer
emerging attacks and serve better the detection of older attack patterns. systems or networks. Machine learning-based systems can mimic hu-
Thus, it is essential to evaluate whether new features will be required mans to craft convincing fake messages utilized in large-scale phishing
for maintaining a high detection accuracy level. attacks [314,315]. Attackers can also inject or manipulate training data
Deep N-Shot Learning — Learning from Few Samples: Few- to either create a backdoor to use at inference time or to corrupt the
shot, low-shot, or n-shot learning allows a model to classify samples training process [316]. Furthermore, hackers can manipulate the states

24
M. Macas et al. Computer Networks 212 (2022) 109032

or policies and falsify part of RL’s reward signals for fooling the agent Another way to guarantee user privacy in DL-based services is
into taking sub-optimal actions [317]. These types of attacks are hard to train the network on encrypted data. Cryptographic techniques,
to prevent, detect, and fight against as they are part of a battle between like fully homomorphic encryption [331,332], enable the processing
AI systems. In the cybersecurity domain, techniques for evasion attacks of encrypted data. Nevertheless, they are too slow for training DNN
have been widely adopted. Nevertheless, there are not enough studies models due to the computational complexity and the arduous oper-
dealing with feature-targeted attacks (a.k.a. Trojan neural network ations involved. Gilad-Bachrach et al. [331] presented Crypto-Nets,
attacks [318]), backdoor attacks [319,320], or attacks against deep which perform the inference phase of a NN on encrypted data. Despite
reinforcement learning and deep unsupervised models [321]. In this its originality and novelty, this work has much room for improve-
context, further research is required to construct defense methods ment, particularly in terms of achieved throughput and latency. Indeed,
against these emergent types of attacks. Nandakumar et al. [332] extended the aforementioned approach and
Deep Learning at the edge: The edge infrastructure tier can of- built the first fully homomorphic computationally efficient DL service
fer new opportunities for supporting cybersecurity strategies, mainly for training on encrypted data. Other works in the literature address
because it is closer to the data sources and it can detect and respond privacy vulnerabilities using dummy approaches. A dummy approach
to events more rapidly. The characteristics of edge computing devices refers to constructing a group of dummy requests for each user service
indicate that they cannot support the same cybersecurity functionality request and then submitting them with the real one in random order
found in enterprise data centers and clouds [166]. The limited scale to the server-side [333,334]. The goal is to make it difficult for the
of the edge compared to the elastic cloud is one significant difference. untrusted server to obtain the users’ real requests, and thus protect
Another difference is the localized context in which the edge-deployed the users’ privacy in recommendation services [335], digital libraries
functionality operates. These differences have several effects on the (book search [333] and browsing privacy [334,336]), and location
interplay of deep learning and security. First, concerning cybersecu- services [337,338].
rity methods based on deep learning, the reduced resource footprint Uncertainty Handling in CPSs: Security and uncertainty are the
available at the edge dictates the types of deep learning models that most common causes of CPS failure. Given that our dependence on
can be successfully employed. Second, the mismatch of capabilities CPSs is expected to grow significantly, dealing with uncertainty at an
between the edge and the data centers calls for the construction and acceptable cost is vital to ensure proper operation and avoid threats
the deployment of different, more lightweight cybersecurity techniques to both users and the environment. Uncertainty is intrinsic in CPSs
at the former. Strategies and techniques for making cybersecurity more due to the novel interactions of embedded systems, networking equip-
ment, cloud infrastructures, and humans [339–341]. In other words,
effective and more scalable are an active research topic.
uncertainty arose with the increase in complexity of the CPS envi-
Secure and Privacy-Preserving Deep Learning: Conventional AI
ronment and required functionality [340,342]. CPSs can benefit by
data processing systems often involve simple models of data transac-
incorporating machine learning components such as DNNs to handle
tions. This traditional procedure faces challenges with new regulations
the physical world’s uncertainty and variability. The ability of DNNs
for the protection of data security and privacy like GDPR [322]. How
to extract high-level features for representation learning grows with
to legally resolve data isolation and fragmentation is a significant chal-
the size of the employed model and the quantity of training data. In
lenge for researchers and AI professionals today. Federated learning is
particular, the capability of DRL to achieve significant performance on
a possible solution to these challenges, as privacy is one of its essential
various decision-making tasks involving uncertainty has drawn a lot
properties [179,181,182]. When data cannot be directly aggregated due
of research attention from both the industry and academia in recent
to intellectual property rights, privacy protection, and data security,
years [341]. For example, DRL is employed in autonomous vehicles
federated learning can be employed as a promising solution.
to make decisions regarding intersection crossing, trust computation,
Federated Averaging (FedAvg) proposed in [323] is the most com-
changing lanes, and speed control [341]. Moreover, Shin et al. [339]
monly used method for FL. According to it, a client locally updates
presented a DL-enabled method for detecting adversarial attacks in
model weights and sends the local weights to a server for model sensors deployed in autonomous vehicles.
aggregation, collaboratively training a global model together with other While the ability to adapt to a changing environment is often seen
clients. In [324], a framework for multi-task learning that allows as a default property of ML/DL, studies demonstrate that the gener-
multiple clients for training different tasks was presented. Its primary alization ability of a model primarily relies on the configuration and
advantage is mitigating communication costs and stragglers through variety of the available training data and is far from assured [343,344].
the training stage. In [325], homomorphic encryption was applied to a Long-term reliability and handling of uncertainties caused by degrad-
horizontal FL framework for protecting the gradients. Gao et al. [326] ing equipment or faulty sensors are critical challenges and significant
introduced a heterogeneous Federated Transfer Learning (FTL) frame- hurdles when deploying ML/DL systems in CPS environments [343].
work for feature space training among multiple clients. The conducted Ovadia et al. [345] investigated the effect of DL for out-of-distribution
experiments demonstrated that it outperforms local training schemes (OOD) examples on the accuracy and calibration of classification tasks
and homogeneous FL schemes. Finally, an FTL framework called Fed- in industrial systems. They evaluated uncertainty not only for in-
Steg was proposed by Yang et al. [327] to train a personalized and distribution examples but also for OOD examples. In [343], the long-
distributed model for secure image steganalysis. Interested readers can term reliability of ML applications in the manufacturing industry was
refer to other works that surveyed the privacy-preserving federated analyzed, highlighting the domain-specific issues and potential sources
learning in-depth, such as [328]. of drift. According to [346], deterministic RNN models are not appro-
A significant challenge when working with federated machine learn- priate for representing the significant uncertainty exhibited in CPSs.
ing models in settings with several different actors is building a highly Instead, stochastic regularization techniques [347] can be used to cast
distributed, secure, and reliable platform that enables the scalable coop- deterministic RNNs as Bayesian RNNs that adapt deterministic sequen-
eration of participants who do not fully trust each other. The blockchain tial predictions as a sequence of posterior distributions for estimating
and smart contract technologies can create an immutable audit trail the uncertainty. Finally, Monte Carlo Dropout is another approach
for federated models to achieve higher trustworthiness in tracking and to obtain the uncertainty estimation of a model [346,348]. Overall,
proving provenance. As a representative example, a blockchain-based strategies and techniques for mitigating uncertainty and developing
framework using a DL approach to detect intrusion attacks while pre- resilient DL-enabled security in CPSs are an active research topic.
serving data privacy was introduced by Alkadi et al. [329]. Moreover, Deep Learning in Mobile and Wireless Networks: DL typically is
the authors in [330] presented a privacy-aware DL method, which computational demanding. The high accuracy of deep models comes
allows the collaboration of multiple nodes for training DNNs. at the expense of high computational and memory requirements for

25
M. Macas et al. Computer Networks 212 (2022) 109032

both the training and inference phases [349]. However, current wire- and should not be used indiscriminately in every use case. Instead,
less devices have limited hardware capabilities, which means that it should be employed in problems characterized by large-scale data
implementing complex DL architectures on such equipment may be and complex non-linear hypotheses with many features and high-order
computationally infeasible unless appropriate model tuning is per- polynomial terms.
formed. A promising direction to address the limited-resources problem
is adopting the edge computing paradigm. The key idea behind edge Declaration of competing interest
computing is to extend cloud computing to the network edge to have
the computation in the proximity of data sources (i.e., mobile/edge The authors declare that they have no known competing finan-
devices), providing in this way benefits in terms of privacy, bandwidth cial interests or personal relationships that could have appeared to
efficiency, and scalability [349–351]. However, since edge computing influence the work reported in this paper.
involves deployment at the edge and the participating devices usually
have limited computing resources and battery power, special strategies Acknowledgment
are required for DL implementation.
There are mainly two main strategies for compressing deep net- The authors would like to thank the Universidad de las Fuerzas
works that we can use to accommodate their high-resource require- Armadas-ESPE of Sangolquí, Ecuador, for the resources granted to
ments on less power-potent edge computing resources [349,352]. The develop the research project entitled: ‘‘Design and Implementation of
first strategy is designing a system with a reduced number of param- the IT infrastructure and service management system for the ESPE
eters in the deep model, combined with a reduction in memory and Academic CERT’’, coded as PIM-03-2020-ESPE-CERT.
execution latency while preserving high accuracy. Many deep learning
models for resource-constrained devices adapted from computer vision References
can be useful in the cybersecurity domain, such as MobileNets [52],
SqueezeNet [51], and YOLO [53], with the state of the art evolving very [1] T.H.T. Symantec, Threat landscape trends - Q2 2020, 2020, [online] (Accessed
5 Sep 2021).
rapidly [349]. Several such models, with pre-trained weights, are acces-
[2] S. Magazine, Hacker breaks into florida water treatment facility, changes
sible for download on open-source ML platforms (e.g., Tensorflow and chemical levels, 2021, [online] (Accessed 5 Sep 2021).
Caffe) for fast bootstrapping [349,353]. The second strategy aims to [3] European Union Agency for Cybersecurity, Enisa threat landscape - the year in
compress the existing high accuracy deep neural network models with review, 2020, [online] (Accessed 5 Sep 2021).
[4] C.R. Institute, Reinventing cybersecurity with artificial intelligence, 2019,
minimal accuracy loss compared to the original model [182,349,351].
[online] (Accessed 5 Sep 2021).
A comprehensive review of techniques to compress high accuracy deep [5] D. Gumusbas, T. Yldrm, A. Genovese, F. Scotti, A comprehensive survey of
models can be found in [354]. Most popular among them are parameter databases and deep learning methods for cybersecurity and intrusion detec-
pruning, parameter quantization, and knowledge distillation. Pruning tion systems, IEEE Syst. J. (2020) 1–15, http://dx.doi.org/10.1109/jsyst.2020.
implies removing the least essential parameters. In parameter quan- 2992966.
[6] S. Zeadally, E. Adi, Z. Baig, I.A. Khan, Harnessing artificial intelligence
tization, an existing DNN has its parameters compressed by changing
capabilities to improve cybersecurity, IEEE Access 8 (2020) 23817–23837,
from floating-point numbers to low-bit width numbers, evading costly http://dx.doi.org/10.1109/access.2020.2968045.
floating-point multiplications. Knowledge distillation aims to create a [7] M. Research, Artificial intelligence in cybersecurity market, 2021, [onlibe]
smaller DNN that imitates the behavior of a more robust and larger (Accessed 5 Sep 2021).
[8] F. Chollet, Deep Learning Mit Python Und Keras: Das Praxis-Handbuch Vom
DNN [355].
Entwickler Der Keras-Bibliothek, MITP-Verlags GmbH & Co. KG, 2018.
Another approach that is an excellent match and complement for [9] J. Saxe, H. Sanders, Malware Data Science: Attack Detection and Attribution,
edge computing is the adoption of distributed learning paradigms. No Starch Press, 2018.
When data is inherently distributed or extremely big to be stored on [10] A. Singla, E. Bertino, How deep learning is making information security more
a single machine, training an ML/DL model in a centralized fashion is intelligent, IEEE Secur. Privacy 17 (3) (2019) 56–65, http://dx.doi.org/10.
1109/msec.2019.2902347.
not a suitable option. Thus, solutions that enable parallel computing,
[11] L. Bottou, Stochastic gradient descent tricks, in: Lecture Notes in Computer
as well as data distribution, should be applied. There are two funda- Science, Springer, Berlin Heidelberg, 2012, pp. 421–436, http://dx.doi.org/10.
mentally different ways for training a model in a distributed system: 1007/978-3-642-35289-8_25.
data parallelism and model parallelism [180,182]. In scenarios where [12] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, J. Attenberg, Feature
hashing for large scale multitask learning, in: Proceedings of the 26th Annual
it is useful or even mandatory to isolate different subsets of training
International Conference on Machine Learning - ICML ’09, ACM Press, CEAS,
data from each other, an emerging distributed learning paradigm called 2009, http://dx.doi.org/10.1145/1553374.1553516.
federated learning can be used. Going forward, DL architectures that [13] S. Ruder, An overview of multi-task learning in deep neural networks, 2017,
are tailored to mobile/wireless networking and can overcome the arXiv preprint arXiv:1706.05098.
limitations of traditional models deserve more research attention. [14] A.L. Buczak, E. Guven, A survey of data mining and machine learning methods
for cyber security intrusion detection, IEEE Commun. Surv. Tutor. 18 (2) (2016)
1153–1176, http://dx.doi.org/10.1109/comst.2015.2494502.
6. Conclusions [15] P.A.A. Resende, A.C. Drummond, A survey of random forest based methods
for intrusion detection systems, ACM Comput. Surv. 51 (3) (2018) 1–36,
Deep Learning is playing an increasingly important role in the cy- http://dx.doi.org/10.1145/3178582.
bersecurity domain. In this paper, we provided a comprehensive survey [16] S.X. Wu, W. Banzhaf, The use of computational intelligence in intrusion
detection systems: A review, Appl. Soft Comput. 10 (1) (2010) 1–35, http:
of recent work regarding deep learning for cybersecurity applications. //dx.doi.org/10.1016/j.asoc.2009.06.019.
We summarized both fundamental concepts and advanced principles [17] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, C. Wang,
of various deep learning models, along with necessary resources like Machine learning and deep learning methods for cybersecurity, IEEE Access 6
a generic framework and datasets. We reviewed the state-of-the-art (2018) 35365–35381, http://dx.doi.org/10.1109/access.2018.2836950.
[18] K. Shaukat, S. Luo, V. Varadharajan, I.A. Hameed, M. Xu, A survey on machine
DL-based cybersecurity systems across different application scenarios.
learning techniques for cyber security in the last decade, IEEE Access 8 (2020)
Finally, we concluded by pinpointing several open research issues and 222310–222354, http://dx.doi.org/10.1109/access.2020.3041951.
promising directions, leading to valuable future research suggestions. [19] C.S. Wickramasinghe, D.L. Marino, K. Amarasinghe, M. Manic, Generalization
We hope that this article will become a guide to developers/resear- of deep learning for cyber-physical system security: A survey, in: IECON 2018 -
chers and security practitioners interested in applying artificial in- 44th Annual Conference of the IEEE Industrial Electronics Society, IEEE, 2018,
pp. 745–751, http://dx.doi.org/10.1109/iecon.2018.8591773.
telligence (particularly deep learning) to complex problems in cyber [20] Y. Luo, Y. Xiao, L. Cheng, G. Peng, D.D. Yao, Deep learning-based anomaly
environments. Lastly, we would like to caution against a potential pit- detection in cyber-physical systems: Progress and opportunities, ACM Comput.
fall. Despite its impressive capabilities, Deep Learning is not a panacea Surv. 54 (5) (2021) http://dx.doi.org/10.1145/3453155.

26
M. Macas et al. Computer Networks 212 (2022) 109032

[21] F. Hussain, R. Hussain, S.A. Hassan, E. Hossain, Machine learning in IoT [47] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, URL
security: Current solutions and future challenges, IEEE Commun. Surv. Tutor. https://books.google.com.ec/books?id=omivDQAAQBAJ.
22 (3) (2020) 1686–1721, http://dx.doi.org/10.1109/comst.2020.2986444. [48] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
[22] M.A. Al-Garadi, A. Mohamed, A.K. Al-Ali, X. Du, I. Ali, M. Guizani, A survey of 436–444, http://dx.doi.org/10.1038/nature14539.
machine and deep learning methods for internet of things (IoT) security, IEEE [49] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
Commun. Surv. Tutor. 22 (3) (2020) 1646–1685, http://dx.doi.org/10.1109/ in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
comst.2020.2988293. IEEE, 2016, pp. 770–778, http://dx.doi.org/10.1109/cvpr.2016.90.
[23] E. Rodriguez, B. Otero, N. Gutierrez, R. Canal, A survey of deep learning [50] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected
techniques for cybersecurity in mobile networks, IEEE Commun. Surv. Tutor. convolutional networks, in: 2017 IEEE Conference on Computer Vision and
23 (3) (2021) 1920–1955, http://dx.doi.org/10.1109/comst.2021.3086296. Pattern Recognition (CVPR), IEEE, 2017, pp. 4700–4708, http://dx.doi.org/10.
[24] D. Berman, A. Buczak, J. Chavis, C. Corbett, A survey of deep learning methods 1109/cvpr.2017.243.
for cyber security, Information 10 (4) (2019) 122, http://dx.doi.org/10.3390/ [51] F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer,
info10040122. SqueezeNet: alexnet-level accuracy with 50x fewer parameters and < 0.5 MB
[25] S. Mahdavifar, A.A. Ghorbani, Application of deep learning to cybersecurity: model size, 2016, arXiv preprint arXiv:1602.07360.
A survey, Neurocomputing 347 (2019) 149–176, http://dx.doi.org/10.1016/j. [52] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M.
neucom.2019.02.056. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for
[26] G. Guo, N. Zhang, A survey on deep learning based face recognition, Comput. mobile vision applications, 2017, arXiv preprint arXiv:1704.04861.
Vis. Image Underst. 189 (2019) 102805, http://dx.doi.org/10.1016/j.cviu.2019. [53] J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger, in: 2017 IEEE
102805. Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017,
[27] M. Wang, W. Deng, Deep face recognition: A survey, Neurocomputing 429 pp. 7263–7271, http://dx.doi.org/10.1109/cvpr.2017.690.
(2021) 215–244, http://dx.doi.org/10.1016/j.neucom.2020.10.081. [54] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-
[28] L. Fei, G. Lu, W. Jia, S. Teng, D. Zhang, Feature extraction methods for propagating errors, Nature 323 (6088) (1986) 533–536, http://dx.doi.org/10.
palmprint recognition: A survey and evaluation, IEEE Trans. Syst. Man Cybern. 1038/323533a0.
49 (2) (2019) 346–363, http://dx.doi.org/10.1109/tsmc.2018.2795609. [55] P. Werbos, Backpropagation through time: What it does and how to do it, Proc.
[29] K. Sundararajan, D.L. Woodard, Deep learning for biometrics, ACM Comput. IEEE 78 (10) (1990) 1550–1560, http://dx.doi.org/10.1109/5.58337.
Surv. 51 (3) (2019) 1–34, http://dx.doi.org/10.1145/3190618. [56] A. Graves, Generating sequences with recurrent neural networks, 2013, arXiv
[30] P. Li, M. Salour, X. Su, A survey of internet worm detection and containment, preprint arXiv:1308.0850.
IEEE Commun. Surv. Tutor. 10 (1) (2008) 20–35, http://dx.doi.org/10.1109/ [57] R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent
comst.2008.4483668. neural networks, 2013, arXiv preprint arXiv:1312.6026.
[31] S. Bhunia, M.S. Hsiao, M. Banga, S. Narasimhan, Hardware Trojan attacks: [58] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated
Threat analysis and countermeasures, Proc. IEEE 102 (8) (2014) 1229–1247, recurrent neural networks on sequence modeling, 2014, arXiv preprint arXiv:
http://dx.doi.org/10.1109/jproc.2014.2334493. 1412.3555.
[32] R. Brewer, Ransomware attacks: Detection, prevention and cure, Netw. Secur. [59] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo,
2016 (9) (2016) 5–9, http://dx.doi.org/10.1016/s1353-4858(16)30086-1. Convolutional LSTM network: A machine learning approach for precip-
[33] M.B. Schmidt, K.P. Arnett, Spyware, Commun. ACM 48 (8) (2005) 67–70, itation nowcasting, in: Advances in Neural Information Processing Sys-
http://dx.doi.org/10.1145/1076211.1076242. tems, 2015, pp. 802–810, URL https://proceedings.neurips.cc/paper/2015/file/
[34] C. Tankard, Advanced persistent threats and how to monitor and deter 07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf.
them, Netw. Secur. 2011 (8) (2011) 16–19, http://dx.doi.org/10.1016/s1353- [60] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, L. Bottou,
4858(11)70086-1. Stacked denoising autoencoders: Learning useful representations in a deep
[35] N. Jindal, B. Liu, Review spam detection, in: Proceedings of the 16th Inter- network with a local denoising criterion, J. Mach. Learn. Res. 11 (12) (2010).
national Conference on World Wide Web - WWW ’07, ACM Press, 2007, pp. [61] S. Rifai, X. Muller, X. Glorot, G. Mesnil, Y. Bengio, P. Vincent, Learning
1189–1190, http://dx.doi.org/10.1145/1242572.1242759. invariant features through local space contraction, 2011, arXiv preprint arXiv:
[36] S. Ji, J. Li, Q. Yuan, J. Lu, Multi-range gated graph neural network for 1104.4153.
telecommunication fraud detection, in: 2020 International Joint Conference on [62] S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders:
Neural Networks (IJCNN), IEEE, 2020, pp. 1–6. Explicit invariance during feature extraction, in: Icml, 2011.
[37] C.-Y. Yu, C.K. Chang, W. Zhang, An edge computing based situation enabled [63] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B. Frey, Adversarial
crowdsourcing blacklisting system for efficient identification of scammer phone autoencoders, 2015, arXiv preprint arXiv:1511.05644.
numbers, in: 2020 International Conference on Computational Science and [64] G. Kakkavas, M. Kalntis, V. Karyotis, S. Papavassiliou, Future network traffic
Computational Intelligence (CSCI), IEEE, 2020, pp. 776–781, http://dx.doi.org/ matrix synthesis and estimation based on deep generative models, in: 2021
10.1109/csci51800.2020.00146. International Conference on Computer Communications and Networks (ICCCN),
[38] D. Cheng, X. Wang, Y. Zhang, L. Zhang, Graph neural network for fraud IEEE, 2021, http://dx.doi.org/10.1109/icccn52240.2021.9522222.
detection via spatial-temporal attention, IEEE Trans. Knowl. Data Eng. (2020) [65] D.P. Kingma, M. Welling, Auto-encoding variational bayes, 2013, arXiv preprint
http://dx.doi.org/10.1109/tkde.2020.3025588. arXiv:1312.6114.
[39] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the KDD [66] D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic backpropagation and ap-
CUP 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for proximate inference in deep generative models, 2014, arXiv preprint arXiv:
Security and Defense Applications, IEEE, 2009, pp. 1–6, http://dx.doi.org/10. 1401.4082.
1109/cisda.2009.5356528. [67] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief
[40] J. Mirkovic, P. Reiher, A taxonomy of DDoS attack and DDoS defense nets, Neural Comput. 18 (7) (2006) 1527–1554, http://dx.doi.org/10.1162/
mechanisms, SIGCOMM Comput. Commun. Rev. 34 (2) (2004) 39–53, http: neco.2006.18.7.1527.
//dx.doi.org/10.1145/997150.997156. [68] P. Smolensky, Information processing in dynamical systems: Foundations of
[41] J. Hong, The state of phishing attacks, Commun. ACM 55 (1) (2012) 74–81, harmony theory, Tech. rep., Colorado Univ at Boulder Dept of Computer
http://dx.doi.org/10.1145/2063176.2063197. Science, 1986.
[42] S. García, M. Grill, J. Stiborek, A. Zunino, An empirical comparison of botnet [69] L. Deng, A tutorial survey of architectures, algorithms, and applications for
detection methods, Comput. Secur. 45 (2014) 100–123, http://dx.doi.org/10. deep learning, APSIPA Trans. Signal Inf. Process. 3 (2014) http://dx.doi.org/
1016/j.cose.2014.05.011. 10.1017/atsip.2013.9.
[43] M. Eslahi, R. Salleh, N.B. Anuar, Bots and botnets: An overview of char- [70] G.E. Hinton, Learning multiple layers of representation, Trends Cogn. Sci. 11
acteristics, detection and challenges, in: 2012 IEEE International Conference (10) (2007) 428–434, http://dx.doi.org/10.1016/j.tics.2007.09.004.
on Control System, Computing and Engineering, IEEE, 2012, pp. 349–354, [71] G.E. Hinton, To recognize shapes, first learn to generate images, Prog. Brain
http://dx.doi.org/10.1109/iccsce.2012.6487169. Res. 165 (2007) 535–547.
[44] P. Mishra, V. Varadharajan, U. Tupakula, E.S. Pilli, A detailed investigation [72] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
and analysis of using machine learning techniques for intrusion detection, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in Neural
IEEE Commun. Surv. Tutor. 21 (1) (2019) 686–728, http://dx.doi.org/10.1109/ Information Processing Systems, 2014, pp. 2672–2680.
comst.2018.2847722. [73] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, 2017, ArXiv arXiv:
[45] Kaspersky, What is adware? - definition, 2019, URL https://usa.kaspersky.com/ 1701.07875.
resource-center/threats/adware. [74] A. Brock, J. Donahue, K. Simonyan, Large scale gan training for high fidelity
[46] J. Joy, A. John, J. Joy, Rootkit detection mechanism: A survey, in: International natural image synthesis, 2018, arXiv preprint arXiv:1809.11096.
Conference on Parallel Distributed Computing Technologies and Applications, [75] G.-J. Qi, Loss-sensitive generative adversarial networks on Lipschitz densities,
Springer, 2011, pp. 366–374, http://dx.doi.org/10.1007/978-3-642-24037-9_ Int. J. Comput. Vis. 128 (5) (2019) 1118–1140, http://dx.doi.org/10.1007/
36. s11263-019-01265-2.

27
M. Macas et al. Computer Networks 212 (2022) 109032

[76] A. Ali-Gombe, E. Elyan, MFC-GAN: class-imbalanced dataset classification using [100] W. Yao, Y. Ding, X. Li, Deep learning for phishing detection, in: 2018
multiple fake class generative adversarial network, Neurocomputing 361 (2019) IEEE Intl Conf on Parallel & Distributed Processing with Applications,
212–221, http://dx.doi.org/10.1016/j.neucom.2019.06.043. Ubiquitous Computing & Communications, Big Data & Cloud Computing,
[77] A. Antoniou, A. Storkey, H. Edwards, Data augmentation generative adversarial Social Computing & Networking, Sustainable Computing & Communications
networks, 2017, arXiv preprint arXiv:1711.04340. (ISPA/IUCC/BDCloud/SocialCom/SustainCom), IEEE, 2018, pp. 645–650, http:
[78] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. //dx.doi.org/10.1109/bdcloud.2018.00099.
Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. [101] R. Agrawal, J.W. Stokes, K. Selvaraj, M. Marinescu, Attention in recurrent
Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, neural networks for ransomware detection, in: ICASSP 2019 - 2019 IEEE
Human-level control through deep reinforcement learning, Nature 518 (7540) International Conference on Acoustics, Speech and Signal Processing (ICASSP),
(2015) 529–533, http://dx.doi.org/10.1038/nature14236. IEEE, 2019, http://dx.doi.org/10.1109/icassp.2019.8682899.
[79] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. [102] Y. Huang, Q. Yang, J. Qin, W. Wen, Phishing URL detection via CNN
Wierstra, Continuous control with deep reinforcement learning, 2015, arXiv and attention-based hierarchical RNN, in: 2019 18th IEEE International
preprint arXiv:1509.02971. Conference on Trust, Security and Privacy in Computing and Communica-
[80] V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, tions/13th IEEE International Conference on Big Data Science and Engineering
K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: (TrustCom/BigDataSE), IEEE, 2019, pp. 112–119, http://dx.doi.org/10.1109/
International Conference on Machine Learning, 2016, pp. 1928–1937. trustcom/bigdatase.2019.00024.
[103] M. Macas, C. Wu, An unsupervised framework for anomaly detection in a water
[81] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T.
treatment system, in: 2019 18th IEEE International Conference on Machine
Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G.
Learning and Applications (ICMLA), IEEE, 2019, pp. 1298–1305, http://dx.doi.
van den Driessche, T. Graepel, D. Hassabis, Mastering the game of go without
org/10.1109/icmla.2019.00212.
human knowledge, Nature 550 (7676) (2017) 354–359, http://dx.doi.org/10.
1038/nature24270. [104] L. Yang, G. Liu, Y. Dai, J. Wang, J. Zhai, Detecting stealthy domain generation
algorithms using heterogeneous deep neural network framework, Ieee Access 8
[82] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D.
(2020) 82876–82889.
Horgan, B. Piot, M. Azar, D. Silver, Rainbow: Combining improvements in deep
reinforcement learning, 2017, arXiv preprint arXiv:1710.02298. [105] R. Cao, G. Liu, Y. Xie, C. Jiang, Two-level attention model of representation
learning for fraud detection, IEEE Trans. Comput. Soc. Syst. (2021).
[83] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy
[106] J. Cheng, R. He, E. Yuepeng, Y. Wu, J. You, T. Li, Real-time encrypted traffic
optimization algorithms, 2017, arXiv preprint arXiv:1707.06347.
classification via lightweight neural networks, in: GLOBECOM 2020-2020 IEEE
[84] G. Lingam, R.R. Rout, D.V. Somayajulu, Adaptive deep Q-learning model for
Global Communications Conference, IEEE, 2020, pp. 1–6.
detecting social bots and influential users in online social networks, Appl. Intell.
[107] X. Liu, H. Lu, A. Nayak, A spam transformer model for SMS spam detection,
49 (11) (2019) 3947–3964.
IEEE Access 9 (2021) 80253–80263, http://dx.doi.org/10.1109/ACCESS.2021.
[85] N. Zhou, J. Du, X. Yao, W. Cui, Z. Xue, M. Liang, A content search method
3081479.
for security topics in microblog based on deep reinforcement learning, World
[108] Y. Li, L. Zhang, Z. Lv, W. Wang, Detecting anomalies in intelligent vehicle
Wide Web 23 (1) (2020) 75–101.
charging and station power supply systems with multi-head attention models,
[86] J. Gantz, D. Reinsel, The digital universe in 2020: Big data, bigger digital
IEEE Trans. Intell. Transp. Syst. 22 (1) (2021) 555–564, http://dx.doi.org/10.
shadows, and biggest growth in the far east, IDC IView: IDC Anal. Future 2007
1109/TITS.2020.3018259.
(2012) (2012) 1–16.
[109] S. Chaudhari, V. Mithal, G. Polatkan, R. Ramanath, An attentive survey of
[87] L. Vu, Q.U. Nguyen, D.N. Nguyen, D.T. Hoang, E. Dutkiewicz, Deep transfer attention models, 2021, arXiv:1904.02874.
learning for IoT attack detection, IEEE Access 8 (2020) 107335–107344, http:
[110] B. Sun, W. Yang, M. Yan, D. Wu, Y. Zhu, Z. Bai, An encrypted traffic
//dx.doi.org/10.1109/access.2020.3000476.
classification method combining graph convolutional network and autoencoder,
[88] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, A in: 2020 IEEE 39th International Performance Computing and Communications
comprehensive survey on transfer learning, Proc. IEEE 109 (1) (2021) 43–76, Conference (IPCCC), IEEE, 2020, pp. 1–8.
http://dx.doi.org/10.1109/jproc.2020.3004555.
[111] Z. Guo, Y. Shen, A.K. Bashir, M. Imran, N. Kumar, D. Zhang, K. Yu, Robust
[89] J. Zhao, S. Shetty, J.W. Pan, C. Kamhoua, K. Kwiat, Transfer learning for spammer detection using collaborative neural network in internet of thing
detecting unknown network attacks, EURASIP J. Info. Secur. 2019 (1) (2019) applications, IEEE Internet Things J. (2020) 1, http://dx.doi.org/10.1109/jiot.
1, http://dx.doi.org/10.1186/s13635-019-0084-4. 2020.3003802.
[90] M. Mohammadi, A. Al-Fuqaha, S. Sorour, M. Guizani, Deep learning for IoT [112] B. Bowman, H.H. Huang, Towards next-generation cybersecurity with graph
big data and streaming analytics: A survey, IEEE Commun. Surv. Tutor. 20 (4) AI, SIGOPS Oper. Syst. Rev. 55 (1) (2021) 61–67, http://dx.doi.org/10.1145/
(2018) 2923–2960, http://dx.doi.org/10.1109/comst.2018.2844341. 3469379.3469386.
[91] Y. Bengio, Deep learning of representations for unsupervised and transfer [113] N. Sun, J. Zhang, P. Rimba, S. Gao, L.Y. Zhang, Y. Xiang, Data-driven
learning, in: Proceedings of ICML Workshop on Unsupervised and Transfer cybersecurity incident prediction: A survey, IEEE Commun. Surv. Tutor. 21 (2)
Learning, 2012, pp. 17–36. (2019) 1744–1772, http://dx.doi.org/10.1109/comst.2018.2885561.
[92] J. Deng, R. Xia, Z. Zhang, Y. Liu, B. Schuller, Introducing shared-hidden-layer [114] Y. Bengio, I. Goodfellow, A. Courville, Deep Learning, Vol. 1, Citeseer, 2017.
autoencoders for transfer learning and their application in acoustic emotion [115] Kdd, KDD cup task presentation, 1999, [online] (Accessed 9 Dec 2020).
recognition, in: 2014 IEEE International Conference on Acoustics, Speech and [116] A. Shiravi, H. Shiravi, M. Tavallaee, A.A. Ghorbani, Toward developing a
Signal Processing (ICASSP), IEEE, 2014, pp. 4818–4822, http://dx.doi.org/10. systematic approach to generate benchmark datasets for intrusion detection,
1109/icassp.2014.6854517. Comput. Secur. 31 (3) (2012) 357–374, http://dx.doi.org/10.1016/j.cose.2011.
[93] N. Bendre, H.T. Marín, P. Najafirad, Learning from few samples: A survey, 12.012.
2020, arXiv:2007.15484. [117] C. Kolias, G. Kambourakis, A. Stavrou, S. Gritzalis, Intrusion detection in 802.11
[94] M. Chen, Y. Wang, Z. Qin, X. Zhu, Few-shot website fingerprinting attack, 2021, networks: Empirical evaluation of threats and a public dataset, IEEE Commun.
arXiv:2101.10063. Surv. Tutor. 18 (1) (2016) 184–208, http://dx.doi.org/10.1109/comst.2015.
[95] Y. Gu, H. Yan, M. Dong, M. Wang, X. Zhang, Z. Liu, F. Ren, WiONE: One-shot 2402161.
learning for environment-robust device-free user authentication via commodity [118] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Intrusion detection evaluation
wi-fi in man–machine system, IEEE Trans. Comput. Soc. Syst. 8 (3) (2021) dataset (CIC-IDS2017), 2021, [online] (Accessed 25 May 2021).
630–642, http://dx.doi.org/10.1109/tcss.2021.3056654. [119] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, CSE-CIC-IDS2018 on AWS, 2021,
[96] H. Hindy, C. Tachtatzis, R. Atkinson, D. Brosset, M. Bures, I. Andonovic, [online] (Accessed 25 May 2021).
C. Michie, X. Bellekens, Leveraging siamese networks for one-shot intrusion [120] I. Sharafaldin, A.H. Lashkari, S. Hakak, A.A. Ghorbani, Developing realistic
detection model, 2021, arXiv:2006.15343. distributed denial of service (DDoS) attack dataset and taxonomy, in: 2019
[97] P. Sirinam, N. Mathews, M.S. Rahman, M. Wright, Triplet fingerprinting: More International Carnahan Conference on Security Technology (ICCST), IEEE, 2019,
practical and portable website fingerprinting with N-shot learning, in: Proceed- pp. 1–8.
ings of the 2019 ACM SIGSAC Conference on Computer and Communications [121] S. Laboratory, A labeled dataset with malicious and benign iot network traffic,
Security, ACM, 2019, http://dx.doi.org/10.1145/3319535.3354217. 2021, [online] (Accessed 2 Sep 2021).
[98] J. Bromley, J.W. Bentz, L. Bottou, I. Guyon, Y. Lecun, C. Moore, E. Säckinger, [122] A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, A. Anwar, TON_IoT Telemetry
R. Shah, [Signature] verification using a ‘‘siamese’’ time delay neural network, Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion
Int. J. Pattern Recognit. Artif. Intell. 07 (04) (1993) 669–688, http://dx.doi. Detection Systems, IEEE Access 8 (2020) 165130–165150, http://dx.doi.org/
org/10.1142/s0218001493000339. 10.1109/access.2020.3022862.
[99] E. Hoffer, N. Ailon, Deep metric learning using triplet network, in: Similarity- [123] R. Damasevicius, A. Venckauskas, S. Grigaliunas, J. Toldinas, N. Morkevicius,
Based Pattern Recognition, Springer International Publishing, 2015, pp. 84–92, T. Aleliunas, P. Smuikys, LITNET-2020: An annotated real-world network flow
http://dx.doi.org/10.1007/978-3-319-24261-3_7. dataset for network intrusion detection, Electronics 9 (5) (2020) 800.

28
M. Macas et al. Computer Networks 212 (2022) 109032

[124] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai, D. Breitenbacher, [155] T.K. Lengyel, S. Maresca, B.D. Payne, G.D. Webster, S. Vogl, A. Kiayias,
Y. Elovici, N-BaIoT—Network-based detection of IoT botnet attacks using deep Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis
autoencoders, IEEE Pervasive Comput. 17 (3) (2018) 12–22, http://dx.doi.org/ system, in: Proceedings of the 30th Annual Computer Security Applications
10.1109/mprv.2018.03367731. Conference on - ACSAC ’14, ACM Press, 2014, http://dx.doi.org/10.1145/
[125] Y.M.P. Pa, S. Suzuki, K. Yoshioka, T. Matsumoto, T. Kasama, C. Rossow, 2664243.2664252.
IoTPOT: a novel honeypot for revealing current IoT threats, J. Inform. Process. [156] The University of Utah, Emulab, 2019, [online] (Accessed 20 Jun 2021).
24 (3) (2016) 522–533, http://dx.doi.org/10.2197/ipsjjip.24.522. [157] D. Raychaudhuri, I. Seskar, G. Zussman, T. Korakis, D. Kilper, T. Chen, J.
[126] Virusshare, Virustotal, 2020, [online] (Accessed 2 Feb 2021). Kolodziejski, M. Sherman, Z. Kostic, X. Gu, H. Krishnaswamy, S. Maheshwari,
[127] Y. Zhou, X. Jiang, Dissecting android malware: Characterization and evolution, P. Skrimponis, C. Gutterman, Challenge, in: Proceedings of the 26th Annual
in: 2012 IEEE Symposium on Security and Privacy, IEEE, 2012, pp. 95–109, International Conference on Mobile Computing and Networking, ACM, 2020,
http://dx.doi.org/10.1109/sp.2012.16. pp. 1–13, http://dx.doi.org/10.1145/3372224.3380891.
[128] C. mobile, Contagio mobile, mobile malware mini dump, 2019, [online] [158] J. Cappos, M. Hemmings, R. McGeer, A. Rafetseder, G. Ricart, Edgenet: a
(Accessed 20 Dec 2020). global cloud that spreads by local action, in: 2018 IEEE/ACM Symposium on
[129] K. Allix, T.F. Bissyandé, J. Klein, Y. Le Traon, AndroZoo, in: Proceedings of the Edge Computing (SEC), IEEE, 2018, pp. 359–360, http://dx.doi.org/10.1109/
13th International Conference on Mining Software Repositories, ACM, IEEE, sec.2018.00045.
2016, pp. 468–471, http://dx.doi.org/10.1145/2901739.2903508. [159] M.S. Elsayed, N.-A. Le-Khac, S. Dev, A.D. Jurcut, Ddosnet: A deep-learning
[130] A. Internet, Alexa top sites, 2021, [online] (Accessed 9 Nov 2021). model for detecting network attacks, in: 2020 IEEE 21st International Sympo-
[131] B. Consulting, Free OSINT tools, 2021, [online] (Accessed 2 Sep 2021). sium on’’ a World of Wireless, Mobile and Multimedia Networks’’(WoWMoM),
[132] P. Daniel, Dgarchive, 2021, [online] (Accessed 2 Sep 2021). IEEE, 2020, pp. 391–396.
[133] Netw. Secur. Research Lab at 360, Netlab DGA project, 2021, [online] (Accessed [160] T.A. Tang, L. Mhamdi, D. McLernon, S.A.R. Zaidi, M. Ghogho, Deep learning
25 May 2021). approach for network intrusion detection in software defined networking, in:
[134] R. Vinayakumar, K. Soman, P. Poornachandran, M. Alazab, S. Thampi, Am- 2016 International Conference on Wireless Networks and Mobile Communica-
ritadga: a comprehensive data set for domain generation algorithms (dgas) tions (WINCOM), IEEE, 2016, pp. 258–263, http://dx.doi.org/10.1109/wincom.
based domain name detection systems and application of deep learning, in: 2016.7777224.
Big Data Recommender Systems-Volume 2: Application Paradigms, Institution [161] S. Otoum, B. Kantarci, H.T. Mouftah, On the feasibility of deep learning in
of Engineering and Technology (IET), 2019, pp. 455–485. sensor network intrusion detection, IEEE Netw. Lett. 1 (2) (2019) 68–71,
http://dx.doi.org/10.1109/lnet.2019.2901792.
[135] M. Zago, M.G. Pérez, G.M. Pérez, UMUDGA: A dataset for profiling DGA-based
[162] L. Yang, J. Li, L. Yin, Z. Sun, Y. Zhao, Z. Li, Real-time intrusion detection in
botnet, Comput. Secur. 92 (2020) 101719, http://dx.doi.org/10.1016/j.cose.
wireless network: A deep learning-based intelligent mechanism, IEEE Access 8
2020.101719.
(2020) 170128–170139.
[136] Cisco, Cisco annual internet report, 2020, [online] (Accessed 20 Dec 2020).
[163] S. Otoum, B. Kantarci, H. Mouftah, Adaptively supervised and intrusion-aware
[137] M. Aghashahi, R. Sundararajan, M. Pourahmadi, M.K. Banks, Water distri-
data aggregation for wireless sensor clusters in critical infrastructures, in: 2018
bution systems analysis symposium–battle of the attack detection algorithms
IEEE International Conference on Communications (ICC), IEEE, 2018, pp. 1–6,
(BATADAL), in: World Environmental and Water Resources Congress 2017,
http://dx.doi.org/10.1109/icc.2018.8422401.
American Society of Civil Engineers, 2017, pp. 101–108, http://dx.doi.org/10.
[164] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran,
1061/9780784480595.010.
Z. Durumeric, J.A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever,
[138] R. Taormina, S. Galelli, H. Douglas, N. Tippenhauer, E. Salomons, A. Ostfeld, A
Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, Y. Zhou,
toolbox for assessing the impacts of cyber-physical attacks on water distribution
Understanding the mirai botnet, in: 26th USENIX Security Symposium (USENIX
systems, Environ. Modell. Softw. 112 (2019) 46–51, http://dx.doi.org/10.1016/
Security 17), USENIX Association, Vancouver, BC, 2017, pp. 1093–1110.
j.envsoft.2018.11.008.
[165] A. Abeshu, N. Chilamkurti, Deep learning: The frontier for distributed attack
[139] J. Goh, S. Adepu, K.N. Junejo, A. Mathur, A dataset to support research in
detection in fog-to-things computing, IEEE Commun. Mag. 56 (2) (2018)
the design of secure water treatment systems, in: International Conference on
169–175, http://dx.doi.org/10.1109/mcom.2018.1700332.
Critical Information Infrastructures Security, Springer, 2016, pp. 88–99.
[166] K. Bresniker, A. Gavrilovska, J. Holt, D. Milojicic, T. Tran, Grand challenge:
[140] C.M. Ahmed, V.R. Palleti, A.P. Mathur, Wadi, in: Proceedings of the 3rd
Applying artificial intelligence and machine learning to cybersecurity, Computer
International Workshop on Cyber-Physical Systems for Smart Water Networks,
52 (12) (2019) 45–52, http://dx.doi.org/10.1109/mc.2019.2942584.
ACM, 2017, pp. 25–28, http://dx.doi.org/10.1145/3055366.3055375.
[167] Y. Xiao, Y. Jia, C. Liu, X. Cheng, J. Yu, W. Lv, Edge computing security: State
[141] H.-K. Shin, W. Lee, J.-H. Yun, H. Kim, HAI 1.0: HIL-based augmented ICS
of the art and challenges, Proc. IEEE 107 (8) (2019) 1608–1631.
security dataset, in: 13th USENIX Workshop on Cyber Security Experimentation
[168] H. Yao, P. Gao, P. Zhang, J. Wang, C. Jiang, L. Lu, Hybrid intrusion detection
and Test (CSET 20), USENIX Association, 2020.
system for edge-based iIoT relying on machine-learning-aided detection, IEEE
[142] C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, S. Vigna,
Netw. 33 (5) (2019) 75–81, http://dx.doi.org/10.1109/mnet.001.1800479.
A reference collection for web spam, in: ACM Sigir Forum, Vol. 40, (2) ACM
[169] A. Ferdowsi, W. Saad, Generative adversarial networks for distributed intrusion
New York, NY, USA, 2006, pp. 11–24.
detection in the internet of things, in: 2019 IEEE Global Communications
[143] K. Lee, B. Eoff, J. Caverlee, Seven months with the devils: A long-term study Conference (GLOBECOM), IEEE, 2019, pp. 1–6, http://dx.doi.org/10.1109/
of content polluters on twitter, in: Proceedings of the International AAAI globecom38437.2019.9014102.
Conference on Web and Social Media, Vol. 5, 2011.
[170] C. Hardy, E. Le Merrer, B. Sericola, MD-GAN: multi-discriminator generative
[144] kaggle, Utkml’s Twitter spam detection competition, 2019, [online] (Accessed adversarial networks for distributed datasets, in: 2019 IEEE International Par-
25 May 2021). allel and Distributed Processing Symposium (IPDPS), IEEE, 2019, pp. 866–877,
[145] T.A. Almeida, J.M.G. Hidalgo, A. Yamakami, Contributions to the study of SMS http://dx.doi.org/10.1109/ipdps.2019.00095.
spam filtering: new collection and results, in: Proceedings of the 11th ACM [171] D. Anguita, A. Ghio, L. Oneto, X. Parra, J.L. Reyes-Ortiz, A public domain
Symposium on Document Engineering, 2011, pp. 259–262. dataset for human activity recognition using smartphones, in: Esann, 2013.
[146] G. Draper-Gil, A.H. Lashkari, M.S.I. Mamun, A.A. Ghorbani, Characterization [172] M. Abdel-Basset, H. Hawash, R.K. Chakrabortty, M.J. Ryan, Semi-supervised
of encrypted and vpn traffic using time-related, in: Proceedings of the 2nd spatio-temporal deep learning for intrusions detection in IoT networks, IEEE
International Conference on Information Systems Security and Privacy (ICISSP), Internet Things J. (2021).
2016, pp. 407–414. [173] N. Ravi, S.M. Shalinie, Semisupervised-learning-based security to detect and
[147] A.H. Lashkari, G. Draper-Gil, M.S.I. Mamun, A.A. Ghorbani, Characterization of mitigate intrusions in IoT network, IEEE Internet Things J. 7 (11) (2020)
tor traffic using time based features, in: ICISSp, 2017, pp. 253–262. 11041–11052.
[148] S. Wazen, C. Thibault, F. Jerome, C. Isabelle, HTTPS websites dataset, 2016, [174] S. Rezvy, Y. Luo, M. Petridis, A. Lasebae, T. Zebin, An efficient deep learning
[online] (Accessed 25 May 2021). model for intrusion classification and prediction in 5G and IoT networks, in:
[149] S. Rezaei, X. Liu, How to achieve high classification accuracy with just a few 2019 53rd Annual Conference on Information Sciences and Systems (CISS),
labels: A semi-supervised approach using sampled packets, 2018, arXiv preprint IEEE, 2019, pp. 1–6, http://dx.doi.org/10.1109/ciss.2019.8693059.
arXiv:1812.09761. [175] L. Nie, W. Sun, S. Wang, Z. Ning, J.J. Rodrigues, Y. Wu, S. Li, Intrusion
[150] The University of Southern California, The DETER project, 2012, [online] detection in green internet of things: A deep deterministic policy gradient-based
(Accessed 20 Jun 2021). algorithm, IEEE Trans. Green Commun. Netw. 5 (2) (2021) 778–788.
[151] S. Université, FIT future internet testing facility, 2019, [online] (Accessed 20 [176] L. Nie, Z. Ning, M.S. Obaidat, B. Sadoun, H. Wang, S. Li, L. Guo, G.
Jun 2021). Wang, A reinforcement learning-based network traffic prediction mechanism
[152] NITlab, Nitos facility, 2019, [online] (Accessed 20 Jun 2021). in intelligent internet of things, IEEE Trans. Ind. Inf. 17 (3) (2020) 2169–2180.
[153] Orbit, Open-access research testbed for next-generation wireless networks [177] G. Kakkavas, A. Stamou, V. Karyotis, S. Papavassiliou, Network tomography for
(ORBIT), 2016, [online] (Accessed 20 Jun 2021). efficient monitoring in SDN-enabled 5G networks and beyond: Challenges and
[154] F. Consortium, Open-access research testbed for next-generation wireless opportunities, IEEE Commun. Magaz. 59 (3) (2021) 70–76, http://dx.doi.org/
networks (ORBIT), 2017, [online] (Accessed 20 Jun 2021). 10.1109/mcom.001.2000458.

29
M. Macas et al. Computer Networks 212 (2022) 109032

[178] L. Nie, Y. Wu, X. Wang, L. Guo, G. Wang, X. Gao, S. Li, Intrusion detection [206] D. Tran, H. Mac, V. Tong, H.A. Tran, L.G. Nguyen, A LSTM based framework
for secure social internet of things based on collaborative edge computing: A for handling multiclass imbalance in DGA botnet detection, Neurocomputing
generative adversarial network-based approach, IEEE Trans. Comput. Soc. Syst. 275 (2018) 2401–2413, http://dx.doi.org/10.1016/j.neucom.2017.11.018.
(2021). [207] J. Woodbridge, H.S. Anderson, A. Ahuja, D. Grant, Predicting domain genera-
[179] P. Vepakomma, T. Swedish, R. Raskar, O. Gupta, A. Dubey, No peek: A survey tion algorithms with long short-term memory networks, 2016, arXiv preprint
of private distributed deep learning, 2018, arXiv preprint arXiv:1812.03288. arXiv:1611.00791.
[180] J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, J.S. Reller- [208] P. Lison, V. Mavroeidis, Neural reputation models learned from passive DNS
meyer, A survey on distributed machine learning, ACM Comput. Surv. 53 (2) data, in: 2017 IEEE International Conference on Big Data (Big Data), IEEE,
(2020) 1–33, http://dx.doi.org/10.1145/3377454. 2017, pp. 3662–3671, http://dx.doi.org/10.1109/bigdata.2017.8258361.
[181] H.B. McMahan, E. Moore, D. Ramage, B.A. y Arcas, Federated learning of deep [209] J. Spaulding, A. Mohaisen, Defending internet of things against malicious do-
networks using model averaging, 2016, CoRR arXiv:1602.05629. main names using D-FENS, in: 2018 IEEE/ACM Symposium on Edge Computing
[182] X. Wang, Y. Han, V.C. Leung, D. Niyato, X. Yan, X. Chen, Convergence of edge (SEC), IEEE, 2018, pp. 387–392, http://dx.doi.org/10.1109/sec.2018.00051.
computing and deep learning: A comprehensive survey, IEEE Commun. Surv. [210] Cisco, Umbrella popularity list, 2021, [online] (Accessed 25 May 2021).
Tutor. 22 (2) (2020) 869–904, http://dx.doi.org/10.1109/comst.2020.2970550. [211] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object
[183] X. Wang, S. Garg, H. Lin, J. Hu, G. Kaddoum, M.J. Piran, M.S. Hossain, Towards detection, in: Proceedings of the IEEE International Conference on Computer
accurate anomaly detection in industrial internet-of-things using hierarchical Vision, 2017, pp. 2980–2988.
federated learning, IEEE Internet Things J. (2021) 1, http://dx.doi.org/10.1109/ [212] Y. Xu, X. Yan, Y. Wu, Y. Hu, W. Liang, J. Zhang, Hierarchical bidirectional
jiot.2021.3074382. RNN for safety-enhanced b5g heterogeneous networks, IEEE Trans. Netw. Sci.
[184] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, V. Shmatikov, How to backdoor Eng. (2021).
federated learning, in: International Conference on Artificial Intelligence and [213] Mnemonic, Passive DNS, 2021, [online] (Accessed 2 Sep 2021).
Statistics, PMLR, 2020, pp. 2938–2948. [214] J. Bromley, J.W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Säckinger,
[185] P. Faruki, A. Bharmal, V. Laxmi, V. Ganmoor, M.S. Gaur, M. Conti, M. R. Shah, Signature verification using a ‘‘siamese’’ time delay neural network,
Rajarajan, Android security: A survey of issues, malware penetration, and Int. J. Pattern Recognit. Artif. Intell. 7 (04) (1993) 669–688.
defenses, IEEE Commun. Surv. Tutor. 17 (2) (2015) 998–1022, http://dx.doi. [215] V. Ravi, M. Alazab, S. Srinivasan, A. Arunachalam, K. Soman, Adversarial
org/10.1109/comst.2014.2386139. defense: DGA-based botnets and DNS homographs detection through integrated
[186] I. Gartner, Gartner says worldwide smartphone sales will grow 3% in 2020, deep learning, IEEE Trans. Eng. Manage. (2021) http://dx.doi.org/10.1109/tem.
2020, [online] (Accessed 20 Dec 2020). 2021.3059664.
[216] J. Woodbridge, H.S. Anderson, A. Ahuja, D. Grant, Detecting homoglyph attacks
[187] MalwarebytesLABS, 2019 State of malware, 2019, [online] (Accessed 20 Nov
with a siamese neural network, in: 2018 IEEE Security and Privacy Workshops
2020).
(SPW), IEEE, 2018, pp. 22–28.
[188] E.B. Karbab, M. Debbabi, A. Derhab, D. Mouheb, MalDozer: automatic frame-
[217] P. Agten, W. Joosen, F. Piessens, N. Nikiforakis, Seven months’ worth of
work for android malware detection using deep learning, Digit. Investig. 24
mistakes: A longitudinal study of typosquatting abuse, in: Proceedings 2015
(2018) S48–S59, http://dx.doi.org/10.1016/j.diin.2018.01.007.
Network and Distributed System Security Symposium, Internet Society, 2015,
[189] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed represen-
http://dx.doi.org/10.14722/ndss.2015.23058.
tations of words and phrases and their compositionality, in: Advances in Neural
[218] Orbis Research Rendering Conscientious Research, Global cyber physical sys-
Information Processing Systems, 2013, pp. 3111–3119.
tems market 2020 by company, regions, type and application, forecast to 2025,
[190] North Carolina State University, Android malware genome project, 2012,
2021, [online] (Accessed 5 Sep 2021).
[online] (Accessed 20 Feb 2021).
[219] I. Stellios, P. Kotzanikolaou, M. Psarakis, C. Alcaraz, J. Lopez, A survey of
[191] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, K. Rieck, Drebin: Effective
IoT-enabled cyberattacks: Assessing attack paths to critical infrastructures and
and explainable detection of android malware in your pocket, in: Proceedings
services, IEEE Commun. Surv. Tutor. 20 (4) (2018) 3453–3495, http://dx.doi.
2014 Network and Distributed System Security Symposium, Vol. 14, Internet
org/10.1109/comst.2018.2855563.
Society, 2014, pp. 23–26, http://dx.doi.org/10.14722/ndss.2014.23247.
[220] S.D. Anton, D. Fraunholz, C. Lipps, F. Pohl, M. Zimmermann, H.D. Schotten,
[192] Google LLC, Google play store, 2021, [online] (Accessed 25 May 2021).
Two decades of SCADA exploitation: A brief history, in: 2017 IEEE Conference
[193] R. Feng, S. Chen, X. Xie, L. Ma, G. Meng, Y. Liu, S.-W. Lin, Mobidroid: a
on Application, Information and Network Security (AINS), IEEE, 2017, pp.
performance-sensitive malware detection system on mobile platform, in: 2019
98–104, http://dx.doi.org/10.1109/ains.2017.8270432.
24th International Conference on Engineering of Complex Computer Systems
[221] R. Khatoun, S. Zeadally, Cybersecurity and privacy solutions in smart cities,
(ICECCS), IEEE, 2019, pp. 61–70, http://dx.doi.org/10.1109/iceccs.2019.00014.
IEEE Commun. Mag. 55 (3) (2017) 51–59, http://dx.doi.org/10.1109/mcom.
[194] Contagiodump, Contagio malware dump, 2020, [online] (Accessed 2 Feb 2021).
2017.1600297cm.
[195] R. Feng, S. Chen, X. Xie, G. Meng, S.-W. Lin, Y. Liu, A performance-sensitive
[222] Defense Use Case, Analysis of the cyber attack on the Ukrainian power grid,
malware detection system using deep learning on mobile devices, IEEE Trans.
Electr. Inf. Shar. Anal. Center (E-ISAC) 388 (2016).
Inform. Forensic Secur. 16 (2021) 1563–1578, http://dx.doi.org/10.1109/tifs. [223] H. Boyes, Cybersecurity and cyber-resilient supply chains, Technol. Innov.
2020.3025436. Manag. Rev. 5 (4) (2015) 28–34, http://dx.doi.org/10.22215/timreview888.
[196] I.U. Haq, T.A. Khan, A. Akhunzada, A dynamic robust DL-based model for [224] M. Macas, W. Chunming, Enhanced cyber-physical security through deep
android malware detection, IEEE Access 9 (2021) 74510–74521. learning techniques, in: Proc. CPS Summer School Ph. D. Workshop, 2019, pp.
[197] F. Wei, Y. Li, S. Roy, X. Ou, W. Zhou, Deep ground truth analysis of current 72–83.
android malware, in: International Conference on Detection of Intrusions and [225] J. Goh, S. Adepu, M. Tan, Z.S. Lee, Anomaly detection in cyber physical systems
Malware, and Vulnerability Assessment, Springer, 2017, pp. 252–276. using recurrent neural networks, in: 2017 IEEE 18th International Symposium
[198] A. Azmoodeh, A. Dehghantanha, K.-K.R. Choo, Robust malware detection for on High Assurance Systems Engineering (HASE), IEEE, 2017, pp. 140–145,
internet of (battlefield) things devices using deep eigenspace learning, IEEE http://dx.doi.org/10.1109/hase.2017.36.
Trans. Sustain. Comput. 4 (1) (2019) 88–95, http://dx.doi.org/10.1109/tsusc. [226] J. Inoue, Y. Yamagata, Y. Chen, C.M. Poskitt, J. Sun, Anomaly detection for
2018.2809665. a water treatment system using unsupervised machine learning, in: 2017 IEEE
[199] J. Jeon, J.H. Park, Y.-S. Jeong, Dynamic analysis for IoT malware detection International Conference on Data Mining Workshops (ICDMW), IEEE, 2017, pp.
with convolution neural network model, IEEE Access 8 (2020) 96899–96911, 1058–1065, http://dx.doi.org/10.1109/icdmw.2017.149.
http://dx.doi.org/10.1109/access.2020.2995887. [227] M. Kravchik, A. Shabtai, Detecting cyber attacks in industrial control systems
[200] M. Dib, S. Torabi, E. Bou-Harb, C. Assi, A multi-dimensional deep learning using convolutional neural networks, in: Proceedings of the 2018 Workshop
framework for IoT malware classification and family attribution, IEEE Trans. on Cyber-Physical Systems Security and PrivaCy, ACM, 2018, pp. 72–83, http:
Netw. Serv. Manag. (2021). //dx.doi.org/10.1145/3264888.3264896.
[201] The Pi Hut, Raspberry pi store, 2021, [online] (Accessed 25 May 2021). [228] M. Kravchik, A. Shabtai, Efficient cyber attack detection in industrial control
[202] DataBridge Market Research, Global botnet detection market – industry trends systems using lightweight neural networks and PCA, IEEE Trans. Depend. Secur.
and forecast to 2027, 2021, [online] (Accessed 20 Jun 2021). Comput. (2021) 1, http://dx.doi.org/10.1109/tdsc.2021.3050101.
[203] J. Kim, A. Sim, J. Kim, K. Wu, Botnet detection using recurrent variational au- [229] X. Xie, B. Wang, T. Wan, W. Tang, Multivariate abnormal detection for
toencoder, in: GLOBECOM 2020-2020 IEEE Global Communications Conference, industrial control systems using 1D CNN and GRU, IEEE Access 8 (2020)
IEEE, 2020, pp. 1–6. 88348–88359, http://dx.doi.org/10.1109/access.2020.2993335.
[204] R. Vinayakumar, M. Alazab, S. Srinivasan, Q.-V. Pham, S.K. Padannayil, K. [230] K.-D. Lu, G.-Q. Zeng, X. Luo, J. Weng, W. Luo, Y. Wu, Evolutionary deep belief
Simran, A visualized botnet detection system based deep learning for the network for cyber-attack detection in industrial automation and control system,
internet of things networks of smart cities, IEEE Trans. Ind. Appl. 56 (4) (2020) IEEE Trans. Ind. Inf. (2021).
4436–4456. [231] S. Boettcher, A. Percus, Nature’s way of optimizing, Artificial Intelligence 119
[205] R.R. Curtin, A.B. Gardner, S. Grzonkowski, A. Kleymenov, A. Mosquera, (1–2) (2000) 275–286.
Detecting DGA domains with recurrent neural networks and side information, [232] T. Morris, W. Gao, Industrial control system traffic data sets for intrusion detec-
in: Proceedings of the 14th International Conference on Availability, Reliability tion research, in: International Conference on Critical Infrastructure Protection,
and Security, ACM, 2019, p. 20, http://dx.doi.org/10.1145/3339252.3339258. Springer, 2014, pp. 65–78.

30
M. Macas et al. Computer Networks 212 (2022) 109032

[233] S. Huda, J. Yearwood, M.M. Hassan, A. Almogren, Securing the operations in [258] S. Seth, S. Biswas, Multimodal spam classification using deep learning tech-
SCADA-IoT platform based industrial control system using ensemble of deep niques, in: 2017 13th International Conference on Signal-Image Technology &
belief networks, Appl. Soft Comput. 71 (2018) 66–77. Internet-Based Systems (SITIS), IEEE, 2017, pp. 346–349, http://dx.doi.org/10.
[234] B. Hussain, Q. Du, B. Sun, Z. Han, Deep learning-based ddos-attack detection 1109/sitis.2017.91.
for cyber–physical system over 5G network, IEEE Trans. Ind. Inf. 17 (2) (2021) [259] T. Wu, S. Liu, J. Zhang, Y. Xiang, Twitter spam detection based on
860–870, http://dx.doi.org/10.1109/TII.2020.2974520. deep learning, in: Proceedings of the Australasian Computer Science Week
[235] G. Barlacchi, M. De Nadai, R. Larcher, A. Casella, C. Chitic, G. Torrisi, F. Multiconference, ACM, 2017, pp. 1–8, http://dx.doi.org/10.1145/3014812.
Antonelli, A. Vespignani, A. Pentland, B. Lepri, A multi-source dataset of urban 3014815.
life in the city of Milan and the Province of Trentino, Scientific Data 2 (1) [260] C. Yang, R. Harkreader, G. Gu, Empirical evaluation and new design for fighting
(2015) 1–15. evolving Twitter spammers, IEEE Trans. Inform. Forensic Secur. 8 (8) (2013)
[236] Y. He, G.J. Mendis, J. Wei, Real-time detection of false data injection attacks 1280–1293, http://dx.doi.org/10.1109/tifs.2013.2267732.
in smart grid: A deep learning-based intelligent mechanism, IEEE Trans. Smart [261] A. Makkar, U. Ghosh, P.K. Sharma, Artificial intelligence and edge computing-
Grid 8 (5) (2017) 2505–2516, http://dx.doi.org/10.1109/tsg.2017.2703842. enabled web spam detection for next generation IoT applications, IEEE Sens. J.
[237] X. Niu, J. Li, J. Sun, K. Tomsovic, Dynamic detection of false data injection (2021).
attack in smart grid using deep learning, in: 2019 IEEE Power & Energy Society [262] S. Sedhai, A. Sun, Hspam14, in: Proceedings of the 38th International ACM
Innovative Smart Grid Technologies Conference (ISGT), IEEE, 2019, pp. 1–6, SIGIR Conference on Research and Development in Information Retrieval, ACM,
http://dx.doi.org/10.1109/isgt.2019.8791598. 2015, pp. 223–232, http://dx.doi.org/10.1145/2766462.2767701.
[263] B. Wang, A. Zubiaga, M. Liakata, R. Procter, Making the most of tweet-
[238] J. Wang, D. Shi, Y. Li, J. Chen, H. Ding, X. Duan, Distributed framework
inherent features for social spam detection on Twitter, 2015, arXiv preprint
for detecting PMU data manipulation attacks with deep autoencoders, IEEE
arXiv:1503.07405.
Trans. Smart Grid 10 (4) (2019) 4401–4410, http://dx.doi.org/10.1109/tsg.
[264] G. Lingam, R.R. Rout, D.V.L.N. Somayajulu, S.K. Ghosh, Particle swarm
2018.2859339.
optimization on deep reinforcement learning for detecting social spam bots
[239] Y. Wang, D. Chen, C. Zhang, X. Chen, B. Huang, X. Cheng, Wide and
and spam-influential users in Twitter network, IEEE Syst. J. 15 (2) (2021)
recurrent neural networks for detection of false data injection in smart grids,
2281–2292, http://dx.doi.org/10.1109/JSYST.2020.3034416.
in: International Conference on Wireless Algorithms, Systems, and Applications,
[265] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi, The paradigm-
Springer, 2019, pp. 335–345.
shift of social spambots: Evidence, theories, and tools for the arms race,
[240] Y. Zhang, J. Wang, B. Chen, Detecting false data injection attacks in smart
in: Proceedings of the 26th International Conference on World Wide Web
grids: A semi-supervised deep learning approach, IEEE Trans. Smart Grid 12
Companion, 2017, pp. 963–972.
(1) (2020) 623–634.
[266] T. Xu, G. Goossen, H.K. Cevahir, S. Khodeir, Y. Jin, F. Li, S. Shan, S. Patel, D.
[241] K.P. Schneider, B. Mather, B. Pal, C.-W. Ten, G.J. Shirek, H. Zhu, J.C. Fuller, Freeman, P. Pearce, Deep entity classification: Abusive account detection for
J.L.R. Pereira, L.F. Ochoa, L.R. de Araujo, et al., Analytic considerations and online social networks, in: 30th USENIX Security Symposium (USENIX Security
design basis for the IEEE distribution test feeders, IEEE Trans. Power Syst. 33 21), USENIX Association, 2021, pp. 4097–4114.
(3) (2017) 3181–3188. [267] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L.
[242] I. Siniosoglou, P. Radoglou-Grammatikis, G. Efstathopoulos, P. Fouliras, P. Kaiser, I. Polosukhin, Attention is all you need, 2017, arXiv preprint arXiv:
Sarigiannidis, A unified deep learning anomaly detection and classification 1706.03762.
approach for smart grid environments, IEEE Trans. Netw. Serv. Manag. (2021). [268] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A.
[243] P.R. Grammatikis, P. Sarigiannidis, E. Iturbe, E. Rios, A. Sarigiannidis, O. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are
Nikolis, D. Ioannidis, V. Machamint, M. Tzifas, A. Giannakoulias, et al., Secure few-shot learners, 2020, arXiv preprint arXiv:2005.14165.
and private smart grid: The SPEAR architecture, in: 2020 6th IEEE Conference [269] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep
on Network Softwarization (NetSoft), IEEE, 2020, pp. 450–456. bidirectional transformers for language understanding, 2018, arXiv preprint
[244] F. van Wyk, Y. Wang, A. Khojandi, N. Masoud, Real-time sensor anomaly arXiv:1810.04805.
detection and identification in automated vehicles, IEEE Trans. Intell. Transport. [270] J. Cao, C. Lai, A bilingual multi-type spam detection model based on M-
Syst. 21 (3) (2020) 1264–1276, http://dx.doi.org/10.1109/tits.2019.2906038. BERT, in: GLOBECOM 2020-2020 IEEE Global Communications Conference,
[245] D. Bezzina, J. Sayer, Safety pilot model deployment: Test conductor team report, IEEE, 2020, pp. 1–6.
Report No. DOT HS, 812, (171) 2014, p. 18. [271] Y. Kou, C.-T. Lu, S. Sirwongwattana, Y.-P. Huang, Survey of fraud detection
[246] M. Hanselmann, T. Strauss, K. Dormann, H. Ulmer, Canet: An unsupervised techniques, in: IEEE International Conference on Networking, Sensing and
intrusion detection system for high dimensional CAN bus data, IEEE Access 8 Control, Vol. 2, IEEE, 2004, pp. 749–754.
(2020) 58194–58205. [272] M. Intelligence, Fraud detection and prevention market - growth, trends,
[247] M. Hanselmann, T. Strauss, K. Dormann, H. Ulmer, Syncan dataset, 2021, COVID-19 impact, and forecasts (2021 - 2026), 2021, [online] (Accessed 20
[online] (Accessed 20 Jun 2021). Jun 2021).
[248] C. Yue, L. Wang, D. Wang, R. Duo, X. Nie, An ensemble intrusion detection [273] S. Pandit, J. Liu, R. Perdisci, M. Ahamad, Applying deep learning to combat
method for train ethernet consist network based on CNN and RNN, IEEE Access mass robocalls, in: 2021 IEEE Security and Privacy Workshops (SPW), 2021,
9 (2021) 59527–59539. pp. 63–70, http://dx.doi.org/10.1109/SPW53761.2021.00018.
[249] G. Kakkavas, M. Diamanti, A. Stamou, V. Karyotis, F. Bouali, J. Pinola, O. Apilo, [274] S. Xu, S. Lai, Y. Li, A deep learning based framework for cloud masquerade
S. Papavassiliou, K. Moessner, Design, development, and evaluation of 5G- attack detection, in: 2018 IEEE 37th International Performance Computing and
enabled vehicular services: The 5G-HEART perspective, Sensors 22 (2) (2022) Communications Conference (IPCCC), IEEE, 2018, pp. 1–2, http://dx.doi.org/
426, http://dx.doi.org/10.3390/s22020426. 10.1109/pccc.2018.8711277.
[275] S. Rezaei, X. Liu, Deep learning for encrypted traffic classification: An overview,
[250] D.A. Hahn, A. Munir, V. Behzadan, Security and privacy issues in intelligent
IEEE Commun. Mag. 57 (5) (2019) 76–81, http://dx.doi.org/10.1109/mcom.
transportation systems: Classification and challenges, IEEE Intell. Transp. Syst
2019.1800819.
1 (2019).
[276] K. Abe, S. Goto, Fingerprinting attack on Tor anonymity using deep learning, in:
[251] D.-A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network
Proceedings of the Asia-Pacific Advanced Network, Vol. 42, 2016, pp. 15–20.
learning by exponential linear units (elus), 2015, arXiv preprint arXiv:1511.
[277] V. Rimmer, D. Preuveneers, M. Juárez, T. van Goethem, W. Joosen, Automated
07289.
website fingerprinting through deep learning, in: 25th Annual Network and
[252] D.E. Sorkin, Spam statistics and facts, 2021, [online] (Accessed 19 Jun 2021).
Distributed System Security Symposium, NDSS 2018, San Diego, California,
[253] B. Feng, Q. Fu, M. Dong, D. Guo, Q. Li, Multistage and elastic spam detection USA, February 18-21, 2018, The Internet Society, 2018, http://dx.doi.org/10.
in mobile social networks through deep learning, IEEE Netw. 32 (4) (2018) 14722/ndss.2018.23105.
15–21, http://dx.doi.org/10.1109/mnet.2018.1700406. [278] P. Sirinam, M. Imani, M. Juarez, M. Wright, Deep fingerprinting: Undermining
[254] Y. Gao, M. Gong, Y. Xie, A. Qin, An attention-based unsupervised adversarial website fingerprinting defenses with deep learning, in: Proceedings of the 2018
model for movie review spam detection, IEEE Trans. Multimedia 23 (2021) ACM SIGSAC Conference on Computer and Communications Security, 2018, pp.
784–796, http://dx.doi.org/10.1109/tmm.2020.2990085. 1928–1943.
[255] S. Madisetty, M.S. Desarkar, A neural network-based ensemble approach for [279] G. Aceto, D. Ciuonzo, A. Montieri, A. Pescapé, Mobile encrypted traffic
spam detection in Twitter, IEEE Trans. Comput. Soc. Syst. 5 (4) (2018) 973–984, classification using deep learning, in: 2018 Network Traffic Measurement and
http://dx.doi.org/10.1109/tcss.2018.2878852. Analysis Conference (TMA), IEEE, 2018, pp. 1–8.
[256] A. Makkar, N. Kumar, An efficient deep learning-based scheme for web spam [280] T. Shapira, Y. Shavitt, Flowpic: Encrypted internet traffic classification is as easy
detection in IoT environment, Future Gener. Comput. Syst. 108 (2020) 467–487, as image recognition, in: IEEE INFOCOM 2019-IEEE Conference on Computer
http://dx.doi.org/10.1016/j.future.2020.03.004. Communications Workshops (INFOCOM WKSHPS), IEEE, 2019, pp. 680–687.
[257] P.K. Roy, J.P. Singh, S. Banerjee, Deep learning to filter SMS spam, Future [281] W. Wang, M. Zhu, X. Zeng, X. Ye, Y. Sheng, Malware traffic classification using
Gener. Comput. Syst. 102 (2020) 524–533, http://dx.doi.org/10.1016/j.future. convolutional neural network for representation learning, in: 2017 International
2019.09.001. Conference on Information Networking (ICOIN), IEEE, 2017, pp. 712–717.

31
M. Macas et al. Computer Networks 212 (2022) 109032

[282] S. Rezaei, X. Liu, Multitask learning for network traffic classification, in: 2020 [307] A. Shahraki, M. Abbasi, A. Taherkordi, A.D. Jurcut, Active learning for network
29th International Conference on Computer Communications and Networks traffic classification: A technical study, 2021, arXiv:2106.06933.
(ICCCN), IEEE, 2020, pp. 1–9. [308] F.-L. Fan, J. Xiong, M. Li, G. Wang, On interpretability of artificial neural
[283] P. Wang, Z. Wang, F. Ye, X. Chen, Bytesgan: A semi-supervised generative networks: A survey, IEEE Trans. Radiat. Plasma Med. Sci. (2021) 1, http:
adversarial network for encrypted traffic classification of SDN edge gateway in //dx.doi.org/10.1109/trpms.2021.3066428.
green communication network, 2021, arXiv preprint arXiv:2103.05250. [309] J.R. Geis, A.P. Brady, C.C. Wu, J. Spencer, E. Ranschaert, J.L. Jaremko, S.G.
[284] H. Wu, L. Wang, G. Cheng, X. Hu, Mobile application encryption traffic Langer, A.B. Kitts, J. Birch, W.F. Shields, R. van den Hoven van Genderen, E.
classification based on TLS flow sequence network, in: 2021 IEEE International Kotter, J.W. Gichoya, T.S. Cook, M.B. Morgan, A. Tang, N.M. Safdar, M. Kohli,
Conference on Communications Workshops (ICC Workshops), IEEE, 2021, pp. Ethics of artificial intelligence in radiology: Summary of the joint European and
1–6. North American multisociety statement, Can. Assoc. Radiol. J. 70 (4) (2019)
[285] T. Wang, Website fingerprinting, 2021, [online] (Accessed 25 May 2021). 329–334, http://dx.doi.org/10.1016/j.carj.2019.08.010.
[286] V. Rimmer, D. Preuveneers, M. Juarez, T. Van Goethem, W. Joosen,
[310] M. Wang, K. Zheng, Y. Yang, X. Wang, An explainable machine learning
Dataset-website fingerprinting, 2021, [online] (Accessed 20 Jun 2021).
framework for intrusion detection systems, IEEE Access 8 (2020) 73127–73141,
[287] A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with
http://dx.doi.org/10.1109/access.2020.2988359.
deep convolutional generative adversarial networks, 2015, arXiv preprint arXiv:
[311] S.M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions,
1511.06434.
in: Proceedings of the 31st International Conference on Neural Information
[288] D. Sahoo, Q. Pham, J. Lu, S.C.H. Hoi, Online deep learning: Learning deep
Processing Systems, in: NIPS’17, Curran Associates Inc., Red Hook, NY, USA,
neural networks on the fly, 2017, arXiv:1711.03705.
2017, pp. 4768–4777.
[289] M. McCloskey, N.J. Cohen, Catastrophic interference in connectionist networks:
The sequential learning problem, in: Psychology of Learning and Motivation, [312] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M.A. Riedmiller, Deter-
10.1016/s0079-7421(08)60536-8, Elsevier, 1989, pp. 109–165, http://dx.doi. ministic policy gradient algorithms, in: Proceedings of the 31th International
org/10.1016/s0079-7421(08)60536-8. Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014,
[290] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A.A. Rusu, in: JMLR Workshop and Conference Proceedings, Vol. 32, JMLR.org, 2014, pp.
K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, 387–395.
D. Kumaran, R. Hadsell, Overcoming catastrophic forgetting in neural networks, [313] I. Grondman, L. Busoniu, G.A. Lopes, R. Babuska, A survey of actor-critic
Proc. Natl. Acad. Sci. 114 (13) (2017) 3521–3526, http://dx.doi.org/10.1073/ reinforcement learning: Standard and natural policy gradients, IEEE Trans. Syst.
pnas.1611835114. Man Cybern. C 42 (6) (2012) 1291–1307, http://dx.doi.org/10.1109/tsmcc.
[291] R. Kemker, M. McClure, A. Abitino, T. Hayes, C. Kanan, Measuring catastrophic 2012.2218595.
forgetting in neural networks, in: Proceedings of the AAAI Conference on [314] T. Jung, S. Kim, K. Kim, DeepVision: deepfakes detection using human eye
Artificial Intelligence, Vol. 32, (1) 2018. blinking pattern, IEEE Access 8 (2020) 83144–83154, http://dx.doi.org/10.
[292] P. Tune, D. Veitch, Sampling vs sketching: An information theoretic comparison, 1109/access.2020.2988660.
in: 2011 Proceedings IEEE INFOCOM, IEEE, 2011, http://dx.doi.org/10.1109/ [315] T.T. Nguyen, C.M. Nguyen, D.T. Nguyen, D.T. Nguyen, S. Nahavandi, Deep
infcom.2011.5935020. learning for deepfakes creation and detection, 1, 2019, arXiv preprint arXiv:
[293] Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, V. Braverman, One sketch to rule 1909.11573.
them all, in: Proceedings of the 2016 ACM SIGCOMM Conference, ACM, 2016, [316] S. Rezaei, X. Liu, Security of deep learning methodologies: Challenges and
http://dx.doi.org/10.1145/2934872.2934906. opportunities, 2019, arXiv preprint arXiv:1912.03735.
[294] T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, S. Uhlig, [317] T.T. Nguyen, V.J. Reddi, Deep reinforcement learning for cyber security, 2019,
Elastic sketch, in: Proceedings of the 2018 Conference of the ACM Special arXiv preprint arXiv:1906.05799.
Interest Group on Data Communication, ACM, 2018, http://dx.doi.org/10.1145/
[318] Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, X. Zhang, Trojaning attack
3230543.3230544.
on neural networks, in: 25th Annual Network and Distributed System Security
[295] J.-Y. Li, T. Chow, Y.-L. Yu, The estimation theory and optimization algorithm for
Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018,
the number of hidden units in the higher-order feedforward neural network, in:
The Internet Society, 2018.
Proceedings of ICNN’95 - International Conference on Neural Networks, IEEE,
[319] X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep
http://dx.doi.org/10.1109/icnn.1995.487330.
learning systems using data poisoning, 2017, arXiv preprint arXiv:1712.05526.
[296] D. Menotti, G. Chiachia, A. Pinto, W. Robson Schwartz, H. Pedrini, A.
Xavier Falcao, A. Rocha, Deep representations for iris, face, and fingerprint [320] T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: evaluating backdooring attacks
spoofing detection, IEEE Trans. Inform. Forensic Secur. 10 (4) (2015) 864–879, on deep neural networks, IEEE Access 7 (2019) 47230–47244, http://dx.doi.
http://dx.doi.org/10.1109/tifs.2015.2398817. org/10.1109/access.2019.2909068.
[297] A. Panesar, Evaluating machine learning models, in: Machine Learning and AI [321] C.-Y. Hsu, P.-Y. Chen, S. Lu, S. Liu, C.-M. Yu, Adversarial examples for
for Healthcare, A Press, 2020, pp. 189–205, http://dx.doi.org/10.1007/978-1- unsupervised machine learning models, 2021, arXiv:2103.01895.
4842-6537-6_7. [322] Gdpr.eu, General data protection regulation (GDPR) compliance guidelines.,
[298] X. He, K. Zhao, X. Chu, Automl: A survey of the state-of-the-art, Knowl.-Based 2020, [online] (Accessed 5 Sep 2021).
Syst. 212 (2021) 106622, http://dx.doi.org/10.1016/j.knosys.2020.106622. [323] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A.y. Arcas, Communication-
[299] H. Hindy, D. Brosset, E. Bayne, A.K. Seeam, C. Tachtatzis, R. Atkinson, X. efficient learning of deep networks from decentralized data, in: Proceedings
Bellekens, A taxonomy of network threats and the effect of current datasets of Machine Learning Research, Vol. 54, PMLR, 2017, pp. 1273–1282, URL
on intrusion detection systems, IEEE Access 8 (2020) 104650–104675, http: https://proceedings.mlr.press/v54/mcmahan17a.html.
//dx.doi.org/10.1109/access.2020.3000179. [324] V. Smith, C.-K. Chiang, M. Sanjabi, A. Talwalkar, Federated multi-task learning,
[300] A. Gharib, I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, An evaluation frame- in: Proceedings of the 31st International Conference on Neural Information
work for intrusion detection dataset, in: 2016 International Conference on Processing Systems, in: NIPS’17, Curran Associates Inc., Red Hook, NY, USA,
Information Science and Security (ICISS), IEEE, 2016, pp. 1–6, http://dx.doi. 2017, pp. 4427–4437.
org/10.1109/icissec.2016.7885840. [325] L.T. Phong, Y. Aono, T. Hayashi, L. Wang, S. Moriai, Privacy-preserving deep
[301] I. Sharafaldin, A. Habibi Lashkari, A.A. Ghorbani, Toward generating a new learning via additively homomorphic encryption, IEEE Trans. Inf. Forensics
intrusion detection dataset and intrusion traffic characterization, in: Proceedings Secur. 13 (5) (2018) 1333–1345, http://dx.doi.org/10.1109/tifs.2017.2787987.
of the 4th International Conference on Information Systems Security and Pri-
[326] D. Gao, Y. Liu, A. Huang, C. Ju, H. Yu, Q. Yang, Privacy-preserving hetero-
vacy, SCITEPRESS - Science and Technology Publications, 2018, pp. 108–116,
geneous federated transfer learning, in: 2019 IEEE International Conference on
http://dx.doi.org/10.5220/0006639801080116.
Big Data (Big Data), IEEE, 2019, http://dx.doi.org/10.1109/bigdata47090.2019.
[302] C. Chio, D. Freeman, Machine Learning and Security: Protecting Systems with
9005992.
Data and Algorithms, ‘‘O’Reilly Media, Inc.’’, 2018.
[327] H. Yang, H. He, W. Zhang, X. Cao, FedSteg: A federated transfer learning
[303] R.F. Fouladi, T. Seifpoor, E. Anarim, Frequency characteristics of DoS and
framework for secure image steganalysis, IEEE Trans. Netw. Sci. Eng. 8 (2)
DDoS attacks, in: 2013 21st Signal Processing and Communications Applications
(2021) 1084–1094, http://dx.doi.org/10.1109/tnse.2020.2996612.
Conference (SIU), IEEE, 2013, pp. 1–4, http://dx.doi.org/10.1109/siu.2013.
6531200. [328] X. Yin, Y. Zhu, J. Hu, A comprehensive survey of privacy-preserving federated
[304] P. Ruvolo, E. Eaton, ELLA: An efficient lifelong learning algorithm, in: S. Das- learning, ACM Comput. Surv. 54 (6) (2021) 1–36, http://dx.doi.org/10.1145/
gupta, D. McAllester (Eds.), Proceedings of the 30th International Conference 3460427.
on Machine Learning, in: Proceedings of Machine Learning Research, Vol. 28, [329] O. Alkadi, N. Moustafa, B. Turnbull, K.-K.R. Choo, A deep blockchain
(1) PMLR, Atlanta, Georgia, USA, 2013, pp. 507–515. framework-enabled collaborative intrusion detection for protecting IoT and
[305] P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, X. Wang, A survey of cloud networks, IEEE Internet Things J. 8 (12) (2021) 9463–9472, http://dx.
deep active learning, 2020, arXiv:2009.00236. doi.org/10.1109/jiot.2020.2996590.
[306] D. Shu, N.O. Leslie, C.A. Kamhoua, C.S. Tucker, Generative adversarial attacks [330] X. Liu, H. Li, G. Xu, S. Liu, Z. Liu, R. Lu, PADL: Privacy-aware and asynchronous
against intrusion detection systems using active learning, in: Proceedings of the deep learning for IoT applications, IEEE Internet Things J. 7 (8) (2020)
2nd ACM Workshop on Wireless Security and Machine Learning, 2020, pp. 1–6. 6955–6969, http://dx.doi.org/10.1109/jiot.2020.2981379.

32
M. Macas et al. Computer Networks 212 (2022) 109032

[331] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, [348] Y. Xiao, W.Y. Wang, Quantifying uncertainties in natural language processing
Cryptonets: Applying neural networks to encrypted data with high throughput tasks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33
and accuracy, in: M.F. Balcan, K.Q. Weinberger (Eds.), Proceedings of the 33rd (2019) 7322–7329, http://dx.doi.org/10.1609/aaai.v33i01.33017322.
International Conference on Machine Learning, in: Proceedings of Machine [349] J. Chen, X. Ran, Deep learning with edge computing: A review, Proc. IEEE 107
Learning Research, Vol. 48, PMLR, New York, New York, USA, 2016, pp. (8) (2019) 1655–1674, http://dx.doi.org/10.1109/jproc.2019.2921977.
201–210. [350] M. De Donno, K. Tange, N. Dragoni, Foundations and evolution of modern
[332] K. Nandakumar, N. Ratha, S. Pankanti, S. Halevi, Towards deep neural network computing paradigms: Cloud, IoT, edge, and fog, IEEE Access 7 (2019)
training on encrypted data, in: 2019 IEEE/CVF Conference on Computer Vision 150936–150948, http://dx.doi.org/10.1109/access.2019.2947652.
and Pattern Recognition Workshops (CVPRW), IEEE, 2019, http://dx.doi.org/ [351] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, J. Zhang, Edge intelligence: Paving
10.1109/cvprw.2019.00011. the last mile of artificial intelligence with edge computing, Proc. IEEE 107 (8)
[333] Z. Wu, S. Shen, X. Lian, X. Su, E. Chen, A dummy-based user privacy protection (2019) 1738–1762, http://dx.doi.org/10.1109/jproc.2019.2918951.
approach for text information retrieval, Knowl.-Based Syst. 195 (2020) 105679, [352] J. Karlekar, J. Feng, Z.S. Wong, S. Pranata, Deep face recognition model
http://dx.doi.org/10.1016/j.knosys.2020.105679. compression via knowledge transfer and distillation, 2019, arXiv preprint arXiv:
[334] Z. Wu, S. Shen, H. Zhou, H. Li, C. Lu, D. Zou, An effective approach for 1906.00619.
the protection of user commodity viewing privacy in e-commerce website, [353] E. Park, X. Cui, T.H.B. Nguyen, H. Kim, Presentation attack detection using a
Knowl.-Based Syst. 220 (2021) 106952, http://dx.doi.org/10.1016/j.knosys. tiny fully convolutional network, IEEE Trans. Inform. Forensic Secur. 14 (11)
2021.106952. (2019) 3016–3025, http://dx.doi.org/10.1109/tifs.2019.2907184.
[335] Z. Wu, G. Li, Q. Liu, G. Xu, E. Chen, Covering the sensitive subjects to protect [354] Y. Cheng, D. Wang, P. Zhou, T. Zhang, Model compression and acceleration
personal privacy in personalized recommendation, IEEE Trans. Serv. Comput. for deep neural networks: The principles, progress, and challenges, IEEE Signal
11 (3) (2018) 493–506, http://dx.doi.org/10.1109/tsc.2016.2575825. Process. Mag. 35 (1) (2018) 126–136, http://dx.doi.org/10.1109/msp.2017.
[336] Z. Wu, G. Xu, C. Lu, E. Chen, F. Jiang, G. Li, An effective approach for the 2765695.
protection of privacy text data in the clouddb, World Wide Web 21 (4) (2017) [355] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,
915–938, http://dx.doi.org/10.1007/s11280-017-0491-8. 2015, arXiv preprint arXiv:1503.02531.
[337] Z. Wu, R. Wang, Q. Li, X. Lian, G. Xu, E. Chen, X. Liu, A location privacy-
preserving system based on query range cover-up or location-based services,
IEEE Trans. Veh. Technol. 69 (5) (2020) 5244–5254, http://dx.doi.org/10.
1109/tvt.2020.2981633. Mayra Macas received a Diploma in Computer Systems
[338] Z. Wu, G. Li, S. Shen, X. Lian, E. Chen, G. Xu, Constructing dummy query Engineering (CSY) from Escuela Superior Politécnica de
sequences to protect location privacy and query privacy in location-based Chimborazo (ESPOCH), Ecuador, in 2012, and a M.Sc.
services, World Wide Web 24 (1) (2020) 25–49, http://dx.doi.org/10.1007/ degree in Management of Information System and Business
s11280-020-00830-x. Intelligence from Army University (ESPE), Ecuador, in 2017.
[339] J. Shin, Y. Baek, Y. Eun, S.H. Son, Intelligent sensor attack detection and Currently, she is a Ph.D. candidate in Zhejiang University
identification for automotive cyber-physical systems, in: 2017 IEEE Symposium with the College of Computer Science and Technology,
Series on Computational Intelligence (SSCI), IEEE, 2017, http://dx.doi.org/10. Hangzhou, China and a member of the NGNT laboratory.
1109/ssci.2017.8280915. She was awarded the Chinese Government Scholarship in
[340] A. Chatterjee, H. Reza, Toward modeling and verification of uncertainty in 2017 for her Ph.D. studies. She has been working in the
cyber-physical systems, in: 2020 IEEE International Conference on Electro IT industry for around eight years. Her research interests
Information Technology (EIT), IEEE, 2020, http://dx.doi.org/10.1109/eit48999. include intelligent cyber-security, network security manage-
2020.9208273. ment with smart agents, as well as Big Data security and
[341] F.O. Olowononi, D.B. Rawat, C. Liu, Resilient machine learning for networked privacy.
cyber physical systems: A survey for machine learning security to securing
machine learning for CPS, IEEE Commun. Surv. Tutor. 23 (1) (2021) 524–552,
http://dx.doi.org/10.1109/comst.2020.3036778.
Chunming Wu received the Ph.D. degree in computer science from Zhejiang University,
[342] M.N. Asmat, S.U.R. Khan, S. Hussain, Uncertainty handling in cyber–physical
Hangzhou, China, in 1995. He is currently a Professor with the College of Computer
systems: State-of-the-art approaches, tools, causes, and future directions, J.
Science and Technology, Zhejiang University, where he is also the Associate Director
Softw.: Evol. Process (2022) http://dx.doi.org/10.1002/smr.2428.
with the Research Institute of Computer System Architecture and Network Security. His
[343] N. Jourdan, S. Sen, E. Garcia, E.J. Husom, T. Biegel, J. Metternich, On the
research interests include software-defined networks, reconfigurable networks, proactive
reliability of machine learning applications in manufacturing environments,
network defense, network virtualization, and intelligent networks.
in: NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and
Applications, 2021.
[344] J. Mena, O. Pujol, J. Vitrià, A survey on uncertainty estimation in deep learning
classification systems from a Bayesian perspective, ACM Comput. Surv. 54 (9) Walter Fuertes currently works at the Universidad de las Fuerzas Armadas ESPE
(2022) 1–35, http://dx.doi.org/10.1145/3477140. Sangolquí-Ecuador. He is a full professor (lecturer-researcher) in the School of Computer
[345] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Science of that polytechnic, where he received an Engineering degree in Computer
Lakshminarayanan, J. Snoek, Can you trust your model’s uncertainty? Evalu- Systems in 1995. Then, he received his Master’s in Science degree in Computer
ating predictive uncertainty under dataset shift, in: H. Wallach, H. Larochelle, Networking from the Escuela Politécnica Nacional in Quito-Ecuador, in 1999, and
A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett (Eds.), Advances in Neural the Ph.D. (Hons.) degree in Computer Science and Telecommunications engineering
Information Processing Systems, Vol. 32, Curran Associates, Inc., 2019. from Universidad Autónoma de Madrid, Spain in 2010. Since 2006, he has actively
[346] M. Ma, J. Stankovic, E. Bartocci, L. Feng, Predictive monitoring with logic- participated in several research projects focused on applying virtualization technologies,
calibrated uncertainty for cyber-physical systems, ACM Trans. Embedded data sciences, and cybersecurity. His research interests include managing distributed
Comput. Syst. 20 (5s) (2021) 1–25, http://dx.doi.org/10.1145/3477032. environments, network security, cybersecurity, the applied research of virtualization
[347] Y. Gal, Z. Ghahramani, Dropout as a Bayesian approximation: Representing technologies, and serious games. He has authored several technical publications in
model uncertainty in deep learning, in: M.F. Balcan, K.Q. Weinberger (Eds.), scientific journals and national and international conferences around the world.
Proceedings of the 33rd International Conference on Machine Learning, Vol.
48, PMLR, 2016, pp. 1050–1059.

33

You might also like