0% found this document useful (0 votes)

42 views17 pages

Deep Cybersecurity - A Comprehensive Overview From Neural Network and Deep Learning Perspective

This document discusses a paper that provides a comprehensive overview of deep learning techniques for cybersecurity from a neural network perspective. It describes popular deep learning methods like convolutional neural networks, recurrent neural networks, self-organizing maps, autoencoders, restricted Boltzmann machines, deep belief networks, generative adversarial networks, deep transfer learning, and deep reinforcement learning. It discusses how these techniques can be applied to various cybersecurity tasks such as intrusion detection, malware detection, phishing detection, predicting cyberattacks, fraud detection, and anomaly detection. The paper aims to serve as a reference for academics and cybersecurity professionals on applying deep learning to cybersecurity challenges.

Uploaded by

Brenda Jino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views17 pages

Deep Cybersecurity - A Comprehensive Overview From Neural Network and Deep Learning Perspective

Uploaded by

Brenda Jino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Open access Posted Content DOI:10.20944/PREPRINTS202102.0340.

Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep

Learning Perspective — Source link
Iqbal H. Sarker, Iqbal H. Sarker

Institutions: Swinburne University of Technology, Chittagong University of Engineering & Technology

Published on: 16 Feb 2021

Topics: Deep belief network, Deep learning, Recurrent neural network, Artificial neural network and
Restricted Boltzmann machine

Network Attacks Detection Methods Based on Deep Learning Techniques: A Survey

Software Vulnerability Detection Using Deep Neural Networks: A Survey

Share this paper:

View more about this paper here: https://typeset.io/papers/deep-cybersecurity-a-comprehensive-overview-from-neural-
2hdh5leuhp
SN Computer Science (2021) 2:154
https://doi.org/10.1007/s42979-021-00535-6

SURVEY ARTICLE

Deep Cybersecurity: A Comprehensive Overview from Neural Network

and Deep Learning Perspective
Iqbal H. Sarker1,2

Received: 19 November 2020 / Accepted: 19 February 2021 / Published online: 20 March 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. part of Springer Nature 2021

Abstract
Deep learning, which is originated from an artificial neural network (ANN), is one of the major technologies of today’s smart
cybersecurity systems or policies to function in an intelligent manner. Popular deep learning techniques, such as multi-layer
perceptron, convolutional neural network, recurrent neural network or long short-term memory, self-organizing map, auto-
encoder, restricted Boltzmann machine, deep belief networks, generative adversarial network, deep transfer learning, as well
as deep reinforcement learning, or their ensembles and hybrid approaches can be used to intelligently tackle the diverse
cybersecurity issues. In this paper, we aim to present a comprehensive overview from the perspective of these neural networks
and deep learning techniques according to today’s diverse needs. We also discuss the applicability of these techniques in
various cybersecurity tasks such as intrusion detection, identification of malware or botnets, phishing, predicting cyberat-
tacks, e.g. denial of service, fraud detection or cyberanomalies, etc. Finally, we highlight several research issues and future
directions within the scope of our study in the field. Overall, the ultimate goal of this paper is to serve as a reference point
and guidelines for the academia and professionals in the cyber industries, especially from the deep learning point of view.

Keywords Cybersecurity · Deep learning · Artificial neural network · Artificial intelligence · Cyberattacks · Cybersecurity
analytics · Cyber threat intelligence

Introduction and individuals, cause disruptions, as well as devastating

financial losses. For example, a data breach costs 8.19 mil-
Due to the increasing popularity of internet-of-things (IoT) lion USD for the United States [7] according to the IBM
[1], and today’s dependency on digitalization, various report, and the total annual cost of cybercrime to the global
security incidents or attacks have grown rapidly in recent economy is 400 billion USD [8]. Cybercrimes are growing at
years. Malicious activities, malware or ransomware attack an exponential rate that brings an alarming message for the
[2], zero-day attack [3], cryptographic attack, unauthor- cybersecurity professionals and researchers [9]. Thus, the
ized access [4], denial of service (DoS) [4], data breaches security management tools having the capability of detecting
[5], phishing or social engineering [6], or various attacks and preventing such incidents in a timely and intelligent way
on IoT devices etc. are common nowadays. These types of is urgently needed, on which the overall national security
security incidents or cybercrime can affect organizations of the business, government, and individual citizens of a
country depends.
This article is part of the topical collection “Deep learning Typically, cybersecurity is characterized as a collection of
approaches for data analysis: A practical perspective” guest edited technologies and processes designed to protect computers,
by D. Jude Hemanth, Lipo Wang and Anastasia Angelopoulou. networks, programs, and data against malicious activities,
attacks, harm, or unauthorized access [10]. According to
* Iqbal H. Sarker
[email protected] today’s numerous needs, conventional well-known secu-
rity solutions such as antivirus, firewalls, user authentica-
1
Swinburne University of Technology, Melbourne, VIC 3122, tion, encryption etc. may not be effective [11–14]. The key
Australia issue with these systems is that they are normally operated
2
Department of Computer Science and Engineering, by a few security analysts, where data management is car-
Chittagong University of Engineering & Technology, ried out in an ad hoc manner and can, therefore, not work
Chittagong 4349, Bangladesh

SN Computer Science
Vol.:(0123456789)
154 Page 2 of 16 SN Computer Science (2021) 2:154

intelligently according to the needs [15, 16]. On the other • This study concentrates on the knowledge of ANN and
hand, in the sense of computing that seeks to operate in DL techniques, a part of artificial intelligence (AI), to
an intelligent manner for cybersecurity management, data- function in a timely, automated, and intelligent manner in
driven learning techniques, e.g., deep learning, have evolved the context of cybersecurity, which are considered as the
rapidly in recent years, in which we are interested. major technologies of the Fourth Industrial Revolution
Deep learning (DL) is considered as a part of machine (Industry 4.0).
learning (ML) as well as artificial intelligence (AI), which • We discuss various popular neural network and deep
is originated from an artificial neural network (ANN) and learning techniques including supervised, unsupervised,
one of the major technologies of the Fourth Industrial and reinforcement learning in the context of cybersecu-
Revolution (Industry 4.0) [9] [17]. The worldwide popu- rity, as well as the applicability of these techniques in
larity of “Cyber security” and “Deep learning” is increas- various cybersecurity tasks.
ing day-by-day, which is shown in Fig. 1. The popularity • Finally, we highlight several research issues and future
trend in Fig. 1 is shown based on the data collected from directions within the scope of our study for future devel-
Google Trends over the last 5 years [18]. In this paper, opment and research in the domain of cybersecurity.
we take into account ten popular neural network and deep
learning techniques including supervised, semi-supervised, This paper is organized as follows. Section 2 provides a brief
unsupervised, and reinforcement learning in the context of overview of cybersecurity data. In Sect. 3, we discuss vari-
cybersecurity. These are (i) multi-layer perceptron (MLP), ous artificial neural networks and deep learning methods and
(ii) convolutional neural network (CNN or ConvNet), (iii) their applicability within the area of cybersecurity. Several
recurrent neural network (RNN) or long short-term memory research issues and potential solutions based on our study
(LSTM), (iv) self-organizing map (SOM), (v) auto-encoder are highlighted in Sect. 4. Finally, we conclude this paper
(AE), (vi) restricted Boltzmann machine (RBM), (vii) deep in Sect. 5.
belief networks (DBN), (viii) generative adversarial network
(GAN), (ix) deep transfer learning (DTL or deep TL), and
(x) deep reinforcement learning (DRL or deep RL). These Understanding Cybersecurity Data
deep neural network learning techniques or their ensembles
and hybrid approaches can be used to intelligently solve dif- The data-driven model based on ANN and DL methods is
ferent cybersecurity issues, such as intrusion detection, iden- usually based on data availability [20]. Usually, datasets
tification of malware or botnets, phishing, predicting cyber- reflect a series of data records consisting of many attributes
attacks, e.g. DoS, fraud detection, or cyber-anomalies. Deep or characteristics and relevant information from which the
learning has its benefits to build the security models due to data-driven cybersecurity model is originated. In the field of
its better accuracy, especially learning from large quantities cybersecurity, many datasets exist, including intrusion anal-
of security datasets [19]. The contribution of this paper is ysis, malware analysis, and spam analysis, which are used
summarized as follows: for different purposes. In our earlier paper “cybersecurity
data science”, Sarker et al. [9], we have summarized various

Fig. 1 The worldwide popularity score of “Cyber security” and “Deep learning” in a range of 0 (min) to 100 (max) over time where x-axis repre-
sents the timestamp information and y-axis represents the corresponding popularity score

SN Computer Science
SN Computer Science (2021) 2:154 Page 3 of 16 154

security datasets that are obtained from different sources. In eventually the decision analysis, could play a significant
the following, several such datasets, including their different role to provide intelligent cybersecurity services that are
characteristics and attacks, are summarized to discuss the discussed briefly in Sect. 3.
applicability of security modeling based on ANN and DL, Another dataset the ISCX [25] was created at the Cana-
according to the objective stated in this paper. dian Institute for Cybersecurity. To describe attack and dis-
To build an intrusion detection system dataset DARPA tribution strategies in a network context, the definition of
(Defence Advanced Research Project Agency) made profiles was used. To create accurate profiles of attacks and
the earliest attempt in 1998 [21]. Under the leadership other events to test intrusion detection systems, several real
of DARPA and AFRL/SNHS, the datasets are compiled traces were analyzed. A new dataset, CSE-CIC-IDS2018
and released by the MIT Lincoln Laboratory’s Cyber dataset [26], collected by the Canadian Cyber Security Insti-
Infrastructure and Technology Division (formerly the tute, was recently created at the same institution, based on
DARPA Intrusion Detection Assessment Group) for the a user profile that tracks network events and activity. The
evaluation of computer network intrusion detection sys- MAWI [27] dataset is a collection of research institutions
tems. The KDD Cup 99 dataset containing network traffic and academic institutions used by the Japanese network to
records that include more than forty feature attributes and calculate the global internet situation across a wide region.
one class identifier, is one of the most commonly used To track new traffic, the dataset is updated daily. For DDoS
datasets for intrusion detection. [22]. The dataset contains intrusion detection, some scholars use this data set [27]. The
different types of attacks that fall into four families: DoS, types of attacks found in it are variable since MAWI is real
R2L, U2R, and PROB, as well as normal data. A refined data traffic. The ADFA data set is a set of host-level intru-
version of this dataset is known as the NSL-KDD dataset sion detection system data sets issued by [28] by the Austral-
containing similar features [23], where duplicate records ian Security Academy (ADFA), which is commonly used in
are excluded from both the training and test results. As the testing of products for intrusion detection. It includes
an example of security data, in Table 1, we have shown five types of attacks, including Hydra-FTP, Hydra-SSH, Add
the features of intrusion detection datasets including the Consumer, Java-MeterPerter, Webshell, and two types of
features and their various types such as integer, float, or regular attacks, such as Training and Validation.
nominal for a deeper understanding of security data [24]. The CAIDA’07 [29], dataset represents anonymized
Effectively processing these features according to the traces of 1-h DDoS attack traffic collected on August 04,
requirements, building target ANN and DL model, and 2007. The 1-h traffic will be broken down into 5-min files.

Table 1 An example of features Feature name Value type Feature name Value type
of an intrusion detection dataset
[24] dst_host_srv_count Integer same_srv_rate Float
flag Nominal dst_host_same_srv_rate Float
srv_serror_rate Float dst_host_srv_serror_rate Float
dst_host_serror_rate Float count Integer
protocol_type Nominal logged_in Integer
dst_host_same_src_port_rate Float dst_host_srv_diff _host_rate Float
rerror_rate Float src_bytes Integer
dst_host_srv_rerror_rate Float service Nominal
srv_rerror_rate Float dst_host_rerror_rate Float
dst_host_count Integer dst_host_diff _srv_rate Float
srv_count Integer wrong_fragment Integer
serror_rate Float num_compromised Integer
srv_diff _host_rate Float dst_bytes Integer
hot Integer diff _srv_rate Float
duration Integer is_guest_login Integer
root_shell Integer land Integer
urgent Integer num_failed_logins Integer
su_attempted Integer num_root Integer
num_file_creations Integer num_shells Integer
num_access_files Integer num_outbound_cmds Integer
is_host_login Integer – –

SN Computer Science
154 Page 4 of 16 SN Computer Science (2021) 2:154

The assault consists primarily of SYN, ICMP, and HTTP [49] for insider threat identification. This dataset includes
flood traffic. As most of the legitimate content was removed 516-day device logs containing over 130 million incidents,
after collecting the traffic, this dataset is more biased towards approximately 400 of which are malicious. Due to privacy
DDoS attacks. The CAIDA’08 [30] dataset is the valid and issues, email datasets are hard to obtain because they are
attack traces tracked by Equinix (Chicago and San Jose data extremely difficult to access. Some common e-mail corpo-
centers). On March 19, 2008 and July 17, 2008, respectively, rations, however, include EnronSpam [50], SpamAssassin
traces were taken in Chicago and San Jose. The ISOT’10 [51], and LingSpam [52]. Bot-IoT is a recent [53] dataset
dataset is a mixture of malicious and non-malicious datasets that includes valid and simulated IoT network traffic along
generated at the University of Victoria [31] by research in with various types of forensic network analytics attacks in
Information Security and Object Technology (ISOT). Hon- the Internet of Things region.
eynet [32] gathered decentralized botnet data for malicious To examine the different trends of security incidents or
traffic, and the Ericsson Research Laboratory and Lawrence malicious behavior, the above-discussed datasets could be
Berkeley National Lab retrieved non-malicious traffic. used to construct a data-driven security model based on
ISCX’12 reflects the traffic from a physical test environ- artificial neural networks and deep learning techniques. In
ment in the real world that produces network traffic while Sect. 3, we discuss and review various ANN and DL meth-
containing centralized botnets. A botnet traffic registered ods by taking into account their applicability in various
at the University of CTU, Czech Republic, in 2011, known cybersecurity tasks.
as the [33] CTU-13 dataset. As a source of benign domain
names, the Alexa Top Sites [34] dataset is commonly used as
one can get as many as one million domain names. OSINT ANN and Deep Learning in Cybersecurity
[35] and DGArchive [36] are the malicious domain names.
The UNSW-NB15 dataset [37] was established in 2015 at Deep learning (DL) is typically considered as a part of a
the University of New South Wales. It has 49 characteris- broader family of machine learning methods as well as arti-
tics and a total of almost 257,700 documents covering nine ficial intelligence (AI), which is originated from artificial
different kinds of modern attacks. A systematic approach neural network (ANN) [9]. The main advantage of deep
to generate benchmark datasets for intrusion detection has learning over traditional machine learning methods is its bet-
been presented in [38]. ter performance in several cases, particularly learning from
In recent years, a well-organized market involving large large amounts of security datasets [19]. In the following, we
amounts of money has become the malware industry. Top discuss ten popular neural network and deep learning tech-
apps in the Google Play Store [39] are the most common niques including supervised, semi-supervised, unsupervised,
source of normal knowledge in malware experiments. While and reinforcement learning in the context of cybersecurity.
these apps are not guaranteed to be malware-free, they are These neural networks and deep learning techniques or their
the most likely to be malware-free because of the combina- ensembles and hybrid security models can be used to intel-
tion of Google’s vetting and the ubiquity of the apps. In ligently tackle different cybersecurity issues including intru-
addition, they are also vetted using the VirusTotal service, sion detection, malware analysis, security threat analysis,
[40]. Malware is stored in many datasets. The Genome Pro- predicting cyberattacks or anomalies, etc.
ject dataset [41], for example, consists of 2123 apps, 1260
of which are malicious covering 49 separate families of mal- Multi‑layer Perceptron (MLP)
ware. This is similar to the Virus Share [42] and VirusTotal
[40] datasets. Another wide dataset containing 22,500 mali- Multi-layer perceptron, a class of feedforward artificial
cious and 22,500 benign raw files is the Comodo dataset neural network (ANN), is a supervised learning algorithm
[43]. The Contagio [44] dataset contains 250 malicious files [54]. It is also considered as the base architecture of deep
and is slightly smaller than the others. The DREBIN Data- learning or deep neural networks (DNN). A typical MLP is
set [45] is a highly imbalanced dataset containing 120,000 a fully connected network, consisting of an input layer that
Android apps, 5000 of which are malicious. For the Kaggle receives the input data, an output layer to make a decision
competition, the Microsoft [46] dataset comprises 10,868 or prediction about the input signal, and one or more hidden
hexadecimal and assembly representation binary malware layers between these two [55], which are considered as the
files named from nine different malware families. There true computational engine of the network, shown in Fig. 2.
are some correlations in the datasets containing malicious Since MLPs are fully linked, each node in one layer
data and the Google Play Store data, according to the sta- connects at a certain weight to each node in the next layer.
tistical details in [47] listed above. In addition, there was Several activation functions such as ReLU (Rectified Lin-
a broad synthetic dataset called the Computer Emergency ear Unit), Tanh, Sigmoid, Softmax [54] are used that deter-
Readiness Team (CERT) Insider Threat Dataset v6.2 [48] mine the output of a network. These activation functions

SN Computer Science
SN Computer Science (2021) 2:154 Page 5 of 16 154

Fig. 3 An example of a convolutional neural network (CNN or Con-

vNet) including multiple convolution and pooling layers

Fig. 2 An example of a feed-forward artificial neural network (ANN)

with multiple hidden layers to detect cyber-anomalies or attacks ‘dropout’ [63] that can handle the issue of over-fitting, which
may cause in a typical network.
Convolutional neural networks are specifically designed
also known as transfer functions introducing non-linear to deal with the variability of 2D shapes [62]. In terms of
properties in the network to learn complex functional map- application areas, CNNs are broadly used in image and video
pings from data. MLP utilizes a supervised learning tech- recognition, medical image analysis, recommender systems,
nique called “Backpropagation” [56] for training, which is image classification, image segmentation, natural language
the most “fundamental building block” in a neural network processing, financial time series, etc. Although CNNs are
and widely used algorithm for training feedforward neural most commonly applied to analyzing visual imagery, these
networks. The ultimate objective of the backpropagation networks can also be used in the domain of cybersecurity.
algorithm is to optimize the network weights to accurately For instance, CNN-based deep learning model is used for
map the inputs to the target outputs. Various optimization intrusion detection, e.g., denial-of-service (DoS) attacks, in
techniques such as Stochastic Gradient Descent (SGD), Lim- IoT Networks [64], to detect malware [65], android mal-
ited memory BFGS (L-BFGS), Adaptive Moment Estima- ware detection [66] etc. Besides, a phishing detection model
tion (Adam) [54] are used during the training process. Such has been presented in [67] based on convolutional neural
neural networks can be used to solve various issues in the networks. A multi-CNN fusion-based model can be used
domain of cybersecurity. For instance, building an intru- for intrusion detection [68] in the area. Although CNN has
sion detection model [57], malware analysis [58], security a greater computational burden, it has the advantage of
threat analysis [59], detecting malicious botnet traffic [60] automatically detecting the important features without any
as well as for building trustworthy IoT systems [61] MLP- human supervision, and thus CNN is considered to be more
based networks are used. MLP is sensitive to feature scaling powerful than typical ANN. Several advanced CNN-based
and needs a range of hyperparameters such as the number deep learning models, such as AlexNet [69], Xception [70],
of hidden layers, neurons and iterations to be tuned, which Inception [71], visual geometry group (VGG) [72], ResNet
may lead the model computationally expensive to solve a [73], etc., or other lightweight architecture of the model can
complex security model. However, MLP has the advantage be used to minimize the issues depending on the problem
of learning non-linear models even in real-time or on-line domain and data characteristics.
learning using partial fit [54].
Long Short‑Term Memory Recurrent Neural Network
Convolutional Neural Network (CNN or ConvNet) (LSTM‑RNN)

The convolutional neural network (CNN or ConvNet) [62] Recurrent Neural Network (RNN) [74] is another type
is a deep learning network architecture that learns directly of artificial neural network, which is capable to process
from data, without the need for manual feature extraction. A a sequence of inputs in deep learning and retain its state
typical CNN consists of an input layer, convolutional layers, while processing the next sequence of inputs. All RNNs
pooling layers, fully connected layers, and an output layer, have feedback loops in the recurrent layer, which allows
as shown in Fig. 3. Thus, the CNN improves the architecture them maintaining information in ‘memory’ over time.
of the typical ANN, which is also considered as regular- Long short-term memory (LSTM) networks are a type
ized versions of multi-layer perceptrons. Each of the layer of RNN that uses special units in addition to standard
in CNN considers optimized parameters for significant out- units, which can deal with the vanishing gradient prob-
come as well as to reduce the complexity. CNN also uses a lem. LSTM units have a ‘memory cell’ that can store data

SN Computer Science
154 Page 6 of 16 SN Computer Science (2021) 2:154

learning approach. It uses a competitive learning algorithm

to train its network, in which nodes are competing for the
right to respond to a subset of input data. It learns the shape
of a dataset by continuously moving its neurons nearer to
the data points. Unlike other artificial neural networks using
error-correction learning such as backpropagation with gra-
dient descent [56], SOMs implement competitive learning, a
neighborhood function to preserve the topological properties
of the input space. SOM is generally used for clustering [81]
and mapping high-dimensional dataset as low-dimensional
(typically two-dimensional) discretized pattern, which
allows to reduce complex problems for easy interpretation,
Fig. 4 Basic structure of a long short-term memory (LSTM) unit and thus it is known as dimensionality reduction algorithm.
A Kohonen network or SOM, as shown in Fig. 5, consists
of two layers of processing units called an input layer and
for long periods in memory. Figure 4 shows an example of an output layer. The units in the output layer compete with
a long short-term memory (LSTM) cell, where the ‘Forget each other when an input pattern is fed to the network, and
Gate’, ‘Input Gate’, and ‘Output Gate’ work cooperatively the winning output unit is typically the one whose incoming
to control the information flow in an LSTM unit [75]. For link weights are closest to the input pattern, such as measur-
instance, the ‘Forget Gate’ decides what information will ing through Euclidean distance [56].
be memorized from the previous state cell and to remove SOM has been widely used in, for instance, pattern
the information that is no longer useful, the ‘Input Gate’ recognition, health or medical diagnosis, recognition of
determines which information should enter the cell state, anomalies, virus or worm attack detection [82] [83]. Sev-
and finally the ‘Output Gate’ decides and controls the eral researchers have used SOM for different purposes in the
outputs. domain of cybersecurity. For instance, in [84], the authors
LSTM networks are well-suited for learning and ana- present a self-organizing map and its modeling for discov-
lyzing sequential data, such as classifying, processing, ering malignant network traffic. To identify the hierarchical
and making predictions based on time-series data, which relations within the modern real-world datasets with mixed
differentiates it from other conventional networks. Thus, attributes - numerical and categorical, authors in [85] take
LSTM is commonly applied in the area of time-series into account the growing hierarchical self-organizing map
prediction, time-series anomaly detection, natural lan- (GHSOM) and spark-GHSOM algorithm in their analysis.
guage processing, question answering chatbots, machine The authors have shown in [86] that SOMs have a high
translation, speech recognition, etc. As a large amount potential as a data analytics tool on unknown traffic, where
of security sequential data such as network traffic flows, they can recognize the botnet and normal flows with high
time-dependent malicious activities, etc. are generated confidence of approximately 99%. SOMs are also used in
these days, an LSTM model can also be applicable in [87] as a visual data mining technique while analyzing com-
the domain of cybersecurity. Several LSTM model- puter user behavior, security incidents, and fraud. The main
based security solutions such as intrusion detection [76],
to detect and classify the malicious apps [77], phishing
detection [78], time-based botnet detection [79] have been
studied in the area. Although the main advantage of a
recurrent network over a traditional network is the capa-
bility of modeling the sequence of data, it may require a
lot of resources and time to get trained. Thus, considering
the above-mentioned advantage, an effective LSTM-RNN
network can improve the security models to detect the
security threats, particularly, where the behavior patterns
of the threats exhibit temporal dynamic behavior.

Self‑organizing Map (SOM)

Self-organizing map (SOM) or Kohonen Map [80] is a type

of artificial neural network that follows an unsupervised Fig. 5 The self-organizing map (SOM) architecture

SN Computer Science
SN Computer Science (2021) 2:154 Page 7 of 16 154

advantage of using a SOM is that the data are easily inter- of training. To enhance the intrusion detection method the
preted and understood. Thus, SOMs can play a significant authors in [95] use a stacked sparse auto-encoder. Thus, the
role to build a data-driven effective security model depend- AE-based model in the domain of cybersecurity can be use-
ing on the characteristics of the data. ful due to its capability to capture the main features of data.

Auto‑Encoder (AE) Restricted Boltzmann Machine (RBM)

An auto-encoder (AE) [74] is a type of artificial neural net- Boltzmann machines [96] are stochastic and generative neu-
work used in an unsupervised way to learn efficient data ral networks with only two types of nodes—visible nodes
codes. The goal of an AE is to learn a representation for which we can and do measure, and hidden nodes which we
a data set, typically by training the network to ignore the cannot or do not measure. It is an unsupervised deep learn-
‘noise’ signal for dimensionality reduction. An auto-encoder ing model in which every node is connected to every other
consists of three components: encoder, code, and decoder node, which helps us understand abnormalities by learn-
as shown in Fig. 6. The encoder compresses the input and ing about the working of the system in normal conditions.
generates the data, and the decoder then uses this code to Restricted Boltzmann Boltzmann machines (RBMs) [97]
reconstruct the input. One primary benefit of the AE is that are a special class of Boltzmann Machines and are limited
during propagation, this model can continuously extract use- in terms of connections between the visible layer and the
ful features and filter the useless information [88]. A single- hidden layer, i.e. only connections between the hidden and
layered AE with a linear activation function is very similar the visible layer of variables, but not between two variables
to principal component analysis (PCA) [89], which is also of the same layer [96]. This restriction enables training
used to decrease the dimensionality of large data sets. algorithms to be more efficient than what is available for
The auto-encoder is widely used for unsupervised learn- the general class of Boltzmann machines, particularly the
ing tasks, e.g., dimension reduction, feature extraction, gradient-based contrastive divergence algorithm [98]. The
efficient coding, and generative modeling [74, 90]. In the Figure 7 shows an illustration of an RBM consisting of m
domain of cybersecurity, the deep AE can be used to build visible units V = (v1 , ..., vm ) representing observable data
an effective security model. The reason is that the AE-based and n hidden units H = (h1 , ..., hn ) capturing dependencies
feature learning model in cybersecurity typically uses the between variables observed.
minimum number of security features compared to other The RBM algorithm plays an important role in dimen-
state-of-the-art algorithms. The resulting rich and tiny latent sionality reduction, classification, regression, collaborative
representation of the security features makes the model more filtering, feature learning, topic modeling, and many more in
effective and efficient, even in small devices such as smart- the era of machine learning and deep learning. In the domain
phones, known as the internet of things (IoT) devices [91]. of cybersecurity, the RBM can be used to build an effective
For example, the authors [92] present an AE-based feature security model. For example, the authors in [99] present
learning model for cybersecurity applications, where they network anomaly detection with the restricted Boltzmann
have demonstrated the model efficacy for malware clas- machine. In their approach, they investigate the efficacy
sification and detection of network-based anomalies. An of the model to combine the expressive power of genera-
anomaly-based insider threat detection model using deep tive models with the ability to infer part of its information
AE has been presented in [93]. In [94], the authors present from incomplete training data with good classification accu-
a CNN-based android malware detection model, where they racy. To increase the accuracy of DoS attack detection, the
use deep AE as a pre-training tool to minimize the time authors in [100] present a deep learning method based on a
restricted Boltzmann machine. In [101], the authors present

Fig. 7 A graphical representation of a restricted Boltzmann machine

Fig. 6 A structure of an auto-encoder (AE) with the components (RBM) with m visible and n hidden nodes

SN Computer Science
154 Page 8 of 16 SN Computer Science (2021) 2:154

an approach for the improvement of network intrusion detec- In the area of cybersecurity, DBN can be used in a large
tion accuracy by using RBM that composes new data by number of high-dimensional data applications. For instance,
removing the noises and outliers from the input data. Over- the authors in [104] used the DBN model as a feature reduc-
all, the restricted Boltzmann machine can automatically rec- tion method to build an effective cybersecurity model, e.g.,
ognize patterns in data and build probabilistic or stochastic intrusion detection scheme. In [105], an intrusion detection
models that incorporate randomness in the approach, which model based on a deep belief network has been presented.
is used for feature selection and feature extraction, as well Their experimental findings on NSL-KDD datasets show
as to form a deep belief network. that there are better classification results than SVM in the
DBN-based intrusion detection model, and the time of model
Deep Belief Networks (DBN) establishment is also shorter, which significantly improves
the speed of intrusion detection. The authors present an
A deep belief network (DBN) [102] is a generative graphi- optimization technique for intrusion detection classification
cal model or a probabilistic generative model consists of model based on a deep belief network in [103], where they
stacked Boltzmann restricted machines (RBMs), discussed find higher detection speed and accuracy of detection. Over-
earlier. As shown in Fig. 8, it is a type of deep neural net- all, the DBN security model can play a significant role, due
work (DNN) with multiple RBMs and a back-propagation to its strong capability of feature extraction and classification
(BP) [56] neural network. DBN can capture a hierarchical in a large number of high-dimensional data applications in
representation of input data based on its deep structure. A the area of cybersecurity.
two-phase training can be conducted sequentially by: (1)
pre-training, unsupervised layer-wise learning of stacked Generative Adversarial Network (GAN)
RBM, where the layers act as feature detectors through
probabilistic reconstructing its inputs, i.e., training with the A generative adversarial network (GAN) is a class of
contrastive divergence [98] technique, and (2) fine-tuning, machine learning frameworks designed by Ian Goodfellow
supervised learning with a classifier, e.g., BP neural net- [106], which is considered as one of the most interesting
work. DBN’s main concept is to initialize the feed-forward ideas in the area. Generative adversarial networks consist
neural networks with unlabeled data with unsupervised pre- of an overall structure composed of two neural networks,
training and then fine-tune the network using labeled data. a generator G and a discriminator D, as shown in Fig. 9,
DBNs can be seen as a composition of simple, unsupervised where the generator and discriminator are trained to compete
networks such as Boltzmann restricted machines (RBMs) with each other. The role of the generator is to generate new
or auto-encoders, where each sub-hidden network’s layer data with characteristics close to the actual data input. On
serves as the next visible layer [103]. the other hand, the discriminator is trained to estimate the
probability of a future sample coming from the actual data
rather than being provided by the generator.
GANs are used widely in natural image synthesis, medi-
cal image analysis, bioinformatics, data augmentation tasks,
video generation, voice generation, etc. It is also useful in
the domain of cybersecurity. Hackers may use an adversar-
ial attack to access and manipulate user data in the modern
world, so it is necessary to implement advanced security
measures to avoid leakage and misuse of sensitive infor-
mation. GAN can, therefore, be trained to recognize such
cases of fraud and make deep learning models more robust.

Fig. 8 Schematic structure of a deep belief network (DBN) with sev-

eral layers Fig. 9 Schematic structure of a generative adversarial network (GAN)

SN Computer Science
SN Computer Science (2021) 2:154 Page 9 of 16 154

Several works have been done in the domain of cybersecu-

rity. The authors of [107], for instance, present a transferred
generative adversarial network (tGAN) for automatic zero-
day attack classification and detection, which is the best per-
former compared to traditional machine learning algorithms.
The authors present a zero-day malware detection strategy
in [108] using deep auto-encoders-based transmitted gen-
erative adversarial networks, which generates fake malware
and learns to distinguish it from real malware. They achieve
95.74% average classification accuracy in their experimental
study. In [109], a system based on generative adversarial
networks to increase botnet detection models (Bot-GAN)
was presented, which improves detection efficiency and Fig. 10 Learning process of transfer learning
decreases the false positive rate. A new GAN-based adver-
sarial-example attack method was implemented in [110],
which outperforms the state-of-the-art method by 247.68%. get domains are different. Several approaches such as
In [111], the authors explore generative adversarial networks instance transfer and feature representation transfer are
(GANs) to improve the training and ultimately performance relevant to this.
of cyber attack detection systems by balancing data sets with • Unsupervised transfer learning It is similar to inductive
the generated data. The model generates data that closely transfer learning mentioned above, where the target task
mimics the distribution of data from various types of attacks is different from the source task but related to each other.
and is used to balance previously unbalanced databases, It is typically studied in the context of the feature repre-
which is a viable solution for designing cyberattack intru- sentation transfer case.
sion detection systems. It is useful not only for unsupervised
learning but also for semi-supervised learning, fully super- Deep transfer learning is applicable in various applica-
vised learning, and reinforcement learning, depending on tion areas such as natural language processing (NLP), sen-
the task, as the main objective of GANs is to learn from a timent classification, computer vision, image classification,
collection of training data and generate new data with the speech recognition, medical imaging and spam filtering, etc.
same characteristics as the training data. In the domain of cybersecurity, it also plays an important
role due to its various advantages in modeling like saving
Deep Transfer Learning (DTL or Deep TL) training time, improving the accuracy of output, and the
need for lesser training data. For instance, the authors in
In machine and deep learning, transfer learning is an impor- [114] present a ConvNet model using transfer learning for
tant method for solving the fundamental problem of inad- network intrusion detection. In [115], the authors propose a
equate training data. Thus, it eliminates the need to train signature generation method based on deep feature transfer
AI models, because it allows training neural networks with learning that dramatically reduces signature generation and
relatively small amounts of data [112]. In the field of data distribution time. A higher classification accuracy of 99.5%
science, it is currently very common since most real-world has been achieved in [116]. The authors addressed transfer
problems generally do not have millions of tagged data learning for the identification of unknown network attacks in
points to train such complex models. It uses pre-trained [117], where they present a feature-based transfer learning
models learned from a source domain and uses these mod- approach using a linear transformation. A semi-supervised
els, shown in Fig. 10, for tasks in the target domain. Trans- transfer learning model for malware detection is discussed
fer learning can be classified under three sub-settings [113] in [118], where the transfer variable has improved the byte
based on various circumstances between the source and tar- classifier accuracy from 94.72 to 96.90%. The authors pre-
get domains and tasks: sent the classification of malicious software in [119], using
deep neural network resnet-50 transfer learning. Their exper-
• Inductive transfer learning In this setting, the target task imental findings on a sample indicate the efficacy of 98.62%
varies from the source task. Several approaches such as accuracy in classifying malware groups. In [120] the authors
instance transfer, feature representation transfer, parame- present deep transfer learning for IoT attack detection with
ter transfer, and relational knowledge transfer are relevant significant accuracy compared to the baseline deep learning
to this. technique. Overall, the transfer learning system significantly
• Transductive transfer learning In this setting, the source accelerates the training of very deep neural networks while
and target tasks are the same, while the source and tar- retaining high efficiency in the field of cybersecurity, even

SN Computer Science
154 Page 10 of 16 SN Computer Science (2021) 2:154

on smaller datasets. Thus, instead of training the neural net- the combination of CNN can also be used for detecting
work from scratch, cybersecurity professionals can take into cyber-attacks, such as for malware detection [65], to detect
account a pre-trained, open-source deep learning model and and mitigate phishing and Botnet attack across multiple IoT
finetune it for their purpose. devices [136]. Thus, we can conclude that various artifi-
cial neural network and deep learning techniques discussed
Deep Reinforcement Learning (DRL or Deep RL) above, and their variants, or modified approaches can play a
significant role to meet the current needs within the context
Deep reinforcement learning (DRL or deep RL) [135] is of cybersecurity.
a category of machine learning and AI, where intelligent
machines can learn from their actions similar to the way
humans learn from experience. It incorporates reinforcement Challenges and Research Directions
learning (RL) algorithms like Q-learning and deep learning,
e.g., neural network learning, as defined below. Our study on ANN and DL-based security analytics opens
several research issues in the area of cybersecurity. Thus, in
• Reinforcement learning (RL)—is the task of learning how this section, we summarize and discuss the challenges faced
agents in an environment can take sequences of actions and the potential research opportunities and future directions
to maximize cumulative rewards. RL considers the issue to make the networks and systems secured, automated, and
of learning to make decisions by trial and error by a com- intelligent.
putational agent. In general, the effectiveness and the efficiency of an ANN
• Deep learning—is a form of machine learning that uses and DL-based security solution depend on the nature and
multiple layers to progressively extract higher-level fea- characteristics of the security data, and the performance of
tures from the raw input, and make intelligent decisions the learning algorithms. To collect the security data in the
through neural network learning. domain of cybersecurity is not straight forward. The current
cyberspace enables the production of a huge amount of data
Deep RL thus incorporates deep learning models, e.g. deep with very high frequency from different domains. Thus, to
neural network (DNN), based on the Markov decision pro- collect useful data for the target applications, e.g., security in
cess (MDP) principle [131], as policy and/or value function smart city applications, and their management is important
approximators. An MDP is “a tuple S, A, T, R, where S is a to further analysis. Therefore, a more in-depth investiga-
set of states, A is a set of actions, T is a mapping defining tion of data collection methods is needed while working on
the transition probabilities from every state-action pair to cybersecurity data. The historical security data, discussed in
every possible new state, and R is a reward function which Sect. 2 may contain many ambiguous values, missing val-
associates a real value (reward) to every state-action pair”. ues, outliers, and meaningless data. The ANN and DL algo-
Figure 11 provides an example of a deep RL schematic rithms including supervised, unsupervised, and reinforce-
structure. The learning system aims to allow the agent to ment learning, discussed in Sect. 3 highly impact on data
learn to produce an optimized series of actions that maxi- quality, and availability for training, and consequently on
mize the total amount of rewards. the security model. Thus, to accurately clean and pre-process
Deep RL can be used in the domain of cybersecurity. the diverse security data collected from diverse sources is a
For instance, the authors in [131] demonstrate that deep challenging task. Therefore, existing pre-processing methods
RL models using deep Q-network (DQN), and double deep or to propose new data preparation techniques are required
Q-network (DDQN) give significant intrusion detection to effectively use the learning algorithms in the domain of
results comparing with traditional machine learning mod- cybersecurity.
els. Similarly, a deep RL-based adaptive intrusion detection To analyze the data and extract insights, there exists many
framework based on deep-Q-network (DQN) for cloud infra- neural networks and deep learning algorithms for building a
structure has been presented in [132], where they experi- security model, discussed briefly in Sect. 3. Thus, selecting a
mentally reported higher accuracy and low false-positive proper learning algorithm that is suitable for the target appli-
rates to detect and identify new and complex attacks. cation in the context of cybersecurity is challenging. The
Based on our study above, we have summarized the key reason is that the outcome of different ANN and DL learning
points of each neural network and deep learning technique in algorithms may vary depending on the data characteristics
Table 2. In Table 3, we have also summarized several cyber- [137]. We have also summarized several key points of these
security applications based on these techniques. Moreover, techniques in Table 2. Selecting a wrong learning algorithm
the hybrid network model, e.g., the ensemble of networks, would result in producing unexpected outcomes that may
can be used to build an effective model considering their lead to loss of effort, as well as the model’s effectiveness
combined advantages. For instance, an LSTM network with and accuracy. In terms of model building, the techniques

SN Computer Science
SN Computer Science (2021) 2:154 Page 11 of 16 154

Table 2 A summary of artificial neural network (ANN) and deep learning (DL) networks highlighting the key points
ANN and DL techniques Descriptive key points

Multi-layer perceptron (MLP) Supervised learning algorithm

A feed-forward fully connected artificial neural network
Computationally expensive to solve a complex problem
Convolutional neural network (CNN, or ConvNet) Regularized version of multi-layer perceptrons
Can automatically learn or detect the key features from data
Typically deal with the variability of 2D shapes, e.g., image
Long short-term memory recurrent neural network (LSTM-RNN) Well-suited for learning and analyzing the sequential data
Preferred for NLP tasks, speech processing, and making predictions
based on time-series data
Self-organizing map (SOM) Follows an unsupervised learning approach
A dimensionality reduction algorithm used for clustering and mapping
high-dimensional dataset as low-dimensional
Use competitive learning rather than backpropagation
Auto-encoder (AE) An unsupervised learning algorithm that learns a representation ofthe
inputs and is deterministic
To significantly reduce the noise in the input data
Used typically for dimensionality reduction, very similar to PCA
Restricted Boltzmann machine (RBM) An unsupervised learning algorithm that learns the statistical distribu-
tion and is probabilistic or stochastic
Used for feature selection and feature extraction
Constitute the building blocks of deep-belief networks
Deep belief networks (DBN) A probabilistic generative model with multiple RBMs
The ability to encode richer and higher order network structures and can
work in either an unsupervised or a supervised setting
Can be used in a large number of high-dimensional data applications
Generative adversarial network (GAN) A form of generative model typically used for unsupervised learning
Generate new, synthetic instances of data with characteristics close to
the actual data input
To make the deep learning models more robust
Deep transfer learning (DTL or deep TL) To solve the basic problem of insufficient training data
Use the pre-trained model and knowledge is transferred from one model
to another
Various advantages in modeling like saving training time, improving the
accuracy of output, and the need for lesser training data
Deep reinforcement learning (DRL) Follow the way how humans learn from experience
Combines reinforcement learning (RL) algorithms like Q-learning and
deep learning
Can be used to solve very complex problems that cannot be solved by
conventional techniques

discussed in Sect. 3 can directly be used to solve many secu- produce lower accuracy. Thus relevant and quality secu-
rity issues. However, the hybrid network model, e.g., the rity data is important for better outcome. In addition to the
ensemble of networks, or modifying with an improvement, security features, the broader contextual information [139]
designing new methods, combining with machine learning [140] [141] such as temporal context, spatial context, or the
techniques [138] [137] according to the target outcome could relationship or dependency among the events or network
be a potential future work in the area. connections, users might help to build an adaptive system.
Similarly, the irrelevant security data and features may The concept of recent pattern-based analysis, i.e., recency
lead to garbage processing as well as incorrect results, [142] and designing corresponding learning technique in
which is also an important issue in the area. If the security cybersecurity solutions could also be effective depending
data is bad, such as non-representative, poor-quality, irrel- on the problem domain. Overall, we can conclude that the
evant features, or insufficient quantity for training, then the success of a data-driven security solution depends on both
deep learning security models may become useless or will

SN Computer Science
154
Page 12 of 16

SN Computer Science
Table 3 A summary of cybersecurity tasks based on artificial neural network (ANN) and deep learning (DL) techniques
Used techniques Cybersecurity tasks References

Multi-layer perceptron (MLP) intrusion detection, malware analysis, detecting botnet traffic, secu- Florencio et al. [57], Karbab et al. [58], Javed et al. [60], Hodo et al.
rity threat analysis [59]
Convolutional Neural Network (CNN, or ConvNet) intrusion detection, malware detection, phishing detection, mali- Susilo et al. [64], Li et al. [68] Yan et al. [65], Mclaughlin et al. [66],
cious user detection Xiao et al. [67], Adebowale et al. [78], Hong et al. [121]
Long Short-Term Memory Recurrent Neural Net- intrusion detection, malicious activity detection, phishing detection, Kim et al. [76], Vinayakumar et al. [77], Adebowale et al. [78], Li
work (LSTM-RNN) time-based botnet detection, authentication modeling et al. [122], Tran et al. [79], Shi et al. [123], Abuhamad et al. [124]
Self-organizing Map (SOM) discovering malignant network traffic, modern botnets analysis, Langin et al. [84], Le et al. [86] , Malondkar et al. [85]
distributed clustering
Auto Encoder (AE) feature learning model, insider threat detection, malware detection, Yousefi et al. [92], Liu et al. [93], Wang et al. [94], Yan et al. [95]
intrusion detection system
Restricted Boltzmann Machine (RBM) network anomaly detection, DoS attack detection, intrusion detec- Fiore et al. [99], Imamverdiyev et al. [100], Mayuranathan et al.
tion [125], Alom et al. [126]
Deep Belief Networks (DBN) intrusion detection system and optimization, phishing detection, Salama et al. [104], Qu et al. [105], Wei et al. [103], Yi et al. [127],
malware detection Arshey et al. [128], Saif et al. [129], Hou et al. [130]
Generative Adversarial Network (GAN) zero-day malware detection, botnet detection, intrusion detection Kim et al. [108], Li et al. [110], Yin et al. [109], Merino et al. [111]
systems
Deep Transfer Learning (DTL or Deep TL) intrusion detection system, detecting unknown network attacks, Wu et al. [114], Zhao et al. [117], Gao et al. [118], Rezende et al.
malware detection, malicious software classification [119]
Deep Reinforcement Learning (DRL or deep RL) intrusion detection system, malware detection, Security and Privacy Lopez et al. [131], Sethi et al. [132], Fang et al. [133], Shakeel et al.
[134]
SN Computer Science (2021) 2:154
SN Computer Science (2021) 2:154 Page 13 of 16 154

References
1. Li S, Da LX, Zhao S. The internet of things: a survey. Inf Syst
Front. 2015;17(2):243–59.
2. McIntosh T, Jang-Jaccard J, Watters P, Susnjak T. The inad-
equacy of entropy-based ransomware detection. In: International
conference on neural information processing. Springer; 2019. pp.
181–189.
3. Alazab M, Venkatraman S, Watters P, Alazab M et al. Zero-day
malware detection based on supervised learning algorithms of
API call signatures. 2010.
4. Sun N, Zhang J, Rimba P, Gao S, Zhang LY, Xiang Y. Data-
driven cybersecurity incident prediction: a survey. IEEE Com-
mun Surv Tutor. 2018;21(2):1744–72.
Fig. 11 Schematic structure of deep reinforcement learning (DRL or 5. Abraham S. Data breach: from notification to prevention using
deep RL) PCI DSS. Colum JL Soc Probs. 2009;43:517.
6. Brij BG, Aakanksha T, Ankit KJ, Dharma PA. Fighting against
phishing attacks: state of the art and future challenges. Neural
the quality of the security data and the performance of the Comput Appl. 2017;28(12):3629–54.
7. Ibm security report. https://www.ibm.com/secur ity/data-breac
learning algorithms. h. Accessed 20 Oct 2019.
8. Fischer EA. Cybersecurity issues and challenges: In brief.
2014.
Concluding Remarks 9. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A.
Cybersecurity data science: an overview from machine learning
perspective. J Big Data. 2020;7(1):1–29.
In this paper, we have conducted a comprehensive overview 10. Steven A. Cybersecurity: the cold war online. Nature.
of cybersecurity from the perspective of artificial neural net- 2017;547(7661):30.
works and deep learning methods. We have also reviewed 11. Anwar S, Mohamad Zain J, Zolkipli MF, Inayat Z, Khan S,
Anthony B, Chang V. From intrusion detection to an intrusion
the recent studies in each category of the neural networks response system: fundamentals, requirements, and future direc-
to make the position of this paper. Thus, according to our tions. Algorithms. 2017;10(2):39.
goal, we have briefly discussed how various types of neu- 12. Sara M, Hamid M, Mostafa G-A, Hadis K. Cyber intrusion detec-
ral networks and deep learning methods can be used for tion by combined feature selection algorithm. J Inf Secur Appl.
2019;44:80–8.
cybersecurity solutions in various conditions. A success- 13. Tapiador JE, Orfila A, Ribagorda A, Ramos B. Key-recovery
ful security model must possess the relevant deep learning attacks on kids, a keyed anomaly detection system. IEEE Trans
modeling depending on the data characteristics. The sophis- Depend Secure Comput. 2013;12(3):312–25.
ticated learning algorithms then need to be trained through 14. Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible
evaluation of anomaly-based intrusion-detection methods. IEEE
the collected security data and knowledge related to the tar- Trans Syst Man Cybern Part C (Appl Rev). 2010;40(5):516–24.
get application before the system can assist with intelligent 15. Farhad F, Peter L. Data science methodology for cybersecurity
decision making. projects. arXiv preprint arXiv:1803.04219. 2018.
Finally, we have summarized and discussed the chal- 16. Saxe J, Sanders H. Malware data science: attack detection and
attribution. 2018.
lenges faced and the potential research opportunities and 17. Ślusarczyk B. Industry 4.0: Are we ready? Pol J Manag Stud.
future directions in the area. Therefore, to enhance the secu- 2018; 17.
rity with time and growing popularity, the challenges that 18. Google trends. In https://trends.google.com/trends/, 2021.
are identified create promising research opportunities in 19. Yang X, Lingshuang K, Zhi L, Yuling C, Yanmiao L, Hongli-
ang Z, Mingcheng G, Haixia H, Chunhua W. Machine learn-
the field which must be addressed with effective solutions. ing and deep learning methods for cybersecurity. IEEE Access.
Overall, we believe that our study on neural networks and 2018;6:35365–81.
deep learning-based security analytics opens a promising 20. Aya R, Ahmed E. Data science: developing theoretical contri-
direction and can be used as a reference guide for potential butions in information systems via text analytics. J Big Data.
2020;7(1):1–26.
research and applications for both the academia and industry 21. Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR,
professionals in the domain of cybersecurity. McClung D, Weber D, Webster SE, Wyschogrod D, Cunning-
ham RK, et al. Evaluating intrusion detection systems: the 1998
Darpa off-line intrusion detection evaluation. In: Proceedings
DARPA information survivability conference and exposition.
Declarations DISCEX’00, vol 2. IEEE; 2000. pp. 12–26.
22. Kdd cup 99. available online:http://kdd.ics.uci.edu/databases/
Conflict of interest The author declares no conflict of interest. kddcup99/kddcup99.html Accessed 20 Oct 2019.
23. Tavallaee M, Bagheri E, Lu W , Ghorbani AA. A detailed analy-
sis of the KDD cup 99 data set. In: 2009 IEEE symposium on

SN Computer Science
154 Page 14 of 16 SN Computer Science (2021) 2:154

computational intelligence for security and defense applications. 49. Joshua G, Brian L. Bridging the gap: a pragmatic approach to
IEEE; 2009, pp. 1–6. generating insider threat data. In: 2013 IEEE security and pri-
24. Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a vacy workshops, pp. 98–104. IEEE. 2013.
machine learning based cyber security intrusion detection model. 50. Enronspam. https ://labs-repos .iit.demok ritos .gr/skel/i-confi g/
Symmetry. 2020;12(5):754. downloads/enron-spam/. Accessed 20 Oct 2019.
25. Canadian institute of cybersecurity, university of new brun- 51. Spamassassin. available online: http://www.spamassassin.org/
swick, ISCX dataset. http://www.unb.ca/cic/datasets/index.html/. publiccorpus/. Accessed 20 Oct 2019.
Accessed 20 Oct 2019. 52. Lingspam. https ://labs-repos .iit.demok ritos .gr/skel/i-confi g/
26. CSE-CIC-IDS 2018 [online]. https://www.unb.ca/cic/ datasets/ downloads/lingspampublic.tar.gz/. Accessed 20 Oct 2019.
ids-2018.html/. Accessed 20 Oct 2019. 53. Nickolaos K, Nour M, Elena S, Benjamin T. Towards the devel-
27. Xuyang J, Zheng Y, Xueqin J, Witold P. Network traffic fusion opment of realistic botnet dataset in the internet of things for
and analysis against DDOS flooding attacks with a novel revers- network forensic analytics: Bot-iot dataset. Future Gener Comput
ible sketch. Inf Fusion. 2019;51:100–13. Syst. 2019;100:779–96.
28. Xie M, Hu J, Yu CE. Evaluating host-based anomaly detection 54. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B,
systems: application of the frequency-based algorithms to adfa- Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et
ld. In: International conference on network and system security. al. Scikit-learn: machine learning in python. J Mach Learn Res.
Springer (2015). 2011;12:2825–30.
29. Caida ddos attack 2007 dataset. http://www.caida .org/data/ 55. Sarker IH. Ai-driven cybersecurity: an overview, security intel-
passi ve/ddos-20070 804-datas et.xml/. Accessed 20 October ligence modeling and research directions. 2021.
2019. 56. Jiawei H, Jian P, Micheline K. Data mining: concepts and tech-
30. Caida anonymized internet traces 2008 dataset. http://www.caida niques. Amsterdam: Elsevier; 2011.
.org/data/passive/passive-2008-dataset.xml/. Accessed 20 Oct 57. Felipe De AF, Edward DMO, Hendrik TM, Ricardo JPDBS,
2019. Filipe Barreto Do N, Flavio AOS. Intrusion detection via MLP
31. Isot botnet dataset. https://www.uvic.ca/engineering/ece/isot/ neural network using an arduino embedded system. In: 2018
datasets/index.php/. Accessed 20 Oct 2019. VIII Brazilian symposium on computing systems engineering
32. The honeynet project. http://www.honeynet.org/chapters/france/. (SBESC), pp 190–195. IEEE. 2018.
Accessed 20 Oct 2019. 58. ElMouatez BK, Mourad D, Abdelouahid D, Djedjiga M. Mal-
33. The ctu-13 dataset. https://stratosphereips.org/category/datasets- dozer: Automatic framework for android malware detection
ctu13. Accessed 20 Oct 2019. using deep learning. Digit Investig. 2018;24:S48–59.
34. Alexa top sites. https ://aws.amazo n.com/alexa -top-sites /. 59. Hodo E, Bellekens X, Hamilton A, Dubouilh P-L, Iorkyase E,
Accessed 20 Oct 2019. Christos T, Robert A. Threat analysis of IoT networks using
35. Bambenek consulting–master feeds. http://osint .bambe nekco artificial neural network intrusion detection system. In: 2016
nsulting.com/feeds/. Accessed 20 October 2019. international symposium on networks, computers and com-
36. Dgarchive. https ://dgarc hive.caad.fkie.fraun hofer.de/site/. munications (ISNCC). IEEE; 2016, pp. 1–6
Accessed 20 Oct 2019. 60. Yousra J, Navid R. Multi-layer perceptron artificial neural
37. Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network based IoT botnet traffic classification. In: Proceed-
network intrusion detection systems (UNSW-NB15 network data ings of the future technologies conference. Springer; 2019, pp.
set). In: 2015 military communications and information systems 973–84.
conference (MilCIS). IEEE; 2015, pp. 1–6. 61. Iván G-M, Rajarajan M, Jaime L. Human-centric AI for trust-
38. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward devel- worthy IoT systems with explainable multilayer perceptrons.
oping a systematic approach to generate benchmark datasets for IEEE Access. 2019;7:125562–74.
intrusion detection. Comput Secur. 2012;31(3):357–74. 62. Yann LC, Léon B, Yoshua B, Patrick H. Gradient-based
39. Google play store. available online: https://play.google.com/store learning applied to document recognition. Proc IEEE.
/. Accessed 20 Oct 2019. 1998;86(11):2278–324.
40. Virustotal. https://virustotal.com/. Accessed 20 Oct 2019. 63. Aurélien G. Hands-on machine learning with Scikit-Learn,
41. Zhou Y, Jiang X. Dissecting android malware: characterization Keras, and TensorFlow: concepts, tools, and techniques to
and evolution. In: 2012 IEEE symposium on security and pri- build intelligent systems. O’Reilly Media, 2019.
vacy. IEEE; 2012. pp. 95–109. 64. Susilo B, Sari RF. Intrusion detection in IoT networks using
42. Virusshare. http://virusshare.com/. Accessed 20 Oct 2019. deep learning algorithm. Information. 2020;11(5):279.
43. Comodo. https://www.comodo.com/home/internet-security/updat 65. Yan J, Qi Y, Rao Q. Detecting malware with an ensemble
es/vdp/database.php. Accessed 20 Oct 2019. method based on deep neural network. Secur Commun Netw.
44. Contagio. http://contagiodump.blogspot.com/. Accessed 20 Oct 2018; 2018.
2019. 66. McLaughlin N, Martinez del RJ, Kang BJ, Yerima S, Miller
45. Kumar R, Zhang X, Ullah Khan R, Kumar J, Ahad I. Effective P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A et al. Deep
and explainable detection of android malware based on machine android malware detection. In: Proceedings of the seventh ACM
learning algorithms. In: Proceedings of the 2018 international on conference on data and application security and privacy;
conference on computing and artificial intelligence. ACM; 2018. 2017. pp. 301–308.
pp. 35–40. 67. Xiao X, Zhang D , Hu G Jiang Y, Xia S. CNN-MHSA: a convo-
46. Microsoft malware classification (big 2015). http://arxiv.org/ lutional neural network and multi-head self-attention combined
abs/1802.10135/. Accessed 20 Oct 2019. approach for detecting phishing websites. Neural Netw (2020).
47. Berman DS, Buczak AL, Chavis JS, Corbett CL. A survey 68. Yanmiao L, Yingying X, Zhi L, Haixia H, Yushuo Z, Yang X,
of deep learning methods for cyber security. Information. Yuefeng Z, Lizhen C. Robust detection for network intrusion
2019;10(4):122. of industrial IoT based on multi-CNN fusion. Measurement.
48. Lindauer B, Glasser J, Rosen M, Wallnau KC, Exactdata L. 2020;154:107450.
Generating test data for insider threat detectors. JoWUA.
2014;5(2):80–94.

SN Computer Science
SN Computer Science (2021) 2:154 Page 15 of 16 154

69. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification 89. Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting
with deep convolutional neural networks. In: Advances in neural context-aware smartphone apps usage based on machine learn-
information processing systems; 2012, pp. 1097–1105. ing techniques. Symmetry. 2020;12(4):499.
70. Chollet F. Xception: Deep learning with depthwise separable 90. Guijuan Z, Yang L, Xiaoning J. A survey of autoencoder-based
convolutions. In: Proceedings of the IEEE conference on com- recommender systems. Front Comput Sci. 2020;14(2):430–50.
puter vision and pattern recognition, pp. 1251–1258. 2017. 91. Sarker IH, Hoque MM, Uddin MK, Alsanoosy T. Mobile data
71. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan science and intelligent apps: Concepts, AI-based modeling and
D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. research directions. Mob Netw Appl 1–19; 2020.
In: Proceedings of the IEEE conference on computer vision and 92. Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U.
pattern recognition; 2015, pp. 1–9. Autoencoder-based feature learning for cyber security appli-
72. Kaiming H, Xiangyu Z, Shaoqing R, Jian S. Spatial pyramid cations. In: 2017 International joint conference on neural net-
pooling in deep convolutional networks for visual recognition. works (IJCNN). IEEE; 2017. pp. 3854–3861.
IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16. 93. Liu L, De Vel O, Chen C, Zhang J, Xiang Y. Anomaly-based
73. Kaiming H, Xiangyu Z, Shaoqing R, Jian S. Deep residual learn- insider threat detection using deep autoencoders. In: 2018
ing for image recognition. In: Proceedings of the IEEE confer- IEEE international conference on data mining workshops
ence on computer vision and pattern recognition, pp. 770–778. (ICDMW). IEEE, 2018, pp. 39–48.
2016. 94. Wei W, Mengxue Z, Jigang W. Effective android malware
74. Ian G, Yoshua B, Aaron C, Yoshua B. Deep learning, vol. 1. detection with a hybrid model based on deep autoencoder and
Cambridge: MIT press Cambridge; 2016. convolutional neural network. J Ambient Intel Humaniz Com-
75. Changhui J, Yuwei C, Shuai C, Yuming B, Wei L, Wenxin put. 2019;10(8):3035–43.
T, Jun G. A mixed deep recurrent neural network for mems 95. Binghao Y, Guodong H. Effective feature extraction via stacked
gyroscope noise suppressing. Electronics. 2019;8(2):181. sparse autoencoder to improve intrusion detection system. IEEE
76. Jihyun K, Jaehyun K, Huong LTT, Howon K. Long short term Access. 2018;6:41238–48.
memory recurrent neural network classifier for intrusion detec- 96. Memisevic R, Hinton GE. Learning to represent spatial transfor-
tion. In: 2016 international conference on platform technology mations with factored higher-order Boltzmann machines. Neural
and service (PlatCon). IEEE; 2016. pp. 1–5. Comput. 2010;22(6):1473–92.
77. Vinayakumar R, Soman KP, Poornachandran P. Deep android 97. Benjamin M, Kevin S, Bo C, Nando F. Inductive principles for
malware detection and classification. In: 2017 International restricted Boltzmann machine learning. In: Proceedings of the
conference on advances in computing, communications and thirteenth international conference on artificial intelligence and
informatics (ICACCI). IEEE; 2017, pp. 1677–1683. statistics. JMLR workshop and conference proceedings; 2010,
78. Adebowale MA, Lwin KT, Hossain MA. Intelligent phishing pp. 509–516.
detection scheme using deep learning algorithms. J Enterp Inf 98. Hinton GE, Osindero S, Yee-Whye T. A fast learning algorithm
Manag. 2020. for deep belief nets. Neural Comput. 2006;18(7):1527–54.
79. Tran D, Mac H, Tong V, Tran HA, Nguyen LG. A LSTM based 99. Fiore U, Palmieri F, Castiglione A, De Santis A. Network anom-
framework for handling multiclass imbalance in DGA botnet aly detection with the restricted Boltzmann machine. Neurocom-
detection. Neurocomputing. 2018;275:2401–13. puting. 2013;122:13–23.
80. Te u vo K . T h e s e l f - o rga n i z i n g m a p . P ro c I E E E . 100. Yadigar I, Fargana A. Deep learning method for denial of ser-
1990;78(9):1464–80. vice attack detection based on restricted Boltzmann machine. Big
81. Juha V, Esa A. Clustering of the self-organizing map. IEEE Data. 2018;6(2):159–69.
Trans Neural Netw. 2000;11(3):586–600. 101. Seo S, Park S, Kim J. Improvement of network intrusion detec-
82. Teuvo K. Essentials of the self-organizing map. Neural Netw. tion accuracy by using restricted boltzmann machine. In: 2016
2013;37:52–65. 8th international conference on computational intelligence and
83. Qu X, Yang L, Guo K, Ma L, Sun M, Ke M, Li M. A survey communication networks (CICN). IEEE; 2016. pp. 413–417.
on the development of self-organizing maps for unsupervised 102. Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.
intrusion detection. Mob Netw Appl. 2019; 1–22. 103. Peng W, Yufeng L, Zhen Z, Tao H, Ziyong L, Diyang L. An
84. Langin C, Zhou H, Rahimi S, Gupta B, Zargham M, Sayeh optimization method for intrusion detection classification model
MR. A self-organizing map and its modeling for discover- based on deep belief network. IEEE Access. 2019;7:87593–605.
ing malignant network traffic. In: 2009 IEEE symposium on 104. Salama MA, Eid HF , Ramadan RA , Darwish A, Hassanien AE.
computational intelligence in cyber security. IEEE, 2009; pp. Hybrid intelligent intrusion detection scheme. In: Soft computing
122–129. in industrial applications. Springer; 2011, pp. 293–303.
85. Ameya M, Roberto C, Iluju K, Michelangelo C, Nathalie J. 105. Qu F, Zhang J Shao Z, Qi S. An intrusion detection model based
Spark-GHSOM: growing hierarchical self-organizing map for on deep belief network. In: Proceedings of the 2017 VI interna-
large scale mixed attribute datasets. Inf Sci. 2019;496:572–91. tional conference on network, communication and computing;
86. Le Duc C, Zincir-Heywood AN, Heywood MI. Data analyt- 2017. pp. 97–101.
ics on network traffic flows for botnet behaviour detection. In: 106. Ian G, Jean P-A, Mehdi M, Bing X, David W-F, Sherjil O, Aaron
2016 IEEE symposium series on computational intelligence C, Yoshua B. Generative adversarial nets. In: Advances in neural
(SSCI), pp. 1–7. IEEE, 2016. information processing systems, pp. 2672–2680. 2014.
87. López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, 107. Jin-Young K, Seok-Jun B, Sung-Bae C. Malware detection using
Gómez-Sanchís J, Vila-Francés J, José Serrano-López A. Anal- deep transferred generative adversarial networks. In: Interna-
ysis of computer user behavior, security incidents and fraud tional conference on neural information processing. Springer;
using self-organizing maps. Comput Secur. 2019;83:38–51. 2017. pp. 556–564.
88. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey 108. Jin-Young K, Seok-Jun B, Sung-Bae C. Zero-day malware detec-
of deep neural network architectures and their applications. tion using transferred generative adversarial networks based on
Neurocomputing. 2017;234:11–26. deep autoencoders. Inf Sci. 2018;460:83–102.
109. Yin C, Zhu Y, Liu S , Fei J, Zhang H. An enhancing framework
for botnet detection using generative adversarial networks. In:

SN Computer Science
154 Page 16 of 16 SN Computer Science (2021) 2:154

2018 international conference on artificial intelligence and big 127. Yi P, Guan Y, Zou F, Yao Y , Wang W , Zhu T. Web phishing
data (ICAIBD). IEEE; 2018. pp. 228–234. detection using a deep learning framework. Wirel Commun Mob
110. Heng L, ShiYao Z, Wei Y, Jiahuan L, Henry L. Adversarial- Comput. 2018; 2018.
example attacks toward android malware detection system. IEEE 128. Arshey M, Angel VKS. An optimization-based deep belief net-
Syst J. 2019;14(1):653–6. work for the detection of phishing. Data Technol. Appl. 2020.
111. Merino T, Stillwell M, Steele M, Coplan M, Patton J, Stoyanov 129. Saif D, El-Gokhy SM, Sallam E. Deep belief networks-based
A, Deng L. Expansion of cyber attack data from unbalanced framework for malware detection in android systems. Alex Eng
datasets using generative adversarial networks. In: International J. 2018;57(4):4049–57.
conference on software engineering research, management and 130. Shifu H, Aaron S, Yanfang Y, Lifei C. Droiddelver: an android
applications. Springer; 2019, pp. 131–145. malware detection system using deep belief network based on
112. Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer API call blocks. In: International conference on web-age infor-
learning. J Big Data. 2016;3(1):9. mation management. Springer; 2016. pp. 54–66.
113. Pan SJ, Qiang Y. A survey on transfer learning. IEEE Trans 131. Manuel L-M, Belen C, Antonio S-E. Application of deep rein-
Knowl Data Eng. 2009;22(10):1345–59. forcement learning to intrusion detection for supervised prob-
114. Wu P, Guo H, Buckland R. A transfer learning approach for net- lems. Expert Syst Appl. 2020;141:112963.
work intrusion detection. In 2019 IEEE 4th international confer- 132. Sethi K, Kumar R, Prajapati N, Bera P. Deep reinforcement learn-
ence on big data analytics (ICBDA), pp. 281–285. IEEE (2019). ing based intrusion detection system for cloud infrastructure. In:
115. Daniel N, Aviad C, Nir N, Yuval E. Deep feature transfer learn- 2020 international conference on communication systems & net-
ing for trusted and automated malware signature generation in works (COMSNETS). IEEE. 2020; pp. 1–6.
private cloud environments. Neural Networks. 2020;124:243–57. 133. Zhiyang F, Junfeng W, Jiaxuan G, Xuan K. Feature selection
116. Nahmias D, Cohen A, Nissim N, Elovici Y. Trustsign: trusted for malware detection based on reinforcement learning. IEEE
malware signature generation in private clouds using deep fea- Access. 2019;7:176177–87.
ture transfer learning. In: 2019 international joint conference on 134. Shakeel PM, Baskar S, Dhulipala VRS, Mishra S, Jaber MM.
neural networks (IJCNN). IEEE; 2019, pp. 1–8. Maintaining security and privacy in health care system using
117. Zhao J, Shetty S, Pan JW, Kamhoua C, Kwiat K. Transfer learn- learning based deep-q-networks. J Med Syst. 2018;42(10):186.
ing for detecting unknown network attacks. EURASIP J Inf 135. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep
Secur. 2019;2019(1):1. reinforcement learning: a brief survey. IEEE Signal Process Mag.
118. Xianwei G, Changzhen H, Chun S, Baoxu L, Zequn N, Hui X. 2017;34(6):26–38.
Malware classification for the cloud via semi-supervised transfer 136. Parra GDLT, Rad P, Kim-Kwang RC, Nicole B. Detecting inter-
learning. J Inf Secur Appl. 2020;55:102661. net of things attacks using distributed deep learning. J Netw
119. Rezende E , Ruppert G, Carvalho T, Ramos F, De Geus P. Mali- Comput Appl.; 2020. 102662.
cious software classification using transfer learning of resnet-50 137. Sarker IH, Kayes ASM, Watters P. Effectiveness analysis of
deep neural network. In: 2017 16th IEEE international confer- machine learning classification models for predicting personal-
ence on machine learning and applications (ICMLA). IEEE; ized context-aware smartphone usage. J Big Data. 2019;6(1):57.
2017. pp. 1011–1014. 138. Sarker IH. A machine learning based robust prediction model for
120. Vu L, Nguyen QU, Nguyen DN, Hoang DT, Dutkiewicz E. real-life mobile phone data. Internet Things. 2019;5:180–93.
Deep transfer learning for IoT attack detection. IEEE Access. 139. Sarker IH. Context-aware rule learning from smartphone
2020;8:107335–44. data: survey, challenges and future directions. J Big Data.
121. Taekeun H, Chang C, Juhyun S. CNN-based malicious user 2019;6(1):95.
detection in social networks. Concurr Comput Pract Exp. 140. Sarker IH, Colman A, Kabir MA, Han J. Individualized time-
2018;30(2):e4163. series segmentation for mining mobile phone user behavior.
122. Li Q, Cheng M, Wang J, Sun B. LSTM based phishing detection Comput J. 2018;61(3):349–68.
for big email data. IEEE Trans Big Data. 2020. 141. Sarker IH, Kayes ASM. ABC-ruleminer: user behavioral rule-
123. Shi W-C, Sun H-M. Deepbot: a time-based botnet detection with based machine learning method for context-aware intelligent
deep learning. Soft Comput. 2020. services. J Netw Comput Appl. 2020;168:102762.
124. Abuhamad M, Abuhmed T, Mohaisen D, Nyang D. AUToSen: 142. Sarker IH, Colman A, Han J. Recencyminer: mining recency-
Deep-learning-based implicit continuous authentication using based personalized behavior from contextual smartphone data. J
smartphone sensors. IEEE Internet Things J. 2020;7(6):5008–20. Big Data. 2019;6(1):1–21.
125. Mayuranathan M, Murugan M,Dhanakoti V. Best features
based intrusion detection system by RBM model for detecting Publisher’s Note Springer Nature remains neutral with regard to
DDOS in cloud environment. J Ambient Intel Humaniz Comput jurisdictional claims in published maps and institutional affiliations.
2019;1–11.
126. Alom MZ, Taha TM. Network intrusion detection for cyber secu-
rity using unsupervised deep learning approaches. In: 2017 IEEE
national aerospace and electronics conference (NAECON), pp
63–69. IEEE. 2017.