0% found this document useful (0 votes)

29 views19 pages

AI-Driven Cybersecurity An Ov

Uploaded by

727hwj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

AI-Driven Cybersecurity An Ov

Uploaded by

727hwj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

SN Computer Science (2021) 2:173

https://doi.org/10.1007/s42979-021-00557-0

REVIEW ARTICLE

AI‑Driven Cybersecurity: An Overview, Security Intelligence Modeling

and Research Directions
Iqbal H. Sarker1,2 · Md Hasan Furhad3 · Raza Nowrozy4

Received: 22 November 2020 / Accepted: 2 March 2021 / Published online: 26 March 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021

Abstract
Artificial intelligence (AI) is one of the key technologies of the Fourth Industrial Revolution (or Industry 4.0), which can
be used for the protection of Internet-connected systems from cyber threats, attacks, damage, or unauthorized access. To
intelligently solve today’s various cybersecurity issues, popular AI techniques involving machine learning and deep learn-
ing methods, the concept of natural language processing, knowledge representation and reasoning, as well as the concept
of knowledge or rule-based expert systems modeling can be used. Based on these AI methods, in this paper, we present a
comprehensive view on “AI-driven Cybersecurity” that can play an important role for intelligent cybersecurity services and
management. The security intelligence modeling based on such AI methods can make the cybersecurity computing process
automated and intelligent than the conventional security systems. We also highlight several research directions within the
scope of our study, which can help researchers do future research in the area. Overall, this paper’s ultimate objective is to
serve as a reference point and guidelines for cybersecurity researchers as well as industry professionals in the area, especially
from an intelligent computing or AI-based technical point of view.

Keywords Cybersecurity · Artificial intelligence · Machine learning · Cyber data analytics · Cyber-attacks · Anomaly ·
Intrusion detection · Security intelligence

Introduction business needs, cyber-attacks often pose major challenges.

A cyber-attack is usually a malicious and concerted attempt
The modern world depends more on technology than ever by an individual or organization to breach another individual
before. A huge amount of data is generated and gathered or organization’s information system. Malware attack, ran-
with the large implementation of booming technologies such somware, denial of service (DoS), phishing or social engi-
as the Internet of Things (IoT) [1] and cloud computing [2]. neering, SQL injection attack, Man-in-the-middle, Zero-day
Although data can be used to better serve the corresponding exploit, or insider threats are common nowadays in the area
[3]. These types of security incidents or cybercrime can
This article is part of the topical collection “Cyber Security and affect organizations and individuals, cause disruptions, as
Privacy in Communication Networks” guest edited by Rajiv Misra, well as devastating financial losses. For instance, according
R K Shyamsunder, Alexiei Dingli, Natalie Denk, Omer Rana, to the IBM report, a data breach costs 8.19 million USD for
Alexander Pfeiffer, Ashok Patel and Nishtha Kesswani.
the United States [4], and the estimated annual cost to the
* Iqbal H. Sarker global economy from cybercrime is 400 billion USD [5].
[email protected] Cybercrimes are growing at an exponential rate that brings
an alarming message for the cybersecurity professionals and
1
Swinburne University of Technology, Melbourne, VIC 3122, researchers [3]. Therefore, to effectively and intelligently
Australia
protect an information system, particularly, Internet-con-
2
Department of Computer Science and Engineering, nected systems from various cyber-threats, attacks, damage,
Chittagong University of Engineering & Technology,
Chittagong 4349, Bangladesh or unauthorized access, is a key issue to be solved urgently,
3 in which we are interested in this paper.
Centre for Cyber Security and Games, Canberra Institute
of Technology, Reid, ACT2601, Australia In the real world, the overall national security of the busi-
4 ness, government, organizations, and individual citizens of
Victoria University, Footscray, VIC 3011, Australia

SN Computer Science
Vol.:(0123456789)
173 Page 2 of 18 SN Computer Science (2021) 2:173

a country depends on the security management tools hav- decision making by taking into account the benefits of AI
ing the capability of detecting and preventing the security methods.
incidents in a timely and intelligent way. Intelligent cyber- The main contributions of this paper are, therefore, listed
security services and management are, therefore, essential as follows:
because immense amounts of data on computers and other
devices are collected, processed, and stored by government, – To provide a brief overview on the concept of AI-driven
military, corporate, financial, medical organizations, and cybersecurity for intelligent cybersecurity services and
many others. Cybersecurity usually refers to a collection of management according to today’s needs. For this, we first
technologies, procedures, and practices designed to protect briefly review the related methods and systems in the
networks, computers, programs, and data from attack, dis- context of cybersecurity to motivate our study as well as
ruption, or unauthorized access. It is also known as “infor- to make a position for the term AI-driven cybersecurity.
mation technology security” or “electronic information – To present security intelligence modeling where various
security”. Several related terms with the concept of cyber- AI-based methods such as machine and deep learning,
security are briefly discussed and summarized in Sect. 2. natural language processing, knowledge representation
According to today’s numerous needs, the conventional and reasoning, as well as the knowledge or rule-based
well-known security solutions such as antivirus, firewalls, expert systems modeling are taken into account accord-
user authentication, encryption, etc. may not be effective ing to our goal.
[6–9]. The key problem with these traditional systems is that – Finally, we discuss and highlight several research direc-
they are normally operated by a few experienced security tions within the scope of our study, which can help the
experts, where data processing is carried out in an ad-hoc cybersecurity researchers to do future research in the
manner and can, therefore, not run intelligently according area.
to needs [10, 11]. On the other hand, Artificial intelligence
(AI), which is known as the key technologies of the Fourth The rest of the paper is organized as follows. Section 2
Industrial Revolution (Industry 4.0), can play an important provides a background and reviews the related work in this
role for intelligent cybersecurity services and management domain. In Sect. 3, we discuss how various AI techniques
according to its computing power and capabilities. Thus, we can be used for security intelligence modeling. In Sect. 4, we
focus on “AI-driven Cybersecurity” to make the cybersecu- discover and summarize several research issues and potential
rity computing process automated and intelligent than the future directions, and finally, Sect. 5 concludes this paper.
conventional security systems in the area.
Artificial intelligence (AI) is the branch of computer
sciences that usually emphasizes the creation of intelligent Background and Related Work
machines, thinking and functioning like humans. To intel-
ligently solve today’s various cybersecurity issues, e.g., In this section, we provide an overview of the relevant AI-
intrusion detection and prevention system, popular AI tech- driven cybersecurity technologies, including different types
niques involving machine learning (ML) and deep learning of cybersecurity incidents within the scope of our study.
(DL) methods, the concept of natural language processing
(NLP), knowledge representation and reasoning (KRR), as Basic Security Properties and CIA Triad
well as the concept of knowledge or rule-based expert sys-
tems (ES) modeling can be used, which are briefly discussed Confidentiality, integrity, and availability, also known as the
in Sect. 3. For instance, these techniques can be applied for CIA triad, is a model usually designed to guide information
identifying malicious activities, fraud detection, predict- security policies within an organization. Thus, to understand
ing cyber-attacks, access control management, detecting the security policy, the CIA triad with the mentioned proper-
cyber-anomalies or intrusions, etc. The aim of this paper is ties is important that are discussed as below.
therefore to provide a reference guide for those processionals
from academia and industry who want to work and research – Confidentiality is a property of security policy that typi-
based on intelligent computing in the field of cybersecurity. cally refers to protecting the information and systems
Therefore, in the sense of cybersecurity, great emphasis is from unauthorized parties. Confidentiality threat can
put on common AI-based methods and their applicability for typically target databases, application servers, and sys-
solving today’s diverse security issues. Overall, this paper tem administrators, and can be considered as “data theft”.
provides a detailed view of AI-driven cybersecurity in terms – Integrity is another property of security policy that typi-
of principles and modeling for intelligent and automated cally refers to prevent any kind of destruction or modifi-
cybersecurity services and management through intelligent cation of information by unauthorized parties. Integrity

SN Computer Science
SN Computer Science (2021) 2:173 Page 3 of 18 173

threat typically includes finance-related threat such as – Network security is usually the practice of preventing
altering financial data, stealing money, reroute deposit, or and tracking unauthorized access, misuse, alteration,
hijacking, and to damage of the organization trustworthi- or denial of services available to a computer network.
ness, and can be considered as “data alteration”. It thus can be considered as a subset of cybersecurity,
– Availability is also considered as another property of which typically protects the data flowing over the net-
security policy that typically refers to ensure the access work.
of information systems or assets to an authorized party – Internet security is a specific aspect of broader con-
or entity in a reliable and timely manner. Availability cepts such as cybersecurity and computer security,
threat typically includes denial of service, or physical focusing on the specific risks and vulnerabilities of
destruction, and can be considered as “denial access of internet access and use. IoT security is another relevant
the data”. term, is typically concerned with protecting Internet-
enabled devices, i.e., Internet of Things (IoT) devices,
Overall, based on the CIA triad for the security policy dis- that connect on wireless networks [13].
cussed above, we can simply conclude that “Confidentiality”
is limiting the data access, “Integrity” is ensuring the data is The above-mentioned security terms are related to “Cyber-
accurate, and “Availability” is making sure the accessibility security”, which is the practice of defending computers,
of the data to the right entity. servers, mobile devices, electronic systems, networks,
and data from malicious attacks, cyber-threats, damage,
Cybersecurity and Related Terms or unauthorized access. Among these terms, the worldwide
popularity of “cybersecurity” is higher than others and
Over the last half-century, our modern and digital society increasing day-by-day, which is shown in Fig. 1. The popu-
is highly integrated with information and communication larity trend in Fig. 1 is shown based on the data collected
technology (ICT). As the smart computing devices used in from Google Trends over the last 5 years [14]. According
our daily life activities are mostly driven by global Internet to Fig. 1, the popularity indication values for cybersecurity
connectivity, the associated risk of data breaches or cyber- was low in 2016 and is increasing day-by-day. Thus, in
attacks is increasing day by day. Thus, preventing and pro- this paper, we focus on the popular term “cybersecurity”,
tecting the ICT systems from various kinds of advanced which is the key to achieving the Fourth Industrial Revolu-
cyber-attacks or threats, is known as ICT security, becomes tion (Industry 4.0).
the major concern for our security professionals or policy- Many researchers defined cybersecurity in various
makers in recent days [12]. ICT security refers to relevant ways. For instance, the diverse activities or policies that
incidents as well as measures, controls, and procedures are taken into account to protect the ICT systems from
applied by enterprises to ensure integrity, confidentiality, threats or attacks is known as cybersecurity [5]. Craigen
and availability of their data and systems. Cybersecurity is et al. defined “cybersecurity as a set of tools, practices,
simply about securing things that are vulnerable through and guidelines that can be used to protect computer net-
ICT. Although the term “Cybersecurity” is popular nowa- works, software programs, and data from attack, damage,
days, several relevant terms such as “Information security”, or unauthorized access” [15]. According to Aftergood
“Data security”, “Network security”, “Internet/IoT security” et al. [16], “cybersecurity is a set of technologies and pro-
often get interchangeable and may create confusion among cesses designed to protect computers, networks, programs
the readers as well as the professionals in the area. In the and data from attacks and unauthorized access, alteration,
following, we define these terms and highlight their world- or destruction”. Overall, cybersecurity typically concerns
wide popularity score as well. with the understanding of diverse cyber threats or attacks
and corresponding defense strategies to prevent them, and
– Data security is all about securing data, which could be eventually protect the systems, which is associated with
specific to data, typically in storage. Thus, data security confidentiality, integrity, and availability [17–19]. Based
can be defined as the prevention of unauthorized access, on these definitions, we can conclude that cybersecurity
use, disruption, modification, or destruction of data in is all about the security of anything in the cyber realm,
storage. such as network security, information security, applica-
– Information security is the prevention of unauthorized tion security, operational security, Internet of Things (IoT)
access, use, disruption, modification, or destruction of security, cloud security, infrastructure security, and rel-
information. Information security, in a sense, can be evant others. While traditional cybersecurity systems con-
considered as a specific discipline under the cybersecu- sist mainly of network protection systems and computer
rity umbrella that is the broader practice of defending IT security systems [20], we aim to provide a wide range of
assets from attacks or threats. cybersecurity view to the readers as it is one of the major

SN Computer Science
173 Page 4 of 18 SN Computer Science (2021) 2:173

Fig. 1 The worldwide popularity score of cyber security comparing with relevant terms in a range of 0 (min)–100 (max) over time where x-axis
represents the timestamp information and y-axis represents the corresponding score

concerns in our digital life in various perspective, from computer networks, system files, or data, in a computing
commercial purpose to personalized mobile computing. environment. For example, based on the responsibilities
of individual users, an attribute or role-based access con-
Security Incident and Attacks trol scheme may be used to limit network access, reduc-
ing the risk to the company or entity.
A security incident is typically a malicious activity that – Firewall [46] is a security framework for the network
threatens the security factors, i.e., confidentiality, integrity, that tracks and regulates incoming and outgoing network
and availability, defined earlier. Several types of cybersecu- traffic. Firewalls are defined as a network-based or host-
rity incidents, i.e., cyber threats and attacks, may impact on based system that is based on a set of security rules to
an organization or an individual [21]. In general, a cyber- allow or block the traffic. It is also capable of filtering
threat can be defined as a possible security violation that traffic from unsecured or suspicious sources to avoid
might exploit the vulnerability of a system or asset, while attacks, such as malicious traffic.
an attack is a deliberate unauthorized action on a system or – Anti-malware [47] also known as antivirus software, is
asset. Cyber-attacks include threats like computer viruses, a computer program that is typically used to prevent,
data breaches, denial of service (DoS) attacks, etc. In detect, and remove computer viruses, or malware. Mod-
Table 1, we list the most common cyber-threats and attacks ern antivirus software can protect users from various
that are needed for consideration in today’s cyber world. malware attacks such as ransomware, backdoors, trojan
horses, worms, spyware, etc.
– Sandbox [48] is a security mechanism used for mitigat-
Cybersecurity Defense Strategies ing the system failures or software vulnerabilities from
spreading through separating the running programs. It
Cybersecurity defense strategies are typically for the protec- is often used to execute untrusted programs or code,
tion of the computer systems and networks from the dam- possibly from unverified suppliers, users, websites, or
age of the associated hardware, software, or data, as well untrusted third parties.
as the disruption of the services they provide. More granu- – Security information and event management (SIEM) [49]
larly, they are responsible for preventing data breaches or is a combination of security information management
security incidents that can be defined as any kind of mali- (SIM) and security event management (SEM) that pro-
cious or unauthorized activity to protect the systems [44]. vides real-time analysis of device and network hardware
In the following, we give an overview of traditional security security alerts.
mechanisms. – Cryptography [50] is a popular method used for protect-
ing data or information that uses the secret keys, e.g.,
– Access control [45] is a security mechanism that typi- secret-key, public key, and hash function, to encrypt and
cally regulates the access or use of the resources, e.g., decrypt data for communication.

SN Computer Science
SN Computer Science (2021) 2:173 Page 5 of 18 173

Table 1 The most common cyber-threats and attacks in cybersecurity

Key terms Description References

Unauthorized access An act of accessing information without authorization to the network, systems or data that results in a [21]
breach or violation of a security policy
Malware To cause extensive damage to data and systems or to obtain unauthorized access to a network, often [18]
referred to as malicious software or program
Ransomware A kind of malware attack that prevents users from accessing their device or personal files and needs a [22]
payment of ransom in order to regain access
Backdoor A type of malware attack that bypasses normal authentication or encryption to gain high-level user [23, 24]
access to a computer device, network or software application
Malicious bot A type of malware to steal information, or infect a host, often used by cyber criminals [18]
Typo-squatting attacks A form of cybersquatting, also known as URL hijacking or domain mimicry, fake URL, that tricks [25]
users into visiting a malicious website
Denial of service (DoS) A type of cyber-attack on a service that interferes with its normal functioning and prevents access to [18]
that by other users
Distributed DoS (DDoS) A large-scale DoS attack where the perpetrator uses multiple machines and networks [18]
Botnets A collection of malware-infected internet-connected devices that allow hackers to carry out malicious [18]
activities such as leaks of credentials, unauthorized access, data theft and DDoS attacks
Computer virus A type of malicious software program loaded without the knowledge of the user onto a user’s com- [18]
puter and performs malicious acts
Social engineering Psychological manipulation of people that enable attackers to gain legitimate, authorized access to [18]
confidential information
Phishing A type of social engineering that involves fraudulent attempts to obtain sensitive information, such as [18, 26]
details of banking and credit cards, login credentials, etc.
Zero-day attack Is considered as the threat of an unknown security vulnerability [27, 28]
Cryptographic attack To finding a weakness in a code, cipher, cryptographic protocol or key management scheme [29]
Insider threats Originates from within the organization by legitimate users, e.g., employees, to misuse access to [30]
networks and assets
Supply chain attack Targets less secure supply network components to harm any industry, from the financial sector, oil or [31, 32]
government sector
Man-in-the-middle (MiiM) A type of cyberattack in which a malicious actor introduces himself into a two-party conversation to [33]
gain access to sensitive information
Data breaches Known as a data leakage, a theft of data by a malicious actor, e.g., unauthorized access of data by an [34, 35]
individual, application, or service
Hacking To compromise data and digital devices, such as computers, smartphones, tablets, and even entire [26, 36]
networks
SQL injection attack To execute malicious SQL statements for backend database manipulation to access information, typi- [37]
cally used to attack data-driven applications
Attacks on IoT devices To make it part of a DDoS attack and unauthorized access to data being collected by the device [3]
Malware on Mobile App To get access of personal information, location data, financial accounts etc. by the malicious actor [38, 39]
Others Privilege escalation [40], password attack [41], advanced persistent threat [42], cryptojacking attack
[43], web application attack [41], and so on

Although the traditional well-known security approaches from different sources in a computer network or device for
have their own merits for different purposes, these might not this purpose and identifies security policy breaches that can
be effective according to today’s diverse needs in the cyber be used to detect internal and external attacks [52, 53]. IDS
industry, because of lacking intelligence and dynamism can be several types based on environment type and detec-
[6–9]. The intrusion detection system (IDS) becomes more tion approaches. For instance, based on the scope from sin-
popular that is typically defined as “a device or software gle computers to large networks, the most common types
application that monitors a computer network or systems for of IDS are:
malicious activity or policy violations” [51]. IDS is typically
capable to identify the diverse cyber threats and attacks, – Host-based IDS (HIDS) runs on a host, analyze traffic,
even the unknown zero-day attack, and able to respond in and detect malicious or suspicious activity. Thus, it can
real-time based on the user’s requirements. IDS gathers data provide real-time visibility into what’s happening on the

SN Computer Science
173 Page 6 of 18 SN Computer Science (2021) 2:173

critical security systems, and which adds to the additional several other datasets exist in the domain of cybersecurity,
security [3]. for instance, DARPA [57, 61], CAIDA [62, 63], ISOT’10
– Network-based IDS (NIDS) On the other hand, NIDS ana- [64, 65], ISCX’12 [66, 67], CTU-13 [68], CIC-IDS [69],
lyzes and monitors network connections to detect mali- CIC-DDoS2019 [70], MAWI [71], ADFA IDS [72], CERT
cious activity or policy violations on a network [3]. [73, 74], EnronSpam [75], SpamAssassin [76], LingSpam
[77], DGA [78–81], Malware Genome project [82], Virus
Similarly, IDS can be several types depending on the detec- Share [83], VirusTotal [84], Comodo [85], Contagio [86],
tion method, where the most well-known versions are the DREBIN [87], Microsoft [88], Bot-IoT [89], etc. A summary
signature-based IDS and anomaly-based IDS [44]. of these cybersecurity datasets highlighting diverse attack
types and machine learning-based usage in different cyber
– Signature-based IDS (SIDS) It looks for unique patterns, applications are provided in our earlier paper Sarker et al.
such as network traffic byte sequences, or recognized [3]. Several works focused on deep learning have recently
malicious sequences that the malware uses as signatures. been studied in the field. For example, methods of detection
It is also considered as misuse or knowledge-based detec- of network attacks based on deep learning techniques are
tion that performs well for the known attacks [54]. It studied in [90]. The researchers of [91] review deep learn-
can, however, face the greatest challenge in detecting ing for the detection of cyber security intrusion. In [92], the
unknown or new attacks. authors review deep learning-based intrusion detection sys-
– Anomaly-based IDS (AIDS) On the other hand, due to the tems. The authors of [93] conducted a study of cybersecurity
rapid growth of malware in recent days, AIDS is mainly deep learning methods. In [13], a survey of computer and
used to detect unknown attacks. To detect anomalies like deep learning techniques for internet of things (IoT) secu-
the unknown or zero-day attacks, machine learning tech- rity is studied. We summarize several data-driven tasks and
niques can also be used to build the protection model [3, machine-learning modeling used for various purposes in the
55]. cybersecurity domain in Table 2.
– Hybrid IDS The hybrid IDS is obtained by combining While different types of cybersecurity data and techniques
anomaly-based IDS with the misuse-based IDS discussed mentioned above are used for various purposes in the field
above and can be used to effectively detect the malicious of cybersecurity and systems, there is an interest in security
activities in several cases [56, 57]. intelligence modeling in a broad sense, according to today’s
– Stateful Protocol Analysis (SPA) Besides, SPA is another cyber industry needs. Therefore, in this paper, we intend to
type of method that identifies the deviations of proto- concentrate on a comprehensive view on “AI-driven cyber-
col state. This approach is similar to the anomaly-based security” in terms of concepts and security modeling for
method, however, it uses predetermined universal profiles intelligent cybersecurity services and management, where
of benign protocol activity [54]. the most popular AI techniques such as machine and deep
learning methods, the concept of natural language pro-
Once the malicious activities have been detected, the intru- cessing, knowledge representation and reasoning, as well
sion prevention system (IPS) can be used to avoid and block as the concept of knowledge or rule-based expert systems
them. This can be done in many ways, such as manual, modeling can be used. These AI methods based on security
sending notification, or automated operation [58]. Among intelligence modeling can be used to solve various cyberse-
these methods, an automated response system (ARS) may be curity issues and tasks, such as automatic identification of
more effective, because it does not involve a human interface malicious activities, phishing detection, to detect malware,
between the detection and response systems. prediction of cyber-attacks, fraud detection, access control
management, detection of anomalies or intrusions, etc. Thus,
Cybersecurity Data and Systems the concept of AI-based security intelligence modeling can
enable the cybersecurity computing process to be more
Research that relies on security information gathered from actionable and intelligent compared to conventional systems.
different sources is often problem specific, which varies
from application-to-application. A number of studies have
been performed on cybersecurity systems and facilities that AI‑Based Security Intelligence Modeling
take into account different sources of security data. For
instance, NSL-KDD [59] that contains security data related As discussed earlier, intelligent cybersecurity management
to various types of cyber-attacks such as denial of service is based on artificial intelligence, applies various AI methods
(DoS), remote-to-local (R2L), user-to-remote (U2R), and that eventually seek for intelligent decision making in cyber
probing attack. Another popular dataset UNSW-NB15 applications or services. In our analysis, we have taken into
[60] that consists of different types of attacks. Similarly, account the most popular AI techniques that include ML and

SN Computer Science
SN Computer Science (2021) 2:173 Page 7 of 18 173

Table 2 A summary of data-driven/machine learning tasks and approaches in the domain of cybersecurity
Used technique and approaches Purpose References

Clustering Intrusion detection analysis Chandrasekhar et al. [94], Sharifi et al. [95], Lin et al.
[96]
Rule-based approach Network intrusion detection systems Tajbakhsh et al. [97], Mitchell et al. [98]
Support vector machines Attack classification intrusion detection and clas- Kotpalliwar et al. [99], Pervez et al. [100], Yan et al.
sification DDoS detection and analysis, anomaly [101], Li et al. [102], Raman et al. [103], Kokila et al.
detection systems [104], Xie et al. [105], Saxena et al. [106], Chan-
drasekhar et al. [94]
K-nearest neighbor Network intrusion detection system reducing the false Shapoorifard et al. [107], Vishwakarma et al. [108],
alarm rate intrusion detection system Meng et al. [109], Dada et al. [110]
Naive Bayes Intrusion detection system Koc et al. [111]
Decision tree Malicious behavior analysis intrusion detection sys- Moon et al. [112], Ingre et al. [113], Malik et al. [114],
tem anomaly detection system Relan et al. [115], Rai et al. [116], Sarker et al. [117],
Puthran et al. [118], Balogun et al. [119], Jo et al.
[120]
Random forests Network intrusion detection systems Zhang et al. [121]
Adaptive boosting Network anomaly detection Yuan et al. [122]
Neural network and deep learn- Anomaly intrusion detection attack classification Jo et al. [120], Alrawashdeh et al. [123], Yin et al.
ing (RNN, LSTM, CNN) Malware traffic classification [124], Kim et al. [125], Almiani et al. [126], Kolosn-
jaji et al. [127], Wang et al. [128]
Genetic algorithm Preventing cyberterrorism and intrusion detection Hansen et al. [129], Aslahi et al. [130], Azad et al.
[131]
Hidden Markov model Intrusion detection system Ariu et al. [132], Aarnes et al. [133]
Reinforcement learning Detecting malicious activities and intrusions Alauthman et al. [134], Blanco et al. [135], Lopez et al.
[136]

DL methods, the concept of NLP, KRR, as well as the con- to reach from a certain set of inputs, i.e., task-driven
cept of knowledge or rule-based expert systems (ES) mod- approach [138]. For instance, to classify internal data,
eling, according to today’s need in the cyber industry. These spam and malicious activities, supervised technique
AI method-based security intelligence modeling potentially can be used. Navies Bayes [139], Various types of deci-
can be used to make intelligent decisions in cybersecurity sion trees, such as C4.5 [140], IntrudTree [117], or even
tasks, which are discussed briefly in the following. BehavDT [141] for behavioral pattern analysis, etc., can
generate policy rules as well, K-nearest neighbors [142],
Machine Learning‑Based Modeling Support vector machines [143], Adaptive boosting [144],
Logistic regression [145], Stochastic Gradient Descent
Machine learning (ML) including neural network-based [146], or ensemble methods such as XGBoost [147],
deep learning is an important part of AI that can be used to Random Forest learning [148], etc. are the well-known
build effective security modeling utilizing the given histori- classification techniques in the area. These techniques
cal cybersecurity data, summarized in Sect. 2. A security can be used for data-driven security modeling accord-
model for machine learning is typically a collection of target ing to their learning capabilities from the security data,
security-related data from different relevant sources, such as e.g., classifying and predicting malware attacks or cyber
network behavior, database activity, application activity, or anomalies. For instance, a decision tree-based machine
user activity, etc., and the algorithms chosen to operate on learning model, e.g., IntruDTree model [117], to detect-
that data to deduce the performance [3]. In the following, ing cyber anomalies, is shown in Fig. 2, which provides
we list several popular machine learning algorithms [137] a significant accuracy 98% for unseen test cases.
that can be used for different purposes ranging from exploit- – Unsupervised learning Security data are not labeled
ing malware to risky behavior identification that might lead or categorized always in the real world scenario. Thus
to a phishing attack or malicious code within the area of unsupervised learning, i.e., data-driven approach, can
cybersecurity. be used to find patterns, structures, or knowledge from
unlabeled data [138]. The hidden patterns and struc-
– Supervised learning Supervised learning is performed tures of the datasets can be uncovered by clustering,
when specific target attack-anomaly classes are defined a common form of unsupervised learning. Clustering

SN Computer Science
173 Page 8 of 18 SN Computer Science (2021) 2:173

Fig. 2 An example of detect-

ing cyber anomalies based on
a decision tree-based machine
learning model

techniques can group the security data by taking into the ranking of security features in [117], according to
account certain measures of similarity in the data. their significance to create an efficient tree-based security
Several clustering algorithms, for example, partition- model that achieves 98% with the simplified model for
ing methods such as K-means [149], K-medoids [150], unseen test cases.
CLARA [151], etc., density-based methods such as ∑n ̄ ̄
DBSCAN [152], distribution-based clustering such as i=1 (Xi − X)(Yi − Y)
r(X, Y) = � (1)
Gaussian mixture models (GMMs) [147], hierarchi- ∑n �
(X − X) ̄ 2 ∑n (Yi − Y) ̄ 2
cal-based methods, agglomerative or divisive such as i=1 i i=1

Single linkage [153], Complete linkage [154], BOTS

[155], etc. can be used in such purposes. Moreover,
n
∑ (Oi − Ei )2
incident response and risk management from recom- 𝜒2 = (2)
i=1
Ei
mendation methods is another area that typically comes
from association learning techniques. Several meth- – Deep learning and others Deep learning is typically con-
ods such as AIS [156], Apriori [157], FP-Tree [158], sidered as part of a broader family of machine learning
RARM [159], Eclat [160], ABC-RuleMiner [161] approaches, originating from an artificial neural network
can be used for building rule-based machine learning (ANN). In Fig. 3, we show a structure of artificial neural
model, e.g., policy-rule generation. network modeling considering input, hidden, and output
– Security feature optimization Today’s cybersecurity data- layer, for detecting cyber anomalies or attacks. In the
sets may contain security features with high dimensions domain of cybersecurity, the deep learning methods can
[117]. Thus, to minimize the complexity of a security be used for various purposes such as detecting network
model, feature optimization is important. Therefore the intrusions, detecting and classifying malware traffic,
task of feature selection or feature engineering such as backdoor attacks, etc. [24, 57, 91]. Multi-layer percep-
considering a subset of security features according to tron (MLP) [163], convolutional neural network (CNN)
their importance or significance in modeling, the extrac- [164], recurrent neural network (RNN) and long–short-
tion of features considering the key components, or gen- term memory (LSTM) are the popular approaches used
erating new features could help simplify as well as opti- in deep learning modeling [23, 124, 164]. In these deep-
mize the resultant security model. Several methods such learning models, many hidden layers can be used to com-
as variance threshold [147], Pearson’s correlation coef- plete the overall computing process. The strongest aspect
ficient defined for two variables (X and Y) in Eq. 1 [146], of deep learning techniques is effectively learning feature
analysis of variance (ANOVA) [147], chi-squared test hierarchies based on the patterns in the data [92]. Several
considering Oi as observed value and Ei as expected value unsupervised techniques such as autoencoder (AE), deep
in Eq. 2 [147], recursive feature elimination (RFE) [147], belief network (DBN), restricted Boltzmann machines
principal component analysis (PCA) [162], or model- (RBMs), generative adversarial network (GAN) etc.,
based selection [117, 147], etc. can be used to perform can also be used in the domain of cybersecurity [90,
the tasks according to the characteristics or nature of the 92]. Hybrid techniques can also be used for significant
security data. For example, the authors take into account outcomes in several cases [92]. For instance, an intru-

SN Computer Science
SN Computer Science (2021) 2:173 Page 9 of 18 173

Fig. 3 A structure of artificial

neural network modeling for
detecting cyber anomalies or
attacks with multiple processing
layers

sion detection model based on the LSTM architecture NLP‑Based Modeling

with RNN achieved an attack detection percentage of
98.8% [125]. A deep-learning model based on a stacked Natural Language Processing (NLP) is considered as an
auto-encoder with a soft-max classifier for efficient net- important branch of AI that can make it possible for com-
work intrusion detection is proposed in [165], which puters to understand human language, interpret it, and even-
achieves up to 99.99% accuracy for the KDD99 dataset, tually determine which parts are important in an intelligent
and 89.13% for the UNSW-NB15 dataset. Besides the system [168]. NLP is increasingly used nowadays by cyber-
semi-supervised learning combining the supervised and criminals and security defense tools in the understanding
unsupervised techniques discussed above, and reinforce- and processing of unstructured data generated. NLP’s ulti-
ment learning techniques such as Monte Carlo learning, mate aim is to extract knowledge from unstructured data
Q-learning, Deep Q Networks [3, 166] can be used in or information, i.e., to interpret, decipher, comprehend, and
the area. A brief discussion of these various types of make sense of human languages in a valuable way. In the
neural networks (ANN) and deep learning (DL) based following, we discuss several parts of NLP that can be used
security modeling are summarized in our earlier paper for intelligent cybersecurity modeling when unstructured
Sarker et al. [167]. security content is available.

Thus, the machine and deep learning methods discussed – Lexical analysis It usually includes the arrangement of
above can play a vital role to understand and analyze the terms being described and analyzed. Lexical analysis
actual phenomena with cybersecurity data, depending on separates the entire chunk of text according to the crite-
the nature or characteristics of the security features and the ria into paragraphs, sentences, phrases, or tokens such as
sufficient amount of data needed for learning. These tech- identifier, keyword, literal, etc. For example, the lexical
niques can extract insights or useful knowledge from the analysis of domain names [169] will lead to the develop-
given security data and eventually build a data-driven secu- ment of the NLP-based model to classify the malicious
rity model. Such models can learn from the training data domains that may encompass the “malicious nature” of
and behave accordingly for the unseen test cases. Overall, the domains used by cybercriminals.
the resultant machine learning-based security models can – Syntactic analysis This is seen as one of the key tools
make intelligent cybersecurity decisions through analyz- used to complete the tasks of the NLP, which is used to
ing data from the huge amount of cyber events. Therefore, determine how the natural language aligns with the gram-
we can conclude that machine learning security models matical rules. The most widely used techniques in NLP
would be able to alter the future of cybersecurity applica- are: lemmatization, morphological segmentation, word
tions and industry, because of their data learning capabili- segmentation, part-of-speech marking, parsing, sentence
ties, and could be a major part in the domain of AI-driven breaking, stemming, etc. A syntactic analysis, e.g., pars-
cybersecurity. ing [170], may contribute to developing an NLP-based

SN Computer Science
173 Page 10 of 18 SN Computer Science (2021) 2:173

model for cyberattack prediction, for example, to quickly malware. For the formulation of behavioral reports
extract useful data from large quantities of public text. [176], a bag-of-words (BoW) NLP model might be
– Semantic analysis Another of the key methods used to helpful. For the automated engineering of related
complete NLP assignments is semantic analysis, which security features and to construct the model, NLP with
includes understanding the context and perception of machine learning techniques can be used.
words and how sentences are structured. For example,
for phishing classification, latent semantic analysis Overall, to enhance the cybersecurity operations by auto-
can be used with keyword extraction [171]. The most mating threat intelligence extracted from the unstructured
widely used techniques in NLP are entity recognition sources, an NLP-based methodology can be used. Thus,
(NER), word sense disambiguation, natural language NLP with the machine learning techniques is considered as
generation, etc. For example, a NER-based automated the driver for the automation of security activities accord-
system [172], can be used to diagnose cybersecurity ing to its capabilities in security modeling depending on the
situations in IoT networks. target security application. Therefore, we can conclude that
NLP-based security modeling could be another major part
Several most frequently used algorithms such as Bag-of- of the domain of AI-driven cybersecurity.
Words (BoW), TF-IDF (term frequency-inverse document
frequency), Tokenization and Stop Words Removal, Stem- Knowledge Representation and Conceptual
ming, Lemmatization, Topic Modeling, etc. are used in the Modeling
area of NLP [173]. Most of the NLP-based modeling relies
on machine and deep learning techniques discussed above Knowledge representation and reasoning is another field
for building the resultant data-driven model that can be of AI that typically represents the real-world informa-
used for various purposes in the domain of cybersecurity. tion so that an intelligent cybersecurity system can utilize
In the following, we give examples of NLP-based security that information to solve complex security problems like
modeling. a human. In the real world, knowledge of cybersecurity is
usually regarded as information about a specific security
– Detecting malicious domain names to identify mali- domain. It is the analysis of how an intelligent cybersecurity
cious domain names (e.g., clbwpvdyztoepfua.lu) from agent’s views, intentions, and decisions can be adequately
benign domains (e.g., cnn.com), the NLP methods can articulated for automated reasoning, e.g., inference engines,
be used. It helps to build a technique for detecting such classifiers, etc., to solve complex security problems. In this
malicious domains in DNS traffic based on the patterns section, we first discuss and summarize the approaches of
that are inherent in domain names using a domain data- knowledge representation, and then we discuss a conceptual
set collected via a domain crawl. security model based on knowledge.
– Vulnerability analysis to detect the weaknesses and
vulnerabilities in the code, the NLP techniques can Knowledge Representation
be used. For instance, n-grams and various smoothing
algorithms [174] combined with machine learning can Modeling the intelligent actions of a security agent is the
be used to build such a model based on the associ- key purpose of knowledge representation. In the field of
ated patterns for detecting vulnerabilities. One example cybersecurity, it enables a computer to benefit from that
could be the detection of zero-day vulnerabilities in the knowledge of security and function like a human being
banking sector. The analysts usually study conversa- accordingly. Instead of considering the bottom-up learning,
tions on various platforms on the web and looking for it takes into account a top-down approach to build the model
the relevant information that is useful for the purposes. to behave intelligently. As discussed in [168], descriptive
– Phishing identification detection of a phishing attack is knowledge, structural knowledge, procedural knowledge,
a challenging problem, because of considering this as meta knowledge, heuristic knowledge, etc. are the several
semantics-based attacks. Phishing can be several cat- types of knowledge that can be used in various application
egories, such as web page based, email content based, areas. In the following, we summarize several knowledge
URL based, etc. A machine learning model with a set representation methods such as logical, semantic network,
of features can be used to detect such phishing [175]. frame, and production rules [177], that can be used to build
NLP techniques can be used to effectively extract a knowledge-based conceptual model.
the features from such content as well as to build the
model. – Logical representation It represents with concrete rules
– Malware family analysis to modeling behavioral reports without any ambiguity that typically deals with proposi-
into a series of words is necessary to effectively detect tions. Thus, logic can be used to represent simple facts

SN Computer Science
SN Computer Science (2021) 2:173 Page 11 of 18 173

that are the general statements that may be either ‘True’ of knowledge and data” [178]. According to [178], for-
or ‘False’. Overall, logical representation means drawing mally, an ontology is represented as “{O = C, R, I, H, A},
a conclusion based on various conditions. Although logi- where {C = C1 , C2 , ..., Cn } represents a set of concepts, and
cal representation enables us to do logical reasoning, the {R = R1 , R2 , ..., Rm } represents a set of relations defined over
inference may not be so efficient due to the restrictions the concepts. I represents a set of instances of concepts, and
and challenges to work with. H represents a Directed Acyclic Graph (DAG) defined by
– Semantic network representation We may represent our the subsumption relation between concepts, and A repre-
information in the form of graphical networks within sents a set of axioms bringing additional constraints on the
semantic networks. This network is made up of objects ontology”. In an ontology-based information security, five
and arcs representing nodes that define the relationship concepts such as threat, vulnerability, attack, impact, and
between those objects. Overall, they provide a structural control, might be involved [179].
representation of statements about a domain of interest.
Although semantic networks are a natural representation – Concept:Threat represents various types of difficulties or
of information, their intelligence in action depends on the dangers against a given set of security properties.
system’s creator. – Concept: Vulnerability mainly represents the weaknesses
– Frame representation A frame, derived from semantic of a cybersecurity system.
networks, is a structure-like record that consists of a set – Concept:Attack represents various types of security inci-
of attributes to represent an object in the world and its dents caused by cyber criminals.
values. In the frame, knowledge about an object or event – Concept:Impact represents the effects that a security inci-
can be stored together in the knowledge base. Although dent can imply.
frame representation is easy to understand and visual- – Concept: Controls represents the relevant mechanisms
ize, it cannot proceed with the inference mechanism that can be used to reduce or avoid the effects of a secu-
smoothly. rity incident or to protect a vulnerability.
– Production rules It typically consists of pairs of the
condition, and corresponding action, which means, “If Based on these concepts and their relationships, a conceptual
condition then action”. Thus, an agent first checks the security model can be built to solve complex security prob-
condition and then the corresponding rule fires if the con- lems. The rationale behind the conceptual security model
dition satisfies. The main advantage of such a rule-based can be structured as: a cyber-threat may produce an attack
system in cybersecurity is that the “condition” part can or security incident that exploits the vulnerabilities of the
determine which rule is suitable to apply for a specific system, which may have an impact on that system. A control
security problem. And the “action” part carries out the mechanism that can detect, prevent, or block the attack, is
solutions associated with that problem. Thus, in a rule- thus needed to protect the system and make it secured. In
based cybersecurity system, it allows us to remove, add Fig. 4, we show a structure of conceptual modeling based on
or modify the rules according to the needs. security ontologies in a cybersecurity system and the cor-
responding information flow from data source to application.
Overall, we can say that the knowledge for building a knowl- According to Fig. 4, the automated security policies can also
edge-based conceptual model or system can be represented be generated from the relevant security ontologies that are
in multiple ways. However, the effectiveness of these meth- used in the eventual security services or applications. Thus,
ods in a security system may vary depending on the nature of it is capable of making intelligent decisions according to the
the data and target application. In the following, we discuss concepts and their semantic relationships that exist in the
how security ontologies, a formal way to define the seman- ontologies. Based on different knowledge representation for-
tics of knowledge and data, can be used to build a conceptual malisms, various ontology languages can be used. In the area
security model. of semantic web, Web Ontology Language (OWL) [180] is
mostly used to formalize and represent these concepts and
Security Ontologies and Conceptual Modeling their semantic relationships in a graphical representation to
build an ontology-based security model. Overall, we can
Ontologies, through information representation techniques, conclude that knowledge representation based conceptual
are conceptual models of what exists in some domain, security modeling could be another part in the domain of AI-
brought into machine-interpretable form. Top-level ontolo- driven cybersecurity according to its computing capabilities
gies or upper ontologies, domain ontologies, and application while making intelligent decisions.
ontologies are several types of ontologies used in the area
[177]. In general, ontology is “an explicit specification of
conceptualization and a formal way to define the semantics

SN Computer Science
173 Page 12 of 18 SN Computer Science (2021) 2:173

decisions. The inference engine shown in Fig. 5, on the

other hand, applies the rules to known facts from a security
perspective to deduce new facts. The user interface shown
in Fig. 5 recognizes the original security facts and invokes
the inference engine to trigger the knowledge base decision
rules.
Usually, a rule consists of two parts: the antecedent
(IF part), called the state or premise, and the inference or
action called the consequent (THEN part). Thus, a rule’s
basic syntax can be expressed as:
IF < antecedent > THEN < consequent >
For instance, “if the flag value is RSTR, then the out-
come is anomaly” can be an example of the IF-THEN rule
for detecting anomalies. Similarly, another rule with mul-
tiple security features could be “if flag value is SF, service
is ftb, and duration <= 4 , then the outcome is anomaly”,
generated from the tree shown in Fig. 2. In addition to
human experts, several techniques can be used to generate
rules that can be used to build the rule-based cybersecurity
expert system.

– Classification learning rules In machine learning, the

classification is one of the popular techniques that can
be used in various application areas. Several popular
Fig. 4 A structure of conceptual modeling based on security ontolo-
gies in a cybersecurity system and the corresponding information classification techniques such as decision trees [140],
flow from data source to application IntrudTree [117], BehavDT [141], Ripple Down Rule
learner (RIDOR) [181], Repeated Incremental Pruning
to Produce Error Reduction (RIPPER) [182], etc. exist
Cybersecurity Expert System Modeling with the ability of rule generation.
– Association learning rules In general, association rules
In artificial intelligence, an expert system is generally a com- are created by searching for frequent IF-THEN pattern
puter system that emulates the decision-making capacity of a data on the basis of [161] support and confidence value.
human expert. A cybersecurity expert system is an instance For generating rules using a given data set, common
of a knowledge-based or rule-based system in which deci- association rule learning techniques such as AIS [156],
sions can be made based on security guidelines. The system Apriori [157], FP-Tree [158], RARM [159], Eclat [160],
is typically split into two subsystems, such as the inference ABC-RuleMiner [161], etc. can be used.
engine and the knowledge base represented as security rules, – Fuzzy logic-based rules Usually, fuzzy logic is an
as shown in Fig. 5. approach to computing focused on “degrees of truth”
The foundation of this cybersecurity expert framework is rather than the usual “true or false” (1 or 0) [183]. Thus,
the knowledge base shown in Fig. 5, as it consists of knowl- instead of Boolean logic, a fuzzy rule-based expert sys-
edge of the domain of the target cybersecurity application tem uses fuzzy logic. In other words, using these rules,
as well as operational knowledge of the rules of security

Fig. 5 A structure of a cyberse-

curity expert system modeling

SN Computer Science
SN Computer Science (2021) 2:173 Page 13 of 18 173

a fuzzy expert system is a set of membership functions security data is not straight forward as the data sources could
and rules that can provide outputs. be multiple and dynamic. Thus, collecting various types of
– Conceptual semantic rule As discussed earlier, an ontol- real-world data such as structured, semi-structured, unstruc-
ogy is “an explicit specification of conceptualization and tured, or meta-data [137] . relevant to a particular problem
a formal way to define the semantics of knowledge and domain with legal access, which may vary from application
data” [178]. For instance, security ontologies include the to application, is challenging. Therefore, to understand the
relationships between each entry within an ontology that security problem, and to integrate and manage the collected
can be used to generate such conceptual rules. As each data for effective data analysis could be one of the major
security decision must consider the concrete company challenges to work in the area of AI-driven cybersecurity.
environment, particular domain ontology can help for The next challenge could be an effective and intelligent
building an effective semantic cybersecurity application. solution to tackle the target security problems. Although
several machine and deep learning techniques, such as clus-
Thus, a rule-based cybersecurity expert system model may tering, rule-based approach, classification, neural network,
have the decision-making capacity of a security expert in etc. [3] are employed to solve several security problems,
an intelligent cybersecurity framework that is built to solve summarized in Table 2, these models can be improved with
complex cybersecurity issues, as well as by information advanced analytics. For instance, observing attack patterns
reasoning. A rule generation method discussed above can in time-series, behavioral analysis, data sparseness in secu-
play a major role in generating the IF-THEN rules while rity analysis, the impact of security features in modeling,
developing the knowledge base module. The rules can then simplifying and optimizing the security model, taking into
be modified and handled according to the requirements by account advanced feature engineering tasks, synchronizing
domain experts with knowledge of business rules. Overall, temporal patterns in modeling while considering multiple
we can conclude that cybersecurity expert systems modeling data sources, etc. can be considered. Moreover, several
could be another important part in the domain of AI-driven important issues such as data aggregation, redundancy in
cybersecurity according to its computing capabilities while rule generation, effectiveness of prediction algorithms,
making intelligent decisions. data inconsistency, recent pattern analysis for prediction
[184–186], etc. might be an important issue for effective
data-driven modeling. Thus, advanced analytics techniques,
Research Issues and Future Directions improved machine or deep learning techniques, new data-
driven algorithms, or hybrid methods could give better
As we have discussed the role of Artificial Intelligence (AI) results for modeling security intelligence, depending on the
throughout the paper, which is known as the key technolo- nature of the security problems, which could be a potential
gies of the Fourth Industrial Revolution (Industry 4.0), can research direction in the area.
play a significant role for intelligent cybersecurity services Besides, to effectively extract the useful insights from the
and management. To intelligently solve today’s various unstructured security data and to effectively build an intel-
cybersecurity issues, i.e., protecting of Internet-connected ligent security model could be another issue. For instance,
systems from cyber-threats, attacks, damage, or unauthor- a large amount of textual content is needed to analyze iden-
ized access, popular AI methods such as machine and deep tifying malicious domains, security incident and event man-
learning, natural language processing, knowledge represen- agement, malware family analysis, domain classification,
tation and reasoning, as well as the concept of knowledge or phishing, source code vulnerability analysis, spam emails,
rule-based expert systems modeling can be used, discussed etc., that are discussed briefly in Sect. 3. Therefore effec-
briefly in Sect. 3. However, several research issues that are tively mining the relevant contents using natural language
identified within the area of AI-driven cybersecurity, dis- processing (NLP) techniques, or designing a new NLP-based
cussed briefly in the following. model, could be another research direction in the area of
According to our study in this paper, cybersecurity source AI-driven cybersecurity. An effective cybersecurity expert
datasets are the primary component, especially to extract system modeling considering IF-THEN policy rules could
security insight or useful knowledge from security data using be another potential research direction in the area. However,
machine and deep learning technique, discussed briefly in the development of large-scale rule-based systems in the
Sect. 3. Thus, the primary and most fundamental challenge area of cybersecurity may face numerous challenges. For
is to understand the real-world security issues and to explore instance, the reasoning process in the expert system can be
the relevant cybersecurity data to extract insights or useful very complex, difficult to manage [168]. Thus, a lightweight
knowledge for future actions. For instance, public text data rule-based inference engine that allows to reason for intel-
such as cyber-related webpage text is used to detect and track ligent cybersecurity services is important. Although several
the potential cyber-attacks [170]. However, collecting the rule mining techniques are popular in the area, mentioned

SN Computer Science
173 Page 14 of 18 SN Computer Science (2021) 2:173

in Sect. 3, a concise set of security policy rules consider- point and guidelines for cybersecurity researchers as well as
ing generalization, reliability, non-redundancy, exceptional industry professionals in the area, especially from an intel-
discovery, etc., could make the expert security system more ligent computing or AI-based technical point of view.
effective. Therefore, a deeper understanding and designing
an effective rule-based system by taking into these proper-
ties could be another research issue in the area of AI-driven Author Contributions The authors present a comprehensive view on
“AI-driven Cybersecurity” that can play an important role for intelli-
cybersecurity. Moreover, designing security ontologies gent cybersecurity services and management [IHS—conceptualization,
according to today’s need, or knowledge representation research design, and prepare the original manuscript]. All the authors
model, and eventually to build an effective conceptual secu- read and approved the final manuscript.
rity modeling, could be another potential research scope in
Declaration
the area.
Overall, the most important task for an intelligent cyber-
Conflict of interest The authors declare no conflict of interest.
security system is to design and build an effective cyber-
security framework that supports the artificial intelligence
techniques, discussed in Sect. 3. In such a framework, we
need to take into account AI-based advanced analytics, so
that the security framework is capable to resolve the associ-
References
ated issues intelligently. Therefore, to assess the feasibil- 1. Li S, Da Li X, Zhao S. The internet of things: a survey. Inf Syst
ity and effectiveness of the related AI-based approaches, a Front. 2015;17(2):243–59.
well-designed cybersecurity framework and experimental 2. Velte T, Velte A, Elsenpeter R. Cloud computing, a practical
evaluation are required, which is a very important direction approach. New York: McGraw-Hill Inc; 2009.
3. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A.
and a major challenge as well. Overall, we can conclude that Cybersecurity data science: an overview from machine learning
this paper has uncovered lots of research issues and potential perspective. J Big Data. 2020;7(1):1–29.
future directions to resolve, discussed above, in the area of 4. Ibm security report. https://www.ibm.com/security/data-breach.
AI-driven cybersecurity. Accessed 20 Oct 2019.
5. Fischer EA. Cybersecurity issues and challenges: in brief. 2014.
6. Anwar S, Mohamad Zain J, Zolkipli MF, Inayat Z, Khan S,
Anthony B, Chang V. From intrusion detection to an intrusion
response system: fundamentals, requirements, and future direc-
Conclusion tions. Algorithms. 2017;39(2):10.
7. Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour
H. Cyber intrusion detection by combined feature selection algo-
Motivated by the growing significance of cybersecurity and rithm. J Inf Secur Appl. 2019;44:80–8.
artificial intelligence, in this paper, we have studied AI- 8. Tapiador JE, Orfila A, Ribagorda A, Ramos B. Key-recovery
driven cybersecurity. Our goal was to provide a compre- attacks on kids, a keyed anomaly detection system. IEEE Trans
Dependable Secur Comput. 2013;12(3):312–25.
hensive overview of how artificial intelligence can play a 9. Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible
significant role in intelligent decision making and to build evaluation of anomaly-based intrusion-detection methods. IEEE
smart and automated cybersecurity systems. For this, we Trans Syst Man Cybern Part C (Appl Rev). 2010;40(5):516–24.
have presented security intelligence modeling where various 10. Foroughi F, Luksch P. Data science methodology for cybersecu-
rity projects. arXiv preprint arXiv:1803.04219. 2018.
AI-based methods such as machine and deep learning, the 11. Saxe J, Sanders H. Malware data science: attack detection and
concept of natural language processing, knowledge represen- attribution. 2018.
tation and reasoning, as well as the concept of knowledge or 12. Rainie L, Anderson J, Connolly J. Cyber attacks likely to
rule-based expert systems modeling are used to intelligently increase. Digit Life. 2014;2025.
13. Al-Garadi MA, Mohamed A, Al-Ali A, Du X, Ali I, Gui-
tackle the cybersecurity issues. Such AI-based modeling can zani M. A survey of machine and deep learning methods for
be used in various problem domains ranging from malware internet of things (iot) security. IEEE Commun Surv Tutor.
analysis to risky behavior identification that might lead to 2020;22:1646–85.
a phishing attack or malicious code, which are discussed 14. Google trends. In https://trends.google.com/trends/. 2019.
15. Craigen D, Diakun-Thibault N, Purse R. Defining cybersecurity.
briefly throughout this paper. Technol Innov Manag Rev. 2014;4(10):13–21.
In the field of AI-driven cybersecurity, the concept of 16. Aftergood S. Cybersecurity: the cold war online. Nature.
AI-based security intelligence modeling discussed in this 2017;547(7661):30.
paper can help the cybersecurity computing process to be 17. National Research Council et al. Toward a safer and more secure
cyberspace. 2007.
more actionable and intelligent. Based on our study, we have 18. Jang-Jaccard J, Nepal S. A survey of emerging threats in cyber-
also highlighted several research issues and potential direc- security. J Comput Syst Sci. 2014;80(5):973–93.
tions that can help researchers do future research in the area.
Overall, we believe this paper can be served as a reference

SN Computer Science
SN Computer Science (2021) 2:173 Page 15 of 18 173

19. Lahcen RAM, Caulkins B, Mohapatra R, Kumar M. Review and 40. Davi L, Dmitrienko A, Sadeghi A-R, Winandy M. Privilege
insight on the behavioral aspects of cybersecurity. Cybersecurity. escalation attacks on android. In: International conference on
2020;3:1–18. information security. Springer; 2010. p. 346–360.
20. Mukkamala S, Sung A, Abraham A. Cyber security challenges: 41. Jovičić B, Simić D. Common web application attack types and
designing efficient intrusion detection systems and antivirus security using asp .net. ComSIS. December. 2006.
tools. In: Vemuri VR editor. Enhancing Computer Security with 42. Virvilis N, Gritzalis D. The big four-what we did wrong in
Smart Technology (Auerbach, 2006). 2005. p. 125–163. advanced persistent threat detection. In: 2013 international con-
21. Sun N, Zhang J, Rimba P, Gao S, Zhang LY, Xiang Y. Data- ference on availability, reliability and security. IEEE; 2013. p.
driven cybersecurity incident prediction: a survey. IEEE Com- 248–254.
mun Surv Tutor. 2018;21(2):1744–72. 43. Sigler K. Crypto-jacking: how cyber-criminals are exploiting the
22. McIntosh T, Jang-Jaccard J, Watters P, Susnjak T. The inad- crypto-currency boom. Comput Fraud Secur. 2018;2018(9):12–4.
equacy of entropy-based ransomware detection. In: International 44. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J. Survey of
conference on neural information processing. Springer; 2019. p. intrusion detection systems: techniques, datasets and challenges.
181–189. Cybersecurity. 2019;2(1):20.
23. Dai J, Chen C, Li Y. A backdoor attack against lstm-based text 45. Qi H, Di X, Li J. Formal definition and analysis of access
classification systems. IEEE Access. 2019;7:138872–8. control model based on role and attribute. J Inf Secur Appl.
24. Wang B, Yao Y, Shan S, Li H, Viswanath B, Zheng H, Zhao 2018;43:53–60.
BY. Neural cleanse: Identifying and mitigating backdoor attacks 46. Yin J. Firewall policy management, May 10 2016. US Patent
in neural networks. In: 2019 IEEE symposium on security and 9,338,134.
privacy (SP). IEEE; 2019. p. 707–723. 47. Xue Y, Meng G, Liu Y, Tan TH, Chen H, Sun J, Zhang J.
25. Banerjee A, Rahman MS, Faloutsos M. Sut: quantifying and miti- Auditing anti-malware tools by evolving android malware and
gating url typosquatting. Comput Netw. 2011;55(13):3001–14. dynamic loading technique. IEEE Trans Inf Forensics Secur.
26. Alsayed A, Bilgrami A. E-banking security: internet hacking, 2017;12(7):1529–44.
phishing attacks, analysis and prevention of fraudulent activities. 48. Hunt T, Zhu Z, Yuanzhong X, Peter S, Witchel E. Ryoan: a
Int J Emerg Technol Adv Act. 2017;7(1):109–15. distributed sandbox for untrusted computation on secret data.
27. Alazab M, Venkatraman S, Watters P, Alazab M, et al. Zero-day ACM Trans Comput Syst (TOCS). 2018;35(4):1–32.
malware detection based on supervised learning algorithms of 49. Irfan M, Abbas H, Sun Y, Sajid A, Pasha M. A framework for
API call signatures. Proceedings of the 9th Australasian Data cloud forensics evidence collection and analysis using secu-
Mining Conference (AusDM), Ballarat, Australia. Australian rity information and event management. Secur Commun Netw.
Computer Society, CRPIT; 2010, vol 121. 2016;9(16):3790–807.
28. Bilge L, Dumitraş T. Before we knew it: an empirical study of 50. Abood OG, Guirguis SK. A survey on cryptography algo-
zero-day attacks in the real world. In: Proceedings of the 2012 rithms. Int J Sci Res Publ. 2018;8(7):410–5.
ACM conference on computer and communications security. 51. Johnson L. Computer incident response and forensics team
ACM; 2012. p. 833–844. management: conducting a successful incident response. 2013.
29. Moghimi A, Wichelmann J, Eisenbarth T, Sunar B. Memjam: a 52. Brahmi I, Brahmi H, Yahia SB. A multi-agents intrusion detec-
false dependency attack against constant-time crypto implemen- tion system using ontology and clustering techniques. In: IFIP
tations. Int J Parallel Program. 2019;47(4):538–70. international conference on computer science and its applica-
30. Warkentin M, Willison R. Behavioral and policy issues in tions. Springer; 2015. p. 381–393.
information systems security: the insider threat. Eur J Inf Syst. 53. Qu X, Yang L, Guo K, Ma L, Sun M, Ke M, Li M. A survey
2009;18(2):101–5. on the development of self-organizing maps for unsupervised
31. Ohm M, Sykosch A, Meier M. Towards detection of software intrusion detection. Mob Netw Appl. 2019; 1–22.
supply chain attacks by forensic artifacts. In: Proceedings of the 54. Liao H-J, Richard Lin C-H, Lin Y-C, Tung K-Y. Intrusion
15th international conference on availability, reliability and secu- detection system: a comprehensive review. J Netw Comput
rity. 2020. p. 1–6. Appl. 2013;36(1):16–24.
32. Eggers S. A novel approach for analyzing the nuclear supply chain 55. Ammar A, Michael H, Jemal A, Moutaz A. Using feature
cyber-attack surface. Nucl Eng Technol. 2021;53(3):879–887 selection for intrusion detection system. In: 2012 international
33. Kügler D. “man in the middle” attacks on bluetooth. In: Interna- symposium on communications and information technologies
tional conference on financial cryptography. Springer; 2003. p. (ISCIT). IEEE; 2012. p. 296–301.
149–161. 56. Viegas E, Santin AO, Franca A, Jasinski R, Pedroni VA,
34. Shaw A. Data breach: from notification to prevention using pci Oliveira LS. Towards an energy-efficient anomaly-based intru-
dss. Colum JL Soc Probs. 2009;43:517. sion detection engine for embedded systems. IEEE Trans Com-
35. Data breach investigations report 2019. https://enter prise.veriz put. 2016;66(1):163–77.
on.com/resources/reports/dbir/. Accessed 20 Oct 2019. 57. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou
36. Hong S. Survey on analysis and countermeasure for hacking H, Wang C. Machine learning and deep learning methods for
attacks to cryptocurrency exchange. J Korea Converg Soc. cybersecurity. IEEE Access. 2018;6:35365–81.
2019;10(10):1–6. 58. Ragsdale DJ, Carver CA, Humphries JW, Pooch UW. Adap-
37. Boyd SW, Keromytis AD. Sqlrand: preventing sql injection tation techniques for intrusion detection and intrusion
attacks. In: International conference on applied cryptography response systems. In: Smc 2000 conference proceedings.
and network security. Springer; 2004. p. 292–302. 2000 IEEE international conference on systems, man and
38. Tong F, Yan Z. A hybrid approach of mobile malware detection cybernetics.’cybernetics evolving to systems, humans, organ-
in android. J Parallel Distrib Comput. 2017;103:22–31. izations, and their complex interactions’(cat. no. 0) vol. 4.
39. Shankar VG, Jangid M, Devi B, Kabra S. Mobile big data: mal- IEEE; 2000. p. 2344–2349.
ware and its analysis. In: Proceedings of first international con- 59. Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis
ference on smart system, innovations and computing. Springer; of the kdd cup 99 data set. In: 2009 IEEE symposium on compu-
2018. p. 831–842. tational intelligence for security and defense applications. IEEE;
2009. p. 1–6.

SN Computer Science
173 Page 16 of 18 SN Computer Science (2021) 2:173

60. Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for 85. Comodo. https://w ww.c omodo.c om/h ome/i ntern et-s ecuri ty/u pdat
network intrusion detection systems (unsw-nb15 network data es/vdp/database.php. Accessed 20 Oct 2019.
set). In: 2015 military communications and information systems 86. Contagio. http://contagiodump.blogspot.com/. Accessed 20 Oct
conference (MilCIS). IEEE; 2015. p. 1–6. 2019.
61. Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, 87. Kumar R, Xiaosong Z, Khan RU, Kumar J, Ahad I. Effective
McClung D, Weber D, Webster SE, Wyschogrod D, Cunning- and explainable detection of android malware based on machine
ham RK, et al. Evaluating intrusion detection systems: the 1998 learning algorithms. In: Proceedings of the 2018 international
darpa off-line intrusion detection evaluation. In: Proceedings conference on computing and artificial intelligence. ACM; 2018.
DARPA information survivability conference and exposition. p. 35–40.
DISCEX’00, vol 2. IEEE; 2000. p. 12–26. 88. Microsoft malware classification (big 2015). arXiv:1802.10135.
62. Caida ddos attack 2007 dataset. http://www.caida.org/data/ passi Accessed 20 Oct 2019.
ve/ddos-20070804-dataset.xml/. Accessed 20 Oct (2019). 89. Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the
63. Caida anonymized internet traces 2008 dataset. http://www. development of realistic botnet dataset in the internet of things
caida.org/data/passive/passive-2008-dataset.xml/. Accessed 20 for network forensic analytics: bot-iot dataset. Future Gener
Oct 2019. Comput Syst. 2019;100:779–96.
64. Isot botnet dataset. https://www.uvic.ca/engineering/ece/isot/ 90. Wu Y, Wei D, Feng J. Network attacks detection methods based
datasets/index.php/. Accessed 20 Oct 2019. on deep learning techniques: a survey. Secur Commun Netw.
65. The honeynet project. http://w ww.h oneyn et.o rg/c hapte rs/f rance/. 2020;2020:17.
Accessed 20 Oct 2019. 91. Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H. Deep
66. Canadian institute of cybersecurity, university of new brun- learning for cyber security intrusion detection: approaches, data-
swick, iscx dataset. http://www.unb.ca/cic/datasets/index.html/. sets, and comparative study. J Inf Secur Appl. 2020;50:102419.
Accessed 20 Oct 2019. 92. Aleesa AM, Zaidan BB, Zaidan AA, Sahar NM. Review of
67. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward devel- intrusion detection systems based on deep learning techniques:
oping a systematic approach to generate benchmark datasets for coherent taxonomy, challenges, motivations, recommendations,
intrusion detection. Comput Secur. 2012;31(3):357–74. substantial analysis and future directions. Neural Comput Appl.
68. The ctu-13 dataset. https://stratosphereips.org/category/datasets- 2020;32(14):9827–58.
ctu13. Accessed 20 Oct 2019. 93. Berman DS, Buczak AL, Chavis JS, Corbett CL. A survey
69. Cse-cic-ids2018 [online]. https://www.unb.ca/cic/ datasets/ids- of deep learning methods for cyber security. Information.
2018.html/. Accessed 20 Oct 2019. 2019;10(4):122.
70. Cic-ddos2019 [online]. https://www.unb.ca/cic/datasets/ddos- 94. Chandrasekhar AM, Raghuveer K. Confederation of fcm cluster-
2019.html/. Accessed 28 March 2020. ing, ann and svm techniques to implement hybrid nids using cor-
71. Jing X, Yan Z, Jiang X, Pedrycz W. Network traffic fusion and rected kdd cup 99 dataset. In: 2014 international conference on
analysis against ddos flooding attacks with a novel reversible communication and signal processing. IEEE; 2014. p. 672–676.
sketch. Inf Fusion. 2019;51:100–13. 95. Sharifi AM, Amirgholipour SK, Pourebrahimi A. Intrusion detec-
72. Xie M, Hu J, Yu X, Chang E. Evaluating host-based anomaly tion based on joint of k-means and knn. J Converg Inf Technol.
detection systems: application of the frequency-based algorithms 2015;10(5):42.
to adfa-ld. In: International conference on network and system 96. Wei-Chao L, Shih-Wen K, Chih-Fong T. Cann: an intrusion
security. Springer; 2015. p. 542–549. detection system based on combining cluster centers and near-
73. Lindauer B, Glasser J, Rosen M, Wallnau KC, L ExactData. est neighbors. Knowl Based Syst. 2015;78:13–21.
Generating test data for insider threat detectors. JoWUA. 97. Tajbakhsh A, Rahmati M, Mirzaei A. Intrusion detection using
2014;5(2):80–94. fuzzy association rules. Appl Soft Comput. 2009;9(2):462–9.
74. Glasser J, Lindauer B. Bridging the gap: a pragmatic approach 98. Mitchell R, Chen R. Behavior rule specification-based intrusion
to generating insider threat data. In: 2013 IEEE security and detection for safety critical medical cyber physical systems. IEEE
privacy workshops. IEEE; 2013. p. 98–104. Trans Dependable Secur Comput. 2014;12(1):16–30.
75. Enronspam. https://labs-repos.iit.demokr itos.gr/skel/i-config/ 99. Kotpalliwar MV, Wajgi R. Classification of attacks using support
downloads/enron-spam/. Accessed 20 Oct 2019. vector machine (svm) on kddcup’99 ids database. In: 2015 fifth
76. Spamassassin. http://www.spamassassin.org/public corpus/. international conference on communication systems and network
Accessed 20 Oct 2019. technologies. IEEE; 2015. p. 987–990.
77. Lingspam. https://labs-repos.iit.demokr itos.g r/skel/i-config/ 100. Pervez MS, Farid DM. Feature selection and intrusion classi-
downloads/lingspampublic.tar.gz/. Accessed 20 Oct 2019. fication in nsl-kdd cup 99 dataset employing svms. In: The 8th
78. Alexa top sites. https:// aws. a mazon. c om/ a lexa- t op- s ites/. international conference on software, knowledge, information
Accessed 20 Oct 2019. management and applications (SKIMA 2014). IEEE; 2014. p.
79. Bambenek consulting–master feeds. http://osint.bambenekco 1–6.
nsulting.com/feeds/. Accessed 20 Oct 2019. 101. Yan M, Liu Z. A new method of transductive svm-based network
80. Dgarchive. https:// d garc h ive. c aad. f kie. f raun h ofer. d e/ s ite/. intrusion detection. In: International conference on computer and
Accessed 20 Oct 2019. computing technologies in agriculture. Springer; 2010. p. 87–95.
81. Zago M, Pérez MG, Pérez GM. Umudga: a dataset for profil- 102. Li Y, Xia J, Zhang S, Yan J, Ai X, Dai K. An efficient intrusion
ing algorithmically generated domain names in botnet detection. detection system based on support vector machines and gradually
Data in Brief. 2020. p. 105400. feature removal method. Expert Syst Appl. 2012;39(1):424–30.
82. Zhou Y, Jiang X. Dissecting android malware: characterization 103. Raman MRG, Somu N, Jagarapu S, Manghnani T, Selvam
and evolution. In: 2012 IEEE symposium on security and pri- T, Krithivasan K, Sriram VSS. An efficient intrusion detec-
vacy. IEEE; 2012. p. 95–109. tion technique based on support vector machine and improved
83. Virusshare. http://virusshare.com/. Accessed 20 Oct 2019. binary gravitational search algorithm. Artif Intell Rev.
84. Virustotal. https://virustotal.com/. Accessed 20 Oct 2019. 2020;53:3255–3286.

SN Computer Science
SN Computer Science (2021) 2:173 Page 17 of 18 173

104. Kokila RT, Thamarai Selvi S, Govindarajan K. Ddos detection 123. Alrawashdeh K, Purdy C. Toward an online anomaly intrusion
and analysis in sdn-based environment using support vector detection system based on deep learning. In: 2016 15th IEEE
machine classifier. In: 2014 sixth international conference on international conference on machine learning and applications
advanced computing (ICoAC). IEEE; 2014. p. 205–210. (ICMLA). IEEE; 2016. p. 195–200.
105. Xie M, Hu J, Slay J. Evaluating host-based anomaly detection 124. Yin C, Zhu Y, Fei J, He X. A deep learning approach for intru-
systems: application of the one-class svm algorithm to adfa-ld. sion detection using recurrent neural networks. IEEE Access.
In: 2014 11th international conference on fuzzy systems and 2017;5:21954–61.
knowledge discovery (FSKD). IEEE; 2014. p. 978–982. 125. Kim J, Kim J, Thi Thu HL, Kim H. Long short term memory
106. Saxena H, Richariya V. Intrusion detection in kdd99 dataset using recurrent neural network classifier for intrusion detection. In:
svm-pso and feature reduction with information gain. Int J Com- 2016 international conference on platform technology and ser-
put Appl. 2014;98(6).25–29. vice (PlatCon). IEEE; 2016. p. 1–5.
107. Shapoorifard H, Shamsinejad P. Intrusion detection using a novel 126. Almiani M, AbuGhazleh A, Al-Rahayfeh A, Atiewi S, Razaque
hybrid method incorporating an improved knn. Int J Comput A. Deep recurrent neural network for iot intrusion detection sys-
Appl. 2017;173(1):5–9. tem. Simul Model Pract Theory. 2019;101:102031.
108. Vishwakarma S, Sharma V, Tiwari A. An intrusion detec- 127. Kolosnjaji B, Zarras A, Webster G, Eckert C. Deep learning for
tion system using knn-aco algorithm. Int J Comput Appl. classification of malware system call sequences. In: Australa-
2017;171(10):18–23. sian joint conference on artificial intelligence. Springer; 2016.
109. Meng W, Li W, Kwok L-F. Design of intelligent knn-based alarm p. 137–149.
filter using knowledge-based alert verification in intrusion detec- 128. Wang W, Zhu M, Zeng X, Ye X, Sheng Y. Malware traffic clas-
tion. Secur Commun Netw. 2015;8(18):3883–95. sification using convolutional neural network for representation
110. Dada EG. A hybridized svm-knn-pdapso approach to intrusion learning. In: 2017 international conference on information net-
detection system. In: Proceedings of Facility Seminar Ser. 2017. working (ICOIN). IEEE; 2017. p. 712–717.
p. 14–21. 129. Hansen JV, Lowry PB, Meservy RD, McDonald DM. Genetic
111. Koc L, Mazzuchi TA, Sarkani S. A network intrusion detec- programming for prevention of cyberterrorism through
tion system based on a hidden Naïve Bayes multiclass classifier. dynamic and evolving intrusion detection. Decis Support Syst.
Expert Syst Appl. 2012;39(18):13492–500. 2007;43(4):1362–74.
112. Moon D, Im H, Kim I, Park JH. Dtb-ids: an intrusion detection 130. Aslahi-Shahri BM, Rahmani R, Chizari M, Maralani A, Eslami
system based on decision tree using behavior analysis for pre- M, Golkar MJ, Ebrahimi A. A hybrid method consisting of GA
venting apt attacks. J Supercomput. 2017;73(7):2881–95. and SVM for intrusion detection system. Neural Comput Appl.
113. Ingre B, Yadav A, Soni AK. Decision tree based intrusion detec- 2016;27(6):1669–76.
tion system for nsl-kdd dataset. In: International conference on 131. Azad C, Jha VK. Genetic algorithm to solve the problem of
information and communication technology for intelligent sys- small disjunct in the decision tree based intrusion detection
tems. Springer; 2017. p. 207–218. system. Int J Comput Netw Inf Secur (IJCNIS). 2015;7(8):56.
114. Malik AJ, Khan FA. A hybrid technique using binary particle 132. Ariu D, Tronci R, Giacinto G. Hmmpayl: an intrusion detec-
swarm optimization and decision tree pruning for network intru- tion system based on hidden Markov models. Comput Secur.
sion detection. Cluster Comput. 2018;21(1):667–80. 2011;30(4):221–41.
115. Relan NG, Patil DR. Implementation of network intrusion detec- 133. Årnes A, Valeur F, Vigna G, Kemmerer RA. Using hidden
tion system using variant of decision tree algorithm. In: 2015 markov models to evaluate the risks of intrusions. In: Inter-
international conference on nascent technologies in the engineer- national workshop on recent advances in intrusion detection.
ing field (ICNTE). IEEE; 2015. p. 1–5. Springer; 2006. p. 145–164.
116. Rai K, Syamala Devi M, Guleria A. Decision tree based 134. Alauthman M, Aslam N, Al-kasassbeh M, Khan S, Al-
algorithm for intrusion detection. Int J Adv Netw Appl. Qerem A, Choo K-KR. An efficient reinforcement learn-
2016;7(4):2828. ing-based botnet detection approach. J Netw Comput Appl.
117. Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a 2020;150:102479.
machine learning based cyber security intrusion detection model. 135. Blanco R, Cilla JJ, Briongos S, Malagón P, Moya JM. Applying
Symmetry. 2020;12(5):754. cost-sensitive classifiers with reinforcement learning to ids. In:
118. Puthran S, Shah K. Intrusion detection using improved decision International conference on intelligent data engineering and
tree algorithm with binary and quad split. In: International sym- automated learning. Springer; 2018. p. 531–538.
posium on security in computing and communication. Springer; 136. Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Applica-
2016. p. 427–438. tion of deep reinforcement learning to intrusion detection for
119. Balogun AO, Jimoh RG. Anomaly intrusion detection using an supervised problems. Expert Syst Appl. 2020;141:112963.
hybrid of decision tree and k-nearest neighbor. In: A Multidisci- 137. Sarker IH. Machine learning: Algorithms, real-world
plinary Journal Publication of the Faculty of Science, Adeleke applications and research directions. Preprints. 2021;
University, Ede, Nigeria, 2015; vol 2. 2021030216:1–23.
120. Jo S, Sung H, Ahn B. A comparative study on the perfor- 138. Sarker IH, Kayes ASM, Watters P. Effectiveness analy-
mance of intrusion detection using decision tree and artificial sis of machine learning classification models for predicting
neural network models. J Korea Soc Digit Ind Inf Manag. personalized context-aware smartphone usage. J Big Data.
2015;11(4):33–45. 2019;6(1):1–28.
121. Zhang J, Zulkernine M, Haque A. Random-forests-based network 139. John GH, Langley P. Estimating continuous distributions in
intrusion detection systems. IEEE Trans Syst Man Cybern Part Bayesian classifiers. In: Proceedings of the eleventh conference
C (Appl Rev). 2008;38(5):649–59. on uncertainty in artificial intelligence. Morgan Kaufmann
122. Yuan Y, Kaklamanos G, Hogrefe D. A novel semi-supervised Publishers Inc.; 1995. p. 338–345.
adaboost technique for network anomaly detection. In: Proceed- 140. Quinlan JR. C4.5: Programs for machine learning. Mach Learn.
ings of the 19th ACM international conference on modeling, 2014.
analysis and simulation of wireless and mobile systems. ACM; 141. Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah
2016. p. 111–114. K. Behavdt: a behavioral decision tree learning to build

SN Computer Science
173 Page 18 of 18 SN Computer Science (2021) 2:173

user-centric context-aware predictive model. Mob Netw Appl. 166. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning:
2020;25:1151–1161. a survey. J Artif Intell Res. 1996;4:237–85.
142. Aha DW, Kibler D, Albert MK. Instance-based learning algo- 167. Sarker IH. Deep cybersecurity: A comprehensive overview from
rithms. Mach Learn. 1991;6(1):37–66. neural network and deep learning perspective. Preprints. 2021;
143. Keerthi SS, Shevade SK, Bhattacharyya C, Krishna Murthy KR. 2021020340:1–18.
Improvements to platt’s smo algorithm for svm classifier design. 168. Sarker IH, Hoque MM, Uddin K et al. Mobile data science and
Neural Comput. 2001;13(3):637–49. intelligent apps: concepts, ai-based modeling and research direc-
144. Freund Y, Schapire RE, et al. Experiments with a new boosting tions. Mob Netw Appl. 2020;1–19.
algorithm. In: Icml, vol. 96. Citeseer; 1996. p. 148–156. 169. Kidmose E, Stevanovic M, Pedersen JM. Detection of malicious
145. Le Cessie S, Van Houwelingen JC. Ridge estimators in logistic domains through lexical analysis. In: 2018 international confer-
regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201. ence on cyber security and protection of digital services (cyber
146. Han J, Pei J, Kamber M. Data mining: concepts and techniques. security). IEEE; 2018. p. 1–5.
2011. 170. Perera I, Hwang J, Bayas K, Dorr B, Wilks Y. Cyberattack pre-
147. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, diction through public text analysis and mini-theories. In: 2018
Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. IEEE international conference on big data (big data). IEEE;
Scikit-learn: machine learning in python. J Mach Learn Res. 2018. p. 3001–3010.
2011;12:2825–30. 171. L’Huillier G, Hevia A, Weber R, Rios S. Latent semantic analysis
148. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. and keyword extraction for phishing classification. In: 2010 IEEE
149. MacQueen J. Some methods for classification and analysis of international conference on intelligence and security informatics.
multivariate observations. In: Fifth Berkeley symposium on IEEE; 2010. p. 129–131.
mathematical statistics and probability, vol. 1. 1967. 172. Georgescu T-M, Iancu B, Zurini M. Named-entity-recognition-
150. Rokach L. A survey of clustering algorithms. In: Data mining and based automated system for diagnosing cybersecurity situations
knowledge discovery handbook. Springer; 2010. p. 269–298. in iot networks. Sensors. 2019;19(15):3380.
151. Kaufman L, Rousseeuw PJ. Finding groups in data: an introduc- 173. Sun S, Luo C, Chen J. A review of natural language pro-
tion to cluster analysis, vol. 344. New York: Wiley; 2009. cessing techniques for opinion mining systems. Inf Fusion.
152. Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based 2017;36:10–25.
algorithm for discovering clusters in large spatial databases with 174. Mokhov SA, Paquet J, Debbabi M. The use of nlp techniques in
noise. Kdd. 1996;96:226–31. static code analysis to detect weaknesses and vulnerabilities. In:
153. Sneath PHA. The application of computers to taxonomy. J Gen Canadian conference on artificial intelligence. Springer; 2014. p.
Microbiol. 1957;17(1):201–26. 326–332.
154. Sorensen T. Method of establishing groups of equal amplitude 175. Egozi G, Verma R. Phishing email detection using robust nlp
in plant sociology based on similarity of species. Biol Skr. techniques. In: 2018 IEEE international conference on data min-
1948;5:1–34. ing workshops (ICDMW). IEEE; 2018. p. 7–12.
155. Sarker IH, Colman A, Kabir MA, Han J. Individualized time- 176. Karbab EB, Debbabi M. Maldy: portable, data-driven malware
series segmentation for mining mobile phone user behavior. detection using natural language processing and machine learn-
Comput J. 2018;61(3):349–68. ing techniques on behavioral analysis reports. Digit Investig.
156. Agrawal R, Imieliński T, Swami A. Mining association rules 2019;28:S77–87.
between sets of items in large databases. In: ACM SIGMOD 177. Stephan G, Pascal H, Andreas A. Knowledge representation and
Record, vol. 22. ACM; 1993. p. 207–216. ontologies. Semantic web services: concepts, technologies, and
157. Agrawal R, Srikant R, et al. Fast algorithms for mining associa- applications. 2007. p. 51–105.
tion rules. In: Proceedings of 20th international conference very 178. Maedche A, Staab S. Ontology learning for the semantic web.
large data bases, VLDB, vol. 1215. 1994. p. 487–499. IEEE Intell Syst. 2001;16(2):72–9.
158. Han J, Pei J, Yin Y. Mining frequent patterns without candidate 179. Pereira T, Santos H. An ontology based approach to informa-
generation. In: ACM Sigmod Record, vol. 29. ACM; 2000. p. tion security. In: Research conference on metadata and semantic
1–12. research. Springer; 2009. p. 183–192.
159. Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: 180. McGuinness DL, Van Harmelen F, et al. Owl web ontology lan-
Proceedings of the tenth international conference on Information guage overview. W3C Recomm. 2004;10(10):2004.
and knowledge management. ACM; 2001. p. 474–481. 181. Witten IH, Frank E. Data mining: practical machine learning
160. Zaki MJ. Scalable algorithms for association mining. IEEE Trans tools and techniques. Burlington: Morgan Kaufmann; 2005.
Knowl Data Eng. 2000;12(3):372–90. 182. Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham
161. Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule- SJ. Weka: practical machine learning tools and techniques with
based machine learning method for context-aware intelligent java implementations. 1999.
services. J Netw Comput Appl. 2020;168:102762. 183. Zadeh LA. Fuzzy logic—a personal perspective. Fuzzy Sets Syst.
162. Sarker IH, Abushark YB, Khan AI. Contextpca: predicting con- 2015;281:4–20.
text-aware smartphone apps usage based on machine learning 184. Sarker IH. A machine learning based robust prediction model for
techniques. Symmetry. 2020;12(4):499. real-life mobile phone data. Internet Things. 2019;5:180–93.
163. Van Efferen L, Ali-Eldin AMT. A multi-layer perceptron 185. Sarker IH. Context-aware rule learning from smartphone
approach for flow-based anomaly detection. In: 2017 interna- data: survey, challenges and future directions. J Big Data.
tional symposium on networks, computers and communications 2019;6(1):95.
(ISNCC). IEEE; 2017. p. 1–6. 186. Sarker IH, Colman A, Han J. Recencyminer: mining recency-
164. Liu H, Lang B, Liu M, Yan H. Cnn and rnn based payload based personalized behavior from contextual smartphone data. J
classification methods for attack detection. Knowl Based Syst. Big Data. 2019;6(1):49.
2019;163:332–41.
165. Khan FA, Gumaei A, Derhab A, Hussain A. A novel two-stage Publisher’s Note Springer Nature remains neutral with regard to
deep learning model for efficient network intrusion detection. jurisdictional claims in published maps and institutional affiliations.
IEEE Access. 2019;7:30373–85.