0% found this document useful (0 votes)
11 views

Machine Learning and Deep Learning Methods for Intrusion Detection Systems- Recent Developments and Challenges

Uploaded by

electro-ub ub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Machine Learning and Deep Learning Methods for Intrusion Detection Systems- Recent Developments and Challenges

Uploaded by

electro-ub ub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Soft Computing (2021) 25:9731–9763

https://doi.org/10.1007/s00500-021-05893-0

FOUNDATIONS

Machine learning and deep learning methods for intrusion detection


systems: recent developments and challenges
Geeta Kocher1 · Gulshan Kumar2

Accepted: 17 May 2021 / Published online: 24 June 2021


© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021

Abstract
Deep learning (DL) is gaining significant prevalence in every field of study due to its domination in training large data sets.
However, several applications are utilizing machine learning (ML) methods from the past several years and reported good
performance. However, their limitations in terms of data complexity give rise to DL methods. Intrusion detection is one
of the prominent areas in which researchers are extending DL methods. Even though several excellent surveys cover the
growing body of research on this subject, the literature lacks a detailed comparison of ML methods such as ANN, SVM, fuzzy
approach, swarm intelligence and evolutionary computation methods in intrusion detection, particularly on recent research.
In this context, the present paper deals with the systematic review of ML methods and DL methods in intrusion detection. In
addition to reviewing ML and DL methods, this paper also focuses on benchmark datasets, performance evaluation measures
and various applications of DL methods for intrusion detection. The present paper summarizes the recent work, compares
their experimental results for detecting network intrusions. Furthermore, current research challenges are identified for helping
fellow researchers in the era of DL-based intrusion detection.

Keywords Intrusion detection system · Deep learning · Deep belief network · Recurrent neural network · Network intrusion
detection system

1 Introduction 2019). Therefore network security has become an important


topic. The conventional methods like firewalls, encryption
With the growth of the digital world, Internet has become and anti-virus software packages adopted by organizations
an integral part of our lives. The dependence on Internet is play a significant role in securing network infrastructure.
growing day by day with the development of smart cities, Still, these methods provide the first level of defence and
autonomous cars, health monitoring via smartwatches and cannot completely protect the networks and systems from
mobile banking etc. (Ziegler 2019; Taddeo et al. 2019; Ser- progressive attacks and malware (Srinivas et al. 2019; Kan-
rano 2019). Although these technologies bring in many dan et al. 2019). As a result, some intruders still manage to
benefits to the users and society in general, they also pose penetrate, resulting in a breach.
several risks. Hackers can exploit the vulnerabilities resulting Organizations use intrusion detection systems (IDSs),
in theft and sabotage, affecting the lives of people globally. which Denning proposed in 1987, as an additional secu-
Figure 1 illustrates the most frequent targeted cyber warfare rity technique for securing their networks (Pradhan et al.
attacks between 2009 and 2019 (geopolitical-attacks 2019). 2020). The research efforts of Denning have given directions
Cyberattacks can be costly for businesses next to finan- to construct detection models effectively and accurately. In
cial loss; it also leads to loss of reputation (Ghose et al. literature, IDS methods are mainly classified as Knowledge-
based, Statistical and ML methods (Kumar et al. 2010) as
B Gulshan Kumar discussed in Sect. 2.2. Artificial intelligence (AI) and ML
[email protected] methods determine the models from the training dataset
(Arrieta et al. 2020).
1 Maharaja Ranjit Singh Punjab Technical University, These ML methods have shown excellence to achieve high
Bathinda, Punjab, India
detection accuracy. Still, there are some limitations of ML
2 Shaheed Bhagat Singh State Technical Campus, Ferozepur, methods like handling raw, unlabeled or high dimensional
Punjab, India

123
9732 G. Kocher, G. Kumar

Fig. 1 Most frequent targeted cyber warfare attacks between 2009 and 2019 (geopolitical-attacks 2019)

data (Nguyen and Reddi 2019), degrades accuracy in case intrusion detection are discussed in Sect. 7. Finally, a con-
of a large dataset, manual feature extraction, requires expen- clusion is drawn in Sect. 8 at the end of this paper. To meet
sive data labelling, time-consuming and tedious task, unable this paper’s objectives, we attempt to answer the following
to detect multi-classification attacks (Alzaylaee et al. 2020; research questions, given in Table 1.
Meng et al. 2020). To combat these limitations, deep learn-
ing (DL)-based methods emerged in 2006. Fortunately, DL
methods, known for their abilities to handle labelled or unla- 2 Background
belled data or solve complex problems with the help of the
high powered GPU (Nguyen and Reddi 2019). This section introduces intrusion detection systems followed
To simplify the use of ML and DL methods in intrusion by a taxonomy of IDS. The main motive of this section is to
detection, it is necessary to understand IDS, standard bench- give an overview of the IDS and its taxonomy.
mark datasets, ML methods, their challenges and the reasons
behind the evolution of DL methods (Nguyen and Reddi
2019; Chaabouni et al. 2019). The summarized review of 2.1 Intrusion detection systems
ML/DL methods helps the researchers explore their advan-
tages and disadvantages in IDS. IDS can be a hardware or software system that is used to
This paper has a dual objective. The first objective is to detect suspicious activity in the network. Monitoring the net-
present a survey of recent contributions to ML and DL meth- work, finding breaches and reporting to the administrator are
ods. The second objective is to explore the reasons behind some of the main functions performed by IDS (Vinayakumar
the evolution of DL methods for intrusion detection. et al. 2019; Almomani et al. 2020). Advanced IDS can also
The review paper is organized into different sections. take actions when malicious activities are found like block-
Section 2 discusses IDSs and their taxonomy. Section 3 ing the traffic from the source IP Address (Vinayakumar et al.
describes various benchmark datasets and performance eval- 2019; Chevalier et al. 2020). IDSs can be divided based on
uation measures of IDS. ML methods used for intrusion different criteria like the technology used, the response of
detection are discussed in Sect. 5. Section 6 introduces DL- IDSs etc. IDSs can be classified into three categories based
based intrusion detection. The crucial challenges for accurate on the methodology used in intrusion detection, IDS’ reac-
tion, and IDS’ architecture as depicted in Fig. 2.

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9733

Table 1 Description of research


RQ# Research questions Motivation
questions arises
RQ1 What is IDS and its taxonomy? Familiarization of IDS along with
its taxonomy
RQ2 What are benchmark datasets used Knowledge of various benchmark
for intrusion detection? datasets used for intrusion
detection
RQ3 What are most commonly used Evaluation of existing IDS using
performance metrics for performance metrics
evaluating IDSs?
RQ4 What are promising ML methods Identification of ML methods for
for accurate intrusion detection? intrusion detection
RQ5 What are challenges of traditional Finding the reasons for transition
ML based IDSs? from ML to DL methods
RQ6 What are the primary advantages Determining the information about
and disadvantages of DL DL methods in the field of IDS
methods in the field of IDS?
RQ7 What are the various challenges of Identifying the limitations of
DL methods for intrusion existing DL methods
detection?

Fig. 2 Classification of IDS

2.1.1 IDSs classification based on detection method In contrast, anomaly detection deals with profiling user
behaviour. In this approach, a particular model of regular
IDS may be categorized into two classes, namely, misuse and user activity is defined, and any deviation from this model
anomaly detection. Misuse detection operates with prede- is known as anomalous. Anomaly detection methods can be
fined patterns of known attacks, also called signatures. It can further categorized into static and dynamic methods (Kur-
be further divided into stateless and state-full IDS (Pandey niabudi et al. 2019). The static anomaly detection method
et al. 2019). State-less methods use only existing signature, works only on the fixed part of the system. The dynamic
whereas state-full methods also use previous signatures and anomaly detection method extracts patterns (also known as
existing signatures (Pandey et al. 2019). “profiles”) from network usage history. This method can
This approach provides high accuracy and low false alarm detect novel attacks but may lead to high FAR and lacks high
rates for known attacks but is not practical for detecting novel accuracy (Kurniabudi et al. 2019; Mäkelä 2019). Another
attacks. One of the known solutions to address this problem is drawback of this system is that an attacker can slowly change
regularly updating the database, which is a time-consuming its behaviour from abnormal to normal when he feels that he
and costly process. Hence it is not considered as feasible is being profiled. Researchers have also suggested hybrid
(Kurniabudi et al. 2019).

123
9734 G. Kocher, G. Kumar

approaches in the recent past for further improvement in Wanjala 2018). Signature-based IDS can easily detect known
intrusion detection (Guo et al. 2016; Kim et al. 2014). attacks, but it fails to work for new attacks where patterns are
not known or not updated in the database. Regular updates of
2.1.2 IDSs classification based on reaction/response patterns in the database can deal with this issue. But when the
method user uses advanced technologies in mounting attacks like no
operation (NOP) generators, payload encoders and encrypted
IDS can be classified into passive and active IDSs (Tidjon data channels, signature-based detection does not work well.
et al. 2019; Aljumah 2017) based on type of its response. Its efficiency decreases significantly with creating a new sig-
Passive IDS is set up to only monitor and inform adminis- nature for every variation (Rao and Raju 2019). Also, with
trator about the intrusions by generating alerts. In contrast, the increase in the number of signatures, the performance
an active IDS can act in real-time by blocking the suspected of the system engine decreases. The failure to detect novel
attack/intrusion (Tidjon et al. 2019; Kim et al. 2014). attacks and update the database for new patterns regularly
are the causes to work in the field of anomaly detection IDS
2.1.3 IDSs classification based on architecture (Rao and Raju 2019; Kang and Kang 2016).

IDSs can be divided into three categories based on their 2.2.2 Anomaly based IDS
architecture, viz., host, network and hybrid IDSs (i.e. a mix
of host and network). In a host-based IDS, an agent/sensor Anomaly-based IDS detect both network and computer intru-
is installed on each computer system involved (Feng et al. sions by monitoring the system. After monitoring, instead of
2019). It identifies intrusions by analyzing application logs, patterns or signatures, it uses heuristics/rules to classify the
audit trails, system calls and other activities within the host. In events as either normal or anomalous and attempts to detect
case of a need to generate additional event information/logs, abnormal operation (Farzaneh et al. 2019; Worku 2019).
there is a dependency on the developer to modify the operat- Anomaly detection methods can detect novel attacks but
ing system kernel code. This approach increases cost which defining its ruleset is a cumbersome task. Anomaly-based
might be unacceptable for some customers (Arabo 2019). IDS are further classified into three classes: knowledge-
Also, deployment of the agent across all computer systems based, statistical, and machine learning methods as depicted
can be cumbersome. in Fig. 3 described below.
In the network-based system, IDS is installed on the server. Statistical anomaly IDSs were used for detecting intru-
The sensors are deployed to identify intrusions by monitor- sions in information systems earlier. Statistical tests were
ing network traffic across multiple hosts (Chevalier et al. performed to check whether the observed behaviour is differ-
2020). They are independent of the operating system, are ent from the expected behaviour. For statistical approaches,
highly portable and easy to implement. However, it shows previous knowledge and frequent updates of the signatures
limitations when high peaks in network traffic or high-speed are not required. It can detect low and slow attacks, espe-
data are involved. In a hybrid system, IDS is required on the cially DoS attacks. The statistical approach’s limitation is the
server as well as on each client. It combines host and network long lead time involved in learning to deliver accurate and
approaches and is considered as the most effective and log- valuable results. The most commonly used methods in this
ical approach for intrusion detection (Chevalier et al. 2020; category include Markov method, deviation method, multi-
Kurniabudi et al. 2019). variate method, and time series method.
Knowledge-based IDS works by gathering knowledge
2.2 IDS taxonomy about specific attacks and system vulnerabilities (Hussain
and Khan 2020). They work by looking into its knowledge
Figure 3 shows the proposed IDS taxonomy of IDS as base to identify an attack. Expert System, Petri Nets, Signa-
per literature analysis. As mentioned in Sect. 2.1, Intru- ture Analysis and State Transition are the various examples of
sion detection methods are divided into the anomaly and knowledge-based IDS. The accuracy rate of results produced
signature-based methods. using these methods is high, with a low false alarm rate. To
keep knowledge-based IDS effective, attack data needs to be
2.2.1 Signature based IDS updated regularly. The updation of regular data is very time-
consuming, which is the main limitation of knowledge-based
Organizations use Signature-based IDS to protect themselves IDS (Hussain and Khan 2020).
from various known attacks whose signatures are available Machine learning is a large field of study that over-
in the database. This IDS search audited pattern against a laps with and inherits ideas from many related fields such
series of malicious bytes/known patterns. Signature-based as artificial intelligence. The focus of the field is learn-
IDS communicate the cause of intrusion alert (Jacob and ing, that is, acquiring skills or knowledge from experience.

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9735

Fig. 3 IDS taxonomy

Most commonly, this means synthesizing practical con- It facilitates the designers to construct realistic behaviour
cepts from historical data. Nowadays, most researchers focus profiles to test a proposed system for attackers and regular
on ML methods due to its built-in properties like robust- users. It provides initial validation of a particular method; if
ness, resilience to noisy data and adaptability. Interested the outcome proves satisfactory, the developers then continue
researchers can further explore the topic at (Lin et al. 2015; to evaluate a method in a specific field of real-life data. The
Liao et al. 2013; Aissa and Guerroumi 2016). ML methods shortcomings of other datasets are the reasons for the origin
proposed for intrusion detection are depicted in Fig. 3 and of self-produced datasets. This results in the generation of
explained in Sect. 5. artificial data and merger into training sets.

3 Intrusion detection datasets 3.2 Benchmark datasets


Several benchmark datasets have been designed to evaluate
Benchmark datasets include datasets like DARPA 98 KDD
and compare the performance of IDSs. This section focuses
Cup99 (Uci 2019), NSL-KDD (Unb 2019a), DEFCON
on the most commonly used datasets for intrusion detection.
2000/2002 dataset (Sharafaldin et al. 2018), UNM dataset,
Training and evaluating IDS need data. So, data is gathered
CAIDA2002/2016 datasets, LBNL dataset (Gharib et al.
from different sources like network data packets, low-level
2016), CDX 2009 (Sangster et al. 2009) dataset, Twente
system information like log files or system dumps etc which
2009 (Sperotto et al. 2009),UMASS 2011, ISCX 2012 (Unb
is used as benchmark dataset. Datasets are categorized into
2019b), ADFA 2013 (Creech and Hu 2013)and CSE-CIC-
three types: Synthetic or Self-produced, Benchmark and
2018 dataset. These datasets have been used commonly for
Real-life datasets, as shown in Fig. 4.
evaluating IDSs in literature. Among these DARPA 98, KDD
Cup99 and NSL-KDD are the most common ones used for
3.1 Synthetic datasets the evaluation of IDS shown in Table 2.
DARPA 98, the base dataset of KDD 99 dataset contained
Synthetic datasets are used to fulfil particular demands or raw TCP/ IP dump files. This dataset contained 38 attacks.
conditions in evaluating IDS. These datasets are used for The training size of dataset was 6.2 Gb and testing Size was
designing any model for theoretical analysis. These designs 3.67 Gb. The training and testing of data was done for seven
can be refined to test and create various types of test scenarios. weeks and two weeks respectively for this purpose.

123
9736 G. Kocher, G. Kumar

Fig. 4 Benchmark intrusion


detection datasets

Table 2 Information regarding Darpa 98, KDD99 and NSL-KDD datasets


Name of dataset DARPA 98 (base dataset) KDD99 NSL-KDD

Training size 6,591,458 Kb (6.2 Gb) 4,898,431 125,973


Testing size 3,853,522 Kb (3.67 Gb) 311,029 22,544
Note Raw TCP/IP dump files Features extracted and preprocessed for ML Reduced size by removing duplicates

In 1999, DARPA 98 dataset was summarized with 41- methods available in the WEKA tool is described in Revathi
features which is known as KDD 99 benchmark dataset and Malathi (2013). To train and test several novels and
for intrusion detection. KDD 99 dataset covered Probing existing attacks, NSL-KDD dataset was used by K-means
attacks, DoS attacks, U2R attacks and R2L attacks. KDD clustering algorithm (Kumar et al. 2013). In Sanjaya and Jena
dataset was divided into labeled and unlabeled containing (2014) the comparative study on the KDD99 data set with
4,898,431 records and 311,029 records respectively. The NSL-KDD dataset was done using ANN and SOM. The ML
various types of attacks available in KDD99 dataset are algorithms are used to analyse various datasets like KDD99,
described in Table 3. NSL-KDD and GureKDD (Sanjaya and Jena 2014). The var-
Here the training size and testing size of attacks U2R and ious types of attacks in NSL-KDD dataset are described in
R2L was very small. This dataset contains huge number of Table 5.
redundant records as shown in Table 4. Improvements in KDD’99 dataset (Unb 2019a) The key
The shortcomings of KDD99’s related to IDS are well advantages of NSL-KDD data set over the original KDD data
documented in literature (Brugger and Chow 2007; Mahoney set are:
and Chan 2003; Sommer and Paxson 2010).
The NSL-KDD dataset is a refined version of the KDD’99.
(Tavallaee et al. 2009). Most researchers have applied differ- 1. The classifiers will not be biased toward frequent records
ent methods and tools on NSL-KDD dataset to build effective due to not inclusion of redundant records in the training
IDS. The NSL-KDD dataset’s analysis using various ML set.
2. The performance of the learners is not biased.

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9737

Table 3 KDD 99 dataset


Attacks Normal DOS Probe U2R R2L Total
distribution
Training size 972,781 3,883,390 41,102 52 1106 4,898,431
%age 19.85 79.27 00.83 00.001 00.02 100
Test size 60,593 231,455 4166 245 14,570 311,029
%age 19.48 74.41 01.33 00.07 04.68 100

Table 4 Statistics of redundant records in KDD cup 99 training and ADFA—2013 KDD and UNM datasets fails to fulfil the
testing datasets present needs of computer technology. ADFA Linux (ADFA-
Attacks Normal Total LD) was proposed as a new dataset by Creech and Hu (2013).
This dataset is used for the evaluation of ML-based IDS. The
Training
attributes of ADFA-LD cybersecurity dataset are challeng-
Original 3,925,650 972,781 4,898,431
ing to understand. Therefore, there is a need to improve its
Distinct 262,178 812,814 1,074,992 attributes for better understanding (Abubakar et al. 2015).
Redundancy (%) 93.32 16.44 78.05 CSE-CIC-2018 Dataset This dataset uses a systematic
Testing approach to generate a benchmark dataset to detect intrusion.
Original 250,436 60,591 311,027 It is based on the creation of user profiles and behaviours seen
Distinct 29,378 47,911 77,289 on the network. This dataset includes seven different attack
Redundancy (%) 88.26 20.92 75.15 scenarios.

3.3 Real life datasets


3. Reasonable number of records in the train and test sets is
available.
This kind of datasets contains real-life data records. It
includes Kyoto 2006/2009, ISCX2012, and UNSW-NB15
Since NSL-KDD is a refined version of KDD-99. So short- datasets.
comings of KDD99 also present in the NSL-KDD dataset. Kyoto 2006/2009 This dataset contains 14 statistical fea-
Some of the other benchmark datasets depicted in Fig. 4 are tures (Kyoto2006+ 2015; Song et al. 2011) derived from
explained here. KDD Cup99 dataset ignoring redundant features. Besides,
LBNL (Lawrence Berkeley National Laboratory and it also includes ten features for better evaluation and analysis
ICSI—2004/2005) This dataset contains normal user behaviour. of NIDS. It tries to overcome the limitations of KDD Cup99
It is not labelled and suffers from heavy anonymization dataset. In this dataset, only those attacks are directed at the
(Gharib et al. 2016). honey-pots, so it provides a limited view of network traffic.
UNM UNM dataset was proposed in 2004 and unable to The normal traffic used for simulation during the attacks does
fulfil current computer technology trends. not represent normal traffic from the real world. There are no
CDX 2009 This dataset was created during network wel- false positives, which are essential for reducing the number
fare competition to generate a labelled dataset. Attackers used of alerts (Song et al. 2011; Sato et al. 2012; Chitrakar and
various tools like Web Scarab, Nikto, and Nessus to investi- Huang 2012). A comparison of various datasets is given in
gate and detect attacks automatically.IDS alert rules can also Al-Dhafian et al. (2015) this paper.
be tested by it. Volume and lack of traffic diversity are the ISCX2012 The dynamic approach was used to generate
limitations of this dataset (Sangster et al. 2009). this dataset. To generate realistic and practical evaluation
Twente—2009 It is a labelled and more realistic dataset. datasets for IDS, the author presents good guidelines. Alpha
It captured data from a honey pot network and includes and beta profiles are the two parts of this approach. This
few unknown and uncorrelated alerts traffic. It also includes dataset comprises of relevant profiles and network traces.
OpenSSH, Apache web server and Proftp using auth/ident New network protocols are not considered in this dataset
on port 113. It suffers from diversity of attacks and lack of (Shiravi et al. 2012).
volume (Sperotto et al. 2009). UNSW-NB15 TCP-dump tool was used to capture raw
UMASS—2011 This dataset includes various trace files traffic. It was used for academic research purpose and con-
both from wireless applications and network packets (U. of tained a hybrid of normal activities and attack behaviours. To
massachusetts amherst 2019; Nehinbe 2011).It lacks a vari- generate this dataset, twelve algorithms and tools were used.
ety of traffic and attacks. This limitation is not beneficial for Based on the literature, it is concluded that different
testing IDS methods (Prusty et al. 2011). researchers used different datasets as per their requirements.

123
9738 G. Kocher, G. Kumar

Table 5 Distribution of instance


Attacks Normal DOS Probe U2R R2L Total
in NSL-KDD dataset
Training size 67,343 45,927 11,656 52 995 125,973
%age 53.458 36.458 9.253 0.041 0.790 100
Test size 9711 7456 2421 200 2756 22,544
%age 43.076 33.073 10.739 0.887 12.225 100

suitable for comparing IDS. To solve this problem, different


performance metrics are described with the help of confu-
sion matrix variables. The performance metrics gives output
in the form of numeric values, which are easy to compare.
The most common metrics are described below.

– Accuracy It describes how much the classifier is correct.


It is the ratio of correct predicted samples to the total
Fig. 5 Confusion matrix number of samples and can be computed as Eq. 1:

(TP + TN)
In the literature, KDD Cup99 and NSL-KDD dataset are (1)
Total Number of Instances
primarily used for evaluating ML-based IDSs. KDD Cup99
dataset does not represent real traffic data. NSL-KDD is the It is a perfect metric for balanced data but diminishes its
refined version of KDD Cup99, but shortcomings of KDD value in the case of imbalanced data.
Cup99 also present in NSL-KDD dataset. Both of these – Detection rate (DR) It is also known as Sensitivity/Recall.
datasets are very old. So, there is a need to use more than It refers to the percentage of actual attacks correctly iden-
one datasets to validate the performance of IDS (Table 6). tified by the system and can be expressed as:

TP
(2)
(TP + FN)
4 Performance metrics
It provides information on the classifier’s performance
IDS effectiveness can be judged by performance evaluation concerning false negatives.
in terms of metrics. It can be evaluated based on different
metrics computing using confusion matrix described below. FP
(3)
(TN + FP)
4.1 Confusion matrix
– Specificity It measures the proportion of negatives that
are correctly identified by the system. This performance
Confusion matrix often used to describe the performance of
metric can be calculated with the help of Eq. 4:
classification models. It summarizes performance of a clas-
sification algorithm by giving predicted result. It contains TN
information regarding different combinations of actual and (4)
(TN + FP)
predicted classifications as shown in Fig. 5.
There are four components in confusion matrix True Pos- – False alarm rate (FAR) The ratio of false-negative sam-
itives (TP), False Positives (FP), True Negatives (FN) and ples to total positive samples is known as FAR and can
True Negative (TN). TP means the actual class and the pre- be calculated by Eq. 5:
dicted class of data points both are 1 (true). It represents
the attacks that the IDS successfully detects. FP refers to FN
(5)
the normal behaviour being wrongly classified as attacks by (TP + FN)
IDS. FN means 0 (false) attack events that are missed by
– Precision It is an important metric and tells what percent-
the IDS incorrectly classified as normal events 1 (true), and
age of our true precision is true. It helps to evaluate the
TN refers to the actual class and the predicted class of data
model better and can be calculated with the help of Eq. 6:
points both are 0 (false). FP is referred to as Type I error
and FN is referred to as Type II error. Confusion matrix is TP
a powerful tool in classification, but its performance is not (6)
(TP + FP)

123
Table 6 Comparison of different datasets and their description
Dataset Total instances Attack type Total Features Ref Advantages Disadvantages

KDD-Cup 99 5,000,000 Normal ,DoS, Probe, U2R, 41 (Uci 2019) Easily available Imbalanced, not included modern
Imbalanced R2L attacks
classes
NSL-KDD Training Normal, DoS,Probe, U2R, 41 (Unb 2019a) Removes redundancy, eliminating Does not represent the modern low
set = 489,431 R2L the unbalancing problem in foot print attack scenarios
Test training and testing dataset
set = 311,027
Kyoto 2006+ Over all Multiple 24 (Kyoto 2019) Includes statistical features such as Not effective on the hybrid features
traffic (2006- source byte and average count of the latest honey pots data sets
2009) along with 10 additional
features for IDS
ISCX 2012 Training Normal Attack 9 (Unb 2019b) Allow dynamic attacks for the Not able to identify the
Dataset = 9 hybrid approach i.e Dos and SSH characteristics based network
Testing brute force errors
Dataset = 9
CICIDS2017 Contains total Brute Force FTP, Brute 80 (Unb 2019c) Contains traffic based on Unusable in case of application
5 days data, i.e. Force SSH, DoS, bidirectional flow-based format layeror modern Dos attacks
Monday to Heartbleed, Web Attack, and packet-based with additional
Friday Infiltration, Botnet and 80 attributes
DDoS
Machine learning and deep learning methods for intrusion detection systems: recent…

DEFCON Contain only Port Scan, BufferFlow – (Sharafaldin et al. 2018) Significant to avoid network It doesn’t function for the normal
attack traffic attacks interruption. Performed better background traffic as opposed to
during DEF- as compared to CTF and the intrusive traffic
CON MAC-CDC
competition
DARPA Multiple datasets DoS, Probe, U2R, R2L – (Sharafaldin et al. 2018) Mainly useful for web application The data set not deals with the
activities such as sending and noisy data injected artificially and
receiving files through FTP, the benign attack as well
browsing websites, sending and
receiving mails and monitoring
routers
UNSW-NB15 Training Fuzzers Analysis, 49 (Cloudstor 2019) It comprises major categories of Only functional for TCP and
set = 175,341 Backdoors, benign attacks such as worms, UDP connection. It can not
Testing DoS, Exploits, Generic fuzzers, exploits, and DoS. handle a high number of DNS at
set = 82,332 Useful when testing model a time
against multi-IP addresses

123
9739
9740 G. Kocher, G. Kumar

– System utilization It means the amount of CPU and mem- 5.1 Artificial neural networks (ANN)
ory utilization required for IDS.
ANNs are designed based on biological neural networks.
They learn from examples and generalize from noisy and
incomplete data to perform tasks. The original aim of the
4.2 Receiver operating characteristic (ROC) ANN approach was to solve problems similar to the human
brain. Such systems are successfully employed for data-
ROC analysis is concerned with a field called “Signal Detec- intensive applications. The various types of ANN and their
tion Theory” (Signal detection theory 2019). During World contributions and performances on intrusion detection will
War II, electrical engineers and radar engineers first devel- be discussed in this section. Several ANN designs have been
oped the ROC Curve to detect enemy objects on battlefields. proposed based on different learning strategies as depicted
The performance of different systems can be compared effec- in Fig. 7.
tively with ROC Curves. It is a plot between TPR and FPR
for the different possible cut-points of a diagnostic test. For 5.1.1 Supervised ANN models
many decades it is increasingly used in ML research. ROC
Curve is used to count the detection costs and evaluates var- In this learning, we train ANN model using labelled data and
ious detection learning methods in intrusion detection. DR a new set of examples. ANN model analyzes training data
and FAR are mainly used performance metrics. High DR and and produces a correct outcome from labelled data. Feed-
low FAR is preferred for IDS. forward neural network and recurrent neural network (RNN)
are examples of supervised learning.
Feed forward neural network was the first and most
straightforward type of ANN. In this network, information
is transferred from the input nodes to the hidden nodes and
5 Machine learning methods for IDSs through hidden nodes to the output nodes in only one direc-
tion forward. This type of network does not form a cycle.
ML is a branch of AI that learned or adapted to the new Single Layer Perception (SLP) consists of a single neu-
environment. It allows programs to finds and learns the pat- ron with adjustable weights and bias. It is used to classify
terns within data. It explores various methods, also called linearly separable patterns, and the training in the percep-
ML methods, that can learn from and then make predictions tion continuous until no error occurs. MLP and RBF are
on data. ML methods usually operate based on the features two examples of Feed-Forward ANNs used for modelling
that represent the characteristic of the object. patterns. Static backpropagation is used to train the MLPs
It is an interdisciplinary field that draws on ideas from networks. This network’s advantage is that they are easy to
various disciplines, including mathematics, science, and handle and can approximate any input/output map. The dis-
engineering. Face recognition, which allows users to tag advantages of MLPs are slow training and requiring a lot of
and post images of their friends on social media, Opti- training data.
cal character recognition (OCR), Recommendation engines, Several researchers used ANN in the supervised mode for
Self-driving vehicles, Image recognition, Speech recogni- detecting intrusions. For instance, Gupta et al. (2012) have
tion, Medical diagnosis, Virtual personal assistant, E-Mail used a feedforward neural network for predicting several
spam and malware filtering, Online fraud detection, and sev- zombies involved in flooding DDoS attacks. In this paper, the
eral other problems have been solved with it. relationship between the zombies and sample entropy is iden-
In general, ML is divided into three sub-domains: super- tified. The zombies are predicted involved in a DDoS attack
vised, unsupervised, and reinforcement learning as shown in with significantly less test error. A generalization of the MLP
Fig. 6. over one or more layers is known as Generalized feedfor-
Supervised learning requires labelled data for training ward (GFF) networks. In real life, GFF networks often solve
(both inputs and desired outputs). It discovers the relation- the problem much more efficiently than MLP. Akilandeswari
ship between data and its class, while unsupervised learning and Shalinie (2012) used a Radial Basis Function Neural
is used when labelled data is not available. These methods Network (RBFNN) for classifying DDoS attack traffic and
find the hidden pattern in the data. Reinforcement learning regular traffic. This method achieves the highest accuracy for
is based on a feedback mechanism. Here, computer program DDoS flooding attacks. RNNs are connectionist models that
interacts with the environment and learns by experience. Sev- capture the dynamics of sequences via cycles. It is a sequen-
eral ML methods have been proposed for accurate intrusion tial learning model and learns features from the memory of
detection. The most commonly used methods are summa- previous inputs. It shows promising results in ML tasks when
rized in the following sub-sections. input and output are of variable length.

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9741

Fig. 6 Machine learning methods

Fig. 7 ANN models

123
9742 G. Kocher, G. Kumar

Tong et al. (2009) reported a hybrid RBF/ Elman neu- Many researchers, Aljumah (2017) used ANN to find
ral network model for anomaly and misuse detection. Elman DDos attacks while others (Tong et al. 2009) used a hybrid
network is used to restore past events and RBF network mem- neural network for both misuse and anomaly detection. The
ory as real-time pattern classification. The results show that following points can be concluded on the basis of contribu-
the IDSs using this hybrid neural network improve the Detec- tions given by researchers:
tion Rate(DR) and effectively decrease the false positive rate.
Aljumah (2017) has used a trained ANN algorithm to detect 1. Network data traffic can be filtered and modelled more
various attacks. A mirror image of a real-life environment efficiently using ANN. Always train the ANN with a new
was used for learning. The author got 98% detection accu- dataset instead of the old dataset. Otherwise, it will dis-
racy. The old and up-to-date datasets were used to train the play poor results.
algorithm for further evaluation. This approach is not able to 2. RBF takes less time to train compared to MLP.
handle DDoS attacks. 3. Adhoc approaches are generally used in SOM applica-
tions.
4. SOM has high speed and fast conversion rates as com-
5.1.2 Unsupervised ANN models pared with other learning methods.
5. Hybrid networks are required to improve the DR and
Training of a machine without a teacher is known as unsuper- decrease FPR.
vised learning. It uses information that is neither classified
nor labelled. Kohonen Self Organizing Map and Adaptive 5.2 Support vector machine (SVM)
Resonance Theory (ART) come under the category of unsu-
pervised learning. SOM is used to build a 2D map of a It is a supervised model used for classification, regression
problem space using unsupervised learning. It can generate a and outlier detection. It linearly separates the data based on
visual representation of data on a rectangular grid. Nonlinear- the hyperplane. SVM maps the data into feature space and
ity is the main advantage of SOM networks. It can preserve divides it into classes using a hyperplane with the most sig-
the topological structure of the data. It clusters the samples nificant margin between the classes’ instances. It is a binary
into predefined classes and then orders the classes into mean- classifier that can also do multi-class classification. SVM is
ingful maps. It comprises two layers, i.e. input and output most useful when dealing with nonlinear data.
layer. Several researchers used SVM for detecting intrusions.
Several researchers used ANN in an unsupervised mode For instance, Wang et al. (2017) proposed SVM model
for detecting intrusions. For instance, Chen et al. (1996) for detecting network intrusions. To improve detection
described a multi-layered SOM algorithm, which permit- efficiency, the authors emphasized the importance of high-
ted unlimited layers of Kohonen maps, also called M-SOM. quality training data. The authors proposed an efficient IDS
This algorithm has been tested in many applications like based on enhanced SVMs. To obtain new and better-quality
internet entertainment-related home-pages and electronic SVM detection, they introduced a logarithm marginal density
brainstorming comments. According to Kalteh et al. (2008) ratio transformation(LMDRT). The empirical results showed
SOM applications are based on ad-hoc approaches and fea- practical values such as high DR and good efficiency.
tured by trial and error approaches. They perform better than Wang’s work was expanded by Gu et al. (2019) by intro-
other methods to solve various problems in cases like climate ducing an ensemble-based intrusion detection model based
and environmental issues. on the LMDRT transformation, which also achieves com-
Ibrahim et al. (2013) implemented the SOM to detect petitive intrusion detection outcomes. Kabir et al. (2018)
anomalies on KDD dataset and NSL-KDD dataset. The proposed optimum allocation based most miniature square
author achieved 92.37% attack detection with KDD dataset support vector machine (OA-LS-SVM) based on the idea of
and 75.49% with NSL-KDD dataset. The SOM network’s sampling. This method can handle both static and incremen-
advantage is its high speed and fast conversion rates com- tal data. The suggested technique is explored and validated
pared with other learning methods. using the KDD 99 dataset. In terms of accuracy and per-
Adaptive Resonance Theory (ART): These are self- formance, the proposed method achieves a realistic result.
organizing neural architectures. It clusters the pattern space Similarly, Gu and Lu (2021) proposed an efficient IDS based
and produces appropriate weight vector templates. Stephen on SVM and naive Bayes feature embedding. Four datasets
Grossberg invented it in 1976. The resonance is related to the UNSW-NB15, NSL-KDD, Kyoto 2006 and CICIDS2017
resonant state of a neural network. Conventional ANNs have were selected for the experiment. The result showed that
failed to solve the stability-plasticity problem. ART algo- the proposed detection approach achieved strong and robust
rithms solve the problem of plasticity, which is required to results, with an accuracy of 93.75% on the UNSW-NB15
learn new patterns. It is an unsupervised learning model. dataset, 98.92% on the CICIDS2017 dataset, 99.35% on the

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9743

NSL-KDD dataset, and 98.58% on the Kyoto 2006+ dataset. achieve a high DR with a low FPR. The system was based on
Key findings of SVM studies include followings: a two-tier hybrid approach that includes two anomaly detec-
tion components and a misuse detection component. In stage
1. Training time is more in SVM. 1, a low-complexity anomaly detection method was built and
2. SVM is most useful when dealing with nonlinear data. used to construct the detection portion. In order to construct
3. Single SVM still has a significantly higher FAR. the two detection components for stage 2, the k-nearest neigh-
4. Because of its promising performance in classification bour’s algorithm was used. The stage 1 detection component
and prediction, the Support Vector Machine (SVM) is was involved in creating the two-stage detection components
becoming more popular. that reduce the number of false positives and false negatives
produced by the stage 1 detection component. The experi-
5.3 Naive Bayes (NB) mental results showed that this approach could effectively
detect network anomalies with a low FPR on the KDD’99
It is a classification algorithm defined based on Bayes The- dataset and the Kyoto University Benchmark dataset.
orem. This classifier assumes that the probability of every Saleh et al. (2019) proposed a hybrid IDS to handle the
feature belonging to a given class value is independent of multi-class classification problem. It was based on a triple
other features. Prediction can be attained by calculating the edged strategy due to its three main contributions, which
instance probabilities of each class and by selecting the class were: (i) NBFS, employed for dimensionality reduction, (ii)
value of the highest probability. OSVM, applied for outlier rejection, and (iii) PKNN, used for
Several researchers used NB for detecting intrusions. For detecting input attacks. The KDD Cup ’99, Kyoto 2006+ and
instance, Kevric et al. (2017) developed a combining clas- NSL-KDD datasets were used to compare the HIDS against
sifier model using random tree and NBTree algorithms for recent techniques. It was capable of detecting attacks rapidly
NIDS. This algorithm was evaluated on NSL-KDD dataset and can be employed for real-time intrusion detection.
and accuracy achieved was 89.24%. It was also concluded
that combining the two best individual classifiers could 5.5 Logistic regression (LR)
not result in the best overall performance. Depending on
the form of attack, a hybrid layered IDS was proposed LR estimates the discrete values in the form of 0 or 1 based
by Çavuşoğlu (2019) that employed various ML methods. on independent values. Fitting data will predict the event
NSL-KDD dataset was used for training and testing. Trans- that will have occurred or not to the logistic function. 0.5 is
formation and normalization operations were performed on considered a threshold, and the values greater than 0.5 are
the dataset. In all attack types, the results revealed that the considered as 1 or lower than 0.5 is considered 0.
proposed method achieved high accuracy and low FPR. Several researchers used NB for detecting intrusions.
The SVM and NB feature embedding was used by Gu and For instance, Palmieri (2019) introduced a novel network
Lu (2021) to develop an efficient intrusion detection system anomaly detection approach focused on nonlinear invariant
as discussed in SVM section. Key findings of NB studies properties of Internet traffic. The overall findings showed
include the followings: that the method effectively isolates a wide range of volumet-
ric DoS attacks in the sense of complex traffic flows with
1. The NB classifier performs the best on real-time dataset. high accuracy and precision.
2. On a variety of classification tasks, NB algorithms were Key findings of LR studies include the followings:
found to be surprisingly accurate on small datasets.
3. The precision of NB does not scale up and decision trees 1. LR is known for its high performance, low computational
in specific, more extensive databases. burden, and good interpretability.
2. It also produces well-calibrated prediction probabilities
5.4 k-nearest neighbour (kNN) without requiring any scaling or tuning of its input fea-
tures.
KNN used both for classification and regression problems, 3. LR outperforms other probabilistic classifiers by being
but it is most appropriate for classification problems. It is a more tolerant of feature correlation, allowing it to make
lazy learner and simples stores all the training data. It uses better predictions even though multiple correlated fea-
this data to find the similarities between available data and tures are present.
new data. Based on the Euclidean distance, the test data is
allotted to the class of kNN. This method is computationally 5.6 Decision tree (DT)
expensive.
Several researchers used NB for detecting intrusions. For DT is used for both regression and classification problems,
instance, Guo et al. (2016) developed a hybrid method to but it is mainly used for classification problems. A regres-

123
9744 G. Kocher, G. Kumar

sion tree is one with continuous values, whereas a decision 1. While increasing the trees, the RF adds more randomness
tree is one with a range of symbolic labels. It classifies a to the model. When splitting a node, it looks for the best
sample through a sequence of decisions represented in a tree function among a random subset of features rather than
structure, in which the current decision helps to make the the most appropriate feature. As a consequence, there is
subsequent decision. Such a sequence of decisions is repre- a lot of variation, which leads to a better model.
sented in a tree structure. Classification and Regression Tree 2. Random forest’s versatility is one of its most appealing
(CART) is a popular program for constructing decision trees. features. It can be used for both regression and classifi-
Several researchers used DT for detecting intrusions. For cation tasks, and the relative importance it assigns to the
instance, Kim et al. (2014) proposed a hybrid intrusion detec- input features can be easily viewed.
tion method based on the misuse and anomaly detection. 3. Overfitting is one of the most common problems in ML,
The experiment was conducted on NSL-KDD dataset. The but RF classifier will not overfit the model if there are
proposed method was better in terms of DR, low FPR and enough trees in the forest.
reduced time complexity. The proposed method’s ability to
reduce time was not as good as it could be. As a result, future 5.8 K-means clustering method
research will concentrate on improving the C4.5 decision tree
algorithm. Similarly, Mousavi et al. (2019) also proposed IDS It is one of the unsupervised ML algorithms. Like an unsu-
based on ant colony optimization and decision trees’ ensem- pervised algorithm, there is no labelled data in this method.
ble. In this method,16 essential features were selected for This algorithm works based on the finding groups in the data.
representing different network visits using a gradually fea- It groups objects into clusters based on their similarities and
ture removal method. The accuracy of 99.92% was obtained differences with objects in other clusters. K-means algorithm
using the proposed method. is highly used in time series data for pattern matching. The
K-Means algorithm has the disadvantage of not applying to
5.7 Random forest (RF) non-spherical results.
Several researchers used K-means method for detect-
RF, as the name suggests, constructs a forest with several ing intrusions. For instance, Mohamad Tahir et al. (2015)
decision trees. It is created by combining several decision proposed a hybrid ML method for NIDS centred on a com-
trees, which predicts by averaging the predictions of each bination of K-means clustering and SVM classification. The
component tree. It is generally much more accurate than a NSL-KDD dataset was used for evaluation and the results
single indicator. In general, the more trees in a forest, the obtained were a positive DR and reduced FAR. In another
more robust it appears. work, Al-Yaseen et al. (2017) suggested a changed K-means
Several researchers used RF for detecting intrusions. For method for reducing the training dataset’s size and balancing
instance, Farnaaz and Jabbar (2016) proposed a model based the data for SVMs and Extreme Learning Machines train-
on RF classifier for intrusion detection. RF was used as ing (ELMs). The experimental results obtained were 95.75%
an ensemble classifier and outperformed other conventional accuracy with a FAR of 1.87%.
classifiers in terms of successful attack classification. The
results showed that the proposed model was efficient with 5.9 Fuzzy systems
low FAR and high DR. Belavagi and Muniyal (2016) pro-
posed a model for intrusion detection using ML classifiers In the early 1960s, Zadeh initiated fuzzy set theory to deal
on NSL-KDD dataset. The results concluded that the RF with problems like incomplete information. It is an essential
classifiers outperformed other classifiers, and the accuracy tool used to analyze the security of a place and begin for sci-
obtained was 99%. Hasan et al. (2019) discussed several ML entific applications. Fuzzy logic was introduced for intrusion
models’ accuracy for predicting attacks and anomalies on IoT detection, mainly due to quantitative features and security
systems. The accuracy obtained for DT, RF and ANN classi- (Luo 1999). Fuzzy set theory assigns values ranging from
fiers was 99.4%, but in terms of other performance metrics, 0 to 1 (Tsoukalas and Uhrig 1997). An object can belong to
RF classifier outperformed other classifiers. Saranya et al. different classes simultaneously in fuzzy logic, which is ben-
(2020a) explored the comparative study of ML algorithms eficial when the difference between classes is not adequately
used in IDS on KDD cup dataset. The accuracy obtained was defined. Due to this concept, fuzzy theory can be applied in
99.65%, 98.1% and 98% for RF, LDA, and CART algorithms. intrusion detection when the differences between the normal
It was observed from the results that RF outperformed other and abnormal classes are not well defined (Gomez and Das-
classifiers in terms of accuracy and concluded that the clas- gupta 2002). Fuzzy sets help in recognizing dangerous events
sifiers’ performance was also dependent on the application and reducing false alarms level during intrusion detection.
used and the size of the dataset. Key findings of RF studies Several researchers used Fuzzy logic in detecting intru-
include the followings: sions. For instance, Porras et al. (2002) proposed the EMER-

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9745

ALD Mission Impact Intrusion Report Correlation System


or M-Correlator to alert prioritization and aggregation. It is
an alert ranking technique. These methods work better for
misuse-based IDSs than anomaly-based IDSs. Qin and Lee
(2003) discussed the alert score to describe the cruelty of
attack and its applicability. Yu et al. (2004) presented multi-
ple IDS to detect real-time network intrusions. In this paper,
a novel IDS alert management system known as FuzMet was Fig. 8 Flow chart of evolutionary algorithm
discussed (Alsubhi et al. 2012). It extends the works of Porras
et al. (2002), Yu et al. (2004) which is used both for misuse
and anomaly-based IDSs. Kudłacik et al. (2016) presented (BNF) grammar. GA is implemented as chromosome-like
a fuzzy-based intrusion detection method. It consists of two data structures and uses parameters, operators and processes
profiles of the user’s activity, i.e. local profile and fuzzy pro- like selection, crossover, mutation and fitness function to
file. This method has low computational complexity, and due arrive at a particular solution. Several researchers used EC
to this, the monitoring server can process a large number of methods for detecting intrusions. For instance, Li (2004) has
incoming local profiles in real-time. applied GA to identify anomalous network behaviours. Cros-
Key findings of the review of fuzzy logic-based meth- bie et al. (1995) applied the multiple agent technology and GP
ods are that fuzzy logic builds flexible patterns for detecting to detect network anomalies. The proposed methodology has
intrusion. The fuzzy theory can differentiate between abnor- the advantage when many small autonomous agents are used.
mal and normal class in intrusion detection. It enhanced the The training process can be time-consuming if the agents are
readability as well as the understanding ability of some ML not correctly initialized or communication occurs among the
algorithms. Various researchers used fuzzy logic or fuzzy agents. Abdullah et al. (2009) has used GAs for getting classi-
sets to recognize the dangerous events and reduce false alarm fication rules for intrusion detection. Ojugo et al. (2012), has
rates. applied GAs to build rule-based intrusion detection. Maniyar
and Musande (2016) revised the genetic algorithm to gener-
5.10 Evolutionary computation ate the rules to detect or classify attacks using network audit
data, and fitness function is used for the selection of rules. GA
Evolutionary computation (EC) is a problem-solving tech- based IDS can be implemented in two steps, i.e. to generate
nique of computational intelligence motivated by natural and classification rules and use these rules for intrusion detec-
biological evolution. Traditional systems are unable to solve tion. For intrusion detection, GA has to go through a series
complex problems. So, researchers have been using evolu- of steps which are discussed below:
tionary computational methods to solve such problems. EC
is an idea through which a computer can develop its solu- 1. Information about the network traffic is collected by the
tions to problems rather than write the computer program sniffer present in the IDS.
manually by going through complicated steps. As a result 2. On this captured data, IDS applies GA. The collected
computer program could be ready in a matter of minutes. It information is used to frame classification rules.
enables computers to solve complex real-world problems that 3. The set of rules of the previous phase are then applied
are difficult for a human being to tackle. The researchers have to the incoming traffic by IDS, resulting in population
used EC for automatic model design, optimization, and even initialization. A new population having good qualities is
learning for classification in intrusion detection. In this sec- generated as a result. After this evaluation is performed
tion, some critical issues like the working of EC, EC methods, on this population, and a new generation with better qual-
and algorithms used in EC will be discussed. After the ini- ities is generated. Then genetic operators are applied to
tialization of candidate solutions, new solutions are created the newly created generation until the most suitable indi-
by applying mutation and crossover operators. The resulting vidual is found.
solution’s evaluation is done based on their fitness, and after
this, the selection is applied to find solutions for the next The implementation of GA is depicted in Fig. 9.
generation. A flow chart of EC is depicted in Fig. 8. In the literature, GP is the most popular technique of EC.
Genetic algorithms (GA), genetic programming (GP), GP is the extension of GA and was introduced by Koza in
grammatical evolution (GE), evolutionary algorithms (EA), 1992. It is a domain-independent method, and to solve a
evolutionary programming, evolution strategy, learning clas- problem, GP genetically breeds a population of computer
sifier system etc., are examples of EC methods. These programs. Le Goues et al. (2011) described and evalu-
methods can be differentiated based on representing the indi- ated genetic Program Repair technique based on existing
viduals like GP uses trees; GE uses the Backus-Naur Form test cases. It automatically generates repairs for real-world

123
9746 G. Kocher, G. Kumar

ple structures, EC has shown excellence to represent the


possible solutions to a large variety of problems. It plays a
vital role for classifiers learning, optimization and automatic
model design. These algorithms are easily transferable from
one application to another. The important application areas
of evolutionary algorithms are numerical and combinatorial
optimization. Black-box optimization is the most challeng-
ing. Following features make the EAs attractive:
Fig. 9 Genetic algorithm’s working
1. They make no explicit assumptions about the problem.
Due to this, they are widely applicable and can be trans-
bugs in legacy applications. GenProg can efficiently repair ferable at a low cost.
programs containing multiple errors drawn from multiple 2. They are flexible and can be easily used in collaboration
domains. with existing methods.
Jebur and Nasereddin (2015) introduced Fuzzy-genetic 3. They are strong due to randomized choices.
IDS combined with feature selection. It allows the system 4. They are also less sensitive to noise.
to develop an optimal subset of an attribute in the middle 5. Algorithm terminates with several solutions and not
of enormous network information. To reduce the training focused on a single solution.
time, the author uses 15 features to describe the rules. Fuzzy
logic is used to generate rules. The soft computing approach 5.11 Swarm intelligence
generates more efficient rules than complex computing. Fur-
ther, a GA is applied to generate essential rules by tuning. Beni and Wang (1993) firstly triggered the term “Swarm
The feature selection strategies perform poorly in the case of Intelligence” (SI) for cellular robotics system and later on for
unbalanced data. problem-solving in AI. It provides a distributed solution to
To solve this problem, Viegas et al. (2018) proposed a complex problems by interactions between agents and their
new feature selection technique based on Genetic Program- environment. Self-organization and division of labour are the
ming that works well with balanced and unbalanced data. two necessary properties of SI. Self-organization is the capa-
It is capable of selecting a set of discriminative features. bility of a system to devolve its agents without any external
Biological and Textual datasets are used for evaluation. The help and labour division. It refers to the parallel execution of
solution proposed by the author improves the efficiency of feasible and straightforward tasks that enable it to solve com-
the learning process and also bringing down the size of the plex problems. The two popular swarm inspired methods are
data space. Besides GA and GP, Grammatical Evolution (GE) ACO and PSO. ACO simulates ants’ behaviour and suitable
is a technique based on biological process. With GE’s help, for discrete optimization problems whereas PSO simulates
complete programs can be generated in an arbitrary language the behaviour of flocks of birds and is used to solve nonlinear
by developing programs written in BNF grammar. The evo- optimization problems. The ants foraging behaviour inspires
lution process can be performed on variable-length binary ACO. The indirect communication between the ants utiliz-
strings instead of actual programs. This transformation pro- ing chemical pheromone trails enables them to find short
vides mapping, which simplifies the application of search. paths between their nest and food sources. ACO algorithms
Şen and Clark (2009) applied GE technique on route dis- are used to solve computational and discrete optimization
ruption and DoS attacks on MANETs. Intrusion detection problems. Researchers have applied ACO algorithms to solve
programs are developed for each attack and distributed to complex problems like Traveling Salesman, Vehicle Routing
every node on the network. GE technique shows good per- and Telecommunication network etc.
formance on evolving efficient detectors for known attacks. Several researchers applied SI for detecting intrusions.
Nyathi and Pillay (2018) compared the GA to GE to auto- For instance, Tabakhi et al. (2014) described an unsupervised
mate GP classification algorithms’ design. This approach is feature selection method based on ant colony optimization
trained and tested using real-world binary and multi-class (UFSACO). This technique is used to find the optimal fea-
data. The result shows that GE is suitable for binary classifi- ture subset with several iterations without using any learning
cation while the GA is suitable for multi-class classification. algorithms. The redundancy is minimized by the computa-
Evolutionary Algorithms (EAs) are black-box search opti- tion of feature relevance based on the similarity between
mization methods based on population and not required features. Hence it is classified as a filter-based multivari-
assumptions like continuity or differentiability. They are very ate method. It exhibits low computational complexity. The
appropriate for dealing with MOPs (Yang et al. 2013). To result indicates that the method outperforms the unsupervised
summarize above, our findings are that due to having sim- methods and comparable with the supervised methods. Agh-

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9747

dam and Kabiri (2016) applied ACO for intrusion detection 5.12 Challenges of ML methods and its remedies
problem area using dimensionality reduction. Due to solid
search capability, it could efficiently found minimal feature It can be concluded from the literature mentioned above that
subset. This technique uses KDD Cup 99 and NSL-KDD ML methods have been widely used to detect various types of
benchmark data sets for intrusion detection and obtained attacks. It helps the network administrator to take the counter
higher accuracy with a lower false alarm. Hajimirzaei and steps to deal with attacks. However, most conventional ML
Navimipour (2019) proposed a hybrid approach for intru- methods belong to shallow learning (SL) and often focus
sion detection. NSL-KDD dataset and CloudSim simulator on feature engineering and selection. The learning capacity
are used, root mean square error (RMSE), mean absolute of traditional detection approaches is limited, and learning
error (MAE), and the kappa statistic is chosen for evalua- efficiency further decreases as the network structure compli-
tion criteria. This hybrid approach gives better results than cated. They only represent partial information, i.e. one or two
earlier methods. Kennedy, Eberhart and Shi introduced the levels of information and cannot effectively solve the real net-
PSO as an optimization technique to guide the particles to work application problem. The multi-classification task will
seek optimal global solutions. Various researchers in intru- lead to decreased accuracy due to the dynamic growth of data
sion detection using this technique due to several advantages sets.
like ease to implement,simplicity, robustness,scalability, fast Shallow learning methods require a vast quantity of train-
finding optimal solution and flexibility, etc. To improve the ing data for the operation, which become a challenge in
accuracy of attack detection, Bamakan et al. (2015) presented a heterogeneous environment. Besides, shallow learning is
a new method based on multiple criteria linear program- expensive and labour intensive and not suited for forecasting
ming(MCLP) and PSO. MCLP is a classification method high-dimensional learning requirements with massive data.
and is capable of solving real-life data mining problems. It is When dealing with a large number of multi-type variables,
based on mathematical programming. To improve the perfor- logistic regression is easy to underfit, and the accuracy is
mance of MCLP classifier, PSO, a robust and straightforward low; decision trees are prone to overfitting and neglect the
technique was used. KDD CUP 99 Benchmark Datasets are problems caused by inter-data correlation; SVM is inefficient
used to evaluate the performance. PSO-MCLP model shows when dealing with large samples, and it can be challenging
the high accuracy of 99.13 percentage and a low FAR of to find a suitable kernel function that can deal with missing
1.947 percentage. data.
Similarly, Bamakan et al. (2016) again proposed a To address these limitations, DL methods, an advanced sub-
time-varying chaos particle swarm optimization method set of ML, are receiving interest across multiple domains. It
(TVCPSO). This technique is based on two conventional has attracted researchers due to its several advantages over
classifiers, i.e. MCLP and SVM, to detect intrusion using ML methods like automatic feature learning, flexible adap-
NSL-KDD dataset. This method shows high accuracy in tation to novel problems which make it possible to work
detecting intrusions with a more discriminative feature sub- upon big data etc. Its superior layer feature learning ability
set. can show improved or at least the same ML methods perfor-
Ali et al. (2018) proposed a PSO-FLN for intrusion detection mance, as shown in Table 8.
problem. KDD99 benchmark dataset was used for validation.
This model was compared with algorithms, i.e. ELM, and
FLN classifier. This technique provides high testing accuracy, 6 Deep learning methods for IDSs
which can be further increased by increasing the number of
hidden neurons in the ANN. DL methods come into existence in 2006 and have become
After reviewing ACO and PSO methods, it can be con- a prominent research topic. The word deep stands for many
cluded that discrete and nonlinear optimization problems can hidden layers in the neural network. It is a subcategory of
be easily solved with SI methods. Researchers used these SI ANN and has a more number of hidden layers than tradi-
methods for the generation of classification rules or to dis- tional neural networks, which goes up to 150. Although it is
cover clusters for Anomaly Detection. Some researchers used a branch of ML, complexity in the structure and learning data
hybrid approaches for the enhancement of intrusion detec- representations makes it a broader version of ML. DL deals
tion. These approaches showed better results than traditional with algorithms that learn from examples the same as in ML.
or single approaches. Due to self-organization and division of The performance of ML and DL algorithm varies as the scale
labour like properties, challenging problems can be decom- of the data increases. To find the network patterns, DL algo-
posed into smaller ones and handed over to an agent to work rithms require massive data, whereas ML algorithms require
in parallel. So by adopting SI methods, real-life problems lesser data. The structure can be made deep by adding one or
can be easily solved. The comparative study of ML methods more hidden layers in ANNs, and since the data is processed
is shown in Table 7. at each layer, thus, making the learning task deeper.

123
9748 G. Kocher, G. Kumar

Table 7 Comparative study of ML methods for IDS (2016-2020)


Study Dataset Method used Advantage(s)/result(s) Limitation(s)/future scope

Mehmood and KDD-99 SVM, J.48, NB, DT J48 outperforms other Feature selection methods
Rais (2016) algorithms in terms of can be used in future
accuracy and
misclassification rate
Belavagi and NSL-KDD SVM, LR, NB, RF RF outperforms other Multiclass classification is
Muniyal (2016) algorithms in terms of required in future
highest TPR and lowest
FPR
Aburomman and KDD-99 PCA, LDA Overall accuracy = 0.92162
Reaz (2016) FP = 0.0196 ,
FN = 0.10849
Ashfaq et al. NSL-KDD Semi supervised learning – Limited only for binary
(2017) (SSL) approach based on classification tasks
fuzziness
Al-Yaseen et al. KDD-99 SVM, EVM 95.75% accuracy,shorter Efficient classifiers are
(2017) training time required for novel attacks
Othman et al. KDD-99 Chi-Square, SVM with High Performance, Low Can be extended to multi
(2018) SGD FPR class model
Gautam and KDD-99 Naive Bayes, Ensemble Accuracy- Limit to 2 class attack
Doegar (2018) methods,Adaptive Boost 99.97,Recall—99.98,
and PART Precision—99.99
Hasan et al. Kaggle ANN, SVM, LR, DT, RF DT, RF and ANN showed More focus is needed on
(2019) accuracy of 99.4% but in real time data
terms of other
performance metrics RF
outperforms other
classifiers
Saranya et al. KDD-99 LDA, CART, RF RF outperforms other Realtime dataset can be
(2020b) classifiers in terms of used in future
99.65% accuracy

The DL models are applied in the research of com- Table 8 Shallow ML versus DL
puter vision, audio recognition, natural language processing, Sr. no. Approach Steps in learning Result
speech recognition, face recognition, image recognition,
information retrieval, failure prediction, handwriting recog- 1 Shallow ML Input—Hand design Output
Features—Mapping from
nition, feature learning , social network filtering, machine Features
translation, dimensionality reduction, intrusion detection and 2 DL Input—Simple Output
so on. Table 17 shows the architecture and application areas Features—Complex
of DL methods. Features—Mapping From
DL methods are categorized into supervised learning Features
and unsupervised learning. Convolutional Neural Network
(CNN) and Recurrent Neural Network (RNN) comes under
the category of supervised learning, and Auto-Encoder(AE) correct result from labelled data. One popular network under
and Deep Belief Network (DBN)comes under the category supervised learning is CNN.
of unsupervised learning. There exist many other DL models
as variants of these basic models (Fig. 10). 6.1.1 Convolutional neural network (CNN)

CNN are a particular form of feed-forward ANNs which


6.1 Supervised DL models works under supervised learning. These networks are made
up of neurons with learn-able biases and weights. These mod-
In supervised learning, the training of the machine is done els process data that comes in multiple arrays and eliminates
with labelled data. After that, to analyze the training data, the the need for manual feature extraction. It works by withdraw-
machine is trained with a new set of examples and produce a ing relevant features directly from images without retaining

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9749

Fig. 10 Deep learning models

them. Feature identification is the main application of Con- security by clustering criminal activities. The proposed algo-
vNets. The automated feature extraction of ConvNets makes rithm performed better for multiple classifications. The time
it highly accurate for computer tasks (Tirumala 2014). The needed to implement a GA-based exhaustive search method
architecture of CNN is shown in Fig. 11. The limitation of to select a specific feature subset and an appropriate CNN
ConvNets is its limited ability to process natural data in its structure was a limitation of this study.
raw form. CNN can accommodate image translation, rotation, size
Several researchers used CNN method for detecting intru- difference, and other types of deformations while provid-
sions. For instance, Shen et al. (2018) proposed a new ing accurate classification results. In a nutshell, it has good
compressed CNN model for image classification called CS- generalization potential when interacting with noisy inputs.
CNN that incorporates compressive sensing theory at the The following factors contribute to the performance of CNN
input layer of CNN models both minimize resource consump- models in classification tasks:
tion and improve accuracy. MINST and CIFAR-10 datasets
were used for the evaluation. This method improved the train- 1. The availability of comprehensive ground truth training
ing speed and classification accuracy. Praanna et al. (2020) sets with labels, e.g., ImageNet.
proposed a method that combines the CNN algorithm and the 2. Implementations of high-speed GPU clusters for training
LSTM algorithm. The proposed method was evaluated with a vast number of parameters.
KDD99. According to the experiments’ results, the proposed 3. Regularization techniques like dropout, which are care-
model outperformed SVM, CNN and DBN with 99.78% fully planned, increase generation capacity.
accuracy. Nguyen and Kim (2020) proposed a novel algo-
rithm for a NIDS based on genetic algorithm (GA)-based 6.1.2 Recurrent neural network (RNN)
exhaustive search and fuzzy C-means. The most successful
CNN structure, called the deep feature extractor, was chosen RNN is an extension of a conventional feed-forward network.
using a GA-based optimization process. It was concluded It is a sequential learning model and is appropriate for sequen-
from the results that deploying the proposed algorithm on tial tasks like speech and language. It learns features from
real-world internet networks would boost computer network previous inputs’ memory and has cyclic connections making

123
9750 G. Kocher, G. Kumar

Fig. 11 Architecture of CNN

them robust for modelling sequences. RNNs are very good


at predicting the next character in the text or the next word
in a sequence, but they can also be used for more complex
tasks. The RNN can capture arbitrary-length dependencies
from a theoretical point of view which is difficult to handle
and hard to train. However, it makes the gradient exploding
or vanishing while training with Back Propagation Train-
ing Time algorithm. LSTM models are presented to prevent Fig. 12 Architecture of SAE
gradient exploding. RNN obtains the best performance in
many applications such as speech recognition, natural lan-
differences without any previous data training. Therefore,
guage processing and machine translation.
the machine can find the hidden structure in unlabeled data
Several researchers used RNN method for detecting intru-
by self-learning. AE and DBN come under the category of
sions. For instance, Yin et al. (2017) applied a DL-based
unsupervised learning.
RNN approach on NSL-KDD dataset to find various attacks
in the network. After that, the results were compared with
traditional classification methods like SVM, ANN proposed 6.2.1 Auto encoder (AE)
by previous researchers and found that RNN-IDS was very
appropriate for modelling a classification model with high In AE, its input is copied to its output. The reduction in
accuracy, and its performance was superior to ML classifi- dimensionality or feature learning is made by transforming
cation methods in both binary and multi-class classification high dimensional data into lower dimensional code. Data will
but to reduce the time, training time using GPU acceleration be recovered from the code by a decoder network. Initially,
needs to be focused in future. random weights are assigned to both encoder and decoder
In another research work, Liu et al. (2019) suggested a networks. The training of AE is done by observing the dif-
payload classification approach to analyze payloads based ference between input and output obtained from encoding
on PL-CNN and PL-RNN use in attack detection. The pro- and decoding. Then the error is fed back to the decoder and
posed methods help end-to-end detection by learning feature encoder network, respectively (Tirumala 2014). This model’s
representations from original payloads without requiring fea- significant change is done by Bengio et al. (2009) chang-
ture engineering. When applied to the DARPA1998 dataset, ing the unsupervised training to supervised for identifying
PL-CNN and PL-RNN techniques achieved accuracies of the significance of the training paradigm. The stacked auto-
99.36% and 99.98%, respectively. PL-RNN outperformed encoders(SAE) with unsupervised training are more efficient
the PL-CNN on a variety of datasets. There were two issues than the SAE with supervised pre-training. The performance
with these models. First, unlike conventional ML models, of SAE based on deep architecture is slightly less than the
these models had more parameters. Consequently, changing performance of RBMs based architecture because SAE is
model parameters was complex, i.e., model training was diffi- unable to ignore random noise in its training data. The archi-
cult and required specific skills. Second, these methods were tecture of SAE is shown in Fig. 12.
not well-interpreted. Several researchers used RNN method for detecting intru-
sions. For instance, Javaid et al. (2016) described a DL-based
approach for developing a flexible and efficient NIDS on
6.2 Unsupervised DL models NSL-KDD benchmark datasets. In this paper, STL scheme
based on unsupervised learning has been applied to training
In this learning, no teacher is available for guidance or train- data using a sparse-auto encoder. The trained features were
ing. Here the machine is trained using information that is used on a labelled test dataset for classification into the nor-
neither labelled nor classified. The unsorted information is mal and attack. N-fold cross-validation methods were used
grouped by machine according to patterns, similarities and for performance evaluation, and the result obtained was rea-

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9751

sonable. Accuracy, Precision, Recall, and F-measure values


metrics were used for performance evaluation. The results
were also compared with the soft-max regression (SMR)
when applied directly to the dataset without feature learn-
ing. After evaluation, it was found that the performance of
STL was better as compared to the previous work.
The autoencoders were also applied for anomaly detec-
tion (Sakurada and Yairi 2014), in which nonlinear feature
reduction by autoencoders was used to train normal network
profile. In their study, the authors analyzed the learned fea-
tures in the hidden layer of AE. They found that AE learned
the normal state properly and activated it differently with
anomalous input. Fig. 13 Architecture of DBN
Many historical evidence shows that SAE can perform bet-
ter classification tasks and multiple levels of higher-quality
representation in terms of feature learning than their shallow supervised fine-tuning strategies are used. Developing a DBN
counterparts. But the limitation of SAE is difficulty in effec- model is computationally expensive. The DBN architecture
tively performing feature learning on “Big data”, having a proposed by Hinton et al. is depicted in Fig. 13. Several
large number of heterogeneous data due to the use of vec- researchers used DBN method for detecting intrusions. For
tors to represent every hidden layer’s input data and learning instance, Kang and Kang (2016) used DBN for intrusion
features. A vector cannot model the highly nonlinear distribu- detection in-vehicular network. DBN based unsupervised
tion of the input data. To solve this problem, multi-modal DL pre-training models could improve intrusion detection accu-
models have been proposed by Ngiam et al. (2011), Srivas- racy, as demonstrated by various researchers. The limitation
tava and Salakhutdinov (2012). Firstly, feature learning from of this model is that its centralized approach might limit its
each modality is performed using conventional DL models practicality in fog networks.
and then integrate the learned features at different levels as Boltzmann Machines (BM) is the form of log-linear
shared representations of multi-model data. Multi-model DL Markov Random Field (MRF), where the energy function is
models help capture the high-order correlations across mul- linear in its free parameters. The hidden nodes can be intro-
tiple modalities to form the hierarchical representations of duced to make them robust enough. The modelling capacity
multi-modal data. However, they cannot model the nonlinear of the BM can be increased by introducing more hidden vari-
distribution of the heterogeneous input data since they learn ables. RBM is the most popular version of BM.
features from different modal data independently, leading to For regular RBM, the relationship between visible units
the failure in learning useful features on big data. and hidden units is limited to constants that certainly down-
Sakurada and Yairi (2014) proposed a tensor DL model grade the representation capability of the RBM. To avoid
for heterogeneous data. The stacking of multiple tensor auto- this error and enhance DL capability, the fuzzy restricted
encoder models was used to build the data computation Boltzmann machine (FRBM) and its learning algorithm are
model. This model achieved higher classification accuracy proposed by Chen et al. (2015). Here, the parameters govern-
for heterogeneous data than multi-modal DL models. ing the model are replaced by fuzzy numbers. As per results,
the representation capacity of FRBM is better than tradi-
6.2.2 Deep belief network (DBN) tional RBM, and when the noise-contaminated the training
data, FBRM reveals better robustness property than RBM.
DBN model was designed by Hinton et al. in 2006. It is To train the DNNs with many parameters creates an overfit-
based on MLP model with greedy layer-wise training and ting problem to solve the overfitting problem (Srivastava et al.
can learn feature representations from both the labelled and 2014) introduced the dropout Restricted Boltzmann Machine
unlabeled data. It comprises many interconnected hidden lay- model, which performs better than standard RBM. Dropout
ers in which each layer acts as an input to the next layer and is is a technique for dropping out units in a neural network.
visible only to the next layer. Each layer in a DBN has no lat- Dropping a unit out means temporarily removing it from the
eral connection between its nodes present in that layer. It first network.
takes the benefit of an efficient layer by layer greedy learning Denoising AutoEncoder (DAE) is the process that uses
strategy to initialize the deep network and then fine-tune all similar input and output data. The denoising power is pro-
the weights jointly with the desired outputs. It optimizes its duced by adding noise to the training procedure. DAE are an
weights at time complexity linear to the depth and size of essential and critical tool for feature selection and extraction.
the networks. In this model, unsupervised pre-training and Variational AutoEncoders (VAEs) are a deep learning tool

123
9752 G. Kocher, G. Kumar

that can be used to learn latent representations. It is appro-


priate for sensor failure detection, application of IoT device
security, and intrusion systems’ security. The VAEs executes
the visualization, recognition, representation, and denoising
task.
Li et al. (2015), the malicious code detection was per-
formed using AE for feature extraction and DBN as a
classifier. KDDCUP’99 benchmark dataset was used for the
experiment. The results have shown that the hybrid approach
is more effective and accurate in time and detection accuracy
than a single DBN. The advantage of these networks is that
they are more beneficial than shallow ones in cyber-attack
detection. The dataset required in this research should be the Fig. 14 Bar chart comparison for state of the art algorithms in terms of
latest, which is its main drawback. accuracy
Table 9 summarizes DL methods for IDS (2015-2021).

Time complexity of all classifiers used for the testing and


6.3 Comparative analysis of experimental results for training on UNSW-NB15 and KDD99 datasets are given in
intrusion detection Table 14. It is observed from the results that RF classifier
requires more time for training because it uses many decision
A comparison of several ML and DL algorithms used for the trees to define the class.
IDS on benchmark datasets is presented in this section. The The accuracy comparison for all data set using ML state of
evaluation metrics used for the comparisons are accuracy, the art algorithms can be visualized in Fig. 14. From, Fig. 14 it
precision and recall. Table 10 shows the evaluation compar- is verified that RF, DT, KNN and HT gives high results in the
ison of several ML classifiers using tenfold cross-validation tenfold cross-validation test mode of both datasets compared
on KDD99 dataset. Similarly, Table 11 shows the evaluation to the other classifiers, while the RF, DT, KNN and SMO
comparison of several ML classifiers using tenfold cross- achieve high results in the supplied test mode of both datasets
validation on the UNSW-NB15 dataset. compared to the other classifiers.
We can see from Table 10 that HT, KNN, DT,and RF We carried out a comparison of several DL models used
performed well in terms of classifying normal and abnor- for the IDS based on cybersecurity. The training time and
mal traffic and achieved an accuracy of 99.22%, 99.83%, accuracy of DL supervised and unsupervised models with
99.86% and 99.94% respectively. The same order of superi- various hidden nodes and learning rate using the CSE-CIC-
ority is retained (refereed to Table 11) by HT, KNN, DT, and 2018 dataset is presented in Table 15. The presented results
RF using tenfold cross validation with accuracy of 93.53%, are directly taken from Ferrag et al. (2020). Similarly, the
93.71%, 95.54% and 96.07% respectively on UNSW-NB15 training time and accuracy of DL supervised and unsuper-
dataset. vised models with various hidden nodes and learning rate
Tables 12 and 13 present the evaluation comparison of using the Bot-IoT dataset is manifested in Table 16.
several ML classifiers using tenfold cross validation on the Contrasted with both profound neural network and RNN,
supplied UNSW-NB15 and KDD99 dataset of the testing the CNN gets a higher precision of 97.38% (referred to
phase. Table 15), in the presence of 100 hidden nodes and the
The empirical analysis of classifying spam traffic using learning rate is 0.5. Furthermore, Table 15 demonstrates the
supplied data sets from Tables 12 and 13 demonstrate that precision and preparation time of generative/solo models in
RF outperformed KNN, SMO and DT with a smaller mar- the CSE-CIC-IDS2018 dataset with various hidden nodes
gin. At the same time, its superiority is more significant as variants and learning rates. The profound DA get a higher
compared to other state of the art algorithms with a large mar- precision of 97.37% when there are 100 hidden nodes, and
gin. The accuracy obtained by SMO, KNN, DT and RF are the learning rate is 0.5 contrasted with the other three algo-
95.11%, 96.01%,96.22%,96.79%, respectively. The result rithms, i.e. DBM, DBN, and RBM.
analysis from Tables 9–12 shows that RF classifier gives bet- Table 16 presents the exactness and preparation time of
ter performance in most cases because while increasing the profound discriminative models in the Bot-IoT dataset with
trees, the RF adds more randomness to the model. When multiple hidden nodes and learning rates. The CNN receives
dividing a node, it looks for the best feature among a random a higher exactness 98.37% in 100 hidden nodes and a 0.5
subset of features rather than the most significant feature. As learning rate. CNN increases the system performance and
a result, there is a lot of variety, which leads to a better model. accuracy due to its unique features like shared weights and

123
Table 9 Summarized review of DL methods for NIDS (2015-2021)
Study Dataset Method used Advantage(s)/result(s) Limitation(s)/future scope

Zhao et al. (2015) (Three real world multi-modal deep neural Efficient in selecting the relevant features and attains Framework applied on single-label multi-class
datasets) networks feature selection competitive classification performance classification problem, it might be further extended
Animal-10 with sparse group LASSO to multi-label categorization or retrieval tasks
NUS-WIDE-
Object
MSRA-MM
Eesa et al. (2015) KDD-99 Cuttlefish optimization Higher Detection Rate, Higher accuracy, lower false CFA can be used as a rule generator for future work
algorithm alarm rate
Srimuang and KDD-99 Weighted-ELM Takes less time for working R2L and Probing attacks had lower effectiveness
Intara-
sothonchun
(2015)
Li et al. (2015) KDDCUP’99 Hybrid (AE, DBN) Improves malicious code detection accuracy, reduces
time complexity
Zhang et al. STL-10 and Deep computation model Efficiently deal with deep computation model for big Additional overhead to perform the data
(2015a) NUS-WIDE uses the BGV encryption data feature learning encryption/decryption and communication between
scheme to encrypt the the client and the cloud, little lower performance
private data accuracy. For future work focus should be given to
incremental deep computation model
Zhang et al. STL-10, Tensor DL model The model is successful to perform feature learning Takes more times to train the parameters than SAE
(2015b) CUAVE, for heterogeneous data due to its capability of and multimodels DL. The focus is given to improve
SANE and learning abstract representations of multiple modal the efficiency of deep computational model
INEX datasets data
Machine learning and deep learning methods for intrusion detection systems: recent…

Tang et al. (2016) NSL-KDD (DNN) Deep Neural 75.75% accuracy rate, AOC = .86 Implement this approach in real SDN environment
Network in SDN with real traffic is still needed to improve the
Environment performance
Dong and Wang KDD-99 Deep coding/SVM-RBM, SVM—RBM gets the better precision, gives
(2016) SVM, Decision Tree, accurate information on the anomalous behavior
Naïve Bayes C4.5
Ma et al. (2016) KDD-99 (SC-DNN), spectral Suitable in complex networks, improves detection Cluster parameters are to be determined empirically
NSL-KDD clustering, deep neural accuracy in real security system, more capable of and not through mathematical theory
network classifying sparse attacks cases
Aminanto and KDD-99 ANN (used for Feature Detection Rate Difficult to implement in Wireless System
Kim (2016a) Selection) SAE (used for 99.4% = IDS-T
Classifier) 99.9% = IDS-All
IDS-T Training time = I Minute
IDS-All Training Time = 10 min
Aminanto and AWID ANN (used for Feature Detection Rate = 65.18% Limited to impersonation attacks only
Kim (2016b) Selection) FAR = 0.14%
SAE (used for Classifier) Accuracy = 98.59%
Precision = 94.53%
F1 = 77.16%

123
9753
Table 9 continued
9754

Study Dataset Method used Advantage(s)/result(s) Limitation(s)/future scope

123
Nskh et al. KDD-CUP 1999 SVM, PCA RBF kernel exhibits better results with better DR and
(2016) detection speed is faster in polynomial kernel
based SVM
Javaid et al. NSL-KDD Self Taught Learning (STL) 2 Class Implementation of real time NIDS with more
(2016) association with (Sparce Precision = 85.44% efficient and active feature learning is required
auto encoder, Soft-max) Recall = 95.95%
F-Measure = 90.4%
Accuracy = 88.39%
5 Class
F-Measure = 75.76%
Accuracy = 79.10%
All Class
Accuracy = 98%
Seo et al. (2016) KDD 99 Restricted Boltzmann Accuracy = 99.4% Parameters like batch size, learning rate and no. of
Machine Precision = 99.8% iteration can be revised
Aminanto and AWID SAE is used as a Feature Detection Rate = 92.18 % In future SAE will be used as outlier detection for
Kim (2017) Extraction and selection FAR = 4.40% detecting unknown attacks
Method Accuracy = 94.81%
Precision = 86.15%
F1 = 89.06%
Effendy et al. NSL-KDD k-means clustering, Provides high accuracy value
(2017) Information gain
Wisesty et al. KDD-CUP 1999 Conjugate Gradient 93.2% accuracy in two class classification, 54.13% In future, sampling method can be used to enhance
(2017) algorithm in case of multi class classification the performance of classification system
Yin et al. (2017) NSL-KDD Proposed RNN-IDS and High accuracy , High DR, low FPR In future attention will be given to reduce the
compared with ANN, J48, training time using GPU acceleration
placePlaceNameRandom
PlaceTypeForest, SVM
Hodo et al. UNB-CIC CFS-ANN based classifier, Detects nonTor traffic with an accuracy of 99.8%, In future, performance will be analyzed in
(2017) ANN SVM DR 100% and FPR of 1.2% classification of 8 different types of traffic in
UNB-CIC Tor Network Traffic dataset
Zhao et al. (2017) KDD Cup 99 PCA Model k-NN Softmax Softmax regression shown better time performance Calculation of memory size is ignored
Regression
Manzoor et al. KDD-99 ANN Increased DR, reduced FAR Manually preprocessing work, number of features in
(2017) reduced feature set can be made optimal
Aminanto et al. Aegean Wi-Fi D-FES (Deep Feature Detection accuracy = 99.918% Extend D-FES to all attack classes
(2017) Intrusion extraction and Selection) FAR = 0.012%
Dataset Used as an Clustering
(AWID)
Liu et al. (2019) DARPA-98 PL-CNN Higher DR Model training is difficult due to more parameters
PL-RNN and interpretation of models is difficult
He et al. (2018) KDD CUP 1999 Kernel clustering algorithm Higher DR, Lower FAR, fit for most attack types
G. Kocher, G. Kumar
Table 9 continued
Study Dataset Method used Advantage(s)/result(s) Limitation(s)/future scope

Napiah et al. Simulated CHA-IDS Multi-Agent J48 algorithm shows 99% TPR, consumed low Unable to precisely identify the attacker
(2018) Dataset System SVM, J48,MLP energy overhead and memory, high capability in
Naïve Bayes, Logistic, detecting routing attacks
Random Forest Feature
Selection: BPS-CFS,
GS-CFS
Ali et al. (2018) KDD-99 (PSO-FLN) Model Particle R2L has obtained less accuracy due to limited
Swarm Optimization Fast samples
Learning Network Model
and Outperformed ELM
and FNN Classifier in the
testing accuracy
Muna et al. NSL-KDD and Deep-Auto encoder, Deep 99% Detection Rate, 1.8% False Alarm Rate Need to train the algorithm on real data collected
(2018) UNSW-NB15 Feed Forward Neural from IoT systems
Network
Kim et al. (2016) CSIC-2010 LSTM-RNN with Adam Accuracy = 99.97% To Evaluate LSTM Performance with different
HTTP Optimizer Recall = .995% optimizers
Precision = .995%
Shone et al. KDD-99, Nonsymmetric deep Accuracy = 89.22% Not perfect to handle zero-day attacks
(2018) NSL-KDD autoencoder (NDAE), RF Precision = 92.97%
Classification Algorithm, Recall = 89.22%
DL, Stacked NDAEs F-Score = 90.76%
False Alarms = 10.78%
Machine learning and deep learning methods for intrusion detection systems: recent…

Caminero et al. NSL-KDD, AE-RL Adversarial Accuracy = 80.16%


(2019) AWID environment Precision = 79.74%
reinforcement learning Recall = 80.16%
F-Score = 79.40%
Yang et al. (2019) NSL-KDD, Modified density peak Accuracy = 82.08% To synthesize R2L and U2R attacks for increasing
UNSW-NB15 clustering algorithm DR = 70.51% the performance of the Model
(MDPCA) and deep belief FPR = 2.62%
networks (DBNs)
Feng et al. (2019) KDD 99 DNN , CNN and LSTM Accuracy = 98.5% Limited to DoS ,XSS and SQL attacks
Precision = 97.63%
Recall = 99.59%
F-Score = 98.6%
Gamage and KDD 99, AE , DBN and LSTM Empirical results with difference of 2.5 to 3% in –
Samarabandu NSL-KDD, comparison to reference papers
(2020) CIC-
IDS2017,CIC-
IDS2018
Zhang et al. KDD 99 AN-LSTM Improved accuracy and detection performance More processing time
(2020)
Sohi et al. (2021) Publicly RNNIDS Improvement in the detection rate upto 16.67% To minimize the false positives in the future

123
9755

available
datasets
9756 G. Kocher, G. Kumar

Table 10 Comparative analysis


Classifier SMO DT DS HT RF
using KDD99 dataset (Khan and
Gumaei 2019) Accuracy % 98.8289 99.8661 95.7757 99.2293 99.9437
precision 0.988 0.999 0.960 0.992 0.999
recall 0.988 0.999 0.958 0.992 0.999

Classifier KNN NB NB-KE SVM-POLY SVM-RBF

Accuracy % 99.8393 96.589 97.337 97.214 98.4367


precision 0.998 0.966 0.974 0.973 0.984
recall 0.998 0.966 0.973 0.972 0.984

Table 11 Evaluation
Classifier SMO DT DS HT RF
comparison of ML classifiers
using tenfold cross validation on Accuracy % 83.588 95.5413 92.0629 93.5349 96.0791
the UNSW-NB15 dataset (Khan
and Gumaei 2019) precision 0.837 0.955 0.928 0.935 0.961
recall 0.836 0.955 0.921 0.935 0.961

Classifier KNN NB NB-KE SVM-POLY SVM-RBF

Accuracy % 93.7134 75.749 79.9157 70.44 81.708


precision 0.937 0.831 0.848 0.707 0.817
recall 0.937 0.757 0.799 0.704 0.817

Table 12 Evaluation
Classifier SMO DT DS HT RF
comparison of ML classifiers
using tenfold cross validation on Accuracy % 95.1125 96.218 93.9811 92.6586 96.7926
the supplied KDD99 dataset of
the testing phase (Khan and precision 0.952 0.962 0.944 0.926 0.969
Gumaei 2019) recall 0.951 0.962 0.940 0.927 0.968

Classifier KNN NB NB-KE SVM-POLY SVM-RBF

Accuracy % 96.0065 94.6799 94.4287 94.0411 94.9474


precision 0.962 0.947 0.948 0.943 0.951
recall 0.960 0.947 0.944 0.940 0.949

Table 13 Evaluation
Classifier SMO DT DS HT RF
comparison of ML classifiers
using tenfold cross validation on Accuracy % 85.3411 84.554 76.6324 59.4423 83.6333
the supplied UNSW-NB15
dataset of the testing phase precision 0.863 0.864 0.835 0.763 0.869
(Khan and Gumaei 2019) recall 0.853 0.846 0.766 0.594 0.836

Classifier KNN NB NB-KE SVM-POLY SVM-RBF

Accuracy % 84.4872 76.3907 76.2219 68.3379 83.2216


precision 0.855 0.782 0.768 0.689 0.835
recall 0.845 0.764 0.762 0.683 0.832

Table 14 Time complexity


KNN NB NB-KE SVM-Poly SVM-RBF SMO DT DS HT RF
comparison (in seconds) for
training phase on KDD99 and KDD99 dataset
UNSW-NB15 dataset (Khan and
Gumaei 2019) 0.06 1.03 1.01 228.88 198.69 789.77 40.78 2.15 5.50 128.51
UNSW-NB15 dataset
0.18 1.84 3.06 793.48 748.36 531.11 76.13 3.95 8.27 542.97

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9757

Table 15 Training time and


Parameters Metric CNN RNN DNN DA DBM DBN RBM
accuracy of DL supervised and
unsupervised models with LR = 0.5 Time 331.2 334.7 390.2 341.3 351.5 344.7 390.1
various hidden nodes and
learning rate using the HN = 100 ACC 97.38% 97.31% 97.28% 97.37% 97.37% 97.30% 97.28%
CSE-CIC-2018 dataset (Ferrag LR = 0.1 Time 332.5 336.9 391.1 331.7 330.1 334.8 390
et al. 2020) HN = 100 ACC 97.31% 97.23% 97.19% 97.31% 97.30% 97.23% 97.19%
LR = 0.01 Time 338.9 341.5 395.2 337.11 339.1 340.4 394.1
HN = 100 ACC 97.22% 97.11% 97.10% 97.22% 97.21% 97.11% 97.10%
LR = 0.5 Time 182.6 190.6 177.7 181.4 181.4 190.5 177.6
HN = 60 ACC 96.99% 96.96% 96.95% 96.99% 96.99% 96.96% 96.95%
LR = 0.1 Time 189.1 192.2 179.3 189.1 189 192.1 179.1
HN = 60 ACC 96.98% 96.97% 96.92% 96.97% 96.97% 96.97% 96.92%
LR = 0.01 Time 192.2 197.5 180.2 191.4 191.1 196.5 180.1
HN = 60 ACC 96.92% 96.90% 96.70% 96.91% 96.91% 96.88% 96.69%
LR = 0.5 Time 87.9 90.3 86.1 87.1 87.9 90.3 86.1
HN = 30 ACC 96.93% 96.89% 96.66% 96.92% 96.93% 96.89% 96.66%
LR = 0.1 Time 88.5 90.9 87.9 88.2 88.3 90.7 87.4
HN = 30 ACC 96.93% 96.89% 96.66% 96.92% 96.92% 96.88% 96.66%
LR = 0.01 Time 89.6 91.3 88.1 88.6 89.5 90.4 88
HN = 30 ACC 96.92% 96.88% 96.61% 96.92% 96.92% 96.84% 96.60%
LR = 0.5 Time 27.1 29.1 18.9 27.1 26.2 28.1 18.8
HN = 15 ACC 96.91% 96.89% 96.65% 96.91% 96.91% 96.89% 96.65%
LR = 0.1 Time 27.2 29.2 19.1 27.2 27.1 29.1 19
HN = 15 ACC 96.91% 96.88% 96.65% 96.90% 96.90% 96.87% 96.64%
LR = 0.01 Time 28.4 30.3 20.2 28.3 28.3 30.1 20
HN = 15 ACC 96.92% 96.87% 96.55% 96.91% 96.91% 96.85% 96.55%

local connectivity. Besides, the preparation time of profound Another challenge in DL is to implement self-learning.
neural networks is in every case, not precisely other related Day by day, new attack scenarios are evolving. Therefore,
strategies (such as CNN and RNN). features identified to detect one category of attacks might
Moreover, Table 16 confirms the exactness and training time soon become outdated/insufficient for the others. Thus, there
of generative/unaided models in the Bot-IoT dataset with is a need to develop a framework that can automatically learn
different hidden nodes and learning rate. The profound DA features, reduce computational time and increase accuracy.
gets a higher precision of 98.39% compared to other states Generalization is a critical challenge in DL systems. It is
of the art algorithms. Interval and Box plots expressing the not possible to give a labelled sample of every problem to a
overall deviation of accuracy for Bot-IoT and CSE-CIC-2018 DL algorithm. Therefore, it will have to be first generalized
data set is presented in Figs. 15 and 16 respectively. with its previous samples to classify new data. Currently, DL
After summarizing review of DL methods, several chal- lacks a mechanism for learning abstractions through verbal
lenges related to DL methods have been identified which definitions. It performs well only if billions of training exam-
needs to be resolved (Table 17). ples are available.
Another challenge of the DL system is that it is not aware
of how a neural network arrives at a solution/conclusion.
7 Challenges Even neural network produces good results, but it is hard to
predict if a failure occurs due to lack of transparency in their
DL is a powerful tool for intrusion detection. But it also has its thinking process. It is not suitable for those domains where
fair share of challenges that need to be addressed. One of the verification of the process is necessary, like medicine.
challenges in DL is to maintain accuracy while compressing Overfitting the model is another challenge of DL. It refers
large scale DL models. Although DL models are focused to an algorithm that models the training data too well.
on incomplete or noisy data, feature learning, reliable DL It means an algorithm learns training data to the extent
models are required by many outdated objects in “Big data” that it negatively affects the model’s performance. When
to explore low-quality data on priority. the accuracy stops improving over a certain number of

123
9758 G. Kocher, G. Kumar

Table 16 Training time and


Parameters Metric CNN RNN DNN DA DBM DBN RBM
accuracy of DL supervised and
unsupervised models with LR = 0.5 Time 1367.2 1400.6 991.6 2816.2 2800.1 2921.7 2111.9
various hidden nodes and
learning rate using the Bot-IoT HN = 100 ACC 98.37% 98.31% 98.22% 98.39% 98.38% 98.31% 98.28%
dataset (Ferrag et al. 2020) LR = 0.1 Time 1022.1 1001.8 711.9 2566.9 2531.2 2644.2 1991.6
HN = 100 ACC 98.12% 97.99% 97.50% 98.31% 98.37% 98.12% 98.21%
LR = 0.01 Time 812.2 801.5 600.2 2466.2 2401.1 2521.8 1861.7
HN = 100 ACC 97.99% 97.62% 97.22% 98.32% 98.31% 98.11% 98.20%
LR = 0.5 Time 412.2 451.2 391.1 2101.8 2109.8 2201.9 1771.9
HN = 60 ACC 97.88% 97.29% 97.10% 98.00% 98.00% 97.98% 97.72%
LR = 0.1 Time 366.2 377.1 302.9 1821.1 1811.9 1912.8 1421.1
HN = 60 ACC 97.21% 96.97% 96.92% 98.00% 97.97% 97.96% 97.22%
LR = 0.01 Time 339.6 331.2 250.8 1461.2 1432.6 1461.6 1129.6
HN = 60 ACC 97.10% 96.96% 96.77% 97.93% 97.92% 97.18% 96.87%
LR = 0.5 Time 221.7 222.1 170.3 1266.8 1239.6 1291.6 1022.6
HN = 30 ACC 97.10% 96.90% 96.66% 97.92% 97.93% 96.99% 96.76%
LR = 0.1 Time 144.2 150.4 102.2 791.6 788.1 801.1 701.6
HN = 30 ACC 96.92% 96.88% 96.66% 97.92% 97.91% 96.92% 96.76%
LR = 0.01 Time 101.1 102.5 88.1 524.2 522.1 560.2 400.8
HN = 30 ACC 96.92% 96.88% 96.61% 96.96% 96.94% 96.86% 96.62%
LR = 0.5 Time 101.1 102.5 88.1 210.3 201.9 221.7 150.5
HN = 15 ACC 96.91% 96.88% 96.65% 96.96% 96.91% 96.89% 96.66%
LR = 0.1 Time 91.3 92.6 66.6 133.7 133.1 138.2 100.2
HN = 15 ACC 96.91% 96.88% 96.65% 96.93% 96.92% 96.88% 96.67%
LR = 0.01 Time 65.3 70.7 56.5 60.1 60.2 72.8 50.4
HN = 15 ACC 96.90% 96.77% 96.45% 96.72% 96.41% 96.55% 96.65%

Fig. 15 Interval plot comparison showing over all deviation of accuracy Fig. 16 Box plot comparison showing over all deviation of accuracy
for Bot-IoT dataset for CSE-CIC-2018 dataset

epochs, we can say the model is over trained or overfit- and are costly. Therefore, it is not feasible for small industries
ted. to train the data with GPUs. To reduce the training time of
To solve the real-world problems, DL models require the GPU acceleration is another challenge. Although a lot of
machines to be equipped with sufficient processing power research is going on DL models, they still fail to handle zero-
like GPUs. These processing units consume a lot of power day attacks.

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9759

Table 17 Architecture and application areas of DL methods


Architecture Application area

CNN Natural language processing, Image recognition, Face recognition, Document analysis
AE Natural language processing, Compact representation of data
RNN Speech and Handwriting recognition
LSTM Natural language text captioning,Speech and Handwriting recognition, Image captioning
DBN Image recognition, Natural language understanding
RBN Feature learning, dimensionality reduction, classification

8 Summary Akilandeswari V, Shalinie SM (2012) Probabilistic neural network


based attack traffic classification. In: 2012 Fourth international
conference on advanced computing (ICoAC). IEEE, pp 1–8
DL methods have been applied to several fields for solv- Al-Dhafian B, Ahmad I, Al-Ghamid A (2015) An overview of the cur-
ing complex problems, including intrusion detection. These rent classification techniques in intrusion detection. In: Proceed-
methods addressed many issues of shallow ML methods ings of the international conference on security and management
like improving the accuracy of detecting intrusions. This (SAM). The Steering Committee of The World Congress in Com-
puter Science, Computer, p 82
paper presented a systematic review of ML and DL methods
Al-Yaseen WL, Othman ZA, Nazri MZA (2017) Multi-level hybrid
for IDSs. To that end, we introduced IDS and provided its support vector machine and extreme learning machine based on
classification. We presented a review of datasets and perfor- modified k-means for intrusion detection system. Expert Syst Appl
mance metrics used for evaluating IDS’ performance. This 67:296–303
Ali MH, Al Mohammed BAD, Ismail A, Zolkipli MF (2018) A new
paper introduces the main ML methods and their applica- intrusion detection system based on fast learning network and par-
tions for detecting intrusions, followed by pros and cons. ticle swarm optimization. IEEE Access 6:20255–20261
The paper also provided DL methods and recent advance- Aljumah A (2017) Detection of distributed denial of service attacks
ments for IDSs. Finally, we listed the challenges of ML and using artificial neural networks. In: IJACSA international journal
of advanced computer science and applications, vol 8(8)
DL methods for IDSs and provided clues for future research Almomani A, Alauthman M, Albalas F, Dorgham O, Obeidat A (2020)
in this field. An online intrusion detection system to cloud computing based on
neucube algorithms. In: Cognitive analytics: concepts, methodolo-
gies, tools, and applications. IGI Global, pp 1042–1059
Declaration Alsubhi K, Aib I, Boutaba R (2012) Fuzmet: a fuzzy-logic based alert
prioritization engine for intrusion detection systems. Int J Netw
Conflict of Interest On behalf of all authors, the corresponding author Manag 22(4):263–284
states that there is no conflict of interest. Alzaylaee MK, Yerima SY, Sezer S (2020) Dl-droid: deep learning
based android malware detection using real devices. Comput Secur
89:101663
Aminanto ME, Kim K (2016a) Deep learning-based feature selection
for intrusion detection system in transport layer. In: Proceedings
of the Korea Institutes of information security and cryptology con-
References ference, pp 740–743
Aminanto ME, Kim K (2016b) Detecting impersonation attack in wifi
Abdullah B, Abd-Alghafar I, Salama GI, Abd-Alhafez A (2009) Per- networks using deep learning approach. In: International workshop
formance evaluation of a genetic algorithm based approach to on information security applications. Springer, pp 136–147
network intrusion detection system. In: International conference Aminanto ME, Kim K (2017) Improving detection of wi-fi imper-
on aerospace sciences and aviation technology. The Military Tech- sonation by fully unsupervised deep learning. In: International
nical College, pp 1–17 workshop on information security applications. Springer, pp 212–
Abubakar AI, Chiroma H, Muaz SA, Ila LB (2015) A review of the 223
advances in cyber security benchmark datasets for evaluating data- Aminanto ME, Choi R, Tanuwidjaja HC, Yoo PD, Kim K (2017) Deep
driven based intrusion detection systems. In: SCSE, pp 221–227 abstraction and weighted feature selection for wi-fi impersonation
Aburomman AA, Reaz MBI (2016) Ensemble of binary SVM classifiers detection. IEEE Trans Inf Forensics Secur 13(3):621–636
based on pca and lda feature extraction for intrusion detection. Arabo A (2019) Distributed ids using agents: an agent-based detection
In: 2016 IEEE advanced information management, communicates, system to detect passive and active threats to a network. In: ICCWS
electronic and automation control conference (IMCEC). IEEE, pp 2019 14th international conference on cyber warfare and security:
636–640 ICCWS 2019. Academic Conferences and Publishing Limited,
Aghdam MH, Kabiri P (2016) Feature selection for intrusion detection p 11
system using ant colony optimization. IJ Netw Secur 18(3):420– Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado
432 A, García S, Gil-López S, Molina D, Benjamins R et al (2020)
Aissa NB, Guerroumi M (2016) Semi-supervised statistical approach Explainable artificial intelligence (xai): concepts, taxonomies,
for network anomaly detection. Procedia Comput Sci 83:1090– opportunities and challenges toward responsible ai. Inf Fusion
1095 58:82–115

123
9760 G. Kocher, G. Kumar

Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness International conference on web research (ICWR). IEEE, pp 61–
based semi-supervised learning approach for intrusion detection 66
system. Inf Sci 378:484–497 Feng F, Liu X, Yong B, Zhou R, Zhou Q (2019) Anomaly detection in
Bamakan SMH, Amiri B, Mirzabagheri M, Shi Y (2015) A new intru- ad-hoc networks based on deep learning model: a plug and play
sion detection approach using pso based multiple criteria linear device. Ad Hoc Netw 84:82–89
programming. Procedia Comput Sci 55:231–237 Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep
Bamakan SMH, Wang H, Yingjie T, Shi Y (2016) An effective intru- learning for cyber security intrusion detection: approaches,
sion detection framework based on MCLP/SVM optimized by datasets, and comparative study. J Inf Secur Appl. https://doi.org/
time-varying chaos particle swarm optimization. Neurocomputing 10.1016/j.jisa.2019.102419
199:90–102 Gamage S, Samarabandu J (2020) Deep learning methods in network
Belavagi MC, Muniyal B (2016) Performance evaluation of supervised intrusion detection: a survey and an objective comparison. J Netw
machine learning algorithms for intrusion detection. Procedia Comput Appl 169:102767
Comput Sci 89:117–123 Gautam RKS, Doegar EA (2018) An ensemble approach for intrusion
Bengio Y et al (2009) Learning deep architectures for ai. Found Trends detection system using machine learning algorithms. In: 2018 8th
Mach Learn 2(1):1–127 International conference on cloud computing, data science & engi-
Beni G, Wang J (1993) Swarm intelligence in cellular robotic sys- neering (confluence). IEEE, pp 14–15
tems. In: Robots and biological systems: towards a new bionics? geopolitical-attacks (2019) geopolitical-attacks. https://www.
Springer, pp 703–712 privacyaffairs.com/geopolitical-attacks/3. Accessed 15 Apr
Brugger ST, Chow J (2007) An assessment of the Darpa ids evaluation 2020
dataset using snort. UCDAVIS Dept Comput Sci 1(2007):22 Gharib A, Sharafaldin I, Lashkari AH, Ghorbani AA (2016) An
Caminero G, Lopez-Martin M, Carro B (2019) Adversarial environment evaluation framework for intrusion detection dataset. In: 2016
reinforcement learning algorithm for intrusion detection. Comput International conference on information science and security
Netw 159:96–109 (ICISS). IEEE, pp 1–6
Çavuşoğlu Ü (2019) A new hybrid approach for intrusion detection Ghose N, Lazos L, Rozenblit J, Breiger R (2019) Multimodal graph
using machine learning methods. Appl Intell 49(7):2735–2761 analysis of cyber attacks. In: 2019 Spring simulation conference
Chaabouni N, Mosbah M, Zemmari A, Sauvignac C, Faruki P (2019) (SpringSim). IEEE, pp 1–12
Network intrusion detection for iot security based on learning tech- Gomez J, Dasgupta D (2002) Evolving fuzzy classifiers for intrusion
niques. IEEE Commun Surv Tutor 21(3):2671–2701 detection. In: Proceedings of the 2002 IEEE workshop on infor-
Chen H, Schuffels C, Orwig R (1996) Internet categorization and mation assurance, pp 321–323
search: a self-organizing approach. J Vis Commun Image Rep- Gu J, Lu S (2021) An effective intrusion detection approach using SVM
resent 7(1):88–102 with Naïve Bayes feature embedding. Comput Secur 103:102158
Chen CP, Zhang CY, Chen L, Gan M (2015) Fuzzy restricted Boltzmann Gu J, Wang L, Wang H, Wang S (2019) A novel approach to intru-
machine for the enhancement of deep learning. IEEE Trans Fuzzy sion detection using SVM ensemble with feature augmentation.
Syst 23(6):2163–2173 Comput Secur 86:53–62
Chevalier R, Plaquin D, Villatel M, Hiet G (2020) Intrusion detection Guo C, Ping Y, Liu N, Luo SS (2016) A two-level hybrid approach for
systems. US Patent App. 16/486,331 intrusion detection. Neurocomputing 214:391–400
Chitrakar R, Huang C (2012) Anomaly based intrusion detection using Gupta BB, Joshi RC, Misra M (2012) Ann based scheme to predict
hybrid learning approach of combining k-medoids clustering and number of zombies in a ddos attack. IJ Netw Secur 14(2):61–70
Naive Bayes classification. In: 2012 8th International conference Hajimirzaei B, Navimipour NJ (2019) Intrusion detection for cloud
on wireless communications, networking and mobile computing. computing using neural networks and artificial bee colony opti-
IEEE, pp 1–5 mization algorithm. ICT Express 5(1):56–59
Cloudstor (2019) Cloudstor. https://cloudstor.aarnet.edu.au/plus/index. Hasan M, Islam MM, Zarif MII, Hashem M (2019) Attack and
php/s/2DhnLGDdEECo4ys. Accessed 15 Apr 15 2020 anomaly detection in iot sensors in iot sites using machine learning
Creech G, Hu J (2013) Generation of a new ids test dataset: time to approaches. Internet Things 7:100059
retire the kdd collection. In: 2013 IEEE wireless communications He D, Chen X, Zou D, Pei L, Jiang L (2018) An improved kernel
and networking conference (WCNC). IEEE, pp 4487–4492 clustering algorithm used in computer network intrusion detection.
Crosbie M, Spafford G, et al. (1995) Applying genetic programming to In: 2018 IEEE international symposium on circuits and systems
intrusion detection. In: Working notes for the AAAI symposium (ISCAS). IEEE, pp 1–5
on genetic programming. MIT Press, Cambridge, pp 1–8 Hodo E, Bellekens X, Iorkyase E, Hamilton A, Tachtatzis C, Atkinson R
Dong B, Wang X (2016) Comparison deep learning method to tradi- (2017) Machine learning approach for detection of nontor traffic.
tional methods using for network intrusion detection. In: 2016 8th In: Proceedings of the 12th international conference on availability,
IEEE international conference on communication software and reliability and security, pp 1–6
networks (ICCSN). IEEE, pp 581–585 Hussain MS, Khan KUR (2020) A survey of ids techniques in manets
Eesa AS, Orman Z, Brifcani AMA (2015) A novel feature-selection using machine-learning. In: Proceedings of the third interna-
approach based on the cuttlefish optimization algorithm for intru- tional conference on computational intelligence and informatics.
sion detection systems. Expert Syst Appl 42(5):2670–2679 Springer, pp 743–751
Effendy DA, Kusrini K, Sudarmawan S (2017) Classification of intru- Ibrahim LM, Basheer DT, Mahmod MS (2013) A comparison study for
sion detection system (ids) based on computer network. In: 2017 intrusion database (kdd99, nsl-kdd) based on self organization map
2nd International conferences on information technology, infor- (som) artificial neural network. J Eng Sci Technol 8(1):107–119
mation systems and electrical engineering (ICITISEE). IEEE, pp Jacob NM, Wanjala MY (2018) A review of intrusion detection systems.
90–94 Glob J Comput Sci Technol 5:66
Farnaaz N, Jabbar M (2016) Random forest modeling for network intru- Javaid A, Niyaz Q, Sun W, Alam M (2016) A deep learning approach for
sion detection system. Procedia Comput Sci 89:213–217 network intrusion detection system. In: Proceedings of the 9th EAI
Farzaneh B, Montazeri MA, Jamali S (2019) An anomaly-based ids international conference on bio-inspired information and commu-
for detecting attacks in rpl-based internet of things. In: 2019 5th nications technologies (formerly BIONETICS), pp 21–26

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9761

Jebur SA, Nasereddin H (2015) Enhanced solutions for misuse network Mahoney MV, Chan PK (2003) An analysis of the 1999 darpa/lincoln
intrusion detection system using sga and ssga. Int J Comput Sci laboratory evaluation data for network anomaly detection. In:
Netw Secur 15(5):66 International workshop on recent advances in intrusion detection.
Kabir E, Hu J, Wang H, Zhuo G (2018) A novel statistical technique for Springer, pp 220–237
intrusion detection systems. Fut Gener Comput Syst 79:303–318 Mäkelä A (2019) Network anomaly detection based on wavenet. In:
Kalteh AM, Hjorth P, Berndtsson R (2008) Review of the self- Internet of things, smart spaces, and next generation networks and
organizing map (som) approach in water resources: analysis, systems: 19th international conference, NEW2AN 2019, and 12th
modelling and application. Environ Model Softw 23(7):835–845 conference, ruSMART 2019, St. Petersburg, Russia, August 26–
Kandan AM, Kathrine GJ, Melvin AR (2019) Network attacks and 28, 2019, proceedings, vol 11660. Springer, p 424
prevention techniques—a study. In: 2019 IEEE international con- Maniyar PS, Musande V (2016) Rules based intrusion detection system
ference on electrical, computer and communication technologies using genetic algorithm. Int J Comput Sci Netw 5(3):554–558
(ICECCT). IEEE, pp 1–6 Manzoor I, Kumar N et al (2017) A feature reduced intrusion detection
Kang MJ, Kang JW (2016) Intrusion detection system using deep neural system using ANN classifier. Expert Syst Appl 88:249–257
network for in-vehicle network security. PLoS ONE 11(6):6 Mehmood T, Rais HBM (2016) Machine learning algorithms in context
Kevric J, Jukic S, Subasi A (2017) An effective combining classifier of intrusion detection. In: 2016 3rd International conference on
approach using tree algorithms for network intrusion detection. computer and information sciences (ICCOINS). IEEE, pp 369–
Neural Comput Appl 28(1):1051–1058 373
Khan FA, Gumaei A (2019) A comparative study of machine learning Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning
classifiers for network intrusion detection. In: International con- for data fusion. Inf Fusion 57:115–129
ference on artificial intelligence and security. Springer, pp 75–86 Mohamad Tahir H, Hasan W, Md Said A, Zakaria NH, Katuk N, Kabir
Kim G, Lee S, Kim S (2014) A novel hybrid intrusion detection method NF, Omar MH, Ghazali O, Yahya NI (2015) Hybrid machine learn-
integrating anomaly detection with misuse detection. Expert Syst ing technique for intrusion detection system
Appl 41(4):1690–1700 Mousavi SM, Majidnezhad V, Naghipour A (2019) A new intelligent
Kim J, Kim J, Thu HLT, Kim H (2016) Long short term memory intrusion detector based on ensemble of decision trees. J Ambient
recurrent neural network classifier for intrusion detection. In: 2016 Intell Hum Comput 66:1–13
International conference on platform technology and service (Plat- Muna AH, Moustafa N, Sitnikova E (2018) Identification of malicious
Con). IEEE, pp 1–5 activities in industrial internet of things based on deep learning
Kudłacik P, Porwik P, Wesołowski T (2016) Fuzzy approach for models. J Inf Secur Appl 41:1–11
intrusion detection based on users commands. Soft Comput Napiah MN, Idris MYIB, Ramli R, Ahmedy I (2018) Compression
20(7):2705–2719 header analyzer intrusion detection system (cha-ids) for 6lowpan
Kumar G, Kumar K, Sachdeva M (2010) The use of artificial intelligence communication protocol. IEEE Access 6:16623–16638
based techniques for intrusion detection: a review. Artif Intell Rev Nehinbe JO (2011) A critical evaluation of datasets for investigating idss
34(4):369–387 and ipss researches. In: 2011 IEEE 10th international conference
Kumar V, Chauhan H, Panwar D (2013) K-means clustering approach on cybernetic intelligent systems (CIS). IEEE, pp 92–97
to analyze nsl-kdd intrusion detection dataset. Int J Soft Comput Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal
Eng 6:66 deep learning. OpenReview
Kurniabudi K, Purnama B, Sharipuddin S, Darmawijoyo D, Stiawan D, Nguyen MT, Kim K (2020) Genetic convolutional neural network for
Samsuryadi S, Heryanto A, Budiarto R (2019) Network anomaly intrusion detection systems. Fut Gener Comput Syst 113:418–427
detection research: a survey. Indones J Electr Eng Inform 7(1):37– Nguyen TT, Reddi VJ (2019) Deep reinforcement learning for cyber
50 security. arXiv preprint arXiv:1906.05799
Kyoto (2019) Kyoto. http://www.takakura.com/Kyoto_data/. Accessed Nskh P, Varma MN, Naik RR (2016) Principle component analysis
15 Apr 2020 based intrusion detection system using support vector machine. In:
Kyoto2006+ (2015) Kyoto2006+ dataset. http://www.takakura.com/ 2016 IEEE international conference on recent trends in electronics,
Kyoto_data/. Accessed Feb 2015 information & communication technology (RTEICT). IEEE, pp
Le Goues C, Nguyen T, Forrest S, Weimer W (2011) Genprog: a generic 1344–1350
method for automatic software repair. IEEE Trans Softw Eng Nyathi T, Pillay N (2018) Comparison of a genetic algorithm to gram-
38(1):54–72 matical evolution for automated design of genetic programming
Li W (2004) Using genetic algorithm for network intrusion detection. classification algorithms. Expert Syst Appl 104:213–234
Proc US Dept Energy Cyber Secur Group 1:1–8 Ojugo A, Eboka A, Okonta O, Yoro R, Aghware F (2012) Genetic
Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method algorithm rule-based intrusion detection system (gaids). J Emerg
based on deep learning. Int J Secur Appl 9(5):205–216 Trends Comput Inf Sci 3(8):1182–1194
Liao HJ, Lin CHR, Lin YC, Tung KY (2013) Intrusion detection system: Othman SM, Ba-Alwi FM, Alsohybe NT, Al-Hashida AY (2018) Intru-
a comprehensive review. J Netw Comput Appl 36(1):16–24 sion detection model using machine learning algorithm on big data
Lin WC, Ke SW, Tsai CF (2015) Cann: an intrusion detection system environment. J Big Data 5(1):1–12
based on combining cluster centers and nearest neighbors. Knowl Palmieri F (2019) Network anomaly detection based on logistic
Based Syst 78:13–21 regression of nonlinear chaotic invariants. J Netw Comput Appl
Liu H, Lang B, Liu M, Yan H (2019) Cnn and rnn based payload 148:102460
classification methods for attack detection. Knowl Based Syst Pandey A, Sinha A, PS A (2019) Intrusion detection using sequential
163:332–341 hybrid model. arXiv preprint arXiv:1910.12074
Luo J (1999) Integrating fuzzy logic with data mining methods for Porras PA, Fong MW, Valdes A (2002) A mission-impact-based
intrusion detection. Master’s thesis, Mississippi State University. approach to infosec alarm correlation. In: International workshop
Department of Computer Science on recent advances in intrusion detection. Springer, pp 95–114
Ma T, Wang F, Cheng J, Yu Y, Chen X (2016) A hybrid spectral clus- Praanna K, Sruthi S, Kalyani K, Tejaswi AS (2020) A CNN-LSTM
tering and deep neural network ensemble algorithm for intrusion model for intrusion detection system from high dimensional data
detection in sensor networks. Sensors 16(10):1701

123
9762 G. Kocher, G. Kumar

Pradhan M, Nayak CK, Pradhan SK (2020) Intrusion detection system Sohi SM, Seifert JP, Ganji F (2021) Rnnids: enhancing network intru-
(ids) and their types. In: Securing the internet of things: concepts, sion detection systems through deep learning. Comput Secur
methodologies, tools, and applications. IGI Global, pp 481–497 102:102151
Prusty S, Levine BN, Liberatore M (2011) Forensic investigation of the Sommer R, Paxson V (2010) Outside the closed world: on using
oneswarm anonymous filesharing system. In: Proceedings of the machine learning for network intrusion detection. In: 2010 IEEE
18th ACM conference on Computer and communications security, symposium on security and privacy. IEEE, pp 305–316
pp 201–214 Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statisti-
Qin X, Lee W (2003) Statistical causality analysis of infosec alert data. cal analysis of honeypot data and building of kyoto 2006+ dataset
In: International workshop on recent advances in intrusion detec- for nids evaluation. In: Proceedings of the first workshop on build-
tion. Springer, pp 73–93 ing analysis datasets and gathering experience returns for security,
Rao CS, Raju KB (2019) Mapreduce accelerated signature-based pp 29–36
intrusion detection mechanism (idm) with pattern matching mech- Sperotto A, Sadre R, Van Vliet F, Pras A (2009) A labeled data set for
anism. In: Soft computing in data analytics. Springer, pp 157–164 flow-based intrusion detection. In: International workshop on IP
Revathi S, Malathi A (2013) A detailed analysis on nsl-kdd dataset operations and management. Springer, pp 39–50
using various machine learning techniques for intrusion detection. Srimuang W, Intarasothonchun S (2015) Classification model of net-
Int J Eng Res Technol 2(12):1848–1853 work intrusion using weighted extreme learning machine. In: 2015
Sakurada M, Yairi T (2014) Anomaly detection using autoencoders 12th International joint conference on computer science and soft-
with nonlinear dimensionality reduction. In: Proceedings of the ware engineering (JCSSE). IEEE, pp 190–194
MLSDA 2014 2nd workshop on machine learning for sensory data Srinivas J, Das AK, Kumar N (2019) Government regulations in cyber
analysis, pp 4–11 security: framework, standards and recommendations. Fut Gener
Saleh AI, Talaat FM, Labib LM (2019) A hybrid intrusion detection sys- Comput Syst 92:178–188
tem (hids) based on prioritized k-nearest neighbors and optimized Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep
svm classifiers. Artif Intell Rev 51(3):403–443 Boltzmann machines. In: Advances in neural information process-
Sangster B, O’Connor T, Cook T, Fanelli R, Dean E, Morrell C, Conti ing systems, pp 2222–2230
GJ (2009) Toward instrumenting network warfare competitions to Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R
generate labeled datasets. In: CSET (2014) Dropout: a simple way to prevent neural networks from
Sanjaya SKSSS, Jena K (2014) A detail analysis on intrusion detection overfitting. J Mach Learn Res 15(1):1929–1958
datasets. In: 2014 IEEE international advance computing confer- Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature
ence (IACC) selection algorithm based on ant colony optimization. Eng Appl
Saranya T, Sridevi S, Deisy C, Chung TD, Khan MA (2020a) Per- Artif Intell 32:112–123
formance analysis of machine learning algorithms in intrusion Taddeo M, McCutcheon T, Floridi L (2019) Trusting artificial intelli-
detection system: a review. Procedia Comput Sci 171:1251–1260 gence in cybersecurity is a double-edged sword. Nat Mach Intell
Saranya T, Sridevi S, Deisy C, Chung TD, Khan MA (2020b) Per- 66:1–4
formance analysis of machine learning algorithms in intrusion Tang TA, Mhamdi L, McLernon D, Zaidi SAR, Ghogho M (2016)
detection system: a review. Procedia Comput Sci 171:1251–1260 Deep learning approach for network intrusion detection in software
Sato M, Yamaki H, Takakura H (2012) Unknown attacks detection defined networking. In: 2016 International conference on wire-
using feature extraction from anomaly-based ids alerts. In: 2012 less networks and mobile communications (WINCOM). IEEE, pp
IEEE/IPSJ 12th international symposium on applications and the 258–263
internet. IEEE, pp 273–277 Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis
Şen S, Clark JA (2009) A grammatical evolution approach to intrusion of the kdd cup 99 data set. In: 2009 IEEE symposium on compu-
detection on mobile ad hoc networks. In: Proceedings of the second tational intelligence for security and defense applications. IEEE,
ACM conference on Wireless network security, pp 95–102 pp 1–6
Seo S, Park S, Kim J (2016) Improvement of network intrusion detec- Tidjon LN, Frappier M, Mammar A (2019) Intrusion detection systems:
tion accuracy by using restricted Boltzmann machine. In: 2016 8th a cross-domain overview. IEEE Commun Surv Tutor 21(4):3639–
International conference on computational intelligence and com- 3681
munication networks (CICN). IEEE, pp 413–417 Tirumala SS (2014) Implementation of evolutionary algorithms for deep
Serrano W (2019) The blockchain random neural network in cyberse- architectures. In: CEUR workshop proceedings
curity and the internet of things. In: IFIP international conference Tong X, Wang Z, Yu H (2009) A research using hybrid rbf/elman neural
on artificial intelligence applications and innovations. Springer, pp networks for intrusion detection system secure model. Comput
50–63 Phys Commun 180(10):1795–1801
Sharafaldin I, Gharib A, Lashkari AH, Ghorbani AA (2018) Towards a Tsoukalas LH, Uhrig RE (1997) Fuzzy and neural approaches in engi-
reliable intrusion detection benchmark dataset. Softw Networking neering. 18216097198
2018(1):177–200 U. of massachusetts amherst (2019) U. of massachusetts amherst, opti-
Shen Y, Han T, Yang Q, Yang X, Wang Y, Li F, Wen H (2018) mistic tcp acking. http://traces.cs.umass.edu/. Accessed 12 Feb
Cs-cnn: enabling robust and efficient convolutional neural net- 2019
works inference for internet-of-things applications. IEEE Access Uci (2019) Uci. http://archive.ics.uci.edu/ml/datasets/
6:13439–13448 kdd+cup+1999+data. Accessed 15 Apr 2020
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward devel- Unb (2019a) Unb. http://www.unb.ca/cic/datasets/nsl.html. Accessed
oping a systematic approach to generate benchmark datasets for 15 Apr 2020
intrusion detection. Comput Secur 31(3):357–374 Unb (2019b) https://www.unb.ca/cic/datasets/ids.html Accessed 15
Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach Apr 2020
to network intrusion detection. IEEE Trans Emerg Top Comput Unb (2019c) https://www.unb.ca/cic/datasets/dos-dataset.html.
Intell 2(1):41–50 Accessed 15 Apr 2020
Signal detection theory (2019) Signal detection theory. http://gim.unmc. Viegas F, Rocha L, Gonçalves M, Mourão F, Sá G, Salles T, Andrade
edu/dxtests/roc2.htm/. Accessed 2019 G, Sandin I (2018) A genetic programming approach for fea-

123
Machine learning and deep learning methods for intrusion detection systems: recent… 9763

ture selection in highly dimensional skewed data. Neurocomputing Zhang Q, Yang LT, Chen Z (2015a) Deep computation model for unsu-
273:554–569 pervised feature learning on big data. IEEE Trans Serv Comput
Vinayakumar R, Alazab M, Soman K, Poornachandran P, Al-Nemrat 9(1):161–171
A, Venkatraman S (2019) Deep learning approach for intelligent Zhang Q, Yang LT, Chen Z (2015b) Privacy preserving deep compu-
intrusion detection system. IEEE Access 7:41525–41550 tation model on cloud for big data feature learning. IEEE Trans
Wang H, Gu J, Wang S (2017) An effective intrusion detection frame- Comput 65(5):1351–1362
work based on SVM with feature augmentation. Knowl Based Syst Zhang Y, Zhang Y, Zhang N, Xiao M (2020) A network intrusion detec-
136:130–139 tion method based on deep learning with higher accuracy. Procedia
Wisesty UN, et al. (2017) Comparative study of conjugate gradient to Comput Sci 174:50–54
optimize learning process of neural network for intrusion detection Zhao L, Hu Q, Wang W (2015) Heterogeneous feature selection with
system (ids). In: 2017 3rd International conference on science in multi-modal deep neural networks and sparse group lasso. IEEE
information technology (ICSITech). IEEE, pp 459–464 Trans Multimed 17(11):1936–1948
Worku A (2019) Minimizing black hole attack in mobile ad hoc network Zhao S, Li W, Zia T, Zomaya AY (2017) A dimension reduc-
with anomaly based ids approach. PhD thesis, ASTU tion model and classifier for anomaly-based intrusion detection
Yang S, Li M, Liu X, Zheng J (2013) A grid-based evolutionary algo- in internet of things. In: 2017 IEEE 15th international con-
rithm for many-objective optimization. IEEE Trans Evol Comput ference on dependable, autonomic and secure computing, 15th
17(5):721–736 international conference on pervasive intelligence and com-
Yang Y, Zheng K, Wu C, Niu X, Yang Y (2019) Building an effec- puting, 3rd international conference on big data intelligence
tive intrusion detection system using the modified density peak and computing and cyber science and technology congress
clustering algorithm and deep belief networks. Appl Sci 9(2):238 (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 836–843
Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion Ziegler S (2019) Internet of things cybersecurity paradigm shift, threat
detection using recurrent neural networks. IEEE Access 5:21954– matrix and practical taxonomy. In: Internet of things security and
21961 data protection. Springer, pp 1–7
Yu J, Reddy YR, Selliah S, Kankanahalli S, Reddy S, Bharadwaj V
(2004) Trinetr: an intrusion detection alert management systems.
In: 13th IEEE international workshops on enabling technologies: Publisher’s Note Springer Nature remains neutral with regard to juris-
infrastructure for collaborative enterprises. IEEE, pp 235–240 dictional claims in published maps and institutional affiliations.

123

You might also like