0% found this document useful (0 votes)
51 views22 pages

Performance Analysis of Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views22 pages

Performance Analysis of Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Disha and Waheed Cybersecurity (2022)5:1

https://doi.org/10.1186/s42400-021-00103-8
Cybersecurity

RESEARCH Open Access

Performance analysis of machine learning


models for intrusion detection system using
Gini Impurity-based Weighted Random Forest
(GIWRF) feature selection technique
Raisa Abedin Disha1* and Sajjad Waheed2

Abstract
To protect the network, resources, and sensitive data, the intrusion detection system (IDS) has become a fundamental
component of organizations that prevents cybercriminal activities. Several approaches have been introduced and
implemented to thwart malicious activities so far. Due to the effectiveness of machine learning (ML) methods, the
proposed approach applied several ML models for the intrusion detection system. In order to evaluate the per-
formance of models, UNSW-NB 15 and Network TON_IoT datasets were used for offline analysis. Both datasets are
comparatively newer than the NSL-KDD dataset to represent modern-day attacks. However, the performance analy-
sis was carried out by training and testing the Decision Tree (DT), Gradient Boosting Tree (GBT), Multilayer Percep-
tron (MLP), AdaBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the binary classification
task. As the performance of IDS deteriorates with a high dimensional feature vector, an optimum set of features was
selected through a Gini Impurity-based Weighted Random Forest (GIWRF) model as the embedded feature selection
technique. This technique employed Gini impurity as the splitting criterion of trees and adjusted the weights for two
different classes of the imbalanced data to make the learning algorithm understand the class distribution. Based upon
the importance score, 20 features were selected from UNSW-NB 15 and 10 features from the Network TON_IoT data-
set. The experimental result revealed that DT performed well with the feature selection technique than other trained
models of this experiment. Moreover, the proposed GIWRF-DT outperformed other existing methods surveyed in the
literature in terms of the F1 score.
Keywords: Cyber security, Feature selection, Intrusion Detection System, Machine learning, Network security

Introduction to organizations, the risk of security breaches has also


In this era of modernization, information system has emerged. To protect sensitive data as well as the Informa-
become a prominent asset for companies and organiza- tion Technology (IT) infrastructure from cyber attackers,
tions to collect, store, process, manage and distribute several information security measures have been taken by
information effectively in an organized way. As soon as organizations. One of the essential components of cyber-
information system technology has been introduced security is the Intrusion Detection System (IDS). Based
upon the knowledge it is provided, an IDS can identify
the anomalous traffic and alert the network administra-
*Correspondence: [email protected] tor accordingly (Scarfone and Mell 2007). In general,
1
Department of Information and Communication Technology, the architecture of an intrusion detection system (IDS)
Bangladesh University of Professionals, Mirpur Cantonment, Dhaka 1216,
is developed based on four functional modules—(1) E
Bangladesh
Full list of author information is available at the end of the article block (Event-boxes), (2) D block (Database-boxes), (3)

© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/.
Disha and Waheed Cybersecurity (2022)5:1 Page 2 of 22

A blocks (Analysis-boxes), and (4) R blocks (Response- is maintained, and the IDS matches the analyzed data
boxes) (Garcia-Teodoro et al. 2009). In a nutshell, E block with the database to find out the intrusion (Khraisat et al.
(Event-Boxes) contains sensor elements for monitor- 2019). It is the best fit for detecting known attacks, but
ing the system, and it acquires information of events for unable to detect new types (previously unseen attacks).
further analysis. This acquired information needs to be The false-positive rate is considerably lower in Signature-
stored for processing. For this purpose, D block (Data- based IDS. On the contrary, Anomaly-based IDS (AIDS)
base-boxes) elements store the information that coming tries to understand the normal behavior of the system
from the E block. A block (Analysis-boxes) is the pro- and sets a threshold value. When the given observation at
cessing module, where the detection of malicious behav- an instant deviates from normal behavior exceeding the
ior is carried out by analyzing the events. After detecting preset threshold value, an anomaly alarm is raised (Liao
the hostile behavior, the most important task is to pre- et al. 2013). As Anomaly-based IDS tries to find out sus-
vent that threat. So, if any kind of intrusion occurs, the picious events, it is a good solution for detecting previ-
R block (Response- boxes) takes initiative to thwart the ously unseen attacks. However, the false-positive rate of
malicious event with appropriate response execution intrusion detection is higher in AIDS than in SIDS.
(Chandola et al. 2009). An illustration of the IDS frame- Due to the tremendous usage of cloud services and the
work is given in Fig. 1. Based on information source (E Internet of things (IoT), network traffic is also uprais-
block), an IDS can be categorized into Host-based IDS ing every day in an enormous amount. It makes it quite
(HIDS) and Network-based IDS (NIDS). HIDS is related difficult for the IDS to distinguish between normal and
to operating system information (system calls and pro- anomalous behavior of network traffic, especially while
cess identifiers), whereas NIDS performs analysis on detecting zero-day attacks (previously unseen attacks). To
the network events (IP address, protocols, service ports, address this issue, Machine Learning (ML) methods have
traffic volume, etc.). Based on the analysis performed in become a convenient and effective technique for identify-
A block, IDS can be classified into Signature-based IDS ing and categorizing multiple network attacks. Machine
(misuse-based) and Anomaly-based IDS. In Signature- Learning technology makes the IDS able to learn and
based IDS (SIDS), a database of known attack signatures improve the system’s performance by analyzing previ-
ous data. Moreover, computer programs that use ML
do not require to be explicitly programmed as they can
learn by themselves (Naqa and Murphy 2015). The intru-
sion detection technique is still the core of many types of
research works as the detection rate and accuracy score
of the machine learning models are not up to the mark
for classifying the intrusion. Furthermore, many solu-
tions are not effective enough for a large number of data
using the full dataset (Yin et al. 2017). Also, a lot of intru-
sion detection works have been conducted using NSL-
KDD 99 or KDD 99 dataset, which is now considered
outdated to identify modern cyber-attacks (Moustafa and
Slay 2015; Labonne 2020; Divekar et al 2018).
In this research, a binary classification task imple-
menting supervised machine learning models had been
conducted. Binary classification happens when the
supervised machine learning model is assigned to predict
a discrete value as a ‘normal’ or ‘attack’ instance (Har-
rington 2012). The dataset used in this study is relatively
large and owns a high dimension of feature space. For a
large dataset, it is important to choose a feature selec-
tion method for dropping the irrelevant features, and use
only the important set of features in both the training
and testing step (Dong and Liu 2018). Feature selection
is the process of choosing a subset of relevant features by
reducing the number of input variables while building a
predictive model. It is desired in most cases to improve
Fig. 1 Common intrusion detection architecture for IDS the performance of predictive models. Also, irrelevant
Disha and Waheed Cybersecurity (2022)5:1 Page 3 of 22

features have a negative impact on the performance of method for the classification task. A filter-based feature
the model. However, as the IDS deals with numerous data selection method was used to train and test their models.
instances which have irrelevant or redundant features, it For their experimental process, the NSL-KDD dataset
not only decreases the detection accuracy score but also was analyzed. 14 significant features had been selected
increases the computational cost (Zaman and Karray by filter-based feature selection technique to reduce the
2009). Feature selection gives the solutions to some fre- time complexity. However, this study was conducted for
quently occurred mistakes in IDS by identifying the rel- both binary and multiclass classification tasks. The result
evant features which consist of essential information for showed that they obtained an accuracy of 83.66% for
the classification task (Mohammadi et al. 2019). a multiclass classification task that successfully identi-
In our proposed approach, a Gini Impurity-based fied five different attacks and 90.30% accuracy for binary
weighted Random Forest (GIWRF) model was devel- classification.
oped to select the important and relevant features based Another filter-based approach was introduced to detect
on the importance score. The feature importance scores Distributed Denial of Service (DDoS) attacks by Osan-
were calculated by adjusting the weight in the Random aiye et al. (2016). They used multiple filters: Chi-Square,
Forest algorithm (Breiman 2001) for imbalanced class Information Gain, Gain Ratio, and ReliefF algorithm for
distribution, and employing the Gini impurity criterion selecting an optimum number of features. For perfor-
for splitting the trees. However, the Decision Tree (DT), mance analysis of the system, they trained and evaluated
Gradient Boosting Tree (GBT), Multilayer Perceptron the models on the NSL-KDD dataset. The classifier used
(MLP), Adaptive Boosting (AdaBoost), Long-Short Term in this approach was DT. The model had been validated
Memory (LSTM), and Gated Recurrent Unit (GRU) with a tenfold cross-validation technique before testing.
were developed over two datasets for ML-based IDS Their result indicated that DT was capable of detecting
framework. Data preprocessing or Data engineering was DDoS attacks with an accuracy of 99.67% while using 13
required before training and testing the models. The data important features out of the whole feature set.
preprocessing phase included three steps which would be Alazzam et al. (2020) compared the performance of IDS
discussed later in this paper. However, performance anal- using Decision Tree (DT) as a binary classifier. Pigeon
ysis for the binary classification task was conducted using Inspired Optimizer (PIO) is considered as the feature
UNSW-NB 15 dataset (Moustafa et al. 2018; Moustafa reduction technique in this study. PIO is a swarm intel-
and Slay 2016), and the Network TON_IoT dataset ligence algorithm inspired by pigeon’s homing behav-
(Moustafa 2021) which are relatively new datasets con- ior. When a flock of pigeons flies a long way home, they
taining modern attack patterns. The rest of the paper is use different navigation tools in different phases of their
organized as follows. First of all, related research works flight (Liu et al. 2019). The pigeons continuously change
have been discussed in “Literature review” section. After their position according to the navigation tool (map and
that, a brief discussion of the other existing datasets for compass operator) following the best pigeon that has
IDS and the experimental datasets (UNSW-NB 15 and the best position. Based on this philosophy of search
Network TON_IoT) have been presented in “Datasets” and optimization, PIO has been further developed. In
section. The proposed methodology of the experimental this experiment, two types of PIO were used for feature
process has been explained subsequently in “Proposed reduction: Cosine PIO and Sigmoid PIO. The authors
methodology” section. “Experiments and results” sec- carried out the simulation on UNSW-NB 15, NSL-KDD,
tion has focused on the experiment environment and the and KDDCup99 datasets. Using Sigmoid PIO, 10 features
result of the models. Based on the experimental result, a from KDDCup99, 14 features from the UNSW-NB data-
discussion segment has been added in “Discussion” sec- set, and 18 features from NSL-KDD were selected. Also,
tion, and finally, the conclusion (“Conclusion” section) they selected 7 features from KDDCup99, 5 features from
has ended the paper. UNSW-NB-15, and NSL-KDD using Cosine PIO for
comparing the result. Sigmoid PIO obtained an accuracy
Literature review of 94.7% for KDDCup99, 86.9% using the NSL-KDD, and
Different Intrusion detection techniques have been 91.3% over the UNSW-NB 15 dataset. On the contrary,
explained and suggested over the last 2 decades (Catania Cosine PIO achieved 91.7% accuracy over UNSW-NB 15,
and Garino 2012). To build robust and efficient IDS many 88.3% on NSL-KDD, and 96% on KDDCup99.
types of research have been proposed with machine Another experiment was conducted by Khan et al.
learning approaches and made some improvements in (2018) to enhance the performance of Random For-
this area. An approach was introduced by Ingre and est, Extreme Gradient Boosting (XGBoost), Decision
Yadav (2015) for ML-based IDS that combined the Deci- Tree, Bagging Meta Estimator, and K-Nearest Neigh-
sion Tree (DT) classifier and correlation input selection bors (KNN). The goal of this study was to decrease
Disha and Waheed Cybersecurity (2022)5:1 Page 4 of 22

the computational time with reduced features while dataset for offline performance analysis was also the
improving the accuracy of models. The subset of 11 UNSW-NB 15 dataset.
features had been chosen according to the feature Kasongo and Sun (2020) implemented five super-
importance calculated by the Random Forest model. vised models: Support Vector Machine (SVM), Logistic
Random Forest obtained 74.87% accuracy, XGBoost Regression (LR), k-Nearest-Neighbour (kNN), Decision
achieved 71.43% detection accuracy, 74.64% for Bag- Tree (DT), and Artificial Neural Network (ANN) using
ging Meta Estimator using UNSW-NB Dataset. Also, a filter-based feature reduction technique. The Extreme
Decision Tree and KNN attained an accuracy of 74.22% Gradient Boosting (XGBoost) technique was used for
and 71.10% respectively. However, the training time for reducing the feature vector from 42 to 19 based on the
XGBoost was comparatively higher than other models feature importance score. This study compared the per-
in this experiment. Decision Tree required the lowest formance of models by training and testing them on
prediction time and KNN needed significantly higher UNSW-NB 15 dataset. However, they conducted the
prediction time than other models. work for both binary and multiclass classification. With
An improved anomaly detection technique was pro- reduced features, they observed an increment in accu-
posed by Tama and Rhee (2019) using Gradient Boost- racy from 88.13 to 90.85% for the DT model while per-
ing Machine (GBM) with complete features of NSL KDD, forming the binary classification task.
UNSW-NB 15, and GPRS dataset. The authors statisti- Another intrusion detection framework was proposed
cally assessed the superiority of GBM over other models: by Meftah et al. (2019) that included two stages. In the
Support Vector Machine (SVM), Random Forest, Classi- first stage, their approach performed binary classifica-
fication and Regression Tree (CART), and Deep Neural tion. In the second stage, the attack traffic output was
Network (DNN) in terms of area under the receiver oper- fed to multiclass classifiers for identifying each attack.
ating characteristic curve (AUC), specificity, false-pos- The authors used Random Forest for the feature selec-
itive rate, sensitivity, and accuracy. The experiment was tion technique for the binary classification task. They
simulated in a Python environment with machine learn- trained the Gradient Boost Machine, Logistic Regres-
ing libraries. GBM outperformed other methods with sion, and Support Vector Machine (SVM) and the test
an accuracy score of 91.31% for the UNSW-NB Dataset, result revealed that SVM obtained the maximum detec-
91.82% for KDDTest+, and 86.51% for KDDTest-21. tion accuracy of 82.11%. The Gradient Boost Machine
An SVM and Artificial Neural Network (ANN) based and Logistic Regression obtained the accuracy score of
technique was proposed by Aboueata et al. (2019) for 61.83% and 77.21% respectively. For multiclass classifica-
intrusion detection systems in cloud environments. tion, they applied multinomial Support Vector Machine,
Additionally, they considered Univariate and Principal Decision Trees (C5.0), and Naïve Bayes. However, the
Component Analysis (PCA) together for choosing an Decision Tree achieved the highest accuracy of 74%, and
optimal set of features. They trained and tested the mod- 86% F1 score for multiclass classification. Moreover, they
els on UNSW-NB 15 dataset and performance evalua- presented constructive criticism on UNSW-NB 15 data-
tion is carried out in terms of F1 score, Precision, Recall set, which had been used in their experimental analysis.
as well as Accuracy. However, feature engineering along Injadat et al. (2020) proposed an ML-based NIDS
with parameter tuning was performed by the authors to framework that studied the oversampling technique on
obtain maximum accuracy while reducing the time com- the training data and selected a suitable sample size for
plexity of the models. The study indicated that with an training the KNN and RF. They applied correlation-based
appropriate set of features, ANN and SVM techniques (CBFS) and information gain (IGBFS) feature selection
were capable of detecting anomalies with 91% and 92% techniques and compared the performance of classi-
accuracy scores respectively. fiers using CICIDS 2017 and UNSW-NB 2015 datasets.
Jing and Chen (2019) introduced another SVM-based However, unlike other research works which used the
anomaly detection approach for both binary classifica- benchmark UNSW-NB 15 training and UNSW-NB
tion and multiclass classification. A nonlinear feature 15 test datasets, this study used the full UNSW-NB 15
scaling method was applied in their simulation. However, data (2,540,044 instances) and made a random split for
they used Radial Basis Function (RBF) to map low dimen- training and testing. Moreover, they used the oversam-
sional space to high dimensional space in SVM. RBF ker- pling technique, SMOTE to synthetically produce more
nel is suitable for large dataset evaluation in terms of instances to a smaller class. They analyzed the effect of
computational time and accuracy. They achieved 85.99% feature selection on the training sample and feature set
testing accuracy for binary classification and 75.77% of size. The performance of the models was evaluated based
accuracy for multiclass classification. Their preferred on Accuracy, Precision, Recall, and FAR (False Alarm
Rate) score. The oversampling method as well as the
Disha and Waheed Cybersecurity (2022)5:1 Page 5 of 22

selection of an optimum train size helped to achieve an showed the best trade-off between performance and time
accuracy score of 99% for both datasets. complexity.
Belgrana et al. (2021) proposed two approaches for Gu and Lu (2021) proposed an effective approach for
NIDS. Firstly, they applied Condensed Nearest Neigh- IDS using SVM with Naïve Bayes algorithm together to
bors (CNN) to reduce the dimensionality of features as classify intrusion and normal instances. In that work, the
well as the computational time. Secondly, the authors Naïve Bayes algorithm was used to transform the original
implemented a Radial Basis Function (RBF) neural net- features into new data. After that, the newly transformed
work to realize the performance training on the NSL- data was used to train the SVM model for the classifi-
KDD dataset. Then, they compared the performance cation task. They applied this approach on UNSW-NB
of KNN, C4.5, and IBK (version of KNN in Weka) with 15, CICIDS2017, NSL-KDD, and Kyoto 2006 + datasets
CNN and RBF model with all features of NSL-KDD and to evaluate the performance. Their experimental result
with reduced features in terms of True Positive Rate showed that they achieved good performances with
(TPR), False Alarm Rate (FAR) and Missed Attacks 98.92% accuracy for the CICIDS2017 dataset, 99.35%
Rate (MAR). Their proposed RBF and CNN obtained accuracy for the NSL-KDD dataset, 93.75% accuracy on
the accuracy (success rate) score of 94.28% and 95.54% UNSW-NB 15 dataset, and 98.58% accuracy applying
respectively. However, they found CNN faster than Kyoto 2006 + dataset.
RBF, though the MAR in CNN was lower than RBF. Moustafa (2021) created a new dataset called, Network
A hybrid approach was proposed by Lee et al. (2020), TON_IoT dataset and used a Wrapper Feature Selec-
where the authors proposed a deep sparse autoencoder tion technique-based RF to select the important features.
for feature selection in the data pre-processing step. They developed the GBM, RF, NB, and DNN to evaluate
Autoencoder is widely used in image processing to the performance. After feature selection, the GBM, RF
compress the image without decreasing the number of and DNN achieved the accuracy score of 93.83%, 99.98%,
features, so this research used autoencoder to compress and 99.92% respectively and for NB they obtained the
the features of training and test datasets without having AUC score of 91.28%.
any loss. They compared the performance of a single RF A summary of the related works that used the ML tech-
and their proposed Deep Sparse Autoencoder Random niques for IDS has been presented in Table 1.
Forest (DSAE-RF) to realize the effect of the proposed
feature compression technique. The authors evaluated Datasets
their approach in terms of Accuracy, Precision, Recall, Existing datasets
and F1 score by training the DSAE-RF on CICIDS2017 Several datasets are available to evaluate the security
Dataset. The accuracy score for DSAE-RF was 99.83%, systems of cybersecurity. However, many datasets do
which was slightly lower than single RF (99.86%), but not contain modern cyber-attack patterns for intrusion
the precision and recall score was comparatively higher detection systems such as NSL-KDD. In this segment,
than single RF. In addition, they analyzed the perfor- the most recent and widely used datasets for ML-based
mance of the model according to the hidden layer’s IDS have been discussed as follows:
structure. However, they claimed that their proposed
DSAE-RF model requires less training and testing time 1. CAIDA datasets CAIDA datasets (Hick et al. 2007)
than the single RF. are a set of enormous different data sources for veri-
Based on Neural Network approaches an experimental fying IDSs performance with very few numbers of
review for Network Intrusion Management was provided attack vectors (e.g., DDoS). The dataset is available as
by Mauro et al. (2020). The authors offered a full view of CAIDA 2007, which contains anonymized network
some promising neural networks which are very relevant data for DDoS attacks without payload. However, the
for applying to ML-based IDS. For their experimental datasets neither possess the audit traces of telemetry
analysis, they developed Convolutional Neural Network, data of IoT sensors and operating systems nor have a
MLP, Recurrent Neural Network (RNN), Wilkes Stonham ground truth of the security events.
and Aleksander Recognition Device (WiSARD), Learn- 2. The Kyoto dataset It was built at Kyoto University
ing Vector Quantization (LVQ), Self-Organizing Maps collecting the network traffic from a honeypot envi-
(SOM) using CIC-IDS-2017/2018 dataset released from ronment. The Kyoto 2006 + (Song et al. 2011) dataset
Canadian institute for cybersecurity. As per their experi- was generated by using the Zeek tool which extracted
mental analysis, MLP is much slower than other neural 24 features from the original KDD99 dataset. How-
networks due to the backpropagation property, though it ever, this dataset did not have the testbed configura-
performed well in terms of detection performance. LVQ tion with the IoT system, and network components.
performed inferior to other networks, whereas WiSARD
Disha and Waheed Cybersecurity (2022)5:1 Page 6 of 22

Table 1 Related works on IDS using ML models


Proposed by Used dataset Feature selection Algorithm Accuracy

Khan et al. (2018) UNSW-NB 15 dataset Feature importance (RF) XGBoost, 71.43%,
RF, 74.87%,
Bagging, 74.64%,
KNN, 74.22%,
DT 71.10%
Tama and Rhee (2019) NSL KDD, Complete feature GBM 91.31% (UNSW-NB),
UNSW-NB 15 and GPRS 91.82% (KDDTest +),
86.51% (KDDTest-21)
Jing and Chen (2019) UNSW-NB 15 All features SVM 85.99% (binary classification),
75.77% (multiclass classifica-
tion)
Kasongo and Sun (2020) UNSW-NB 15 XGBoost algorithm SVM, 60.89,
Logistic Regression, 77.64,
KNN, 84.46,
DT, 90.85,
ANN 84.39
Ingre and Yadav (2015) NSL-KDD filter-based DT 83.66% (multiclass classifica-
tion),
90.30% (binary classification)
Osanaiye et al. (2016) NSL-KDD Chi-Square, information DT 99.67%
gain, gain ratio, and relieff
algorithm
Alazzam et al. (2020) UNSW-NB 15, Sigmoid PIO DT 91.7% (UNSW-NB 15);
NSL-KDD, KDDCup99 Cosine PIO DT 88.3% (NSL-KDD);
96% (KDDCup99);
91.3% (UNSW-NB)
86.9% (NSL-KDD)
94.7% (KDDCup99)
Aboueata et al. (2019) UNSW-NB 15 Univariate and principal com- ANN, SVM 91%, 92%
ponent analysis (PCA)
Meftah et al. (2019) UNSW-NB 15 Random forest GBM, 61.83%,
LR, 77.21%,
SVM 82.11%
Injadat et al. (2020) CICIDS 2017, Correlation based (CBFS) and KNN 99% for both datasets
UNSW-NB 15 information gain (IGBFS) RF
Belgrana et al. (2021) NSL-KDD Condensed nearest neighbors Radial basis function (RBF), 94.28%,
(CNN) CNN 95.54%
Lee et al. (2020) CICIDS2017 dataset Autoencoder Deep sparse autoencoder 99.83%
random forest (DSAE-RF)
Gu and Lu (2021) UNSW-NB 15, CICIDS2017, Feature transformation with SVM 98.92% (CICIDS2017),
NSL-KDD, and Kyoto 2006 + Naïve Bayes 99.35% (NSL-KDD),
93.75% (UNSW NB15),
98.58% (Kyoto 2006 +)
Moustafa (2021) Network TON_IoT Wrapper feature selection GBM 93.83%,
technique-based RF RF 99.98%,
DNN 99.92%

3. ISCX dataset This dataset (Shiravi et al. 2012) was rity (CIC) which involves several normal and attack
created by capturing normal traffic and a synthetic (hack) scenarios employing data profiling concept,
attack simulator, which was carried out in a real-time similar to ISCX dataset. The network traffic of this
simulation environment for more than 1 week. How- dataset was scrutinized by the CICFlowMeter. The
ever, there was no ground truth of the intrusions, and flow identifiers and the time-stamp were utilized by
the proofing concept requires enormous computa- the CICFlowMeter with the labeled data to investi-
tional resources making it unsuitable to be applied in gate the network traffic. However, CICIDS 2017 also
real-time. has the same shortcomings as the ISCX dataset.
4. CICIDS 2017 The dataset (Sharafaldin et al. 2018) 5. The UNSW-NB 15 dataset The dataset (Moustafa
was created at the Canadian Institute for Cybersecu- and Slay 2015) was designed at the University of New
Disha and Waheed Cybersecurity (2022)5:1 Page 7 of 22

South Wales, Canberra, especially to evaluate the


Network Intrusion Detection System. It used IXIA
traffic generator which captured 2,540,044 obser-
vations. Out of the whole data, a portion is avail-
able online as two separated.csv files called UNSW_
NB15_training-set and UNSW_NB15_test-set.
However, this dataset does not contain any security
events against the operating system and IoT.
6. TON_IoT Datasets It was also created at the Univer-
sity of New South Wales, Canberra deploying a novel
testbed architecture by Moustafa (2021). The data
was collected and labeled by executing real-world
normal and intrusion scenarios. The final dataset was
named TON_IoT as the data was gathered from dif-
ferent telemetry datasets of IoT services, the dataset
of network traffic, and Windows and Linux-based
datasets.
Fig. 2 Transaction class distribution of UNSW-NB 15 Train Dataset

Discussion about experimental datasets


UNSW-NB 15 dataset
For our experimental process of Intrusion Detection
System (IDS) the first dataset we used for the offline
analysis was UNSW-NB 15 (Moustafa and Slay 2015).
UNSW-NB 15 dataset is comparatively newer than NSL
KDD 99 or KDD 99, CAIDA, Kyoto 2006 + , and ISCX
dataset. It contains modern network traffic for both nor-
mal and anomalous instances including present-day low
footprint attacks. It is available in a clean format, and
there is no redundancy in the data, thus making it more
suitable for reliable evaluation for Network Intrusion
Detection systems. The total number of data instances
is 2,540,044 which are kept in four.csv files. From these
records, a partition of 175,341 instances is considered as
a training set and 82,332 data instances as the test set. In
UNSW-NB 15 training set, a significant portion (68.1%)
of data instances is attack types and just less than one-
third (31.9%) of data instances are normal transactions, Fig. 3 Transaction class distribution of UNSW-NB 15 Test Dataset
as shown in Fig. 2. Similarly, in the test set, 55.1% of total
data is attack traffic, and 44.9% of instances contain nor-
mal transactions, as shown in Fig. 3. Attacks are repre- types of attacks (Moustafa and Slay 2016), which can be
sented by ‘1’, whereas normal traffic is indicated by ‘0’ described as in the following points:
in the class label of the dataset for binary classification.
The dataset does not have any duplicate records to make 1. Fuzzers It is a malicious activity by the perpetrators
the classifiers being biased. However, attack types in the in which they explore the security vulnerabilities in
UNSW-NB 15 train and test dataset are similar. Figure 4 applications, programs, operating systems, or net-
has illustrated the distribution of attack categories in works by flooding it with enormous random data
both datasets. In its clean format, the dataset contains with the aim of crashing it.
44 features, where ‘attack_cat’ and ‘label’ are output vari- 2. Analysis It involves a wide range of intrusions that
ables (labeled feature). The data instances can be catego- penetrate the web application via emails (using
rized into float, categorical and binary format as shown in spam), web scripts (through HTML files), and ports
Table 2. Both of train and test set contains nine different (via port scan techniques).
Disha and Waheed Cybersecurity (2022)5:1 Page 8 of 22

Fig. 4 Distribution of categories in UNSW-NB 15 Train and Test dataset

3. Backdoor This technique is used to bypass the nor- Network TON_IoT dataset
mal authentication process, and let unauthorized TON_IoT network dataset is the most recent dataset
users (attackers) access a computer or device, which created by Moustafa (2021). This dataset was collected
helps them to execute commands remotely. by the Intelligent Security Group of the Cyber Range
4. DoS (Denial of Service) In DoS, the attacker attempts and IoT Labs of UNSW, which contains nine very recent
to make the computer resources unavailable to attacks traffic. The full data record contains 22,339,021
authorized users temporarily or indefinitely by shut- instances, from where a portion of 461,043 instances is
ting down theservices provided by the target system being considered as ‘Train_Test_Network_dataset’ for
or network. evaluating new Artificial Intelligence-based cybersecu-
5. Exploit It is a series of instructions given by the per- rity solutions. The dataset was published for applying
petrators, which takes advantage of any vulnerability, into different machine learning models and to handle the
bug, or glitch that exists in the network or system. challenges of class imbalance, which made it suitable for
6. Generic It is a malicious activity in which the attackers using in this research. However, the dataset has 44 fea-
do not care about the cryptographical implementation tures excluding the target variable ‘label’ and ‘type’. As
of any primitives. It works successfully against all block the author suggested in Moustafa (2021) four features:
ciphers using the hash function to cause a collision, no source IP, destination IP, source ports and destination
matter what configurations the block ciphers have. ports were removed from the dataset before applying the
7. Reconnaissance/ Probe This attack collects informa- feature selection technique in the proposed approach,
tion about the computer network in order to dodge and from this dataset, 80% of data (368,834) was selected
the security controls. for training the models, and other 20% (92,209) was used
8. Shellcode For the purpose of taking control over the for testing the models. As shown in Table 3, the features
compromised device or machine, attackers write the are categorized into four types of format: String, Num-
code and inject it into the application that activates ber, Boolean, and Time, which belong to four service
the command shell. groups, such as connection, statistics, user attributes (i.e.
9. Worms It is a self-replicating code that exploits the HTTP, SSL activities and DNS), and violation attributes.
system vulnerabilities or uses social engineering Unlike UNSW NB 15, where the number of attack traffic
techniques to gain access to the system. It reduces is higher than normal instances, the TON_Train_Test_
the availability of the system by consuming memory Network dataset posses 65.1% normal instances and
and network bandwidth. 34.9% attack traffics as shown in Fig. 5. The distribution
Disha and Waheed Cybersecurity (2022)5:1 Page 9 of 22

Table 2 List of features exist in UNSW-NB 15 Dataset Table 3 List of features exist in TON_IoT Train_Test Network
Dataset
Feature number Format Feature
Feature number Feature Format
1 Float dur
2 Categorical proto 1 dns_AA Boolean
3 Categorical service 2 dns_RD Boolean
4 Categorical state 3 dns_RA Boolean
5 Integer spkts 4 dns_rejected Boolean
6 Integer dpkts 5 ssl_resumed Boolean
7 Integer sbytes 6 ssl_established Boolean
8 Integer dbytes 7 weird_notice Boolean
9 Float rate 8 src_port Number
10 Integer sttl 9 dst_port Number
11 Integer dttl 10 duration Number
12 Float sload 11 src_bytes Number
13 Float dload 12 dst_bytes Number
14 Integer sloss 13 missed_bytes Number
15 Integer dloss 14 src_pkts Number
16 Float sinpkt 15 src_ip_bytes Number
17 Float dinpkt 16 dst_pkts Number
18 Float sjit 17 dst_ip_bytes Number
19 Float djit 18 dns_qclass Number
20 Integer swin 19 dns_qtype Number
21 Integer stcpb 20 dns_rcode Number
22 Integer dtcpb 21 http_trans_depth Number
23 Integer dwin 22 http_request_body_len Number
24 Float tcprtt 23 http_status_code Number
25 Float synack 24 http_response_body_len Number
26 Float ackdat 25 http_user_agent Number
27 Integer smean 26 label Number
28 Integer dmean 27 src_ip String
29 Integer trans_depth 28 dst_ip String
30 Integer response_body_len 29 proto String
31 Integer ct_srv_src 30 service String
32 Integer ct_state_ttl 31 conn_state String
33 Integer ct_dst_ltm 32 dns_query String
34 Integer ct_src_dport_ltm 33 ssl_version String
35 Integer ct_dst_sport_ltm 34 ssl_cipher String
36 Integer ct_dst_src_ltm 35 ssl_subject String
37 Binary is_ftp_login 36 ssl_issuer String
38 Integer ct_ftp_cmd 37 http_method String
39 Integer ct_flw_http_mthd 38 http_uri String
40 Integer ct_src_ltm 39 http_referrer String
41 Integer ct_srv_dst 40 http_version String
42 Binary is_sm_ips_ports 41 http_orig_mime_types String
43 Categorical attack_cat 42 http_resp_mime_types String
44 Binary label 43 weird_name String
44 weird_addl String
45 type String
46 ts Time
Disha and Waheed Cybersecurity (2022)5:1 Page 10 of 22

try to find out the active IP addresses and the open


port of the vulnerable target system using Nmap or
Nessus or any other scanning tool.
2. Denial of Service (DoS) DoS attack is one of the fre-
quently occurred invasions in the network that
induces fake flooding to target network for making
services unavailable to its users.
3. Distributed Denial of Service (DDoS) In DDoS attack,
multiple DoS attacks are launched together to cor-
rupt the system. Usually, an attacker infects many
vulnerable systems with malware and turns each sys-
tem into a bot or zombie. Then, taking control over
the botnet it attacks the target from all of the com-
promised bot systems.
4. Ransomware attack It is a complicated malware that
infects any system or service by encrypting those and
makes the authorized user unable to access until they
pay the attacker ransom money.
Fig. 5 Class distribution in TON_Train_Test Network data 5. Backdoor attack It bypasses the normal authentica-
tion process and gains high-level user access to the
vulnerable system for stealing user’s personal data,
financial information, installing other malware, or
hijacking the devices.
6. Injection attack This attack inserts or injects any
malicious input or any forged data from the client to
a web application, and forces it to execute some com-
mands for changing the operation.
7. Cross-site Scripting (XSS) attack It is a type of vulner-
ability where the attackers inject client-side scripts
into the web pages that are accessed by other users,
and employs the web pages to transmit the malicious
code to the other end user’s system in the form of
browser-side script.
8. Password cracking attack Password cracking refers to
any hacking technique that continuously tries to dis-
cover the correct password by employing brute force
or dictionary attacks.
9. Man-In-The-Middle (MITM) attack MITM is a
kind of eavesdropping attack, where the attackers
Fig. 6 Distribution of categories in TON_Train_Test Network data put themselves between two parties (e.g. users and
applications), and impersonate as one of the parties
to steal personal information (e.g. login credentials,
credit card information etc.) from them.
of attack categories as shown in Fig. 6 has illustrated that
except Man-in-the-middle (MITM) other attacks have
an equal distribution of records. The MITM attack has Proposed methodology
the lowest, and normal traffic has the highest records in The main motivation of this study was to apply a suit-
the dataset. able feature selection technique for imbalanced data
However, the nine different attacks that exist in the and make a comparison of the performance of the ML
dataset can be summarized as following points: models described in “Machine learning methods under
scrutiny” section by training those over the datasets
1. Scanning attack The goal of a scanning attack is to summarized in “Discussion about experimental data-
gather information about the target system. Attackers sets” section. The diagram of the proposed approach
has been illustrated in Fig. 7.
Disha and Waheed Cybersecurity (2022)5:1 Page 11 of 22

Fig. 7 Proposed ML-based IDS framework

Data preprocessing probability (P) of data instance belonging to each class


The first step of the experiment was data preprocessing based upon the value of categorical data (Dharmik 2019).
or data engineering. Feeding the models raw data may A mathematical expression can be presented by:
give a misleading prediction sometimes. As the datasets
P(Y |A) = P(A ∩ Y)/P(A) (1)
do not have any null or duplicate value, the data preproc-
essing step included only Categorical Feature Encoding, where Y is denoted as a class and the category of feature
Feature Selection, and Feature Scaling. is represented as A. It requires considerable memory
space that made it unsuitable for transforming string
features of the Network TON_IoT dataset, as it contains
Coding for categorical feature several categorical features.
The UNSW-NB 15 dataset contains three categori-
cal features: State, Proto, and Service, and the Network
TON_IoT Dataset has nineteen string features as shown Feature scaling
in Table 3. To encode the categorical features of UNSW- The datasets contain some features which have highly fluc-
NB 15, Response Coding was applied, and to transform tuated magnitudes, ranges, and units. Due to this highly
the string features of Network TON_IoT dataset Label varying characteristic of the datasets, some algorithms
Encoding was used. In Label Encoding, each categorical such as the Multilayer Perceptron predicts wrongfully.
feature is given a unique integer value based on alpha- Besides, it requires high computational resources. So, fea-
betical order (Sethi 2020). On the other hand, Response ture scaling is one commonly used method that normalizes
Coding is an encoding technique, which represents the the range of independent variables. Feature scaling is man-
probability of data instances that belongs to a specific datory for some machine learning models which calculate
class given a category. For N-class classification, N new the distance between data. Normalization and Standardi-
features are generated for each category that inserts the zation are two widely applied feature scaling methods in
Disha and Waheed Cybersecurity (2022)5:1 Page 12 of 22

machine learning. In this experiment, Normalization or Table 4 Selected features of UNSW-NB 15 and Network TON_IoT
Min–Max scaling was used to normalize the data between dataset
the range of 0 and 1. The general formula of normalization UNSW-NB 15 (20 features) Network TON_IoT (10 features)
can be shown as below:
feature name Importance Feature name Importance
score score
       
f − = f − min f / max f − min f (2)
sttl 0.123 ts 0.258
where f is the original feature and f − is the normalized ct_state_ttl 0.099 proto 0.146
feature. The minimum andmaximum
 values
 of the fea- sload 0.059 src_ip_bytes 0.137
ture are represented as min f and max f respectively.
rate 0.058 src_pkts 0.067
dload 0.057 dst_ip_bytes 0.062
Gini Impurity-based Weighted Random Forest (GIWRF)
dttl 0.045 dst_pkts 0.048
for feature selection
sbytes 0.039 conn_state 0.047
Random Forest (Breiman 2001) is an ensemble classi-
ct_srv_dst 0.037 dst_bytes 0.034
fier that is built on a number of decision trees and sup-
smean 0.034 src_bytes 0.032
ports various feature importance measures. One of those
dmean 0.032 duration 0.030
is to derive the importance score by training the classifier.
dbytes 0.028 – –
The traditional machine learning classification algorithm
ackdat 0.027 – –
expects that all the classes in the training set come up with
ct_dst_src_ltm 0.027 – –
similar importance, and the models are built without con-
ct_srv_src 0.026 – –
sidering that there may exist an imbalance class distribu-
dur 0.026 – –
tion in training data (Krawczyk 2016). To understand the
tcprtt 0.025 – –
relevance of features with the output of the imbalanced
synack 0.024 – –
data, this feature selection technique employed a weight
dinpkt 0.023 – –
adjustment technique in RF once the classifier measured
sinpkt 0.022 – –
the Gini impurity, i(τ). Gini impurity reveals how well a
dpkts 0.021 – –
split is to divide the total samples of binary classes in a spe-
cific node. Mathematically it can be written as:

i(τ ) = 1 − pp2 − pn2 (3) where pp is the fraction of positive samples and pn is the
fraction of negative samples out of the total number of

Fig. 8 Accuracy versus threshold value for UNSW-NB 15 and Network TON_IoT dataset
Disha and Waheed Cybersecurity (2022)5:1 Page 13 of 22

samples (N) at node τ. The reduction in Gini impurity nn


weight for positive class, wp = (5)
deriving from any optimal split �if (τ , M) is gathered N
together for all the nodes τ in the M number of weighted
trees in the forest, individually for all of the features. np
weight for negative class, wn = (6)
Mathematically it can be written as: N
   
Ig f = wp,n �if (τ , M) wp + wn = 1 and for imbalanced class data wp � = wn.
(4) The number of negative instances is represented as
M τ
nn and the positive instances are denoted as np. N is the
where Ig is the Gini importance, which specifies the fre- total number of instances in the training dataset. The
quency of a particular feature (f ) being selected for the pseudo-code for selecting the features using the sklearn
split and the significance of the feature’s overall discrimi- library has been explained in Algorithm 1. Considering
native value for the binary classification task. Assigning the threshold as 0.02 and 0.030 for UNSW-NB 15 and
the weight wp,n defines the imbalanced class distribution Network TON_IoT respectively maximum accuracy was
in the learning algorithm. The weight adjustment can be observed as shown in Fig. 8, and the selected features for
written as: both datasets have been listed in Table 4.
Disha and Waheed Cybersecurity (2022)5:1 Page 14 of 22

Machine learning methods under scrutiny base classifier (such as DT) and utilizes that classifier to
In this section, a summary of ML techniques that were predict over the training set. Then, increasing the weight
applied to the experimental analysis has been discussed. of incorrectly classified training instances, it trains the
The methods were selected in such a way that a compari- second classifier, using the newly updated weights, again
son between traditional ML technique (DT), Ensemble it makes a prediction on the training set. Then, again
method (Adaboost, GBT), Artificial Neural Network updates the weights of instances, and so on. This pro-
(MLP), and Deep Neural Network (LSTM, GRU) would cess will be continued until it reaches the very last base
be performed to detect intrusions over two imbalanced learner.
datasets.
Gradient boosting tree (GBT)
Decision tree Gradient Boosting Tree (Mason et al. 1999) is a widely
Decision Tree (Quinlan 1986) is a non-parametric super- used machine learning model that identifies the short-
vised machine learning model which is implemented for comings of weak learners and overcomes their limita-
both classification and regression tasks. It repeatedly tions by boosting gradient descent with each weak
splits the data following a specific attribute. Decision learner in the loss function. The Loss function is cal-
Tree learns the decision rules that are deduced from the culated from the difference between the true value and
data attributes. Based on that rules, it predicts the value estimated value. It transforms the weak learners into
of the target variable. The decision-making process of stronger ones by adding up the predictors to the ensem-
this model can be shaped like a tree and makes it easier ble, sequentially as well as gradually. Each predictor in
for the user to interpret. Numerous ML tools are avail- the ensemble corrects the mistakes made by its prede-
able to visualize the output of DT (Safavian and Land- cessor. GBT is an effective technique to reduce noise,
grebe 1991). Two units: decision nodes and leaves are variance as well as bias (Felix and Sasipraba 2019).
the fundamental concepts of DT. In the decision node,
data is split based on a particular parameter, and in the
leaves unit, the outcome or decisions are obtained. How- Multilayer perceptron (MLP)
ever, the splitting criterion in this experimental study was MLP (Rosenblatt 1961) is a feed-forward Artificial Neural
entropy (Shannon 1948), which measures the impurity Network (ANN) that passes the information from input
of the split. For every single internal decision node of the to output in a forward direction. As the name suggests,
Decision Tree, the entropy equation can be given by the Multilayer Perceptron contains three layers of nodes:
following formula: Input Layer, Hidden Layer, and Output Layer. However,
in MLP each node except the input unit is a neuron, and
j
  these neurons utilize a nonlinear activation function to
Entropy, E j = − Pk log2 Pk (7) transform the weighted sum of input into an output. The
k=1 input unit receives the input signal, and the desired task
where j is the number of unique classes in the dataset and of regression or classification is conducted in the output
Pk is the probability of each particular class. For binary layer (Abirami and Chitra 2020). A mathematical expres-
classification the entropy equation can be written as: sion of MLP can be written as the following equation:
 k 
Entropy, E = −Pp log2 Pp − Pn log2 Pn (8) 
y=∅ wi xi + c (9)
where Pp is the probability of positive event and Pn is the i=1
probability of the negative event. where each neuron estimates the weighted sum of the
inputs k, then adds a bias c, and after that uses an activa-
Adaptive boosting (AdaBoost)
tion function (∅) for producing the output y. MLP uses
AdaBoost algorithm was firstly introduced by Schapire back-propagation to train the neurons.
(2003). It is an ensemble learning method, which utilizes
an iterative approach to correct the mistakes of weak
learners. It calls a given base learning algorithm or a
Long Short-Term Memory (LSTM)
weak learner repeatedly in a series of rounds to boost the
LSTM (Hochreiter and Schmidhuber 1997) is a type
model’s performance. The fundamental concept of Ada-
of artificial Recurrent Neural Network (RNN) that is
Boost is to reassign the weights to each instance, and giv-
adapted to store the information for a longer period. It
ing higher weights to the misclassified instances. In brief,
was developed to handle the vanishing gradient problem,
while training the Adaboost model, firstly it trains the
which can be experienced by RNN during the training
Disha and Waheed Cybersecurity (2022)5:1 Page 15 of 22

phase. An LSTM unit is a composition of a memory cell, (TP + TN )


and three gates: input, output and forget gates to control AC = (12)
(TP + TN + FP + FN )
the flow of information towards the memory cell or out
of the cell. The input gate takes the decision of updat- where TP stands for True Positive, and TN represents
ing the cell state, the forget gate determines what infor- the count of True Negative. True Positive is the number
mation would be kept or discarded based on estimating of correctly detected attack instances. True Negative
a coefficient computed by the input data and the earlier counts the correctly classified normal instances. On the
hidden state. Based on the previous hidden state and the other hand, False Positive (denoted as FP) is the num-
input data, the output gate highlights which information ber of legitimate traffic that is misclassified as an attack.
should be conveyed to the next hidden state. Mathemati- False Negative (denoted by FN) counts the number of
cally, if oj (t) is an output gate and sj (t) denotes the cell attack instances that are wrongfully considered as normal
state of the jth LSTM unit, the equation for the hidden instances. In the case of detecting intrusion, less FN is
state hj (t) at any time t can be represented as follows: always expected, as it is more hazardous than FP. How-
  ever, for imbalanced class data, especially, where the aim
hj (t) = oj (t). tanh sj (t) (10) is to efficiently detect the intrusion (positive instances)
four other metrics: False Positive Rate (FPR), Precision,
where tanh is the activation function. Recall, and F1 score are also considered in addition to the
Accuracy measure.
Gated Recurrent Unit (GRU)
Gated Recurrent Unit (Cho et al. 2014) is considered 1. False Positive Rate (FPR) The FPR is measured as the
as a variation or a lightweight version of LSTM, but it ratio of the negative events that are misclassified as
is simpler to implement than LSTM. It has two gates: positive (FP) and the total amount of truly negative
The update gate and Reset gate, which have been intro- events. The expression can be given as below:
duced to solve the vanishing gradient problem of RNN.
FP
The update gate is similar to the input and the forget gate FalsePositiveRate(FPR) = (13)
used in the LSTM, which helps the model to decide what (FP + FN )
information should be passed to the next state. On the 2. Precision Precision is the measure that evaluates a
other hand, the Reset gate determines how much past model’s performance by calculating how often the
information should be forgotten. Mathematically, the model’s prediction is correct when it positively pre-
hidden state of jth GRU unit can be represented as a lin- dicts an instance. Mathematically, it can be expressed
ear interpolation between the previous activation at time as below equation:
t and the candidate activation hj(t + 1), where z j (t + 1)
is the update gate and makes a decision regarding the TP
Precision = (14)
updating amount of the candidate activation: (TP + FP)
 
hj (t + 1) = 1 − z j (t + 1) hj (t) + z j (t + 1)hj (t + 1) 3. Recall/ Detection Rate (DR) It is the measure of the
Machine Learning model correctly detecting True
(11) Positive instances. Also, Recall measures how accu-
rate the model is to identify relevant data. This is why
it is referred to as Sensitivity or True Positive Rate.
Evaluation metrics The mathematical expression can be shown as below:
The goal of this experiment was to increase the number
of correct predictions in the test set for binary classifica- TP
Recall/DR = (15)
tion. To evaluate the performance of ML-based IDS sev- (TP + FN )
eral performance metrics are used in Machine Learning,
4. F1 Score It is the tradeoff between Recall and Preci-
such as Accuracy, False Positive Rate, Precision, Recall,
sion that takes both FP and FN into account and
etc. (Kumar 2014). However, the Accuracy (AC) score is
measures the overall accuracy of the ML model. F1
the most commonly used metric to evaluate the perfor-
score can be expressed as below equation:
mance of models in binary classification problems. It can
be defined as below equation: (2 ∗ Recall ∗ Precision)
F 1score = (16)
(Recall + Precision)
Disha and Waheed Cybersecurity (2022)5:1 Page 16 of 22

Table 5 Result of the models with all features (42) for UNSW-NB Table 7 Result of the models with all features (39) for Network
15 dataset TON_IoT dataset
Models AC. (%) Precision Recall (%) F1 score (%) FPR (%) Models AC. (%) Precision Recall (%) F1 score (%) FPR (%)
(%) (%)

DT 90.15 87.45 95.85 91.46 16.84 DT 99.50 99.83 98.74 99.28 0.09
AdaBoost 90.51 87.07 97.19 91.85 17.67 AdaBoost 99.88 99.99 99.67 99.83 0.001
GBT 87.56 82.49 98.25 89.68 25.54 GBT 99.98 99.98 99.95 99.97 0.006
MLP 84.11 78.34 98.31 87.20 33.28 MLP 98.35 97.60 97.68 97.64 1.2
LSTM 87.90 85.01 94.71 89.60 20.44 LSTM 94.51 91.28 93.18 92.22 4.7
GRU 82.87 76.78 98.75 86.39 36.57 GRU 95.69 91.23 96.99 94.02 5.0
The best result per column has been written in boldface to have better The best result per column has been written in boldface to have better
understanding understanding

Table 6 Result of the models with reduced features (20) for Table 8 Result of the models with selected features (10) for
UNSW-NB 15 dataset Network TON_IoT dataset
Models AC.(%) Precision (%) Recall (%) F1 score (%) FPR (%) Models AC.(%) Precision Recall (%) F1 score (%) FPR (%)
(%)
DT 93.01 92.69 94.76 93.72 09.14
AdaBoost 90.51 87.59 96.43 91.80 16.73 DT 99.90 99.84 99.87 99.85 0.08
GBT 87.08 82.25 97.58 89.26 25.78 AdaBoost 99.98 99.98 99.96 99.97 0.006
MLP 87.26 82.02 98.44 89.48 26.43 GBT 99.98 99.98 99.97 99.97 0.008
LSTM 88.99 87.59 93.21 90.31 16.17 MLP 97.13 93.64 98.49 96.00 3.5
GRU 90.11 86.73 96.84 91.51 18.14 LSTM 88.99 76.44 99.00 86.27 16.37
GRU 95.02 90.95 95.24 93.04 5.08
The best result per column has been written in boldface to have better
understanding The best result per column has been written in boldface to have better
understanding

Experiments and results (for Network TON_IoT) have shown the result obtained
The experiment was conducted on an HP Pavilion by machine learning algorithms for binary classification
14-AL143TX loaded with the Windows 10 Operating using the whole feature set, and Table 6 (for UNSW-NB
System with the following processor: Intel(R) Core(TM) 15) and Table 8 (for Network TON_IoT) have indicated
i5-7200U CPU @ 2.5–3.1 GHz. The building, training, the performance of the same models applying only the
and evaluation of the Machine Learning model were per- selected features. The best result per column has been
formed by Pandas, NumPy, Scikit-Learn (sklearn), etc. written in boldface. Visual representation of the perfor-
Machine Learning libraries in the python environment of mance comparison for Network TON_IoT and UNSW-
Jupyter Notebook, an open-source tool. NB 15 has been illustrated in Figs. 9 and 10 respectively
Considering six ML models: Decision Tree (DT), in terms of Accuracy, F1 score and FPR.
Adaptive Boosting (AdaBoost), Gradient Boosting For the Decision Tree (DT) classifier, entropy was
(GBT), ANN (Multilayer Perceptron), Long-Short Term selected as the criterion during parameter tuning.
Memory (LSTM) and Gated Recurrent Unit (GRU), this Entropy is the measure of information gain, based on
experiment was carried out into two states. In the first what DT decides to split the data. To define each class
state, all the features of the UNSW-NB 15 and Network label another important parameter, class_weight was set
TON_IoT dataset were utilized, and the performance to ‘balanced’. Other tuned parameters of DT were set as
of ML models was evaluated for binary classification. In follows: random_state = 10, max_depth = 11, max_leaf_
the second state, using the proposed feature selection nodes = 162, min_samples_leaf = 20, and min_impu-
method only 20 features for UNSW-NB 15 and 10 fea- rity_decrease = 0.00006. However, for UNSW-NB 15, the
tures from the Network TON_IoT dataset were chosen result indicated that DT achieved the test accuracy score
and the binary classification was performed again with of 90.15% while applying the full feature space (42 fea-
six models. For training all the models, parameter tuning tures), whereas it achieved the accuracy of 93.01% using
was one common approach that assisted this experiment 20 selected features. A 16.84% FPR score and 91.46%
to improve the accuracy and F1 score as well as decrease F1 score were achieved applying the full features, and a
the FPR score. Table 5 (for UNSW-NB 15) and Table 7 09.14% FPR score and 93.72% F1 score were observed
while using only the reduced features. For Network
Disha and Waheed Cybersecurity (2022)5:1 Page 17 of 22

Fig. 9 The performance comparison between ML models with full feature and with reduced feature for Network TON_IoT dataset

TON_IoT dataset, the accuracy score for DT was 99.50%, and FPR of detection was 91.85% and 17.67% respec-
F1 score was 99.28% and FPR score was 0.09% using full tively using full features, and became 91.80% and 16.73%
features, and using only the selected features the accu- respectively applying the selected features. In the case of
racy, F1 and FPR score became 99.90%, 99.85% and 0.08% Network TON_IoT dataset, the accuracy, F1 score and
respectively. FPR score were 99.88%, 99.83%, and 0.001% respectively
During hyperparameter tuning, DecisionTreeClassifier using full features, whereas it became 99.98%, 99.97% and
was selected as the base estimator of the AdaBoost classi- 0.006% using selected features.
fier. In addition, the number of trees (n_estimators) cho- The Gradient Boosting (GBT) model used ‘deviance’
sen for the AdaBoost model was 3300 with a learning rate as the loss function. Multiple models were trained with
of 0.3. Other than that, the ‘SAMME.R’ algorithm was different number of trees and different learning rate as
used as another important parameter that achieves com- follows: n_estimators = {2500, 2700, 3000, 3200, 3500}
paratively lesser test errors with a fewer number of boost- and learning_rate = {0.1, 0.08, 0.07, 0.05, 0.01}. There is
ing iterations. The base estimator, Decision Tree, was also always a tradeoff between n_estimators and learning_
tuned with some important attributes as follows: crite- rate. The experimental result revealed that n_estimators
rion = ’gini’, random_state = 10, class_weight = ’balanced’, set as 3200 with a learning rate of 0.05 provided the high-
max_depth = 11, max_leaf_nodes = 162, min_samples_ est test accuracy score for GBT. Using the full features of
leaf = 20, and min_impurity_decrease = 0.00006. The UNSW-NB 15, the accuracy score for detecting intru-
criterion, ‘gini’ is the measure of impurity that calculates sion was 87.56%, F1 score was 89.68% and the FPR was
the probability of incorrect classification of an observa- 25.54%. In contrast, the accuracy score was 87.08%, F1
tion. Based on the criterion, DT takes the decision to score was 89.26% and the FPR score became 25.78% with
split the data. However, intending to perform binary clas- only the selected features.
sification task for UNSW-NB 15, the obtained result for For the Network TON_IoT dataset, in both cases of
AdaBoost classifier exhibited that the accuracy score was whole features and selected features the accuracy, and F1
unchanged (90.51%) in both cases. However, the F1 score
Disha and Waheed Cybersecurity (2022)5:1 Page 18 of 22

Fig. 10 The performance comparison between ML models with full feature and with reduced feature for UNSW-NB 15 dataset

score did not change, but FPR was increased to 0.008% used, and those became 89.48% and 20.86% respectively
from 0.006% using selected features. using only selected features. For the Network TON_IoT
In the case of the Multilayer Perceptron (MLP), ‘Adam’ dataset, the accuracy, F1 score and FPR were respec-
solver was used in the experiments for weight optimiza- tively 98.35%, 97.64% and 1.2% with whole features,
tion. ‘Adam’ solver is a stochastic gradient-based opti- and became 97.13%, 96.00% and 3.5% with the selected
mizer that works efficiently for large datasets. This is why features.
it was selected instead of Stochastic Gradient Descent A stacked LSTM model with four hidden layers and
(SGD) or Limited memory Broyden–Fletcher–Gold- one output layer was trained on both UNSW-NB 15
farb–Shanno (LBFGS) solver. The MLP was developed and Network TON_IoT dataset. In the hidden lay-
with only one hidden layer with the following number of ers, tanh was used as the activation function, wherein
neurons: hidden_layer_sizes = (15, 30, 60). The activation sigmoid was employed in the output layer as the acti-
function, which decides whether a neuron will be trig- vation function. Other selected parameters were opti-
gered or not for the hidden layer was set at ‘Relu’ in all of mizer = ’adam’, loss = ’binary_crossentropy’, epochs = 50,
our experiments. The maximum iteration for the solver and batch_size = 64. The layers contained 300, 200,
was set at 2000 and the learning rate was ‘adaptive’. In 100, 80 LSTM units sequentially with a dropout of 0.4.
addition, learning_rate_init was tuned into 0.002 to con- Using the full feature set of UNSW-NB 15, the accuracy
trol the step size for upgrading the weight. However, for score was 87.90%, F1 score was 89.60% and FPR score
the binary classification task using the whole feature set was 20.44%, and after using selected features the scores
of UNSW-NB 15, the accuracy score for detection was became 88.99%, 90.31%, and 16.17% respectively. In
84.11% and it became 87.26% while using the subset of the case of Network TON_IoT dataset, the accuracy,
features selected through the feature selection method. F1 score and FPR were respectively 94.51%, 92.22%
In the case of F1 score and FPR, the scores were 87.20% and 4.7% with full features, and those became 88.99%,
and 33.28% respectively when all of the features were 86.27% and 16.37 with reduced features.
Disha and Waheed Cybersecurity (2022)5:1 Page 19 of 22

Similar to the LSTM model, the same setting was also than the full features. Training the models on selected
considered for the GRU model. While employing the full features, the accuracy score was improved for DT by
features of UNSW-NB 15, the obtained accuracy, F1 and a percentage of 0.40, and became 99.90%, F1 score was
FPR scores were 82.87%, 86.39% and 36.57% respectively, increased by 0.5% and FPR was decreased by 0.001%. In
which became 90.11%, 91.51%, and 18.14% after applying the case of Adaboost, the accuracy score was boosted by
the selected features. In the case of Network TON_IoT 0.1% and became 99.98% with selected features, F1 score
dataset, with all features the accuracy score was 95.69%, was also improved by 0.14%, but the FPR was increased
the F1 score was 94.02% and the FPR was 5.0%, which by 0.005%. Surprisingly, no change in accuracy or F1
became 95.02%, 93.04% and 5.08% respectively. score was observed for GBT with feature selection, but
FPR score was increased to 0.008% from 0.006%.
Discussion Feature selection did not improve the prediction capa-
Observation for UNSW-NB 15 dataset bility for the neural networks in the case of the Network_
After numerous trials and errors, the experimental result TON_IoT dataset. The accuracy score for MLP, LSTM
revealed that with the GIWRF feature selection tech- and GRU were respectively 98.35%, 94.51% and 95.69%
nique, Decision Tree outperformed all other models without feature selection technique, which reduced to
obtaining an accuracy score of 93.01% and an F1 score 97.13%, 88.99% and 95.02% respectively. Similarly, the
of 93.72%. As the accuracy score climbed to 93.01% from F1 score was reduced by 1.64%, 5.95% and 0.98% respec-
90.15% using the 20 features, it is clear that the feature tively for the aforementioned models. Feature selection
selection method had a great impact on the detection did not decrease the FPR score for the neural networks.
process of DT-based IDS. Also, the feature selection It increased the score by 2.3%, 11.67% and 0.08% respec-
technique helped to reduce the FPR by 7.7%. There was tively for the above-mentioned models.
a minor decrement in the accuracy score of GBT while
using the reduced features for binary classification. In the General consideration
case of GBT, the accuracy score was 0.48% higher when Overall, it is clear that the weight adjustment technique
we used the full feature set and it dropped to 87.08% along with the Gini impurity criterion that split the
using 20 features. Similarly, F1 score was decreased by trees in RF picked a set of important features consider-
0.42% and the FPR score was increased by 0.24% using ing the skewed class distribution of training data, and it
the reduced feature. The result indicated that the fea- enhanced the learning ability of DT over two imbalanced
ture selection technique did not improve the prediction datasets. As DT obtained the improved Accuracy and F1
capability of GBT. Surprisingly, the accuracy of Adaboost score for both datasets with the selected features, it can
model remained at 90.51% for both cases of selected fea- be claimed as the best model of the proposed approach to
tures and full features, though the FPR was decreased by detect intrusions. Another observation should be pointed
0.94% after feature selection, F1 score was reduced by a out that during the parameter tuning, setting the class
negligible percentage of 0.05. weight as ‘balanced’ improved the prediction ability of
For UNSW-NB 15, the feature selection technique sig- DT. As this work did not consider any oversampling or
nificantly improved the performance of the Neural Net- undersampling technique to handle the skewed class dis-
work. In the case of using full features, MLP, LSTM and tribution of training data, setting the class weight as ‘bal-
GRU classified the intrusion and normal instances with anced’ implicitly put a higher weight on the minor class
an accuracy score of 84.11%, 87.9% and 82.87% respec- and a lower weight on the major class while training the
tively, whereas the feature selection technique boosted model.
this accuracy score to 87.26%, 88.99 and 90.11% respec- While feature selection significantly improved the per-
tively. Likewise, the F1 scores for the aforementioned formance of neural networks over the UNSW-NB 15
models were improved by 2.28%, 0.71% and 5.12% dataset, it showed opposite behavior when trained on
respectively with reduced features. Similarly, the FPR the Network TON_IoT dataset, where normal traffic is
score for MLP, LSTM and GRU were decreased by 6.85%, higher than the attack traffic. For this dataset, without
4.27% and 18.43% respectively with the selected features. feature selection, the ANN and DNN learned well, but in
The GRU came out as the best performer out of other the case of UNSW-NB 15 where attack traffic is higher
two neural networks. than the normal instances, the ANN and DNN required
important features to train on. Similarly, AdaBoost per-
Observation for network TON_IoT dataset formed better in terms of accuracy, F1 score and FPR
In the case of the Network TON_IoT dataset, the experi- with selected features from the Network TON_IoT
mental result showed that DT and Adaboost models dataset, but in the case of UNSW-NB 15 feature selec-
performed better with the feature selection technique tion did not make any negative or positive impact on the
Disha and Waheed Cybersecurity (2022)5:1 Page 20 of 22

Table 9 Comparison between different existing methods and our proposed approach for UNSW-NB 15
Proposed by ML methods used for IDS Accuracy(%) Recall/detection rate F1 score (%)
(DR) (%)

Alazzam et al. (2020) Cosine-PIO DT 91.7 – 90


Khan et al. (2018) RF 75.65 76 73
Jing and Chen (2019) SVM 85.99 – –
Kasongo and Sun (2020) DT 90.85 98.38 88.45
Meftah et al. (2019) SVM 82.11 – –
Tama and Rhee (2019) GBM 91.31 – –
Aboueata et al. (2019) SVM 92 92 91
Gu and Lu (2021) NB-SVM 93.75 94.73 –
Proposed approach GIWRF-DT 93.01 94.76 93.72

Table 10 Comparison between existing methods and our proposed approach for Network TON_IoT
Proposed by ML methods used for IDS Accuracy (%) Recall/detection rate (DR) (%) F1 score (%)

Moustafa (2021) RF 99.98 99.99 99.97


Proposed Approach GIWRF-DT 99.90 99.87 99.85

prediction ability of AdaBoost. Also, feature selection did remove those features for demonstrating the difficulty of
not improve the performance of GBT for both datasets. the security event.
However, choosing the best model from each litera- It should be mentioned that Tables 9 and 10 have just
ture that used the UNSW NB-15 dataset, Table 9 has illustrated a picture of a comparison between the pro-
compared the performance with the best model (DT) posed intrusion detection technique and other existing
presented in this paper. It has affirmed that DT with our frameworks. Hence, it should not be claimed that the
proposed feature selection (GIWRF) achieved satisfac- proposed framework is superior to other intrusion detec-
tory performance than other works in terms of F1 score, tion techniques, but the effectiveness of this study may
which is a suitable evaluation criterion for imbalanced bring stimulation for future works in the active research
data. Also, the accuracy score was found to be higher area of IDS.
than other existing works except the one proposed by
Gu and Lu (2021). In this case, the proposed GIWRF-DT
Conclusion
obtained slightly a lowered accuracy score (0.74%) than
The motivation of this study was to train and evalu-
the NB-SVM (Gu and Lu 2021), but the Recall/Detection
ate the ML models: DT, AdaBoost, GBT, MLP, LSTM,
Rate was still higher (0.03%) than theirs. That proved the
and GRU for performing the binary classification task
effective intrusion detection capability of the proposed
of ML-based IDS. To select a suitable set of features
DT. However, they used a different training size than the
from two imbalanced datasets: UNSW-NB 15 and Net-
one used in the proposed work.
work TON_IoT, a Gini Impurity-based Weighted RF
To expand the comparison scope, Table 10 has depicted
was introduced as the feature selection process based
the performance of the best model proposed by Moustafa
upon the assumption that imbalanced class distribu-
(2021) versus our proposed DT using the newest dataset,
tion might have an influence on the feature selection
Network TON_IoT. Models developed in the proposed
process. The feature selection technique reduced the
approach were also capable to classify normal and mali-
features of the UNSW-NB 15 and Network TON_IoT
cious traffic effectively compared to the proposed work
dataset to 20 (from 42) and 10 (from 41) respectively.
by Moustafa (2021). DT developed in the proposed study
The performance of the models was evaluated in terms
obtained a slightly lowered accuracy score (0.08) than
of Accuracy, FPR, Precision, Recall, and F1 score to
their proposed RF in Moustafa (2021), which was quite
detect the intrusion. In the beginning, the experiment
expected as the authors used the source IP, destination
was conducted using the whole feature set over both
IP, source port, and destination port attributes for devel-
datasets. After that, the experiment was carried out
oping the models but suggested other future works to
again using only the selected features extracted through
the feature selection method. A comparison of the
Disha and Waheed Cybersecurity (2022)5:1 Page 21 of 22

model’s performance with full feature space and with Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intru-
sion detection system based on pigeon inspired optimizer. Expert Syst
reduced features for both datasets was performed. In Appl 148:113249
both cases, the Decision Tree behaved better with the Belgrana FZ, Benamrane N, Hamaida MA et al (2021) Network intrusion detec-
feature selection technique for both datasets. However, tion system using neural network and condensed nearest neighbors
with selection of NSL-KDD influencing features. In: 2020 IEEE international
this work did not perform multiclass classification and conference on internet of things and intelligence system (IoTaIS). IEEE,
time complexity analysis, hence, a multiclass classifica- pp 23–29
tion scheme for IDS considering time complexity analy- Breiman L (2001) Random forests. Mach Learn 45:5–32
Catania CA, Garino CG (2012) Automatic network intrusion detection: current
sis can be carried out as future work. techniques and open issues. Comput Electr Eng 38:1062–1072
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM
Comput Surv 41:1–58
Abbreviations Cho K, Van Merriënboer B, Gulcehre C et al (2014) Learning phrase represen-
IDS: Intrusion Detection System; ML: Machine learning; RF: Random forest; tations using RNN encoder-decoder for statistical machine translation.
DT: Decision tree; GBT: Gradient Boosting Tree; MLP: Multilayer Perceptron; arXiv14061078
FPR: False positive rate; HIDS: Host-based IDS; NIDS: Network-based IDS; Dharmik (2019) Response coding for categorical data. https://medium.com/@
SIDS: Signature-based IDS; AIDS: Anomaly-based IDS; IoT: Internet of Things; thewingedwolf.winterfell/response-coding-for-categorical-data-7bb89
DDoS: Distributed Denial of Service; PIO: Pigeon Inspired Optimizer; XGBoost: 16c6dc. Accessed 23 July 2021
Extreme Gradient Boosting; LSTM: Long-Short Term Memory; GRU: Gated Di Mauro M, Galatro G, Liotta A (2020) Experimental review of neural-based
Recurrent Unit; KNN: K-Nearest Neighbors; GBM: Gradient Boosting Machine; approaches for network intrusion management. IEEE Trans Netw Serv
RBF: Radial Basis Function; SVM: Support Vector Machine; CART: Classification Manag 17:2480–2495
and Regression Tree; DNN: Deep Neural Network; AUC: Area under the receiver Divekar A, Parekh M, Savla V, et al (2018) Benchmarking datasets for anomaly-
operating characteristic curve; ANN: Artificial Neural Network; PCA: Principal based network intrusion detection: KDD CUP 99 alternatives. In: 2018 IEEE
Component Analysis; DoS: Denial of Service; SGD: Stochastic Gradient 3rd international conference on computing, communication and security
Descent; LBFGS: Limited memory Broyden–Fletcher–Goldfarb–Shanno. (ICCCS). IEEE, pp 1–8
Dong G, Liu H (2018) Feature engineering for machine learning and data
Acknowledgements analytics. CRC Press
Not applicable. Felix AY, Sasipraba T (2019) Flood detection using gradient boost machine
learning approach. In: 2019 international conference on computational
Authors’ contributions intelligence and knowledge economy (ICCIKE). IEEE, pp 779–783
RAD designed the feature selection technique, performed the experiments, Garcia-Teodoro P, Diaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009)
interpreted the results, and drafted the manuscript. SW participated in prob- Anomaly-based network intrusion detection: techniques, systems and
lem discussions and improvements of the manuscript. The author(s) read and challenges. Comput Secur 28:18–28
approved the final manuscript. Gu J, Lu S (2021) An effective intrusion detection approach using SVM with
naïve Bayes feature embedding. Comput Secur 103:102158
Funding Harrington P (2012) Machine learning in action. Simon and Schuster
Not applicable. Hick P, Aben E, Claffy K, Polterock J (2007) The CAIDA DDoS attack 2007 data-
set. 2012) [2015-07-10]. http//www. caida. org
Availability of data and materials Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput
The UNSW-NB 15 dataset can be found at the website of the University of 9:1735–1780
New South Wales, (https://research.unsw.edu.au/projects/unsw-nb15-dataset). Ingre B, Yadav A (2015) Performance analysis of NSL-KDD dataset using ANN.
The Network TON_IoT dataset can be found at the website of the University of In: 2015 international conference on signal processing and communica-
New South Wales, tion engineering systems. IEEE, pp 92–96
(https://cloudstor.aarnet.edu.au/plus/s/ds5zW91vdgjEj9i?path=%2FTrain_ Injadat M, Moubayed A, Nassif AB, Shami A (2020) Multi-stage optimized
Test_datasets%2FTrain_Test_Network_dataset). machine learning framework for network intrusion detection. IEEE Trans
Netw Serv Manag
Declaration Jing D, Chen H-B (2019) SVM based network intrusion detection for the
UNSW-NB15 dataset. In: 2019 IEEE 13th international conference on ASIC
Competing interests (ASICON). IEEE, pp 1–4
The authors declare that they have no competing interests. Kasongo SM, Sun Y (2020) Performance analysis of intrusion detection systems
using a feature selection method on the UNSW-NB15 dataset. J Big Data
Author details 7:1–20
1
Department of Information and Communication Technology, Bangladesh Khan NM, Negi A, Thaseen IS (2018) Analysis on improving the performance
University of Professionals, Mirpur Cantonment, Dhaka 1216, Bangladesh. of machine learning models using feature selection technique. In:
2
Department of Information and Communication Technology, Maw- International conference on intelligent systems design and applications.
lana Bhashani Science and Technology University, Santosh, Tangail 1902, Springer, pp 69–77
Bangladesh. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion
detection systems: techniques, datasets and challenges. Cybersecurity
Received: 2 August 2021 Accepted: 17 November 2021 2:1–22
Krawczyk B (2016) Learning from imbalanced data: open challenges and
future directions. Prog Artif Intell 5:221–232
Kumar G (2014) Evaluation metrics for intrusion detection systems-a study.
Evaluation 2:11–17
References Labonne M (2020) Anomaly-based network intrusion detection using machine
Abirami S, Chitra P (2020) Energy-efficient edge based real-time healthcare learning. https://tel.archives-ouvertes.fr/tel-02988296/. Accessed 30 Sept
support system. In: Advances in computers. Elsevier, pp 339–368 2021
Aboueata N, Alrasbi S, Erbad A, Kassler A, Bhamare D (2019) Supervised Lee J, Pak J, Lee M (2020) Network intrusion detection system using feature
machine learning techniques for efficient network intrusion detection. extraction based on deep sparse autoencoder. In: 2020 international con-
In: 2019 28th international conference on computer communication and ference on information and communication technology convergence
networks (ICCCN). IEEE, pp 1–8 (ICTC). IEEE, pp 1282–1287
Disha and Waheed Cybersecurity (2022)5:1 Page 22 of 22

Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y (2013) Intrusion detection system: a com-
prehensive review. J Netw Comput Appl 36:16–24
Liu H, Yan X, Wu Q (2019) An improved pigeon-inspired optimisation
algorithm and its application in parameter inversion. Symmetry (basel)
11:1291
Mason L, Baxter J, Bartlett P, Frean M (1999) Boosting algorithms as gradient
descent in function space. In: Proc. NIPS, pp 512–518
Meftah S, Rachidi T, Assem N (2019) Network based intrusion detection using
the UNSW-NB15 dataset. Int J Comput Digit Syst 8:478–487
Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H (2019) Cyber
intrusion detection by combined feature selection algorithm. J Inf Secur
Appl 44:80–88
Moustafa N (2021) A new distributed architecture for evaluating AI-based
security systems at the edge: network TON_IoT datasets. Sustain Cities
Soc 72:102994
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network
intrusion detection systems (UNSW-NB15 network data set). In: 2015
military communications and information systems conference (MilCIS).
IEEE, pp 1–6
Moustafa N, Slay J (2016) The evaluation of network anomaly detection sys-
tems: statistical analysis of the UNSW-NB15 data set and the comparison
with the KDD99 data set. Inf Secur J A Glob Perspect 25:18–31
Moustafa N, Turnbull B, Choo K-KR (2018) An ensemble intrusion detection
technique based on proposed statistical flow features for protecting
network traffic of internet of things. IEEE Internet Things J 6:4815–4830
El Naqa I, Murphy MJ (2015) What is machine learning? In: Machine learning in
radiation oncology. Springer, pp 3–11
Osanaiye O, Cai H, Choo K-KR, Dehghantanha A, Xu Z, Dlodlo M (2016)
Ensemble-based multi-filter feature selection method for DDoS detection
in cloud computing. EURASIP J Wirel Commun Netw 2016:1–10
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Rosenblatt F (1961) Principles of neurodynamics. Perceptrons and the theory
of brain mechanisms. Cornell Aeronautical Lab Inc, Buffalo
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodol-
ogy. IEEE Trans Syst Man Cybern 21:660–674
Scarfone K, Mell P (2007) Guide to intrusion detection and prevention systems
(idps). NIST Spec Publ 800:94
Schapire RE (2003) The boosting approach to machine learning: an overview.
Nonlinear Estim Classif 149–171
Scikit Learn, Machine Learning in Python. https://scikit-learn.org/stable.
Accessed 6 July 2021
Sethi (2020) One-hot encoding vs. label encoding using scikit-learn. https://
www.analyticsvidhya.com/blog/2020/03/one-hot-encoding-vs-label-
encoding-using-scikit-learn/. Accessed 30 Sept 2021
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech
J 27:379–423
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new
intrusion detection dataset and intrusion traffic characterization. Icissp
1:108–116
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a
systematic approach to generate benchmark datasets for intrusion
detection. Comput Secur 31:357–374
Song J, Takakura H, Okabe Y, et al (2011) Statistical analysis of honeypot data
and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings
of the first workshop on building analysis datasets and gathering experi-
ence returns for security, pp 29–36
Tama BA, Rhee K-H (2019) An in-depth experimental study of anomaly detec-
tion using gradient boosted machine. Neural Comput Appl 31:955–965
Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detec-
tion using recurrent neural networks. IEEE Access 5:21954–21961
Zaman S, Karray F (2009) Features selection for intrusion detection systems
based on support vector machines. In: 2009 6th IEEE consumer com-
munications and networking conference. IEEE, pp 1–8

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.

You might also like