0% found this document useful (0 votes)
11 views

Intrusion_Detection_System_A_Comparative_Study_of_

Uploaded by

smart2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Intrusion_Detection_System_A_Comparative_Study_of_

Uploaded by

smart2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Intrusion Detection System: A Comparative Study of

Machine Learning-based IDS


Amit Singh
Government of India
Jay Prakash
Dr Ram Manohar Lohia Avadh University
Gaurav Kumar (  [email protected] )
GLA University

Research Article

Keywords: Intrusion Detection Systems, Cybersecurity, Cyberattack, Anomaly-based Intrusion Detection


Systems, Machine Learning

Posted Date: May 25th, 2022

DOI: https://doi.org/10.21203/rs.3.rs-1634802/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Page 1/31
Abstract
Due to the Covid-19 pandemic, there has been a significant rise in the amount of data processed and
transferred to any communication network. The use of encrypted data, the diversity of new protocols, and
the surge in the number of malicious activities worldwide have posed new challenges for Intrusion
Detection Systems (IDS). In this scenario, existing signature-based IDS are not performing well. Various
researchers have proposed machine learning-based IDS to detect unknown malicious activities based on
behaviour patterns. Results have shown that machine learning-based IDS perform better than signature-
based IDS (SIDS) in identifying new malicious activities in the communication network. In this paper, we
have analyzed the IDS dataset that contains the most current common attacks and evaluated the
performance of network intrusion detection systems by adopting two data resampling techniques and ten
machine learning classifiers. It has been observed that the top three IDS models KNeighbors, XGBoost
and AdaBoost outperform in binary-class classification with 99.49%, 99.14% and 98.75% accuracy, and
XGBoost, KNneighbors, and GaussianNB outperform in multi-class classification with 99.30%, 98.88%
and 96.66% accuracy, respectively.

1. Introduction
The Covid-19 pandemic makes people stay at home, avoid physical congregation, and social distancing
becomes a new normal. The use of new paradigms in education delivery, business transactions, and work
from home culture increases the dependency of individuals, governments, and businesses on mobile and
electronic gadgets. The usage of communication networks and cloud-based processing systems have
increased manifold. This change in the pandemic era promotes new threats and lures intruders to exploit
vulnerabilities in the data communication network. Organizations usually use diversified protocols to
encrypt their data and maintain confidentiality. Volume, heterogeneity of protocols, and encryption have
posed several new challenges before the IDS system in detecting malicious activities (Resende &
Drummond, 2018). An intruder attempts to gain unauthorized access to a system or network with
malafide intentions and disrupt the normal execution (Butun et al., 2014; Liao et al., 2013; Low, 2005;
Mitchell & Chen, 2014). Several times intruders aim to steal or corrupt sensitive data. In 2020, Emsisoft
reported that local governments, universities, and private organizations had spent $144 million in
response to the year's worst ransomware attack (Novinson, 2020). The WHO reported that cyber-attack
increased five-fold during the Covid-19 pandemic (WHO, 2020). According to the McAfee quarterly threat
report 2020, fraudsters are taking advantage of the pandemic by using Covid-19-themed malicious apps,
phishing campaigns, and malware (McAfee, 2020). In quarter one (Q1), new malware targeting mobile
devices surged by 71%, with overall malware increasing by roughly 12% over the previous four quarters
(McAfee, 2020).

IDS provides security solutions against malicious attacks or security breaches. It is a software or
hardware device that detects harmful activity on a computer system to maintain system security (Liao et
al., 2013). It identifies all forms of suspicious network traffic and malicious computer activity that a
firewall might miss. Signature-based Intrusion Detection System (SIDS) and Anomaly-based Intrusion
Page 2/31
Detection System (AIDS) are two popular categories of IDS that have widely been used to provide security
solutions (Axelsson, 2000; Hodo et al., 2017).

The SIDS relies on previously known signatures and faces challenges in identifying an unknown and
obfuscated malicious attack (Amouri et al., 2020; Atli, 2017; Khraisat et al., 2019; Lin et al., 2015; Low,
2005; Vinayakumar et al., 2019; Wu & Banzhaf, 2010). Therefore SIDS cannot prevent every intruder
based on previously learned indicators of compromises; however, they can detect and prevent similar
attacks happen in the future. As the number of cyber-attacks has increased exponentially and attackers
are using evolved techniques to conceal attack patterns, it becomes almost infeasible to identify intruders
using SIDS (Amouri et al., 2020; Khraisat et al., 2019; Vimala et al., 2019; Warsi & Dubey, 2019; Wu &
Banzhaf, 2010).

Many scholars use AIDS because of its ability to overcome the limitation of SIDS. An AIDS is a typical
computer system model created using statistical-based methods, machine learning algorithms, or
knowledge-based methods. These methods are designed and developed to detect abnormal behaviour in
computer systems. The typical usage pattern is baselined, and when usage deviates from the expected
behaviour, alarms are generated. The key benefit of using AIDS is detecting zero-day attacks because it
does not rely on a signature database to detect abnormal user behaviour (Alazab et al., 2012). AIDS is
further categorized into three main groups: Statistics-based, Knowledge-based, and Machine learning-
based. Researchers have investigated many approaches to improve intrusion detection in the last few
decades, from data mining and machine learning to time series modelling. The machine learning-based
IDS can learn the attacks' behaviour and pattern, and future attacks can be predicted using trained
machine learning models. The researchers have explored and proposed many machine learning-based
IDS. Popular techniques such as decision trees, naïve Bayes, support vector, logistic regression, k-nearest-
neighbour, and ensemble methods are employed to design the machine learning-based IDS. This paper
focuses on machine learning-based IDS and comparative analysis among popular machine learning-
based IDS.

2. Machine Learning-based Ids


Machine Learning is a technique of extracting knowledge from massive amounts of data. It comprises a
set of rules, methods, or complex "transfer functions" that can be used to discover intriguing patterns or
estimate behaviour (Dua & Du, 2016). The machine-learning techniques use training data to acquire
complex pattern-matching capabilities. Researchers widely use the support vector machine (SVM) for
Network Intrusion Detection System (NIDS) (Niyaz et al., 2015); and different clustering algorithms such
as K-means and Exception Maximization (EM) for both NIDS and anomaly detection (Bennett & Demiriz,
1999; Syarif et al., 2012). Gaussian Fields and Spectral Graph Transducer techniques have widely been
used for anomaly detection (Chen et al., 2008; Lecun et al., 2015). The process flow of creating machine
learning-based IDS are shown in Fig. 1. Several algorithms and techniques, such as support vector, naïve-
Bayes, decision trees, logistic regression, k-nearest-neighbour, clustering, and different ensemble methods,
are being used by various researchers and organizations for discovering the knowledge from intrusion
Page 3/31
datasets. They all are mainly concerned with the detection effect and practical issues such as detection
efficiency and data management. The recent development in machine learning-based intrusion detection
is discussed in the next section.

2.1 Related work


Recently, many research and practical ideas based on artificial intelligence and machine learning have
been published to overcome the challenges in intrusion detection systems. The authors (Ashraf & Latif,
2014; Sharafaldin et al., 2018) used Bayesian networks, support vector machines, neural networks and
genetic algorithms machine learning methods to detect Distributed Denial of Service (DDoS) attacks in
the SDN (Software-Defined Networking) environment-based network. The authors (Sharafaldin et al.,
2018) used the CICIDS2017 dataset and examined the performance of the selected features with Naive-
Bayes, KNN, ID3, RF, Adaboost, MLP and QDA. Ultimura and Costa (Utimura & Costa, 2018) used 10% of
the ISCXIDS2012 dataset to analyze the performance using Multilayer Perceptron (MLP) and the
Optimum-Path Forest (OPF) classifiers.

Feature selection is an important process to build IDS systems. Varghese and Muniyal (Varghese &
Muniyal, 2017) studied the efficacy of seven different algorithms concerning two different feature
selection strategies on the NSLKDD dataset. The authors used Principal Component Analysis (PCA) and
Correlation-based Feature Selection (CFS) for selecting features. Then, the performance of j48, NBTree,
Random Forest, LibSVM, Bagging with REPTree, PART, and Multilayer Perceptron (MLP) classifiers using
ten-fold cross-validation was evaluated. Effendy et al. (Effendy et al., 2017) also used the NSL-KDD
dataset and Information Gain Ratio (IGR) for selecting features. The authors assessed the Naïve-Bayes
classifier with accuracy as a key performance indicator. The authors (Acharya & Singh, 2018) used
intelligent water drops (IWD) nature-inspired algorithm to select the feature and a support vector machine
as a classifier to evaluate the selected features. Alazzam et al. (Alazzam et al., 2020) used the pigeon
inspired optimizer technique, and Tawil et al. (Tawil & Sabri, 2021) used the Moth Flame Optimization
technique to choose the releavnt features in designing the IDS system. The authors (Naseri &
Gharehchopogh, 2022) presented a Farmland Fertility Algorithm (FFA) called BFFA to select the feature
used in IDS classification.

The authors (Biswas, 2018) considered the amalgamation of feature selection techniques and classifiers
to design an accurate network intrusion detection system. They used the NSL-KDD dataset and applied
four feature selection methods to evaluate the performance of five classifiers using a five-fold cross-
validation strategy. The authors (Imrana et al., 2021) proposed a bidirectional Long-Short-Term-Memory
(BiDLSTM) based intrusion detection system to handle especially User-to-Root (U2R) and Remote-to-
Local (R2L) attacks. Their proposed model improves the detection accuracy rate of U2R and R2L attacks
than the conventional LSTM.

Ammar and Faisal (Aldallal & Alisa, 2021) proposed a hybrid model of Support Vector Machine (SVM)
and genetic algorithm (GA) intrusion detection system with innovative fitness functions to evaluate the

Page 4/31
system accuracy in the cloud computing environment. The proposed approach was evaluated on the
CICIDS2017 dataset and benchmarked with KDD CUP 99 and NSL-KDD. The results showed that the
proposed model remarkably outperformed these benchmarks by 5.74%. The authors (Imran et al., 2021)
proposed an ensemble of automated machine learning and Kalman filter prediction approaches to
improve the anomaly detection accuracy in a network intrusion environment. The proposed model was
evaluated on the UNSW-NB15 and CICIDS2017 datasets and observed intrusion detection accuracy of
98.80% for the UNSW-NB15 dataset and 97.02% for the CICIDS2017 dataset.

The authors (Al-Omari et al., 2021; Sarker et al., 2020) presented a machine learning-based security model
called Intrusion Detection Tree (IntruDTree) that considers the importance of security features and then
builds a tree-based generalized intrusion detection model based on the selected essential features

A survey on machine learning approaches for Cyber Security Intrusion Detection was published in 2016
using KDD 1999 and DARPA 1998 datasets (Buczak & Guven, 2016). Similar work was also published in
(Sultana et al., 2019) and (da Costa et al., 2019), focusing only on reviewing current literature. All these
works correlate with ours, but our work used different machine learning-based IDS models and executed
them on the recently available dataset. After that, the results were compared to the existing work to
assess and analyze the performance.

In this paper, a new and still under-analyzed IDS dataset contains the most recent common attacks and
evaluates the performance of network intrusion detection by adopting two data resampling techniques
and ten classifiers on the entire dataset. Some preprocessing steps are required to fix the problems that
may exist in the datasets. These data preprocessing steps are discussed next.

2.2 The data preprocessing steps


In machine learning, data preprocessing is the processing of transforming or encoding into suitable
formats so that machines can quickly parse it. The datasets may require treating missing or inconsistent
values, feature scaling, feature selection, and data imbalance problems.

2.2.1 Missing or Inconsistent values


The presence of missing values in a dataset is quite common. Missing values must be evaluated for
rectification, whether they occurred during data collection or validating data. It can be solved by
eliminating rows with missing data or filling them with estimated missing values. For example, the 'Phone
Number' is stored in the 'Address field'. It may be due to a human mistake, or the information was
misunderstood while scanned from a handwritten form.

2.2.2 Feature Scaling


Feature Scaling is a part of Data Preprocessing. It normalizes the independent features in a defined range
to handle highly fluctuating magnitude or values. There are different strategies to perform feature
scaling.

Page 5/31
Min-Max Normalization

This approach re-scales a feature or observation value in a range between Zero and One. Its formula is

Xi − min(X)
Xnew =
max(X) − min(X)

Standardization

It is a very effective re-scaling strategy where a feature value has a distribution with zero mean value, and
variance equals to one.

Xi − Xmean
Xnew =
StandardDeviation

If we do not perform feature scaling, then a machine learning model will weight larger values lower and
smaller values higher, regardless of the unit of measurement.

2.2.3 Feature selection


"Feature selection" is also named "Feature Learning" or "Feature Engineering", which is the most crucial
stage during preprocessing. It simplifies the data, eliminates data redundancy, reduces the computational
difficulty, improves the detection rate, and reduces false alarms of machine learning models. Only
essential features are selected based on their correlation scores with the consequence variable. Feature
selection plays a critical role in building any IDS so that the chosen features highly affect the accuracy
and reduces false alert. Each feature has specific characteristics for addressing different areas of threat
detection. Features containing basic information about the software or network are considered naïve, and
when they represent deeper details, they are considered rich. Three approaches Filter, Wrapper, and
Embedded are used for Feature selection, as shown in Table 1.

Table 1
Feature Selection Approaches
Approach Description Advantage Disadvantage

Filter (Hamon, Selects the top essential Low Execution May choose a
2013) features regardless of the Time and over- redundant variable
model fitting

Wrapper (Phuong Create subsets by combining Consider Overfitting risk and


et al., 2006) related variables interactions high Execution time

Embedded Examine more depth interaction Optimal subset -----------


(Hernandez et al., than Wrapper results
2007)

Page 6/31
2.2.4 Imbalanced Learning
Most machine learning predictive models work based on an assumption that equal number of classes in
each sample. But when the distribution of classes is imbalanced, for example, the minority class contains
a hundred samples, and the majority class contains hundreds of thousands of samples, this results in the
machine learning models having poor performance, specifically for the minority class and for the majority
class the performance might be misleading. Imbalanced Learning is an open-source python toolbox that
provides a wide range of techniques to handle imbalanced data classification. Some of the categories of
handling imbalanced data such as Random Under-Sampling (RUS), which reduces the samples from the
majority class; Random Over-Sampling (ROS), which creates duplicate copies of samples from the
minority class; Synthetic Minority Oversampling Technique (SMOTE), where the synthetic sample is
created, and Tomek Links which removes the noise from the data, are discussed with advantage and
limitation in the following subsections. These strategies are used to fine-tune the class distribution of a
data set.

Let the imbalanced dataset is represented by x, the minority class sample is represented by xmin, and xmax
represent the majority class samples. The balancing ratio of the dataset x is defined as:

xmin
rx= xmax
The balancing process is equivalent to resample x into a new dataset xres such that rx>rxres.

i) Random Under Sampling (RUS): In RUS, the number of samples of the majority class (xmax) is reduced,
i.e. removing some of the observations from the majority class until the majority and minority class are
balanced out. The drawback of under-sampling is that we are removing the data that may be valuable.

ii) Random Over Sampling (ROS): Contrary to under-sampling, more copies of data are added into the
minority class such that new samples are generated in xmin to reach the balancing ratio rxres. It is a worthy
choice when we don't have tons of data to work with, but at the same time, it also causes overfitting and
poor generalization of minority samples results.

iii) Synthetic Minority Oversampling Technique (SMOTE): Over-sampling method create duplicated
samples in the minority class that do not add any new information to the existing data set. SMOTE
solves these issues by creating synthetic samples. It chooses a sample from a minority class at random
and finds its k neighbours minority class. A synthetic sample is then created randomly in between two
samples in a feature space. This technique can be used to create as many synthetic examples for the
minority class (xmin) to reach the balancing ratio rxres. This strategy may produce noisy samples by
inserting new points between marginal outliers and inliers.

iv) Tomek's Link: This is a cleaning method to eliminate the noise generated in the majority class while
creating new samples in the minority class. This is an under-sampling strategy for reducing the unwanted
samples from the majority class.

Page 7/31
In this paper, we have used the SMOTE oversampling strategy to balance the CICIDS2018 dataset and
Tomek's links to clean the unwanted samples.

3. Machine Learning-based Ids Models


Different supervised machine learning methods such as support vector machine, decision trees, naïve
Bayes, k-nearest-neighbor and logistic regression are applied to address the intrusion detection problem.
In supervised learning IDS, each record contains a pair of a network or host data source and an
associated labelled output value such as Malicious or Benign. A model is trained using supervised
learning techniques to learn the intrinsic relationship between the input data and the labelled output value
for the selected features. Then, the trained model classifies the unknown data into Malicious or Benign
classes in the testing stages. Each classifier has its strengths and weaknesses. A natural way to create a
robust classifier is to combine many weak classifiers. Multiple classifiers are trained using ensemble
techniques, and the classifiers then vote to determine the final results. Boosting, Bagging, and Stacking
are just a few of the ensemble approaches that have been proposed to improve performance.

The term "boosting" refers to a group of algorithms that can improve the performance of weak learners.
Training the same classifier on a different subset of the same dataset is called bagging. Stacking
combines various classifications via a meta classifier (Aburomman & Ibne Reaz, 2016). According to
Jabbaret al. combination of Random Forests and the Average One-Dependence Estimator (AODE) may be
used to overcome the issue of attribute dependence in Naïve Bayes. Random Forest enhances precision
and reduces false alarms (Jabbar et al., 2017). Hybrid models are designed in many stages in
combination with different classification models. Ensemble and hybrid classifiers tend to outperform
single classifiers in terms of performance. The key points lie in selecting which classifiers to combine and
how they are connected. This present work analyzes the top 10 popular machine learning classifiers such
as Adaboost, Decision Tree, GaussianNB, KNeighbors, Logistic, MultinomialNB, RandomForest, SVM, and
XGBoost on intrusion detection systems to find out the best model.

4. Experimental Setup And Results


The experimental setup and execution are performed on Microsoft Windows 10 environment with Intel
Core i5 2.2 GHz, RAM 4GB and 500GB HDD. All models are implemented in Python 3.7.x with the help of
Scikit-learn (v0.22.X), Pandas (v1.0.3), Numpy (v1.18.2), Matplotlib (v3.2.1), Seaborn (v0.10.0), Xgboost
(v0.90), Scipy (v1.4.1), and Imblearn (v0.4.3). All the models are trained and tested against the
CICIDS2018 IDS dataset. A detailed description of the CICIDS2018 dataset is discussed next.

4.1 The CICIDS2018 dataset

Sharafaldin et al. (Sharafaldin et al., 2018) analyzed the properties of eleven IDS datasets since 1998 and
showed that most are outdated and unreliable. Some issues found are i) existing datasets suffered from

Page 8/31
the lack of traffic diversity and volumes, and ii) datasets do not cover the diversity of known attacks.

The CICIDS2018 dataset is publicly available for networking security and intrusion detection research
from the Canadian Institute of Cyber-security. More than 80 network flow features are extracted from the
network traffic data generated over five days. They also delivered the network flow dataset as CSV files
with 85 features and class labels. Seven different attack scenarios such as Brute-force, Heartbleed, Web
attacks, DoS, DDoS, Botnet and infiltration of the network from inside are included in the final dataset.
There are 50 machines in the attacker's infrastructure, while 420 machines and 30 servers in the victim
organization's infrastructure are spread across five departments. The CICIDS2018 dataset consists of
corresponding profiles and labelled network flows, including full packet payloads in PCAP format, and
CSV files for the machine and deep learning purposes, as shown in Table 2.

Table 2 CICDS2018 dataset CSV

File Name Class

Monday-WorkingHours.pcap ISCX.csv BENIGN

Tuesday-WorkingHours.pcap ISCX.csv BENIGN, SSH-Patator, FTP-Patator

Wednesday-workingHours.pcap BENIGN, DoSslowloris, DoSSlowhttptest, DoS Hulk,


ISCX.csv
DoSGoldenEye, Heartbleed

Thursday-WorkingHours-Morning- BENIGN, Web Attack \x96 Brute Force, Web Attack \x96
WebAttacks.pcap ISCX.csv XSS, Web Attack \x96 Sql Injection

Thursday-WorkingHours-Afternoon- BENIGN, Infiltration


Infilteration.pcap ISCX.csv

Friday-WorkingHours-Morning.pcap BENIGN, Bot


ISCX.csv

Friday-WorkingHours-Afternoon- BENIGN, DDoS


DDos.pcap ISCX.csv

Friday-WorkingHours-Afternoon- BENIGN, PortScan


PortScan.pcap ISCX.csv
The timestamp, source and destination ports, source and destination IPs, protocols and attacks are all
labelled in this dataset. This dataset also includes complete network architecture, including a modem, a
firewall, routers, switches, and nodes with different operating systems, i.e. open-source operating system
Linux, Apple's Mac OS IOS, Microsoft Windows 10, Windows 8, Windows 7, and Windows XP. The dataset
set is captured daily from the network traffic and generated in a PCAP file. After that, the PCAP file is
converted into a CSV file. The five days of CSV file is analyzed, containing 3119345 rows and 85
columns. Some columns' names are mentioned in Figure 2.

The dataset contains NULL values, as shown in Figure 3. After that, data is preprocessed, and continuous
NULL values are removed from CICDS2018 datasets. The NULL values are eliminated by dropping that

Page 9/31
rows from the dataset, as shown in Figure 4.

The correlation map has been created using Pearson's Correlation Coefficient (r) between feature and
target variable, as shown in Figure 5. A correlation map is a visual representation of the relationships
between variables with each other's or the target variables. The increases in one value of a feature,
increase the target variable's value, representing the positive correlation. And the increase in one value of
feature decreases the value of the target variable, representing the negative correlation.

The feature score is computed using the univariate selection method in Figure 6. A subset of features is
selected based on their score.

The CICIDS2018 is an imbalance dataset as it has more Benign type samples than Malware type
samples. It can be seen in Figure 7 that the count of Malware type samples is much less as compared to
Benign type samples. The SMOTETomek method is applied to the imbalance CICIDS2018 data to convert
it into balanced data for Binary classifiers, shown in Figure 7.

Similarly, the CICIDS2018 is an imbalanced dataset in the multi-class, as shown in Figure 9. The
Oversampling Technique is applied to the imbalance CICIDS2018 data to convert it into balanced data for
multi-class classifiers, shown in Figure 9.

4.2 Performance Measure

This section discusses the classification metrics for IDS. Table 3 shows the confusion matrix for a two-
class classifier that can be used to evaluate an IDS's performance. Each column of the confusion matrix
indicates the samples in a predicted class, while each row indicates the samples in an actual class. The
diagonal of the confusion matrix represents the correct classification of samples, while the non-diagonal
represents the incorrect classification. The main aspects to consider when measuring the accuracy are

True Positive (TP): The classifier correctly predicts the intrusions attack
True Negative (TN): The classifier correctly predicts the non-intrusions instances
False Positive (FP): The classifier in-correctly predicts the intrusions attack
False Negative (FN): The classifier correctly predicts the non-intrusions instances

Table 3 Confusion Matrix

Actual Class Predicted Class

Class Normal Attack

Normal TN FP

Attack FN TP

Page 10/31
This paper uses popular performance measures, including overall accuracy, decision rates, precision,
recall, and F1-score (Aminanto et al., 2017; Atli, 2017; Hodo et al., 2017), which are briefly discussed.

Accuracy: Accuracy is the most intuitive performance measure of a classification model. It is the ratio of
total correctly predicted samples and the total number of samples in the dataset, as shown in equation 1.
High accuracy means the model is performing well. Accuracy is a valuable measurement only when the
dataset is well balanced.

• Precision: Precision is also a performance measure of correctly classifying data points out of total data
points predicted by the classification model, as shown in equation 2. The higher precision value indicates
the better performance of the model. Precision is also known as a positive predictive value (PPV).
Precision is a good measure to determine when the cost of false positive is high.

• Recall: It measures the sensitivity of the model. The recall is a performance measure of correctly
retrieving the data points. In other words, the recall is the ratio of the total correct class predicted and the
actual data points in the dataset, as shown in equation 3. The recall is also known as the true positive
rate (TPR). The higher recall value indicates the better performance of the model. It is a good metric of
measurement when there is the high cost associated with False Negative.

• F1–Score: It is an instrumental performance measurement technique widely used when the model
produces high recall and low precision or low recall and high precision, i.e. uneven class distribution (a
large number of actual negative classes). F1-Score uses harmonic instead of arithmetic to punish
extreme values shown in equation 4.

AUC-ROC Curves: Area Under the Curve (AUC) and Receiver Operating Characteristics (ROC) curve is an
approach for measuring the performance of a classification model on different thresholds settings. The
curve is plotted between TPR on the y-axis and the FPR on the x-axis to measure the performance of a
classifier. The higher AUC means the classification has high accuracy. It is used to know the capability of
a classification model to separate the classes.

Page 11/31
The matrices mentioned above can be used to measure the performance of both binary and multi-class
IDS in which incidents are classified as either Benign or Malicious or family of Malicious.

4.3 Results Analysis

The final quantitative for each class label is assigned after noise clean-up, as shown in Table 4. It can be
observed from Table 4 that the dataset is highly unbalanced. The NULL values are removed from the
dataset; the missing values are treated carefully and filled with valid data. Then feature
scaling/transformation is performed using MinMaxScaler techniques because the dataset contains
varying magnitude, values, or units. The fixed range values are provided in each column of the datasets.

Table 4 Data Clean-Up

Class Name The value assigned for class Original Sample Final Samples

BENIGN 0 2273097 2273097


DoS Hulk 4 231073 231073

PortScan 10 158930 158930


DDoS 2 128027 128027
DoSGoldenEye 3 10293 10293

FTP-Patator 7 7938 7938


SSH-Patator 11 5897 5897

DoSslowloris 6 5796 5796


DoSSlowhttptest 5 5499 5499

Bot 1 1966 1966


Web Attack – Brute Force 12 1507 1507

Web Attack – XSS 14 652 652


Infiltration 9 36 36

Web Attack – Sql Injection 13 21 21


Heartbleed 8 11 11

Null Value: 288602 ---------------

Total ------------ 3119345 2830743

The Univariate Selection method is used to compute the score of each feature on the whole dataset. The
top 50 features are selected based on the score shown in Table 5. The scikit-learn library provides the
Page 12/31
SelectKBest class to extract the best features of a given dataset. SelectKBest class performs statistical
tests to select features with the strongest relationship with the output or target variable. In this class, the
Chi-Square method is used on the groups of categorical features to evaluate the likelihood of correlation
or association between them using their frequency distribution. Table 5 lists our 50 features selected. The
final considered dataset has 50 feature columns and one column with class labels.

Table 5 Top-50 Features Selected

The top 50 features based on their high score are arranged in descending order

Total Length of Bwd Packets, SubflowBwd Bytes, Fwd PSH Flags, SYN Flag Count, URG Flag Count,
Timestamp, Init_Win_bytes_backward, Average Packet Size, Fwd IAT Total, Packet Length Mean, Flow
Duration, Bwd Packet Length Mean, AvgBwd Segment Size, Bwd Packet Length Std, Destination Port,
Idle Max, Packet Length Std, Bwd IAT Max, Fwd IAT Max, Flow IAT Max, Bwd IAT Total, Bwd Packet
Length Max, Bwd IAT Mean, Fwd Header Length, Fwd Header Length.1, Idle Mean, Bwd IAT Min, ACK
Flag Count, Flow IAT Std, Flow IAT Mean, Idle Min, Max Packet Length, Bwd IAT Std, Total Fwd
Packets, SubFlowFwd Packets, Packet Length Variance, Bwd Header Length, Bwd Packet Length Min,
Down/Up Ratio, Fwd IAT Std, Fwd IAT Mean, Active Min, Fwd IAT Min, Total Backward Packets,
SubflowBwd Packets, Init_Win_bytes_forward, Idle Std, Active Mean, PSH Flag Count, Total Length of
Fwd Packets

Imbalanced Learning: Imbalanced classification is a classification problem when unequal classes are in
the training dataset. The imbalanced class distribution may vary, but modelling severe imbalance data
may require more specialized techniques. The dataset is classified into two sections Binary classification
and Multi-class classification. Both classifications contain an Imbalanced dataset.

a) Binary Classification: Binary or binomial classification uses classification rules to classify elements of
a given set into two groups. The IDS dataset contains a target labelled as Benign and Malware in the
form of Binary classification. In this dataset target is Imbalanced. SMOTETomek method is used to
balance the dataset present in imblearn.combine library.

b) Multi-class Classification: In machine learning, multi-class or multinomial classification is the problem


of classifying instances into one of three or more classes. That means those data sets that contain more
than two targets or labels. The IDS dataset target or label had Benign and some family of malware, so the
dataset was classified as multi-class classification, and these were imbalanced. So for balancing the
dataset, the RandomOverSampler method presents in imblearn.over_sampling library is used.

The hyper-parameter techniques, i.e., GridSearchCV and RandomizedSearchCV, are employed to search
for the best parameter for all classifiers according to the dataset. The target in the dataset is classified
using a binary-class classifier and a multi-class classifier. So, ten popular machine-learning classification
models are used based on binary-class and multi-class classifiers. The results of these models are
evaluated on various factors such as Score, Precision, Recall, F1_score, Accuracy, and Total time (in
second) taken by each algorithm.

Page 13/31
4.3.1 Binary Classifier

In the Binary classifier, target samples in the CICIDS2018 dataset are classified into two classes. All
classifiers and their accuracy, precision, recall, f1 score and time are shown in Table 6. It can be observed
from Table 6 that the top three best classifiers are KNeighbors (99.49 %), XGBoost (99.14 %) and
AdaBoost (98.75 %).

Table 6 Performance Comparison of Binary Classifiers

Score Precision Recall F1_Score Accuracy Time (s)

Adaboost 0.987587 0.987587 0.987587 0.987587 98.75873 6.26027

Decision Tree 0.943674 0.943674 0.943674 0.943674 94.36737 0.35466

GaussianNB 0.863565 0.863565 0.863565 0.863565 86.35652 0.02468

KNeighbors 0.994993 0.994993 0.994993 0.994993 99.49932 2.03291

Logistic 0.927402 0.927402 0.927402 0.927402 92.74016 0.45754

MultinomialNB 0.649630 0.649630 0.649630 0.649630 64.96297 0.01359

RandomForest 0.951393 0.951393 0.951393 0.951393 95.13925 2.58871

SGDClassifier 0.924272 0.924272 0.924272 0.924272 92.42724 0.06629

SVM 0.935851 0.935851 0.935851 0.935851 93.58506 13.5986

XGBoost 0.991447 0.991447 0.991447 0.991447 99.14467 3.85169

Box plotting is an excellent tool for identifying outliers and comparing distributions. The Box plots chart
is shown in Figure 11. It helps us better understand and visualize how values are spaced out in different
data sets.

The ROC curve of the Binary Classifier is shown in Figure 12. The accuracy of a testing model is
evaluated based on how well the model distinguishes between malware and Benign. The ROC curve is
plotted considering the Sensitivity or TPR and FPR. The colour denotes the threshold value for each TPR
and FPR pair. Its threshold will be around one if the given instance has a high degree of affinity for the
class. Hence, for a higher threshold of instance, darker will be the colour in the ROC.

The AUC (Area under the Curve) measures the proportion of correctly classified test data. AUC value one
represents a perfect test, whereas 0.5 represents a minor accurate test. In Figure 12, KNeighbors, XGBoost
and AdaBoost are close to 1 and have a larger area under the curve than all other classifiers. It can be
observed from Figure 12 that KNeighbors, XGBoost and AdaBoost classified most of the samples
correctly and have a higher percentage of accuracy than other classifiers.

Page 14/31
Figure 13 depicts a histogram, a sort of bar chart used to show the frequency distribution of continuous
data. Class or bin is used to indicate the number of observations between the range of values. This
histogram represents the Test score of precision, recall, f1_score, and score of each classifier.

4.3.2 Multi-class classifier

Target samples in the CICIDS2018 dataset are classified into multi-class in the multi-class classifier. The
multi-class classifiers and their accuracy, precision, recall, f1_score and time are shown in Table 7. The
box plots chart and histogram plots of the Multi-class classifier are shown in Figure 14 and Figure 15.

Table 7 Performance Comparison of Multi-class Classifiers

Score Precision Recall F1_Score Accuracy Time

Adaboost 0.949037 0.949037 0.949037 0.949037 94.903704 4.45012

Decision Tree 0.928148 0.928148 0.928148 0.928148 92.814815 0.02235

GaussianNB 0.966667 0.966667 0.966667 0.966667 96.666667 0.03283

KNeighbors 0.988889 0.988889 0.988889 0.988889 98.888889 0.93272

Logistic 0.903407 0.903407 0.903407 0.903407 90.340741 2.39280

MultinomialNB 0.614963 0.614963 0.614963 0.614963 61.496296 0.01185

RandomForest 0.947556 0.947556 0.947556 0.947556 94.755556 2.67216

SVM 0.914370 0.914370 0.914370 0.914370 91.437037 3.47171

XGBoost 0.993037 0.993037 0.993037 0.993037 99.303704 53.44061

It can be observed that the model XGBoost, K-Neighbors and GaussianNB perform better than other multi-
class classifiers with 99.30%, 98.88% and 96.66% accuracy, respectively.

5. Conclusion And Future Work


In this paper, various intrusion detection systems (IDSs) are reviewed. It is observed that signature-based
IDS systems are incapable of protecting network resources due to the size and complexity of protocol
increases in the communication network. In this scenario, machine learning-based IDSs outperform as
compared to signature-based IDSs. Ten popular machine learning-based IDS models, i.e., Adaboost,
Decision Tree, GaussianNB, KNeighbors, Logistic, MultinomialNB, RandomForest, SVM, and XGBoost, are
chosen, implemented and trained on the CICDS2018 dataset to identify the best machine learning-based
IDS. The various techniques regarding the application problem domain, i.e., feature selection techniques,
tuning parameters, and evaluation criteria, are used to fine-tune the models and optimize the results. After

Page 15/31
that, results were also compared between the Binary class classifier and the Multi-class classifier on the
CICDS2018 dataset. It is observed that the top three IDS models, i.e., KNeighbors, XGBoost and AdaBoost,
outperform in binary-class classification with 99.49%, 99.14% and 98.75% accuracy, respectively. It is also
observed that the top three IDS models, i.e., XGBoost, KNneighbors and GaussianNB, outperform in multi-
class classification with 99.30%, 98.88% and 96.66% accuracy, respectively. Overall, the XGBoostbased
IDS model outperforms in all multi-class classification models. The KNeighbors based IDS model
outperforms in all binary-class classification models on the CICDS2018 dataset with more than 99%
accuracy.

In future work, the adversarial example that an attacker has intentionally designed to cause the model to
make a mistake can be considered an input to the different machine learning models to understand the
vulnerability of machine learning classifiers.

Declarations
The authors declare no conflict of interest. This research received no specific grant from any funding
agency in the public, commercial, or not-for-profit sectors.

Ethical approval

This article does not contain any studies with human participants or animals performed by any authors.

Data availability

The datasets analyzed during the current study are available in the Canadian Institute for Cybersecurity
repository.

https://www.unb.ca/cic/datasets/ids-2018.html

Contribution of Authors

The first author helps in conceptualizing the problem for the analysis and drafting the flow of the paper.
The second and third authors implemented the machine learning models with detailed analysis.

References
1. Aburomman, A.A., Ibne Reaz, M., Bin: A novel SVM-kNN-PSO ensemble method for intrusion detection
system. Appl. Soft Comput. J. 38, 360–372 (2016). https://doi.org/10.1016/j.asoc.2015.10.011
2. Acharya, N., Singh, S.: An IWD-based feature selection method for intrusion detection system. Soft.
Comput. 22(13), 4407–4416 (2018). https://doi.org/10.1007/s00500-017-2635-2
3. Al-Omari, M., Rawashdeh, M., Qutaishat, F., Alshira'H, M., Ababneh, N.: An Intelligent Tree-Based
Intrusion Detection Model for Cyber Security. J. Netw. Syst. Manage. 29(2), 20 (2021).
https://doi.org/10.1007/s10922-021-09591-y
Page 16/31
4. Alazab, A., Hobbs, M., Abawajy, J., Alazab, M.: Using feature selection for intrusion detection system.
2012 International Symposium on Communications and Information Technologies, ISCIT 2012, 296–
301. (2012). https://doi.org/10.1109/ISCIT.2012.6380910
5. Alazzam, H., Sharieh, A., Sabri, K.E.: A feature selection algorithm for intrusion detection system
based on Pigeon Inspired Optimizer. Expert Systems with Applications, 148. (2020).
https://doi.org/10.1016/j.eswa.2020.113249
6. Aldallal, A., Alisa, F.: Effective intrusion detection system to secure data in cloud using machine
learning. Symmetry. 13(12) (2021). https://doi.org/10.3390/sym13122306
7. Aminanto, M.E., Choi, R., Tanuwidjaja, H.C., Yoo, P.D., Kim, K.: Deep abstraction and weighted feature
selection for Wi-Fi impersonation detection. IEEE Trans. Inf. Forensics Secur. (2017).
https://doi.org/10.1109/TIFS.2017.2762828
8. Amouri, A., Alaparthy, V.T., Morgera, S.D.: A machine learning based intrusion detection system for
mobile internet of things. Sens. (Switzerland). 20(2), 1–6 (2020).
https://doi.org/10.3390/s20020461
9. Ashraf, J., Latif, S.: Handling intrusion and DDoS attacks in Software Defined Networks using
machine learning techniques. National Software Engineering Conference, NSEC 2014. (2014).
https://doi.org/10.1109/NSEC.2014.6998241
10. Atli, B.G.: Anomaly-Based Intrusion Detection by Modeling Probability Distributions of Flow
Characteristics. Aalto University (2017)
11. Axelsson, S.: Intrusion Detection Systems: A Survey and Taxonomy. Technical Report, 99, 1–15.
(2000). https://doi.org/10.1.1.1.6603
12. Bennett, K.P., Demiriz, A.: Semi-supervised support vector machines.Advances in Neural Information
Processing Systems. (1999)
13. Biswas, S.K.: Intrusion Detection Using Machine Learning: A Comparison Study.International Journal
of Pure and Applied Mathematics. (2018)
14. Buczak, A.L., Guven, E.: A Survey of Data Mining and Machine Learning Methods for Cyber Security
Intrusion Detection. IEEE Commun. Surv. Tutorials. 18(2), 1153–1176 (2016).
https://doi.org/10.1109/COMST.2015.2494502
15. Butun, I., Morgera, S.D., Sankar, R.: A survey of intrusion detection systems in wireless sensor
networks. IEEE Commun. Surv. Tutorials. 16(1), 266–282 (2014).
https://doi.org/10.1109/SURV.2013.050113.00191
16. Chen, C., Gong, Y., Tian, Y.: Semi-supervised learning methods for network intrusion detection.
Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. (2008).
https://doi.org/10.1109/ICSMC.2008.4811688
17. da Costa, K.A.P., Papa, J.P., Lisboa, C.O., Munoz, R., de Albuquerque, V.H.C.: Internet of Things: A
survey on machine learning-based intrusion detection approaches. Comput. Netw. 151, 147–157
(2019). https://doi.org/10.1016/j.comnet.2019.01.023

Page 17/31
18. Dua, S., Du, X.: Data Mining and Machine Learning in Cybersecurity. In Data Mining and Machine
Learning in Cybersecurity. (2016). https://doi.org/10.1201/b10867
19. Effendy, D.A., Kusrini, K., Sudarmawan, S.: Classification of intrusion detection system (IDS) based
on computer network. 2017 2nd International Conferences on Information Technology, Information
Systems and Electrical Engineering (ICITISEE), 90–94. (2017).
https://doi.org/10.1109/ICITISEE.2017.8285566
20. Hamon, J.: Combinatorial optimization for variable selection in high dimensional regression:
Application in animal genetic. Université des Sciences et Technologie de Lille, Lille I (2013)
21. Hernandez, J.C., Duval, B., Hao, J.K.: A genetic embedded approach for gene selection and
classification of microarray data. Lecture Notes in Computer Science (Including Subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4447 LNCS, 90–101. (2007).
https://doi.org/10.1007/978-3-540-71783-6_9
22. Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., Atkinson, R.: Shallow and Deep Networks Intrusion
Detection System: A Taxonomy and Survey. CoRR, abs/1701.0. (2017)
23. Imran, Jamil, F., Kim, D.: An ensemble of a prediction and learning mechanism for improving
accuracy of anomaly detection in network intrusion environments. Sustain. (Switzerland). 13(18)
(2021). https://doi.org/10.3390/su131810057
24. Imrana, Y., Xiang, Y., Ali, L., Abdul-Rauf, Z.: A bidirectional LSTM deep learning approach for intrusion
detection. Expert Syst. Appl. 185(July), 115524 (2021). https://doi.org/10.1016/j.eswa.2021.115524
25. Jabbar, M.A., Aluvalu, R., Reddy, S.S.: RFAODE: A Novel Ensemble Intrusion Detection System.
Procedia Comput. Sci. 115, 226–234 (2017). https://doi.org/10.1016/j.procs.2017.09.129
26. Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems:
techniques, datasets and challenges. Cybersecurity. 2(1), 1–22 (2019).
https://doi.org/10.1186/s42400-019-0038-7
27. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. In Nature. (2015).
https://doi.org/10.1038/nature14539
28. Liao, H.J., Lin, R., Lin, C.H., Tung, K.Y.: Intrusion detection system: A comprehensive review. In Journal
of Network and Computer Applications. (2013). https://doi.org/10.1016/j.jnca.2012.09.004
29. Lin, W.C., Ke, S.W., Tsai, C.F.: CANN: An intrusion detection system based on combining cluster
centers and nearest neighbors. Knowl. Based Syst. 78(1), 13–21 (2015).
https://doi.org/10.1016/j.knosys.2015.01.009
30. Low, C.: Understanding Wireless Attacks and Detection. In: SANS Institute InfoSec Reading Room, pp.
1–22. SANS Institute InfoSec Reading Room (2005)
31. McAfee:. McAfee Labs COVID-19 Threats Report. (2020)
32. Mitchell, R., Chen, I.-R.: A survey of intrusion detection techniques for cyber-physical systems. ACM
Comput. Surveys. 46(4), 1–29 (2014). https://doi.org/10.1145/2542049

Page 18/31
33. Naseri, T.S., Gharehchopogh, F.S.: A Feature Selection Based on the Farmland Fertility Algorithm for
Improved Intrusion Detection Systems. J. Netw. Syst. Manage. 30(3) (2022).
https://doi.org/10.1007/s10922-022-09653-9
34. Niyaz, Q., Sun, W., Javaid, A.Y., Alam, M. A deep learning approach for network intrusion detection
system. EAI International Conference on Bio-Inspired Information and Communications Technologies
(BICT). (2015). https://doi.org/10.4108/eai.3-12-2015.2262516
35. Novinson, M. The 11 Biggest Ransomware Attacks Of 2020.Www.Crn.Com,1–2. (2020), June
36. Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using feature selection. J. Bioinform. Comput.
Biol. 4(2), 241–257 (2006). https://doi.org/10.1142/S0219720006001941
37. Resende, P.A.A., Drummond, A.C.: A survey of random forest based methods for intrusion detection
systems. ACM Comput. Surveys. 51(3) (2018). https://doi.org/10.1145/3178582
38. Sarker, I.H., Abushark, Y.B., Alsolami, F., Khan, A.I.: IntruDTree: A machine learning based cyber
security intrusion detection model. Symmetry. 12(5) (2020). https://doi.org/10.3390/SYM12050754
39. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A. Toward generating a new intrusion detection dataset
and intrusion traffic characterization. ICISSP 2018 - Proceedings of the 4th International Conference
on Information Systems Security and Privacy. (2018). https://doi.org/10.5220/0006639801080116
40. Sultana, N., Chilamkurti, N., Peng, W., Alhadad, R.: Survey on SDN based network intrusion detection
system using machine learning approaches. Peer-to-Peer Netw. Appl. 12(2), 493–501 (2019).
https://doi.org/10.1007/s12083-017-0630-0
41. Syarif, I., Prugel-Bennett, A., Wills, G.: Unsupervised Clustering Approach for Network Anomaly
Detection. Commun. Comput. Inform. Sci. (2012). https://doi.org/10.1007/978-3-642-30507-8_7
42. Tawil, A.A., Sabri, K.E. A feature selection algorithm for intrusion detection system based on Moth
Flame Optimization. 2021 International Conference on Information Technology, ICIT 2021 -
Proceedings, 377–381. (2021). https://doi.org/10.1109/ICIT52682.2021.9491690
43. Utimura, L.N., Costa, K.A. Aplicação e Análise Comparativa do Desempenho de Classificadores de
Padrões para o Sistema de Detecção de Intrusão Snort. Anais Do XXXVI Simpósio Brasileiro de
Redes de Computadores e Sistemas Distribuídos. (2018)
44. Varghese, J.E., Muniyal, B. An investigation of classification algorithms for intrusion detection
system - A quantitative approach. 2017 International Conference on Advances in Computing,
Communications and Informatics, ICACCI 2017. (2017).
https://doi.org/10.1109/ICACCI.2017.8126146
45. Vimala, S., Khanaa, V., Nalini, C.: A study on supervised machine learning algorithm to improvise
intrusion detection systems for mobile ad hoc networks. Cluster Comput. 22(s2), 4065–4074 (2019).
https://doi.org/10.1007/s10586-018-2686-x
46. Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Al-Nemrat, A., Venkatraman, S.: Deep
Learning Approach for Intelligent Intrusion Detection System. IEEE Access. 7(c), 41525–41550
(2019). https://doi.org/10.1109/ACCESS.2019.2895334

Page 19/31
47. Warsi, S., Dubey, P.P.: Literature Review of Various Data Mining Based Techniques for Ids Data
Classification. Int. J. Innovative Res. Technol. 5(12), 68–70 (2019)
48. WHO:. WHO reports fivefold increase in cyber attacks, urges vigilance. Coronavirus Disease (COVID-
19) Pandemic. (2020)
49. Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: A review.
Appl. Soft Comput. J. 10(1), 1–35 (2010). https://doi.org/10.1016/j.asoc.2009.06.019

Figures

Figure 1

The life cycle of Machine learning-based IDS

Page 20/31
Figure 2

CICDS2018 dataset Columns

Page 21/31
Figure 3

CICDS2018 with NULL Values

Figure 4

CICDS2018 without NULL values

Page 22/31
Figure 5

CICDS2018 with Correlation Map

Page 23/31
Figure 6

CICDS2018 with Feature Score

Page 24/31
Figure 7

Imbalance Data Binary Class

Page 25/31
Figure 8

Balanced Data Binary Class

Figure 9

Imbalance Data Multi-Class

Figure 10
Page 26/31
Balanced Data Multi-Class

Figure 11

Box-Plot for Binary Classifier

Page 27/31
Figure 12

ROC Curve of Binary Classifier

Page 28/31
Figure 13

Test Score of Binary Classifiers

Page 29/31
Figure 14

Box-Plot of Multi-class Classifier

Page 30/31
Figure 15

Test Score of Multi-Class Classifiers

Page 31/31

You might also like