0% found this document useful (0 votes)
24 views

Intrusion Detection Using Explainable Machine Learning Techniques

The document discusses using explainable machine learning techniques for intrusion detection to improve both detection accuracy and the ability to interpret model decisions. It involves utilizing algorithms like LIME and SHAP to create models that predict intrusions and provide insights into influencing features. The proposed approach is tested on an intrusion detection dataset and its performance is compared to traditional machine learning methods.

Uploaded by

eshensanjula2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Intrusion Detection Using Explainable Machine Learning Techniques

The document discusses using explainable machine learning techniques for intrusion detection to improve both detection accuracy and the ability to interpret model decisions. It involves utilizing algorithms like LIME and SHAP to create models that predict intrusions and provide insights into influencing features. The proposed approach is tested on an intrusion detection dataset and its performance is compared to traditional machine learning methods.

Uploaded by

eshensanjula2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

21st OITS International Conference on Information Technology (OCIT 2023)

Intrusion Detection using Explainable Machine


Learning Techniques
1st Rishikant Mallick 2nd Smriti Rout 3rd Soumyabrata Biswas
2023 OITS International Conference on Information Technology (OCIT) | 979-8-3503-5823-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/OCIT59427.2023.10430641

Department of Computer Science Department of Computer Science Department of Computer Science


Kalinga Institute of Industrial Technology Siksha’O’Anusandhan Kalinga Institute of Industrial Technology
Bhubaneswar, India Bhubaneswar, India Bhubaneswar, India

4th Lalit Vashishtha 5th Santosh Kumar Sahu


Department of Computer Science GEOPIC
Kalinga Institute of Industrial Technology Oil and Natural Gas Corporation
Bhubaneswar, India Dehradun, India

Abstract—With the rise in complex cyber threats, there is a grown in sophistication, understanding the underpinnings of
pressing need for accurate and interpretable methods to detect AI-generated predictions becomes paramount. To this end, our
intrusions effectively. This research investigates the fusion of study encompasses a diverse array of classification and regres-
explainable machine learning techniques with intrusion detection,
aiming to improve both detection accuracy and the ability to sion techniques, ranging from traditional models to state-of-
interpret model decisions. The study involves the utilization the-art deep learning approaches. Each method is meticulously
of various explainable machine learning algorithms, such as dissected, not only for its predictive prowess but also for its
LIME (Local Interpretable Model-Agnostic Explanations) and interpretability. By employing various XAI techniques, we aim
SHAP (SHapley Additive exPlanations), to create models that to shed light on the decision-making processes of these AI
not only predict intrusions but also provide insights into the
features influencing these predictions. The proposed approach models, unraveling the complex features and data interactions
is tested on a benchmark intrusion detection dataset, and its that drive their predictions. Furthermore, we delve into the
performance is compared with traditional machine learning practical implications of such interpretability, focusing on the
methods. The results demonstrate that the explainable machine empowerment it grants cybersecurity professionals in rapidly
learning-based intrusion detection approach achieves competitive identifying and mitigating emerging threats.
detection accuracy while also offering valuable explanations for
each prediction. This enhanced interpretability aids cybersecurity
experts in comprehending why certain instances are flagged II. L ITERATURE R EVIEW
as intrusions. By combining accuracy and transparency, this
research contributes to the development of more reliable and A novel approach employing a deep quantum neural net-
understandable intrusion detection systems, thereby bolstering work based on single-qubit encoding for efficient quantum
cyber defense strategies in an increasingly intricate digital land-
scape.
image classification as approached by [1] . This approach
Index Terms—Explainable Artificial Intelligence, Intrusion mirrors traditional convolutional neural networks (CNNs) used
Detection, Cyber Security in classical deep learning while reducing parameter require-
ments compared to previous quantum image classification
I. I NTRODUCTION models. The study showcases the viability of this proposal
by achieving classification accuracies of 94.6%, 89.5%, and
In the realm of modern cybersecurity, the integration of 82.5% on subsets of MNIST, Fashion-MNIST, and ORL
artificial intelligence (AI) has ushered in a new era of threat face datasets, respectively, in noisy simulation environments
detection and mitigation. However, as AI algorithms become similar to the NISQ era. Similarly, Artificial Intelligence (AI)
increasingly intricate, their decision-making processes often pervades daily life, but its opacity raises concerns, particularly
retreat into obscurity, raising concerns about accountability in CyberSecurity where trusting unexplainable AI decisions
and interpretability. This has given rise to the burgeoning poses risks.
field of Explainable AI (XAI), which seeks to illuminate A survey on the integration of Explainable Artificial Intel-
the inner workings of these algorithms and enable informed ligence (XAI) in the realm of cyber security [2]. It exam-
decision-making within the context of cybersecurity. This ines how AI, specifically Machine Learning (ML) and Deep
research paper embarks on a comprehensive exploration of Learning (DL), is employed for tasks like intrusion detection
the pivotal role that XAI plays in enhancing the effectiveness and malware identification. However, the opacity of many AI
of diverse classification and regression methods within the models hinders understanding their decisions. XAI principles
cybersecurity domain. In an era where cyber threats have aim to make AI more transparent and interpretable, enhancing
979-8-3503-5823-0/23/$31.00 ©2023 IEEE user trust. This survey fills a gap by focusing on XAI’s

564

Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
application in cyber security, offering insights into challenges, real-world datasets/models validate findings, highlighting
frameworks, and datasets. cybersecurity risks tied to counterfactual explanations.
A two-stage pipeline for network intrusion detection, aiming
to enhance the system’s accuracy and interpretability [3]. In A review on literature from 2017 to 2022, exploring quan-
the first stage, they used an XGBoost model for supervised tum machine learning in intrusion detection systems (IDS)
detection of malicious network traffic. To explain this model’s [9] . They focused on quantum algorithms, especially hybrid
decisions, they employed the SHAP framework. In the second models like quantum support vector machines and quantum
stage, they designed an anomaly-based system using a deep neural networks.Their findings showcased quantum’s advan-
autoencoder to identify deviations from the model’s behavior tages, like quicker training and better accuracy in spotting
during training. malicious network activity. This indicates quantum comput-
A review on the integration of quantum and classical ing’s potential for improving machine learning in intrusion
machine learning while addressing the rising importance of detection.As internet complexity grows, cyberattacks on DNS
network security due to increased cyber network usage [4] . rise. Traditional methods fall short, leading to AI solutions.
Intrusion detection systems (IDSs) are essential for securing Initially, rule-based, case-based, and ML approaches were
networks, and the paper proposes a SHAP-based framework to used. Advanced ML models improved predictions but lacked
enhance their interpretability. This framework provides local transparency.
and global explanations for IDS predictions, aiding cyberse- As IoT becomes more pervasive, cybersecurity challenges
curity experts in understanding and building trust. escalate due to constant connectivity and resource limitations
The study compares quantum and classical machine learning [10] . The research proposes an XAI-powered framework
algorithms and explores the potential of quantum computing combining Deep Learning (DL) and XAI techniques (SHAP,
for improving machine learning tasks [5]. Additionally, the RuleFit, LIME) to improve the interpretability of DL-based
paper proposes an XAI model for in-vehicle intrusion de- Intrusion Detection Systems for IoT. Testing on real datasets
tection systems (IV-IDS) to enhance trust and transparency. validates its effectiveness against various attacks. Key contri-
The proposed model, VisExp, uses the SHAP method for butions include a novel framework, DL-based architecture for
explanation, and a user survey shows increased trust in the IoT security, integration of XAI methods, and validation on
AI-based IV-IDS with explanatory insights. NSL-KDD and UNSW-NB15 datasets. The paper’s structure
The study introduces a Quantum Support Vector Machine encompasses related works, the proposed framework, perfor-
(QSVM) model for optimizing urban services like mobility, mance evaluation, and conclusions.
security, and healthcare [6]. This model utilizes quantum
computing capabilities to enhance tasks such as identifying III. M ETHODOLOGY
DDoS attacks in smart micro-grids. Real DDoS attack data Explainable AI (XAI) is imperative in cybersecurity
validates the model’s effectiveness. The paper concludes by to unravel the intricate decisions made by AI systems.
discussing the potential of merging quantum computing and As AI increasingly drives threat detection and response,
machine learning, while acknowledging existing challenges. XAI offers transparency, helping cybersecurity experts
The cybersecurity community is adopting Machine Learning understand, validate, and trust these automated processes.
(ML) to counter evolving threats [7]. To ensure the successful The transparency provided by XAI enhances accountability,
integration of these models, it’s crucial for domain experts aids in detecting potential biases or vulnerabilities in models,
and users to understand and trust their functioning. As black- and empowers decision-making. In a landscape where rapid,
box models are increasingly used for critical predictions, the accurate responses are vital, XAI bridges the gap between
demand for transparency and explainability grows. This is advanced AI technologies and the need for comprehensible,
especially important in cybersecurity, where detailed insights reliable cybersecurity measures.
are needed beyond binary outputs. Recent research has focused
on enhancing explainability methods, attacking interpreters in 1) LIME (Local Interpretable Model-Agnostic Explana-
white-box settings, and defining explanation properties. tions): It is a machine learning technique that provides local,
The study investigates quantum computing’s parallelism easily understandable explanations for model predictions by
for faster machine learning, focusing on quantum algorithms’ approximating a complex model’s behavior in a simplified
potential in real-world applications like classification and manner, helping to make black-box models more interpretable.
clustering [8]. Explainable AI (XAI) is explored in diverse 2) SHAP (SHapley Additive exPlanations): It is a technique
disciplines, with historical shifts in explanation focus. XAI in Explainable Artificial Intelligence (XAI) that quantifies the
methods are classified by timing and provide global/local contribution of each feature to a model’s prediction. It offers
insights. Security and reliability are crucial for XAI adoption. a unified, game-theoretic approach for explaining complex
Real-world XAI tests reveal the importance of explanations, model outcomes in a comprehensible manner.
despite challenges including security risks and adversarial
attacks. The study offers a comprehensive security analysis of Machine learning techniques involve creating algorithms
explanation use, covering attacks like membership inference, that enable computers to learn and make predictions from
model extraction, and poisoning. Unified framework, data. Explainable AI (XAI) focuses on making AI decisions

565

Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
interpretable to humans. This is crucial due to the complexity the creation and evaluation of intrusion detection systems. The
of modern AI models, ensuring accountability, complying with dataset is divided into training and testing subsets and provided
regulations, detecting biases, improving models, fostering with binary labels indicating the presence of attacks.
human-AI collaboration, and facilitating education. XAI
techniques simplify complex models, reveal decision factors, B. CICIDS 2017 Dataset
and enhance trust, making AI systems transparent, ethical, The CIC IDS 2017 dataset is derived from actual network
and effective across various domains. activity and is divided into two sections: one with four attack-
launching machines and the other with ten victim machines.
3) Support Vector Machines (SVMs):: Support Vector Ma- 50 GBytes of raw data are included in this dataset in PCAP
chines (SVM) is a supervised machine learning technique files, along with 84 features that are listed in CSV files. It
suitable for classification and regression. It excels at binary includes a sizable 2,830,743 instances and captures network
classification by determining the optimal decision boundary, or activity over a 5-day period. The dataset divides network traffic
hyperplane, between data points. Especially effective in high- into 15 groups, including both regular traffic and 14 different
dimensional spaces, SVM identifies support vectors—closest attack methods.
points to the decision boundary. The margin, or space around
the hyperplane, is established based on these vectors. C. UNSWNB15 Dataset
4) K Nearest Neighbour(KNN): KNN is a supervised ma- Raw network packets from the UNSW-NB15 dataset were
chine learning algorithm that can be used for classification and painstakingly assembled in the Cyber Range Lab of UNSW
regression tasks. It works by finding the k most similar data Canberra, fusing real-world contemporary activities with artifi-
points to a new data point and then assigning the new data cially generated attack behaviors using the IXIA PerfectStorm
point to the class of the majority of the k nearest neighbors. program. The tcpdump tool was used to capture 100 GB of
KNN is a simple and effective machine learning algorithm that raw traffic, producing Pcap files. The dataset includes class
can be used for a variety of tasks. It is particularly well-suited labels produced by twelve algorithms and 49 characteristics.
for problems where the data is not linearly separable or where With 175,341 and 82,332 records, respectively, a segmentation
the use of other machine learning algorithms is not feasible. produces a training set and testing set that include various
5) Random Forest: It is a machine learning ensemble attack and typical instance records.
method. It combines multiple decision trees to enhance accu-
racy and mitigate overfitting. Each tree is trained on a random IV. I MPLEMENTATION
subset of the data and provides a prediction. The final outcome In the implementation phase of our research, we uti-
is determined by aggregating the predictions of all trees. This lized three distinct datasets: CICIDS 17, NSL-KDD, and
approach is valuable for classification and regression tasks due UNSWNB15. To ensure the reliability of our subsequent
to its robustness and generalization ability. analyses, we initiated the process with comprehensive data
6) Gradient Boosting: It is a machine learning ensemble cleaning to remove any inconsistencies or errors present in the
technique that sequentially builds a strong predictive model datasets. Subsequently, we performed rigorous normalization
by combining the outputs of multiple weak models. It does to standardize the data, ensuring that it is on a consistent
this by emphasizing the correct prediction of instances that scale and format. Additionally, we strategically partitioned
previous models struggled with. It minimizes errors by ad- the data, possibly into training, validation, and testing sets,
justing weights during training, creating a robust final model. to ensure that the model’s performance assessment is accurate
Gradient boosting is widely used for tasks like classification and representative of real-world scenarios.
and regression due to its ability to handle complex relation- Within this framework, we incorporated eXplainable Artifi-
ships within data. cial Intelligence (XAI) techniques, including LIME and SHAP.
7) Adaptive Boosting: AdaBoost is a boosting algorithm LIME allowed us to provide local, interpretable explanations
that creates a strong classifier by combining weak classifiers. for individual model predictions, helping us understand how
It works by training a series of weak classifiers and then our machine learning algorithms arrived at specific outcomes.
weighting the predictions of each classifier according to its SHAP, on the other hand, offered a holistic view by quantify-
accuracy. The final prediction is made by combining the ing the feature contributions to model predictions. These steps
predictions of all the classifiers. ensured our research produced interpretable and transparent
results, enhancing insights into the decision-making processes
A. NSL-KDD Dataset of these models. This comprehensive approach exemplifies
In the field of network security and intrusion detection the sequence of steps that our study followed, showcasing
research, the NSL-KDD dataset is a crucial resource. NSL- how these initial data preparation steps, coupled with XAI
KDD, which is derived from the KDD Cup 1999 dataset, techniques, fit within the broader process.
addresses the shortcomings of its forerunner by getting rid In our pursuit of meaningful results, we tailored our ap-
of redundant information and adding a wider variety of attack proach to evaluation techniques. Specifically, we selected
scenarios. It includes information about network traffic that evaluation methods, denoted as, that align with the distinctive
reflects both legitimate and illicit activity and is essential for characteristics of each dataset. This strategy ensured that our

566

Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
Intrusion Datasets
NSLKDD, UNSW & CICIDS

Data Cleaning

ĂƚĂWƌĞͲWƌŽĐĞƐƐŝŶŐΘ
&ĞĂƚƵƌĞZĞĚƵĐƚŝŽŶ
Outlier Removal

Normalization

Feature Reduction
(a) Random Forest

ŽŶǀĞŶƚŝŽŶĂů
Random 'ƌĂĚŝĞŶƚ

>ĞĂƌŶŝŶŐ
ŬEE ĚĂƐƚ SVM
Forest ŽŽƐƚŝŶŐ

Model Explanations

ĞŶŝŐŶůĂƐƐ
Decisions
DĂůŝĐŝŽƵƐůĂƐƐ
(b) Gradient Boosting

Fig. 1: Data flow diagram of the study

findings are both statistically robust and contextually relevant


within the context of our data cleaning, normalization, and
incorporation of XAI techniques such as LIME and SHAP.

(c) K-Nearest Neighbors

V. R ESULTS AND D ISCUSSIONS

(d) Adaptive Boosting


In light of the growing threat of cyberattacks, our research
has effectively tackled the vital requirement for precise and
transparent intrusion detection systems. Our work provides
a promising path in the field of cybersecurity by fusing the
capabilities of interpretable models like LIME and SHAP with
the power of machine learning. In addition to maintaining high
detection accuracy, this integration outperforms traditional
black-box techniques by providing insight into the variables
affecting model predictions. It highlights the possibility of a
paradigm change in intrusion detection and stresses the signifi- (e) Support Vector Machine
cance of accuracy and openness. Although there are still issues Fig. 2: Results of (a) RF, (b) GB, (c) kNN,(d) AdaBoost, (e)
with scalability and model complexity, our work advances SVM using NSLKDD dataset.
our understanding of intrusion detection. The combination
of transparency and accuracy can yield more effective cyber
defense measures, which in turn can strengthen cybersecurity
and allow for a proactive response to the increasingly complex
and dynamic character of contemporary cyber threats.

567

Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
(a) Random Forest
(a) Random Forest

(b) Gradient Boosting

(b) Gradient Boosting

(c) K-Nearest Neighbors

(c) K-Nearest Neighbors

(d) Adaptive Boosting

(d) Adaptive Boosting

(e) Support Vector Machine


Fig. 3: Results of (a) RF, (b) GB, (c) kNN, (d) AdaBoost, (e)
SVM using UNSW-NB15 dataset.

(e) Support Vector Machine

568Fig.4: Results of (a) RF, (b) GB, (c) kNN, (d) AdaBoost, (e)
SVM using CICIDS-17 dataset.
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
VI. C ONCLUSION [9] Nida Aslam,Fatima M. Anis ,Irfan Ullah Khan , Samiha Mirza ,
Alanoud AlOwayed , G Reef M. Aljuaid and Reham Baageel, “In-
The primary objective of addressing the imperative need for terpretable Machine Learning Models for Malicious Domains Detec-
accurate, reliable, and transparent intrusion detection systems tion Using Explainable Artificial Intelligence (XAI),” in 2022, visit:
https://doi.org/10.3390/su14127375 .
in the face of escalating cyber threats. By marrying the power [10] Zakaria Abou El Houda , Bouziane Brik and Lyes Khoukhi, “ Why
of machine learning with the interpretability of explainable Should I Trust Your IDS? ” IEEE OPEN JOUNAL OF THE COM-
models, this research has illuminated a promising path forward MUNICATIONS SOCIETY, vol. 3, pp. 1164 - 1176, July. 2022, doi:
10.1109/OJCOMS.2022.3188750.
in the field of cybersecurity.
The investigation showcased the successful integration of
explainable machine learning algorithms, such as LIME and
SHAP, into intrusion detection models. This integration not
only maintained competitive levels of detection accuracy but
also transcended traditional black-box approaches by pro-
viding insights into the factors driving model predictions.
This breakthrough significantly enhances the trustworthiness
of intrusion detection systems, as cybersecurity experts can
now comprehend the decision-making process behind flagged
intrusions.
The results underscore the potential of explainable machine
learning techniques to revolutionize intrusion detection strate-
gies. However, it’s important to acknowledge the ongoing
challenges, including fine-tuning model complexity, scalabil-
ity, and adaptation to evolving threat landscapes.
Ultimately, this study contributes to the paradigm shift
from mere detection to comprehensive understanding. The
synergy between accuracy and transparency holds the key to
more effective cyber defense mechanisms. As organizations
strive to safeguard their digital assets, the adoption of these
explainable machine learning techniques could play a pivotal
role in fortifying their cybersecurity posture and fostering a
proactive response to the dynamic and sophisticated nature of
modern cyber threats.

R EFERENCES
[1] Nicola Capuano; Giuseppe Fenza; Vincenzo Loia; Claudio Stanzione,
“Explainable Artificial Intelligence in CyberSecurity,” IEEE Access,
2022, pp. 93575 - 93600 doi:10.1109/ACCESS.2022.3204171.
[2] Zhibo Zhang, Hussam Al Hamadi, Ernesto Damiani and Chan Yeob
Yeun, “Explainable Artificial Intelligence Applications in Cyber Secu-
rity,” in IEEE Access, vol:10, 2022, pp. 93104 - 93139. doi: 10.1109/AC-
CESS.2022.3204051.
[3] Pieter Barnard, Nicola Marchetti and Luiz A. DaSilva, “Robust Network
Intrusion Detection Through Explainable Artificial Intelligence (XAI)”,
pp. 167 - 171, Vol: 4 , doi: 10.1109/LNET.2022.3186589.
[4] E. H. Houssein, Z. Abohashima, M. Elhoseny, and W. M. Mohamed,
“An Explainable Machine Learning Framework for Intrusion Detec-
tion Systems,” IEEE Access, vol. 10. Oct , 2022, doi: 10.1109/AC-
CESS.2022.3208573.
[5] Carolina Sanchez Hernandez; Samuel Ayo; Dimitrios Panagiotakopou-
los, “Experimental Analysis of Trustworthy In-Vehicle Intrusion De-
tection System Using eXplainable Artificial Intelligence (XAI),” 2021,
IEEE/AIAA 40th Digital Avionics Systems Conference (DASC),doi:
10.1109/DASC52595.2021.9594341.
[6] Carolina Sanchez Hernandez, Samuel Ayo And Dimitrios Panagio-
takopoulos, “An Explainable Artificial Intelligence (xAI) Framework
for Improving Trust in Automated ATM Tools,” Energies (Basel), Nov.
2021, doi: 10.1109/DASC52595.2021.9594341.
[7] Aditya Kuppa; Nhien-An Le-Khac, “Black Box Attacks on Explainable
Artificial Intelligence(XAI) methods in Cyber Security,” 2020. doi:
10.1109/IJCNN48605.2020.9206780 .
[8] Aditya Kuppa, and Nhien-An Le-Khac, “ Adversarial XAI Methods in
Cybersecurity ” IEEE Transactions on Information Forensics and Secu-
rity, vol. 16, pp. 4924 - 4938, 2021, doi: 10.1109/TIFS.2021.3117075.

569

Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.

You might also like