AI Based Threat Detection System - IEEE Report
AI Based Threat Detection System - IEEE Report
Appikonda Shyam Sai Venkata Agastya, Bandi Rishikesh Kumar, Dandu Sasi Sathvik Varma, Gangu Chirudeep,
Kavitha C. R.*
Department of Computer Science & Engineering
Amrita School of Computing, Bengaluru
Amrita Vishwa Vidyapeetham, India
*Corresponding Author: [email protected],
[email protected], [email protected],
[email protected], [email protected].
Abstract— With the rapid growth of network (DDoS) attacks and APT. These attacks are
cyber threats, there exists a growing need for potentially very damaging to the organizations,
advanced, scalable and highly accurate governments and individuals, and they are meant to
mechanisms for threat detection. AI based threat cause harm to the confidentiality, integrity and
detection system is presented in this paper that availability of critical data and systems.
uses machine learning and feature engineering
techniques for classifying network traffic as
normal to malicious.It leverages state of the art Current rule based intrusion detection systems (IDS)
algorithms including Gradient Boosted Trees and suffer from their inability to adapt to rapidly changing
a Multi Layer Perceptron, achieving high attack patterns and sheer wide breadth of inputted
accuracy with optimized preprocessing steps such network traffic. Due to the growing innovation of
as Principal Component Analysis and Chi Square attackers these systems have been facing challenges
feature selection.arning techniques and feature moving to detect novel threats, to handle huge scale
engineering methods to classify network traffic as data and generate fast response. In response to these
either normal or malicious. The system limitations, Artificial Intelligence (AI) and Machine
incorporates state-of-the-art algorithms, including learning (ML) are powerful tools that can potentially
Gradient Boosted Trees and Multi-Layer provide the ability to analyzise the complex patterns,
Perceptron achieving high accuracy through spot anomalies and adapt to the new attack strategy.
optimized preprocessing steps such as Principal
Component Analysis and Chi-Square feature In this paper, we present an AI Based Threat
selection. A Flask application and Python GUI are Detection System which is used to detect the network
utilized as a means to test the system via user traffic and classify it as normal or malicious traffic.
friendly interfaces for real time prediction and Range of machine learning algorithms, such as
validation. ensemble models and neural networks are used in the
system to identify known and unknown threats. To
improve model performance and computational
efficiency, advanced feature engineering techniques
Keywords— Network Threat Detection, Machine
are applied, including Principal Component Analysis
Learning, Gradient Boosted Trees, Multi-Layer
(PCA) for dimensionality reduction and Chi-Square
Perceptron, Feature Engineering, Principal
feature selection. These methods permit the system to
Component Analysis, Cybersecurity.
effectively process large datasets and identify crucial
I. INTRODUCTION 'feature characteristics' that are critical to the
identification of threats.
Digital networks and inter connected systems have
fastly expanded giving us the communications and The system is built to be both friendly to use, and
also gadgets more advancement. Unfortunately this scalable such that it integrates easily with testing
growth has also resulted in an alarming rise in the interfaces including a Flask based web application
sophistication of cyber threats from malware and and a Python GUI. They are further useful for
phishing, to large scale Distributed Denial of Service practical scenarios in cybersecurity for the following
reason: they enable the user to input manually strategic insights into how to defend against offensive
features and get real time predictions. AI, proposing ways to assess and mitigate these
threats when they emerge. Bo-Xiang Wang and Jiann-
By utilizing AI and ML techniques, this work seeks to
Liang Chen [4] built an AI powered network threat
address some challenges of modern cybersecurity
detection network extended with 52 features derived
landscape by introducing a new, robust and scalable
from network interactions including message based,
threat detection system.
host based and geography based data. The aim was to
prevent command line based threats at the remote
network connection and to get better detection
II. LITERATURE SURVEY accuracy and effectiveness. Nevertheless, this
comprehensive system was successful and it was
Growing threats of cyber have led to extensive
pointed out that greater advancement would have
research on AI driven and integrated systems for
been possible were more elements of optimization
cyber threats detection. These works consider diverse
entered into the system and if the detection
areas such as financial networks and smart
algorithms were refined. In Software – Defined
infrastructures, and are novel in the methodologies
Networking (SDN), Francesco Salatino et al. [5]
and technologies that they successfully apply to
proposed an intrusion detection system based on
enhance cybersecurity. This part summarizes detailed
artificial intelligence techniques for detecting
review of few important works in this domain, which
Distributed Denial of Service (DDoS) attacks. The
discusses the research gaps and its matters in
authors combine advanced Machine Learning (ML)
proposing project. Kuldeep Singh and Lakshmi
and Deep Learning (DL) methods to improve
Sevukamoorthy [1] suggested one possible method of
detection accuracy without increasing computational
strengthening security of such financial networks: the
complexity. A research gap is indicated by a need for
use of blockchain technology combined with AI. The
better feature analysis and selection to further reduce
challenge of increasing cyber threats to financial
computational requirements and improve the
institutions was addressed by the authors who pointed
scalability of the system. Android malware detection
out that cybersecurity frameworks must be robust.
was the main focus of Shamsher Ullah et al.[6]
One of their findings is the inability of existing
where they gave attention to the fact that the cyber
comprehensive frameworks that can adopt the
threats against Android devices are increasing
advantages of blockchain as well as AI in the
exponentially. Moreover, they pointed out that the
detection of threats. Their research fills the gap by
deficiencies in the performance of current machine
demonstrating that it is possible to build secure and
learning models with regard to transparency and
resilient system using immutable blockchain ledgers
interpretability were especially glaring when
and intelligent AI based threat detection mechanisms.
considering the explanatory AI (XAI). According to
Marc Schmitt[2] investigated AI based malware and
their study, XAI techniques come in handy in
intrusion detection in smart infrastructures and digital
demystifying the decision-making processes of ML
industries. It also highlighted the urgent need to
models and supply actionable insights for end users
protect environments that are getting more and more
and stakeholders. Sonu Preetam et al. [7] proposed a
interlinked against sophisticated cyber threats. But
behaviour based threat modelling approach with their
Schmitt pointed out the difficulties of bringing AI/ML
explanations for intelligent decision making. In order
models into internal digital ecosystems that are
to overcome the various issues associated with
complex. The gap identified suggests that solutions
traditional intrusion detection systems, the authors
that improve accuracy of detection do not have to
wanted to make them scalable and real time. A gap in
disrupt existing infrastructures or operations while
model development was demonstrated that integrates
seamlessly interfacing with them. Yisroel Mirsky et
diverse data sources, correlates tactics, techniques
al. [3] examined the threat of offensive AI
and procedures (TTPs) with advanced AI techniques,
highlighting how AI capable adversaries might
and ultimately delivers real time, explainable threat
exploit organizational systems vulnerabilities. This
detection.In the context of 5G networks, Thulitha
research presents a structured perspective on
Senevirathna et al.[8] investigated the vulnerabilities
offensive AI tactics taking for granted the cyber kill
of Explainable AI (XAI) methods in Network
chain and what it means to security. The authors
Intrusion Detection Systems (NIDS).s. The results of
found that the most glaring gap in their study
their study also showed the vulnerabilities of XAI
concerned the fact that this emerging threat lacked
methods towards scaffolding attacks and lack of
effective solutions to the problem of detecting such
sophisticate adversarial attacks. Jonghoon Lee et al.
[9] proposed an artificial neural networks based cyber
threat detection system using event profiles. The
challenge of analyzing vast amounts of security event
data where false positives are high and real world
threats are difficult to detect from those large
amounts of data was addressed. The identified gap
shows that existing methods in general are unable to
to generalize to multiple data sets well and do not
adequately reduce false alarms. Viraj Rathod et al.
[10] applied their AI and ML based anomaly
detection system to detect adversarial behaviors,
based on the EMBER dataset. They found the need
for systems with real time response mechanisms
coupled with AI driven anomaly detection, a space
that if properly filled in, would greatly improve
system responsiveness and accuracy in a dynamic
threat environment. Reviews of the literature show
that there has been excellent progress in the
development of AI based threat detection systems, but
critical gaps also still remain.The limitations include
a lack of integrated frameworks of blockchain and AI
[1], difficulty of seamlessly deploying AI/ML models
in complex ecosystems [2] and the demand of
transparency and interpretability in AI models [6,8].
Additional work is crucial, including advances in
feature analysis and selection processes [5], Figure 1 Architecture
generalization across different datasets [9], and real-
time response mechanisms [10] to alleviate current Implementation Flow
limitations. The proposed system addresses these
The architecture of the proposed methodology is
gaps in order to contribute to the science of
shown in figure 1. It starts with raw data collection,
cybersecurity through the development of an AI
follows through preprocessing, feature engineering,
based threat detection system specializing in
followed by model training. We integrate the model
scalability, interpretability, and real time response.
that performs the best into the interfaces for testing
This presented system will take advantage of the
and maintain a workflow from any data input to
strengths, and lessons learned in already existing
threat prediction.
works, as it will enhance the existing tools with the
new ideas to overcome the limitations in the present
approaches.
Dataset Description
III. METHODOLOGY
Network traffic data are contained in the dataset
The proposed AI Based Threat Detection System is making both categorical as well as numerical features
based on a systematic methodology comprising of such as protocol types, service ports, and connection
data preprocessing, feature engineering, machine flags. We label each instance as normal or malicious
learning model training and evaluation. Figure 1 so that supervised machine learning models can
illustrates the architecture of this methodology with classify threats. Our challenges include working with
the modular design and the flow between high dimensional data, unbalanced class distributions,
components. In this section we describe individually and heterogenous feature types.
what the steps are and the role in the end that they
play to get at accurate and efficient threat detection.
Data Preprocessing 3.Support Vector Machines (SVM): As binary
classification constructs, it is hyperplanes for high
Transforming raw or non structured data into
dimensional data.
structured format for machine learning algorithms is
called preprocessing. Tasks include : 4.Random Forest: A multiple decision trees ensemble
method for robust classification.
1.String Indexing: It transforms categorial attributes
(e.g. protocol type, service flags) into numerical 5.Decision Tree: This model will split data by using
indicies to match those that machine learning models the most important features at each step, it will be a
can handle. simple and interpretable model.
2.Handling Missing Values: Data integrity is 6.Naive Bayes: A quick baseline categorical data
maintained by either missing entries imputed or classifier and a probabilistic model effective for
removed. categorical data.
Feature Engineering
A variety of machine learning models are To validate the system, two user-friendly testing
implemented to classify network traffic accurately: interfaces are developed:
Figure 4 Comparison graph for accuracies on machine Results when PCA is performed
learning algorithms
Table 2 PCA Results
Algorithms
SVM 94.536489 94.536688 94.536489 94.536585
Decision Tree 96.844181 96.850596 96.844181 96.841893
Random Forest 96.469428 96.583230 96.469428 96.458673
MLP 99.408284 99.408357 99.408284 99.408305
Logistic 94.902066 94.902875 94.902066 94.900487
Regression
Gradient-Boosted 98.121814 98.133233 98.121814 98.120693
Tree
Figure 5 Comparison graph for Precision scores on MLP: Achieved high scores across all metrics,
machine learning algorithms including precision, recall, and F1-score,
demonstrating its ability to generalize well with
reduced dimensionality.
F1 Score
Accuracy Precision Recall
Algorithms
SVM 94.930966 94.936025 94.930966 94.932422
Decision Tree 97.100592 97.160715 97.100592 97.094559
Random Forest 96.646943 96.749411 96.646943 96.637386
MLP 99.447732 99.448701 99.447732 99.447827
Logistic
Regression 94.990138 94.994401 94.990138 94.986298
Gradient-Boosted
Tree 98.560158 98.560378 98.560158 98.560233
The AI Based Threat Detection System develops to [4] Wang, Bo-Xiang, Jiann-Liang Chen, and Chiao-
address the problem of developing advanced scalable, Lin Yu. "An ai-powered network threat detection
accurate mechanisms for network traffic anomalies system." IEEE Access 10 (2022): 54029-54037.
detection and classification in the presence of [5] Salatino, Francesco, et al. "Detecting DDoS
evolving cyber threats. The system employs state of Attacks Through AI driven SDN Intrusion Detection
art machine learning algorithms and robust feature System." 2024 IEEE 21st Consumer Communications
engineering methods, which yield high accuracy & Networking Conference (CCNC). IEEE, 2024.
across numerous configurations and show the system
to be an effective and flexible tool while in operation [6] Ullah, Shamsher, et al. "The revolution and vision
in the real world. of explainable AI for android malware detection and
protection." Internet of Things (2024): 101320.
Key results include evidence that raw data are
performing better than Gradient Boosted Trees (GBT) [7] Preetam, Sonu, et al. "An Approach for Intelligent
with 99.6% accuracy and that the Multi-layer Behaviour-Based Threat Modelling with
Perceptron (MLP) model is robust to Principal Explanations." 2023 IEEE Conference on Network
Component Analysis (PCA) and Chi-square feature Function Virtualization and Software Defined
selection. These results demonstrate the improvement Networks (NFV-SDN). IEEE, 2023.
of model accuracy, precision, recall and F1 scores
[8] Senevirathna, Thulitha, et al. "Deceiving Post-hoc
achieved by feature optimization.
Explainable AI (XAI) Methods in Network Intrusion
The usability of the system was further validated in Detection." 2024 IEEE 21st Consumer
the development of Flask and Python GUI interfaces
Communications & Networking Conference (CCNC).
IEEE, 2024.