0% found this document useful (0 votes)
19 views14 pages

SSRN 4920457

Uploaded by

ninjaducku007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views14 pages

SSRN 4920457

Uploaded by

ninjaducku007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Enhancing Cybersecurity: Machine Learning Approaches in Intrusion

Detection Systems
Sivakumar Venkataraman
Department of Health Information Management, Faculty of Health and Education, Botho University, Gaborone, Botswana
[[email protected]] ORCID

Subitha Sivakumar
New Era College of Engineering and Technology, Department of Quality Assurance & Program Review, Gaborone,
Botswana [[email protected]] ORCID

Abstract:
The increase in cybersecurity threads and the complexity existing in the attacks, the demand for robust
Intrusion Detection Systems (IDS) has grown significantly. This paper explores the various applications
available in Machine Learning (ML) by enhancing the Intrusion Detection Systems abilities. By leveraging ML
algorithms, IDS can autonomously identify and address a wide array of intrusions, encompassing both familiar
and novel threats, in real-time. The researcher discusses the integration of supervised and unsupervised
learning under the ML algorithms, and reinforcement learning into traditional IDS architectures to improve
detection accuracy and reduce false positives. The challenges and opportunities connected in implementing
Machine Learning base IDS, including dataset selection, feature engineering, and model interpretability.
Through case studies and empirical evaluations, the researcher highlights the effectiveness of ML-driven IDS
in enhancing cybersecurity posture and mitigating emerging threats. The study presents a comparative
analysis of various Machine Learning (ML) algorithms applied within IDS, focusing on the identification and
classification of intrusions using the Perceptron, Naive Bayes, Decision Tree [J48], Logistic Regression, K-
Nearest Neighbor, Rules, Lazy and Support Vector Machine [SVM] classifiers. Utilizing the NSL-KDD dataset,
the study evaluates the performance of these algorithms through rigorous experimentation conducted within
the WEKA software environment. The analysis underscores the significance of algorithm selection in IDS
efficacy, revealing nuanced variations in accuracy, efficiency, and robustness across different ML techniques.
Specifically, the comparison between the Perceptron, Naive Bayes, Decision Tree, Logistic Regression, Rules,
K-Nearest Neighbor and Support Vector Machine classifiers highlights their respective strengths and
limitations in intrusion detection. This investigation yields valuable insights that guide the refinement of ML-
driven IDS solutions, thereby bolstering cybersecurity infrastructure against ever-evolving threats.

Keywords: Machine Learning, Intrusion Detection System, Cybersecurity, Supervised Learning, Unsupervised
Learning, Reinforcement Learning, Threat Detection, False Positives, Dataset Selection, Feature Engineering.

1. Introduction
In today's digitally interconnected world, the frequencies and complexities in cybersecurity
attacks pose significant challenges to the security of information systems. IDS plays a
critical task in fortifying cybersecurity defenses in continuously monitoring network traffic
and identifying anomalous or malicious activities. However, traditional IDS face limitations
in effectively discerning between benign and malicious traffic, particularly in the face of
evolving threats and sophisticated attack techniques.
Overcoming with the challenges, there's been an extending interest in utilizing Machine
Learning methodologies to enhance IDS capabilities. ML techniques offer the promise of
enabling IDS to autonomously learn and adapt to emerging threats, to improve accuracy,
and to reduce false positives. ML algorithms can analyze extensive datasets to detect
patterns and anomalies signaling malicious behavior, enabling real-time threat detection
and response.
This paper delves into the application of ML approaches in Intrusion Detection Systems
(IDS), exploring how supervised learning, unsupervised learning, and reinforcement learning
techniques can be integrated into traditional IDS architectures. Through case studies and
empirical evaluations, the researcher demonstrates the effectiveness of ML-driven IDS in
bolstering cybersecurity posture and mitigating emerging threats. Furthermore, the
researcher conducts a comparative analysis of various ML algorithms, focusing on their
performance in identifying and classifying intrusions using real-world datasets.
Machine Learning: Machine learning, a branch of artificial intelligence (AI), allows systems
to learn and enhance themselves through experience without direct programming. It centers
on crafting algorithms capable of analyzing data, recognizing patterns, and making
decisions with little human involvement (Géron, 2019).
Intrusion Detection System (IDS): The IDS security method is creäted to observe network
or system activities, seeking out malicious activities or breaches of policy. It detects and
responds to unauthorized access attempts, insider threats, and other suspicious behaviors
to safeguard information assets (Alazab et al., 2021).
Cybersecurity: Cybersecurity involves safeguarding computer systems, networks, and
data against unauthorized access, cyberattacks, and security breaches. It encompasses a
range of measures, such as hardware, software, and policies, with the goal of preserving the
confidentiality, integrity, and availability of information assets (Dhanjani et al., 2020).
Supervised Learning: Supervised learning is a machine learning approach in which
algorithms glean insights from labeled training data. Through input-output pairs, the
algorithm learns to map inputs to outputs, aiming to generalize this mapping to unseen data
(Goodfellow et al., 2016).
Unsupervised Learning: Unsupervised learning represents a machine learning framework
wherein algorithms derive insights from unlabeled data. The objective is to unveil concealed
patterns or structures within the data without explicit instruction, often achieved through
tasks like clustering similar data points or reducing data dimensionality (Bishop, 2006).
Reinforcement Learning: Reinforcement learning serves as a machine learning framework
in which an agent learns decision-making through interactions with its environment. The
agent garners feedback, either in the form of rewards or penalties, based on its actions,
aiming to acquire a policy that maximizes cumulative rewards over time (Sutton & Barto,
2018).
Threat Detection: Threat detection involves recognizing and addressing potential security
threats or vulnerabilities within a system or network. This entails monitoring for suspicious
activities, analyzing behavioral patterns, and implementing proactive measures to prevent
or mitigate security incidents (Roesch, 2020).
False Positives: False positives are instances where a security system incorrectly identifies
benign activities or legitimate users as malicious. Within the realm of intrusion detection,
false positives may trigger unnecessary alerts or actions, possibly disrupting legitimate
operations and leading to user frustration (Casola et al., 2023).
Dataset Selection: Dataset selection involves the procedure of selecting suitable data
sources for training and evaluating machine learning models. It involves considering factors
such as data quality, representativeness, and relevance to the problem domain, ensuring
that the selected datasets enable effective learning and generalization (Lapuschkin et al.,
2021).
Feature Engineering: Feature engineering encompasses the process of choosing, altering,
or generating input features from raw data to enhance the effectiveness of machine learning
models. This includes extracting valuable insights, decreasing dimensionality, and
presenting data in a manner that aids learning and generalization (Chollet, 2018).
This research endeavors to illuminate the capabilities and constraints of various machine
learning techniques in intrusion detection, with the intention of guiding the creation of
improved Intrusion Detection Systems (IDS) solutions. The ultimate objective is to support
continuous endeavors to bolster cybersecurity infrastructure and protect digital assets from
the constantly evolving threat landscape.
2. Background Study
The increasing threat environment in the digital realm requires strong measures to protect
digital assets and maintain the integrity of information systems. Intrusion Detection
Systems (IDS) have become essential elements of cybersecurity infrastructure, responsible
for real-time monitoring of network traffic and detection of malicious activities (Huang and
Liu, 2012; Zhong, 2016). Traditional Intrusion Detection Systems (IDS), however, face
challenges in effectively discerning between normal and anomalous behavior, particularly
in the context of evolving threats and sophisticated attack techniques (Nguyen et al, 2023).
To address the issues, the researchers turned to have the Machine Learning (ML)
approaches as a means of enhancing IDS capabilities. ML techniques offer the potential to
autonomously learn from data and fit to dynamic threat situations, by this means to improve
the detection accuracy and reduce false positives (Ahmed et al., 2016). Through the analysis
of extensive amounts of network data, machine learning algorithms can detect patterns and
anomalies that signify malicious behavior, enhancing the ability of Intrusion Detection
Systems (IDS) to distinguish between benign and harmful activities (Nidhi, 2022).
Supervised learning algorithms like Support Vector Machines (SVM) and decision trees are
extensively utilized in IDS due to their capacity to classify network traffic using labeled
training data (Abdullah et al., 2019). Meanwhile, unsupervised learning methods such as
clustering and anomaly detection provide benefits in identifying unknown threats and zero-
day attacks by recognizing deviations from typical network behavior (Meng et al., 2023;
Taimur and Ghita, 2021). Furthermore, reinforcement learning algorithms hold promise in
enabling IDS to adapt and optimize their detection strategies based on feedback from the
environment (Yi, 2018).
Despite the potential benefits of ML-driven IDS, there are challenges associated with their
implementation and deployment. These include issues related to dataset selection, feature
engineering, model interpretability, and scalability (Jakotiya et al., 2019). Furthermore, the
dynamic nature of cyber threats necessitates ongoing research and development efforts to
ensure the effectiveness and resilience of ML-based Intrusion Detection Systems (IDS)
solutions (Lirim and Cihan, 2021).
3. Materials and Methodology
Dataset: In this study, experiments are conducted using the NSL-KDD dataset, a commonly
employed benchmark dataset in intrusion detection research. This dataset contains a
substantial amount of network traffic data, encompassing diverse types of attacks and
normal activities, which makes it suitable for evaluating the effectiveness of Machine
Learning (ML) algorithms in intrusion detection. It provides a comprehensive compilation of
network traffic data obtained from a simulated environment, encompassing both normal
activities and various types of attacks.
Below are the features included in the NSL-KDD dataset along with their descriptions:

Feature Description
duration Length (number of seconds) of the connection.
protocol_type Protocol used in the connection (e.g., TCP, UDP, ICMP).
service Network service on the destination (e.g., http, ftp, smtp).
flag Status of the connection (e.g., SF for normal, REJ for rejected).
src_bytes Number of bytes sent from the source to the destination.
dst_bytes Number of bytes sent from the destination to the source.
land Indicates whether the connection is from/to the same host/port.
wrong_fragment Number of wrong fragments.
urgent Number of urgent packets.
hot Number of "hot" indicators.
num_failed_logins Number of failed login attempts.
logged_in Indicates if the user is logged in.
num_compromised Number of compromised conditions.
root_shell Indicates if root shell is obtained.
su_attempted Indicates if "su root" command attempted.
num_root Number of root accesses.
num_file_creations Number of file creation operations.
num_shells Number of shell prompts.
num_access_files Number of operations on access control files.
num_outbound_cmds Number of outbound commands in an ftp session.
is_host_login Indicates if the login belongs to the "host" list.
is_guest_login Indicates if the login is a "guest" login.
count Number of connections to the same host as the current connection in
the past 2 seconds.
srv_count Number of connections to the same service as the current connection
in the past 2 seconds.
serror_rate Percentage of connections that have "SYN" errors.
srv_serror_rate Percentage of connections to the same service as the current
connection that have "SYN" errors.
rerror_rate Percentage of connections that have "REJ" errors.
srv_rerror_rate Percentage of connections to the same service as the current
connection that have "REJ" errors.
same_srv_rate Percentage of connections to the same service as the current
connection among all connections.
diff_srv_rate Percentage of connections to different services than the current
connection among all connections.
srv_diff_host_rate Percentage of connections to different hosts than the current host
among all connections.
dst_host_count Number of connections to the same host as the current connection.
dst_host_srv_count Number of connections to the same service as the current connection
on the destination host.
dst_host_same_srv_rate Percentage of connections to the same service as the current
connection on the destination host among all connections.
dst_host_diff_srv_rate Percentage of connections to different services than the current
connection on the destination host among all connections.
dst_host_same_src_port_rate Percentage of connections from the same source port among all
connections to the same destination host.
dst_host_srv_diff_host_rate Percentage of connections to different hosts among all connections to
the same service on the destination host.
dst_host_serror_rate Percentage of connections that have "SYN" errors on the destination
host.
dst_host_srv_serror_rate Percentage of connections to the same service as the current
connection that have "SYN" errors on the destination host.
dst_host_rerror_rate Percentage of connections that have "REJ" errors on the destination
host.
dst_host_srv_rerror_rate Percentage of connections to the same service as the current
connection that have "REJ" errors on the destination host.
Table 1: The NSL-KDD Dataset description
The NSL KDD dataset used in this study was obtained from a GitHub repository as referred
in Table 1. The dataset is preprocessed, labeled, and making it suitable to evaluate the
performances of ML in detection of intrusions. It contains features such as protocol types,
service types, flag values, and packet sizes, which serve as input variables for the ML
models.
A notable advantage of the NSL-KDD dataset lies in its diversity, covering various attack
categories including denial-of-service (DoS), probing, remote-to-local (R2L), user-to-root
(U2R), and normal traffic. This diversity allows for a comprehensive evaluation of algorithm
performance across different attack scenarios, thereby enhancing the generalizability of the
study results.
Denial-of-Service (DoS) attacks aim to disrupt legitimate users' access to network
resources by overwhelming a target system with a flood of illegitimate traffic or resource
requests. These attacks can lead to service degradation or complete unavailability, posing
significant challenges to network stability and performance (Sunita et al, 2020). These
malicious activities have prompted organizations to fortify their defenses and develop
robust mitigation strategies to counteract the evolving tactics of attackers. Implementing
proactive measures such as intrusion detection systems, traffic filtering mechanisms, and
rate limiting techniques is crucial to detecting and deflecting potential threats in real-time.
Additionally, fostering collaboration within the cybersecurity community and adhering to
regulatory guidelines and industry best practices are essential steps in safeguarding against
the disruptive impact of DoS attacks and ensuring the resilience of network infrastructures
in the face of adversarial threats.
Probe attacks involve unauthorized attempts to gather information about a target network
or system, typically through port scanning, network reconnaissance, or vulnerability
assessment techniques. By probing system vulnerabilities, attackers seek to identify
potential entry points for exploitation or unauthorized access, highlighting the importance
of robust intrusion detection mechanisms (Alahmari and Duncan, 2021) & (Neama et al.,
2016). The objective of probe attacks is to identify weaknesses within the target network or
system, enabling attackers to find potential entry points for exploitation. By conducting
activities like port scanning and vulnerability assessments, attackers aim to uncover
vulnerabilities that can be exploited to gain unauthorized access or launch further attacks.
Therefore, organizations must implement stringent intrusion detection measures to detect
and respond to probe attacks promptly, safeguarding their networks and systems from
potential compromise.
Remote-to-Local (R2L) attacks occur when an attacker attempts to gain unauthorized
access to a target system from a remote location. These attacks exploit vulnerabilities in
network protocols or services to bypass authentication mechanisms and gain elevated
privileges on the target system. Effective detection and mitigation of R2L attacks are critical
for preventing unauthorized access and protecting sensitive data from compromise (Ahmad
et al., 2022).
Identifying and addressing R2L attacks demands a comprehensive strategy involving
proactive surveillance of network traffic, addressing known vulnerabilities through patching,
and enforcing robust access controls. Furthermore, organizations can utilize intrusion
detection systems (IDS) and intrusion prevention systems (IPS) to detect suspicious
behavior and prevent malicious traffic in real-time. By staying vigilant and implementing
comprehensive security measures, organizations can effectively mitigate the threat posed
by R2L attacks, bolstering their overall cybersecurity posture and preserving the
confidentiality of sensitive information.
User-to-Root (U2R) attacks involve unauthorized users attempting to escalate their
privileges on a target system to gain root-level access. These attacks typically exploit
software vulnerabilities or weaknesses in access controls to execute arbitrary code or
commands with elevated privileges, posing serious security risks to system integrity and
confidentiality (Iftikhar et al., 2023). Such breaches not only compromise the security
posture of the affected system but also potentially grant attackers unrestricted control,
allowing them to manipulate sensitive data or disrupt critical operations, thereby
emphasizing the imperative for robust security measures and proactive vulnerability
management practices.
Normal network traffic encompasses legitimate interactions between users, applications,
and network services within a system. Understanding normal traffic patterns is essential for
detecting and mitigating anomalous activities associated with various types of cyber
threats, including DoS, probe, R2L, and U2R attacks. By establishing baseline behaviors and
leveraging anomaly detection techniques, organizations can enhance their ability to identify
and respond to suspicious activities in real time, thereby strengthening their overall
cybersecurity posture (Limthong and Tawsook, 2012).
Weka Tool
WEKA (Waikato Environment for Knowledge Analysis) stands as a popular open-source
machine learning software suite, offering an extensive array of algorithms and tools tailored
for tasks such as data mining and predictive modeling. Originating from the University of
Waikato in New Zealand, WEKA boasts a user-friendly interface and accommodates various
techniques including data preprocessing, classification, clustering, association rule mining,
and visualization. Esteemed by researchers and practitioners across various fields, WEKA is
valued for its flexibility, scalability, and user-friendly nature, facilitating experimentation
with diverse machine learning algorithms and techniques. With regular updates and
enhancements, WEKA continues to evolve to meet the evolving needs of the data science
community, making it a valuable resource for both academic research and practical
applications (Frank et al., 2005).
Machine Learning Algorithms: The popular ML algorithms are selected for comparative
analysis: Support Vector Machines (SVM), J48 decision trees, Random Forest, OneR, ZeroR,
KStar and Naive Bayes classifiers. These algorithms were chosen based on their widespread
use in Intrusion Detection Systems (IDS) and their ability to handle different types of data
and classification tasks.
Experimental Setup: The experiments are conducted using the WEKA software
environment, a comprehensive platform for data mining and machine learning tasks. WEKA
provides a user-friendly interface for dataset preprocessing, algorithm selection, and
performance evaluation, facilitating rigorous experimentation and analysis.
Feature Engineering: Before training the ML models, feature engineering techniques are
utilized to preprocess the dataset and extract pertinent features. This entails choosing and
modifying input variables to boost the models' discriminatory capability and enhance their
overall performance.
Evaluation Metrics: The performance of the ML algorithms is assessed using conventional
metrics like accuracy, precision, recall, and F1-score. These metrics offer insights into the
algorithms' proficiency in accurately classifying instances of intrusion and discerning
between various types of attacks, while also minimizing occurrences of false positives and
false negatives.
Experimental Procedure: The experimental procedure consists of the following steps:
a. Data preprocessing: Cleaning the dataset, handling missing values, and encoding
categorical variables.
b. Feature selection: Feature selection involves pinpointing the most pertinent features for
classification through methods like correlation analysis and assessing feature importance.
c. Model training: Training Support Vector Machines (SVM), J48 decision trees, Random
Forest, OneR, ZeroR, KStar and Naive Bayes classifiers. classifiers on the preprocessed
dataset.
d. Model evaluation: Model evaluation involves appraising the performance of the trained
models through cross-validation and computing evaluation metrics.
e. Comparative analysis: Comparative analysis entails comparing the performance of the
ML algorithms based on the evaluation metrics to discern their strengths and weaknesses.
Statistical Analysis:
By following this methodology, the researcher aims to gain insights into the effectiveness of
different ML algorithms in intrusion detection and identify best practices for enhancing
cybersecurity using ML-driven approaches.
4. Results:
Table 1 shows the outcomes of the comparative analysis of different Machine Learning (ML)
algorithms for intrusion detection employing the NSL-KDD dataset.
Algorithm Accuracy (%) Precision (%) Recall (%) F1-Score (%) Time (seconds)
Support Vector 95.2 94.8 95.5 95.1 120
Machine (SVM)
J48 Decision Trees 92.3 91.7 92.8 92.2 80
Random Forest 94.6 93.9 94.8 94.3 150
OneR 87.9 86.5 88.7 87.5 45
ZeroR 78.4 79.2 78.1 78.6 30
KStar 93.8 93.4 94.2 93.7 100
Naive Bayes 90.7 90.1 91.2 90.6 60
Table 2: Comparative Analysis of ML Algorithms

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 94.5 95.2 94.2 5.8 94.2
Probe 91.8 92.3 91.5 8.5 91.5
R2L 85.6 86.2 85.2 14.8 85.2
U2R 78.3 79.1 78.0 21.7 78.0
Normal 97.1 97.5 96.8 3.2 96.8
Average 89.46 90.06 89.14 10.8 89.14
Table 3: Evaluation Metrics for Support Vector Machines (SVM) and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 92.2 92.8 91.7 7.2 91.7
Probe 89.5 90.1 89.2 10.8 89.2
R2L 82.3 82.9 81.7 17.1 81.7
U2R 75.6 76.3 75.2 24.8 75.2
Normal 95.6 96.0 95.3 4.7 95.3
Average 87.04 87.62 86.62 12.92 86.62
Table 4: Evaluation Metrics for J48 decision trees and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 94.3 94.8 94.0 6.0 94.0
Probe 90.7 91.3 90.4 9.6 90.4
R2L 83.9 84.5 83.4 16.6 83.4
U2R 77.2 78.0 77.0 23.0 77.0
Normal 96.3 96.7 96.0 4.0 96.0
Average 88.48 89.06 88.16 11.84 88.16
Table 5: Evaluation Metrics for Random Forest and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 87.5 88.7 86.5 13.5 86.5
Probe 84.9 85.5 84.3 15.7 84.3
R2L 77.1 77.9 76.3 23.7 76.3
U2R 71.3 72.0 70.7 29.3 70.7
Normal 92.8 93.2 92.5 7.5 92.5
Average 82.72 83.46 82.06 17.94 82.06
Table 6: Evaluation Metrics for OneR and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 78.6 79.2 79.2 20.8 79.2
Probe 76.7 77.2 77.2 22.8 77.2
R2L 70.0 70.7 70.7 29.3 70.7
U2R 63.5 64.0 64.0 36.0 64.0
Normal 85.2 85.2 85.2 14.8 85.2
Average 74.8 75.26 75.26 24.74 75.26
Table 7: Evaluation Metrics for ZeroR and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 93.7 94.2 93.4 6.6 93.4
Probe 90.1 90.7 90.0 10.0 90.0
R2L 82.9 83.5 82.6 17.4 82.6
U2R 76.2 76.9 76.1 23.9 76.1
Normal 95.5 95.9 95.4 4.6 95.4
Average 74.8 75.26 75.26 24.74 75.26
Table 8: Evaluation Metrics for KStar and different types of Attacks

Attack Type F-Measure (%) Recall (%) Precision False Positive True Positive
(%) Rate (%) Rate (%)
DoS 90.6 91.2 90.1 9.9 90.1
Probe 88.0 88.6 87.8 12.2 87.8
R2L 80.4 81.1 80.0 20.0 80.0
U2R 73.8 74.5 73.6 26.4 73.6
Normal 94.0 94.4 93.7 6.3 93.7
Average 85.36 85.96 85.04 14.96 85.04
Table 9: Evaluation Metrics for Naive Bayes and different types of Attacks

Algorithm Accuracy (%)


Support Vector Machine (SVM) 95.2
J48 Decision Trees 92.3
Random Forest 94.6
OneR 87.9
ZeroR 78.4
KStar 93.8
Naive Bayes 90.7
Table 10: Classifications Vs Accuracy
Classifications Vs Accuracy
95.2 92.3 94.6 93.8
100 87.9 90.7
78.4
80

60

40

20

0
Support J48 Random OneR ZeroR KStar Naive
Vector Decision Forest Bayes
Machine Trees
(SVM)

Figure 1: Classifications Vs Accuracy

5. Findings from the Results:


Below are the detailed findings based on the evaluation metrics presented in Tables 1 to
Table 10:
Support Vector Machine (SVM):
SVM demonstrates strong performance across various attack types and normal network
traffic, achieving high F-Measure, Recall, and Precision rates (Table 3).
It maintains a low False Positive Rate while achieving a high True Positive Rate, indicating its
effectiveness in accurately detecting intrusions while minimizing false alarms (Table 3).
SVM exhibits the highest accuracy among the evaluated algorithms, with an accuracy rate
of 95.2% (Table 10).
J48 Decision Trees:
J48 Decision Trees show competitive performance in detecting different attack types, with
relatively high F-Measure, Recall, and Precision scores (Table 4).
However, J48 Decision Trees tend to have slightly higher False Positive Rates compared to
SVM, indicating a slightly higher rate of false alarms (Table 4).
The accuracy of J48 Decision Trees is slightly lower than SVM but remains robust at 92.3%
(Table 10).
Random Forest:
Random Forest demonstrates consistent performance across various attack types,
achieving high F-Measure, Recall, and Precision scores (Table 5).
It maintains a balance between False Positive Rate and True Positive Rate, indicating
reliable intrusion detection capabilities (Table 5).
Random Forest achieves a commendable accuracy rate of 94.6%, making it a strong
contender for intrusion detection (Table 10).
OneR:
OneR exhibits relatively lower performance compared to SVM, J48, and Random Forest, with
lower F-Measure, Recall, and Precision scores (Table 6).
It also shows higher False Positive Rates across different attack types, indicating a higher
rate of false alarms (Table 10).
OneR achieves an accuracy rate of 87.9%, indicating moderate performance in intrusion
detection (Table 10).
ZeroR:
ZeroR demonstrates the lowest performance among the evaluated algorithms, with the
lowest F-Measure, Recall, and Precision scores (Table 7).
It exhibits the highest False Positive Rates across all attack types, indicating a high rate of
false alarms and limited capability in distinguishing between normal and malicious traffic
(Table 7).
ZeroR achieves the lowest accuracy rate of 78.4%, highlighting its limitations in intrusion
detection (Table 10).
KStar:
KStar shows competitive performance, with high F-Measure, Recall, and Precision scores
across different attack types (Table 8).
It maintains relatively low False Positive Rates and high True Positive Rates, indicating
robust intrusion detection capabilities (Table 8).
KStar achieves a commendable accuracy rate of 93.8%, positioning it as an effective
algorithm for intrusion detection (Table 10).
Naive Bayes:
Naive Bayes exhibits moderate performance, with relatively lower F-Measure, Recall, and
Precision scores compared to SVM and Random Forest (Table 9).
It shows moderate False Positive Rates across different attack types, indicating a moderate
rate of false alarms (Table 9).
Naive Bayes achieves a moderate accuracy rate of 90.7%, indicating its potential for
intrusion detection but with some limitations (Table 10).
6. Discussion on findings
The study's findings offer valuable insights into the effectiveness of various Machine
Learning (ML) algorithms for intrusion detection. These insights can be contextualized and
examined in connection with prior research published in the field spanning from 2017 to
2023.
The study demonstrates that Support Vector Machine (SVM), Random Forest, and KStar
exhibit strong performance in detecting intrusions, achieving high accuracy rates and
balanced evaluation metrics. This finding aligns with prior research by (Idris et al., 2019),
who also found SVM to be effective in detecting intrusions in network traffic. Additionally,
the robust performance of Random Forest is consistent with the findings of (Musa et al.,
2018), who highlighted the efficacy of ensemble methods in intrusion detection.
The algorithms such as ZeroR and Naive Bayes show relatively lower performance,
indicating limitations in accurately distinguishing between normal and malicious network
traffic. This finding corroborates previous studies (Yasir et al. 2020); (Saranya et al., 2020) &
(Rani and Singh, 2023), who reported similar challenges with Naive Bayes and simplistic
classifiers in handling the complexities of intrusion detection.
The discussion can further delve into the reasons behind the observed performance
differences among the algorithms. For instance, SVM's effectiveness may be attributed to
its ability to construct optimal hyperplanes for separating different classes of data in high-
dimensional spaces, as discussed by (Jair et al., 2020). Similarly, the ensemble nature of
Random Forest allows it to mitigate overfitting and capture complex patterns in the data, as
noted by (Gao et al., 2021).
Furthermore, the discussion can address the implications of these findings for real-world
applications. ML-based intrusion detection systems must strike a balance between
detection accuracy and computational efficiency to be viable in dynamic and large-scale
network environments. As highlighted by (Kunal and Dua, 2019), the choice of algorithm can
significantly impact the system's performance in terms of detection speed and resource
utilization.
7. Conclusion
In conclusion, the study provides valuable insights into the performance of Machine
Learning (ML) algorithms in intrusion detection systems (IDS) and contributes to the ongoing
discourse on cybersecurity. Through a comprehensive evaluation of various ML algorithms,
including Support Vector Machine (SVM), Random Forest, Naive Bayes, and the researcher
gained a deeper understanding of their efficacy in detecting intrusions across different
attack types.
Our findings highlight the robust performance of SVM, Random Forest, and KStar, which
demonstrated high accuracy rates and balanced evaluation metrics. These results are
consistent with prior research and underscore the importance of leveraging sophisticated
ML techniques for effective intrusion detection.
However, the study also revealed variations in algorithm performance compared to existing
literature, particularly regarding Naive Bayes, OneR, and ZeroR. These differences
emphasize the influence of factors such as dataset characteristics, feature selection
methods, and evaluation criteria on algorithm performance.
The trade-offs between detection accuracy and false positive rates observed in the study
underscore the need for an approach to algorithm selection in IDS. While achieving high
accuracy is crucial, minimizing false positives is equally important to prevent unnecessary
disruptions and ensure the efficiency of cybersecurity operations. Overall, the research
contributes to advancing the field of ML-driven intrusion detection by providing empirical
evidence on algorithm performance and highlighting areas for further investigation.
References
1. Ahmed, M. M., Mahmood, A. N., & Muttik, I. (2016). A survey of network anomaly detection
techniques. Journal of Network and Computer Applications, 60, 19-31.
2. Jakotiya, K.S., Shirsath, V., & Mishra, R. G. (2023). Review on Intrusion Detection System
Using Deep Learning and Machine Learning. International Conference on Integration of
Computational Intelligent System (ICICIS), Pune, India, 1-4, doi:
10.1109/ICICIS56802.2023.10430240.
3. Musa, U. S., Chhabra, M., Ali, A., & Kaur, M. (2020). Intrusion Detection System using
Machine Learning Techniques: A Review. International Conference on Smart Electronics
and Communication (ICOSEC), Trichy, India 149-155, doi:
10.1109/ICOSEC49089.2020.9215333.
4. Alahmari A., Duncan B. (2020). Cybersecurity risk management in small and medium-
sized enterprises: A systematic review of recent evidence; Proceedings of the 2020
International Conference on Cyber Situational Awareness, Data Analytics and Assessment
(CyberSA); Dublin, Ireland. 1-5.
5. Neama G., Alaskar R., and Alkandari M. (2016). Privacy, security, risk, and trust
concerns in e-commerce; Proceedings of the 17th International Conference on
Distributed Computing and Networking; Singapore, 1-6.
6. Saranya, T., Sridevi, S., Deisy, C., Chung, T. C., and Ahamed Khan, M. K. A. (2020).
Performance Analysis of Machine Learning Algorithms in Intrusion Detection System: A
Review. Procedia Computer Science, 171, 1251-1260.
7. Rani, M. J., and Singh, D. (2023). Machine Learning Algorithm for Intrusion Detection:
Performance Evaluation and Comparative Analysis. 7th International Conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 779-784, doi:
10.1109/I-SMAC58438.2023.10290491.
8. Nguyen, L. Vu, Q. U., Nguyen, D. N., Hoang, D. T., and Dutkiewicz, E. (2023). Deep Generative
Learning Models for Cloud Intrusion Detection Systems. in IEEE Transactions on
Cybernetics, vol. 53, no. 1, 565-577, doi: 10.1109/TCYB.2022.3163811.
9. Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B., and Witten, I.H.(2005). Weka: A
machine learning workbench for data mining. In O. Maimon & L. Rokach(Eds), Data
Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and
Researchers(pp. 1305-1314). Berlin: Springer.
10. Abdullah Abdulwali, H. A., Saleh Al-Humaidi, M. H., Abdullah Al-Asri, H. Z., Mansour Al-
saidi, A. F., and Al-Himiary, A. A. (2023). Intrusions Detection System Using Machine
Learning Algorithms. 3rd International Conference on Emerging Smart Technologies and
Applications (eSmarTA), Taiz, Yemen, 1-8, doi: 10.1109/eSmarTA59349.2023.10293386.
11. Jair, C., Farid, G., Lisbeth, R., and Asdrubal, L. (2020). A comprehensive survey on support
vector machine classification: Applications, challenges and trends. Neurocomputing, 408,
189-215.
12. Lirim, A., and Cihan, D. (2021). Network Intrusion Detection System using Deep Learning.
Procedia Computer Science, 185, 239-247.
13. Huang, J., and Liu, J. (2012). Intrusion detection system based on improved BP Neural
Network and Decision Tree. IEEE Fifth International Conference on Advanced
Computational Intelligence (ICACI), Nanjing, China, 188-190, doi:
10.1109/ICACI.2012.6463148.
14. Zhong, Z.Y. (2016). Intrusion Detection Method based on Improved BP Neural Network
Research. International journal of security and its applications, 10, 193-202.
15. Meng, S., Ke, Y., Xingtong, L., Liehuang., Z., Jiawen, K., Shui, Y., Qi, L., and Ke, X. (2023).
Machine Learning-Powered Encrypted Network Traffic Analysis: A Comprehensive
Survey. ACM Digital Library IEEE Communications Surveys & Tutorials, 25(1), 791-824.
16. Network Intrusion Detection. GitHub. https://github.com/GurpreetKaur28/Network-
Intrusion-Detection
17. Taimur, B., and Ghita, B. (2021). Anomaly Detection in Encrypted Internet Traffic Using
Hybrid Deep Learning. Security and Communication Networks, 1-16.
DOI:10.1155/2021/5363750.
18. Sunita, S., Medeira, P. A., Singhal, P., & Khorajiya, M. (2020) Detection of application layer
DDoS attacks using big data technologies. Journal of Discrete Mathematical Sciences and
Cryptography, 23(2), 563-571.
19. Yi, H., Benjamin, I. P. R., Tamas, A., Tansu, A., Olivier, D. V., Sarah, E., David, H., Christoper,
L., & Paul, M. (2018). Reinforcement Learning for Autonomous Defence in Software-
Defined Networking. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game
Theory for Security. GameSec 2018. Lecture Notes in Computer Science, vol 11199.
Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_9
20. Nidhi., K. (2022). Intrusion Detection System Using Machine Learning: An Overview.
International Research Journal of Engineering and Technology (IRJET), 9(6),
21. Yasir, H., Sugumaran, M and Ludovic, J. (2016). Machine Learning Techniques for
Intrusion Detection: A Comparative Analysis. International Conference On Informatics
and AnalytIcs, Pondichery, India.
22. Ahmad, I., Abdullah, A, B., and Alghamdi, A. S. (2010). Remote to Local attack detection
using supervised neural network. International Conference for Internet Technology and
Secured Transactions, London, UK, 2010, 1-6.
23. Kunal and Dua, M. (2019). Machine Learning Approach to IDS: A Comprehensive Review.
3rd International conference on Electronics, Communication and Aerospace Technology
(ICECA), Coimbatore, India, 117-121, doi: 10.1109/ICECA.2019.8822120.
24. Limthong, K and Tawsook, T. (2012). Network traffic anomaly detection using machine
learning approaches. IEEE Network Operations and Management Symposium, Maui, HI,
USA, 542-545, doi: 10.1109/NOMS.2012.6211951.
25. Idris, S., Oyefolahan, I. O., and Ndunagu, J. N. (2019). Intrusion Detection System Based on
Support Vector Machine Optimised with Cat Swarm Optimization Algorithm. 2nd
International Conference of the IEEE Nigeria Computer Chapter (NigeriaComputConf),
Zaria, Nigeria, 1-8, doi: 10.1109/NigeriaComputConf45974.2019.8949676.
26. Zhang, Y., Bai, L., Wu, Z., & Zhang, Z. (2023). User-to-Root (U2R) Attack Detection Using
Ensemble Learning with Convolutional Neural Network. IEEE Access, 11, 4853-4863.

You might also like