Leveraging Metaheuristics For Feature Selection With Machine Learning Classification For Malicious Packet Detection in Computer Networks

Uploaded by

Drkmkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views20 pages

Leveraging Metaheuristics For Feature Selection With Machine Learning Classification For Malicious Packet Detection in Computer Networks

Uploaded by

Drkmkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Received 9 January 2024, accepted 30 January 2024, date of publication 5 February 2024, date of current version 14 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3362246

Leveraging Metaheuristics for Feature Selection

With Machine Learning Classification for
Malicious Packet Detection in
Computer Networks
AGANITH SHANBHAG1 , SHWETA VINCENT 2 , (Member, IEEE), S. B. BORE GOWDA 1,

OM PRAKASH KUMAR 1 , AND SHARMILA ANAND JOHN FRANCIS 3

1 Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka
576104, India
2 Department of Mechatronics, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
3 Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia

Corresponding authors: Shweta Vincent ([email protected]) and Om Prakash Kumar ([email protected])

ABSTRACT Robust Intrusion Detection Systems (IDS) are increasingly necessary in the age of big data
due to the growing volume, velocity, and variety of data generated by modern networks. Metaheuristic
algorithms offer a promising approach to enhance IDS performance in terms of optimal feature selection.
Combining these algorithms along with Machine learning (ML) for the creation of an IDS makes it possible
to improve detection accuracy, reduce false positives and negatives, and enhance the efficiency of network
monitoring. Our study proposes using metaheuristic algorithms along with machine learning classifiers for
feature selection to optimize the number of features from the data set of computer network traffic. We have
tested several combinations of algorithms viz., Genetic Algorithm (GA), Particle Swarm Optimization
(PSO) and Grey Wolf Optimizer (GWO) along with ML algorithms viz., Decision Tree (DT), Random
Forest (RF), Gaussian Naïve Bayes (GNB) and Logistic Regression (LR). The combinations of algorithms
have been tested over the NSS-KDD and kddcupdata_10% data sets. We have drawn several insights on
feature selection scores with respect to test scores, FI scores, recall and precision for various algorithm
combinations. The feature selection time has also been highlighted to showcase the fastest-performing
algorithm combinations. Ultimately, we have presented three combinations of algorithms depending on
organizational IDS requirements and provided separate solutions for each.

INDEX TERMS Feature selection, intrusion detection system, metaheuristic algorithms, space complexity,
time complexity.

I. INTRODUCTION and digital threats have rapidly increased in frequency and

The internet today has rapidly changed from what it had complexity. Robust Intrusion Detection Systems (IDS) that
initially started. Even with increased attention to protecting can handle big data are essential in today’s cybersecurity
electronic information, there are ample reasons for busi- landscape to ensure the accurate and efficient detection of
ness organizations, institutions, and the general public to security threats in large and complex networks. IDS protect
be concerned. More malware is being launched than ever computer networks from malicious attacks. Traditional ID
before. Cybersecurity is now a global priority as cybercrime systems faced limitations in their detection accuracy and effi-
ciency. Network-based Intrusion Detection Systems (NIDS)
The associate editor coordinating the review of this manuscript and are resource intensive. Therefore, an organization must plan
approving it for publication was Vicente Alarcon-Aquino . for the additional hardware to deploy and smoothly run in

2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 21745
A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

the network. The primary reason for being resource-intensive A. AUTHORS’ CONTRIBUTIONS
and requiring additional hardware is to model complex, • Firstly, our research presents various combinations of
time-intensive data models [1]. metaheuristic and ML algorithms for feature selection
An IDS is only as good as its signature library. If it is not from a computer network traffic dataset, for optimal
updated frequently, it will not register the latest attacks and detection of intruders.
cannot raise an alert [2]. Network Security Engineers who • Secondly, our research presents different ML classifiers
monitor the network traffic frequently update the classifier in tandem with the feature selection algorithms for clas-
model of an IDS. When new threats emerge, they update sification of intruder data. Several factors such as mean
the classifier model of the IDS by incorporating new rules, feature length, mean feature selection time etc. have
algorithms, or machine learning techniques to enhance its been extensively explored and presented.
detection capabilities. These models which are trained on • Lastly our research presents three use cases of combina-
massive network-based datasets, are generally resource in- tions of algorithms based on test score, F1 score, recall
tensive, and have time and space complexity issues [1]. and precision which could be used by three types of
Therefore, optimal feature selection to reduce the dimension- organizations based on their needs.
ality of the datasets is of prime importance for any IDS to be
The remainder of the paper is organized as follows.
able to detect and thwart threats in real time.
Section II describes related work. Section III introduces the
This study proposes integrating metaheuristic algorithms
three metaheuristic algorithms used in this study, Genetic
into an Intrusion Detection System (IDS), potentially improv-
Algorithm, Particle Swarm Optimization, and Grey Wolf
ing its performance and accuracy in detecting different types
Optimization Algorithm. There will be a brief discussion
of attacks. Metaheuristic algorithms are optimization tech-
on the working of these algorithms. Section IV presents an
niques that can search for the best solution in a large and
improved intrusion detection method based on the selection
complex search space. These algorithms are search-based
of the optimal feature subset and feature weighting. Section V
optimization techniques inspired by natural processes such
verifies the effectiveness of the proposed algorithms by
as evolution, swarm behavior, and genetics [3]. Reference [4]
comparing the experimental results with other methods of
have proposed the use of grey wolf and dipper throat opti-
intrusion detection, and Section VI presents conclusions.
mization for feature selection for IDS. Their results show an
increase in classification accuracy between the different types
of attacks, which would be beneficial for IoT systems. The II. RELATED WORK
authors of [5] have proposed the use of statistical measures Researchers in [9] worked towards finding the best rele-
such as Chi-squared test and Pearson correlation coefficient vant selected features to be used as essential features in a
in tandem with a modified Genetic algorithm for feature new IDS dataset using the six feature selection methods,
selection for the creation of the IDS. They have achieved a namely, Information Gain (IG), Gain Ratio (GR), Symmet-
high accuracy with minimum features selected for the IDS rical Un- certainty (SU), Relief-F (R-F), One-R (OR) and
creation using their algorithm. On the same lines as [5], the Chi-Square (CS).
authors of [6] have proposed the usage of a hybrid meta- In 2016, a study [10] highlighted the importance of feature
heuristic algorithm which uses artificial bee colony along selection in intrusion detection systems (IDS) to improve
with dragon fly algorithm for feature selection for the creation accuracy and performance. The study proposes a recursive
of the IDS. They have also obtained considerable results feature elimination mechanism and a decision tree-based
in classifying the attack and non-attack packets. The Tabu classifier to identify and eliminate irrelevant parts. Applying
search metaheuristic algorithm for feature selection along this approach to the NSL-KDD dataset results in signifi-
with Random forest for classification has been proposed by cant accuracy improvements. The NSL-KDD dataset is a
the authors of [7]. They claim to have reduced the false pos- benchmark for intrusion detection systems. These findings
itive rate considerably by their approach. Further the authors emphasize the value of feature selection in designing effective
of [8] have proposed a novel metaheuristic algorithm termed IDS.
Operational Crow Search algorithm for dimensionality reduc- An adaptive ensemble learning model named the Multi-
tion of the feature space and have used Recurrent Neural Tree algorithm is proposed [11], focusing on the NSL-KDD
Networks (RNN) for attack classification. dataset. The MultiTree algorithm adjusts the training data
Our paper proposes an optimized approach for detecting proportion and constructs multiple decision trees. A selec-
malicious packets by integrating metaheuristic algorithms tion of base classifiers such as decision tree, random forest,
into an Intrusion detection system. The proposed algorithm kNN, and DNN is employed to enhance the overall detec-
aims to improve accuracy and precision while reducing space tion effectiveness. An ensemble adaptive voting algorithm
and time complexity by integrating metaheuristic algorithms is also designed to improve detection accuracy further. It is
with existing machine learning classifier techniques. The important to note that data analysis reveals the critical role
experimental results demonstrate that this hybrid approach of data feature quality in determining detection effective-
outperforms existing classifiers, making it a promising ness. The identified limitation of the study conducted in
solution for IDS optimization. this paper pertains to the training and modeling process on