0% found this document useful (0 votes)
22 views20 pages

Leveraging Metaheuristics For Feature Selection With Machine Learning Classification For Malicious Packet Detection in Computer Networks

Uploaded by

Drkmkr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views20 pages

Leveraging Metaheuristics For Feature Selection With Machine Learning Classification For Malicious Packet Detection in Computer Networks

Uploaded by

Drkmkr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Received 9 January 2024, accepted 30 January 2024, date of publication 5 February 2024, date of current version 14 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3362246

Leveraging Metaheuristics for Feature Selection


With Machine Learning Classification for
Malicious Packet Detection in
Computer Networks
AGANITH SHANBHAG1 , SHWETA VINCENT 2 , (Member, IEEE), S. B. BORE GOWDA 1,

OM PRAKASH KUMAR 1 , AND SHARMILA ANAND JOHN FRANCIS 3


1 Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka
576104, India
2 Department of Mechatronics, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
3 Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia

Corresponding authors: Shweta Vincent ([email protected]) and Om Prakash Kumar ([email protected])

ABSTRACT Robust Intrusion Detection Systems (IDS) are increasingly necessary in the age of big data
due to the growing volume, velocity, and variety of data generated by modern networks. Metaheuristic
algorithms offer a promising approach to enhance IDS performance in terms of optimal feature selection.
Combining these algorithms along with Machine learning (ML) for the creation of an IDS makes it possible
to improve detection accuracy, reduce false positives and negatives, and enhance the efficiency of network
monitoring. Our study proposes using metaheuristic algorithms along with machine learning classifiers for
feature selection to optimize the number of features from the data set of computer network traffic. We have
tested several combinations of algorithms viz., Genetic Algorithm (GA), Particle Swarm Optimization
(PSO) and Grey Wolf Optimizer (GWO) along with ML algorithms viz., Decision Tree (DT), Random
Forest (RF), Gaussian Naïve Bayes (GNB) and Logistic Regression (LR). The combinations of algorithms
have been tested over the NSS-KDD and kddcupdata_10% data sets. We have drawn several insights on
feature selection scores with respect to test scores, FI scores, recall and precision for various algorithm
combinations. The feature selection time has also been highlighted to showcase the fastest-performing
algorithm combinations. Ultimately, we have presented three combinations of algorithms depending on
organizational IDS requirements and provided separate solutions for each.

INDEX TERMS Feature selection, intrusion detection system, metaheuristic algorithms, space complexity,
time complexity.

I. INTRODUCTION and digital threats have rapidly increased in frequency and


The internet today has rapidly changed from what it had complexity. Robust Intrusion Detection Systems (IDS) that
initially started. Even with increased attention to protecting can handle big data are essential in today’s cybersecurity
electronic information, there are ample reasons for busi- landscape to ensure the accurate and efficient detection of
ness organizations, institutions, and the general public to security threats in large and complex networks. IDS protect
be concerned. More malware is being launched than ever computer networks from malicious attacks. Traditional ID
before. Cybersecurity is now a global priority as cybercrime systems faced limitations in their detection accuracy and effi-
ciency. Network-based Intrusion Detection Systems (NIDS)
The associate editor coordinating the review of this manuscript and are resource intensive. Therefore, an organization must plan
approving it for publication was Vicente Alarcon-Aquino . for the additional hardware to deploy and smoothly run in

2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 21745
A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

the network. The primary reason for being resource-intensive A. AUTHORS’ CONTRIBUTIONS
and requiring additional hardware is to model complex, • Firstly, our research presents various combinations of
time-intensive data models [1]. metaheuristic and ML algorithms for feature selection
An IDS is only as good as its signature library. If it is not from a computer network traffic dataset, for optimal
updated frequently, it will not register the latest attacks and detection of intruders.
cannot raise an alert [2]. Network Security Engineers who • Secondly, our research presents different ML classifiers
monitor the network traffic frequently update the classifier in tandem with the feature selection algorithms for clas-
model of an IDS. When new threats emerge, they update sification of intruder data. Several factors such as mean
the classifier model of the IDS by incorporating new rules, feature length, mean feature selection time etc. have
algorithms, or machine learning techniques to enhance its been extensively explored and presented.
detection capabilities. These models which are trained on • Lastly our research presents three use cases of combina-
massive network-based datasets, are generally resource in- tions of algorithms based on test score, F1 score, recall
tensive, and have time and space complexity issues [1]. and precision which could be used by three types of
Therefore, optimal feature selection to reduce the dimension- organizations based on their needs.
ality of the datasets is of prime importance for any IDS to be
The remainder of the paper is organized as follows.
able to detect and thwart threats in real time.
Section II describes related work. Section III introduces the
This study proposes integrating metaheuristic algorithms
three metaheuristic algorithms used in this study, Genetic
into an Intrusion Detection System (IDS), potentially improv-
Algorithm, Particle Swarm Optimization, and Grey Wolf
ing its performance and accuracy in detecting different types
Optimization Algorithm. There will be a brief discussion
of attacks. Metaheuristic algorithms are optimization tech-
on the working of these algorithms. Section IV presents an
niques that can search for the best solution in a large and
improved intrusion detection method based on the selection
complex search space. These algorithms are search-based
of the optimal feature subset and feature weighting. Section V
optimization techniques inspired by natural processes such
verifies the effectiveness of the proposed algorithms by
as evolution, swarm behavior, and genetics [3]. Reference [4]
comparing the experimental results with other methods of
have proposed the use of grey wolf and dipper throat opti-
intrusion detection, and Section VI presents conclusions.
mization for feature selection for IDS. Their results show an
increase in classification accuracy between the different types
of attacks, which would be beneficial for IoT systems. The II. RELATED WORK
authors of [5] have proposed the use of statistical measures Researchers in [9] worked towards finding the best rele-
such as Chi-squared test and Pearson correlation coefficient vant selected features to be used as essential features in a
in tandem with a modified Genetic algorithm for feature new IDS dataset using the six feature selection methods,
selection for the creation of the IDS. They have achieved a namely, Information Gain (IG), Gain Ratio (GR), Symmet-
high accuracy with minimum features selected for the IDS rical Un- certainty (SU), Relief-F (R-F), One-R (OR) and
creation using their algorithm. On the same lines as [5], the Chi-Square (CS).
authors of [6] have proposed the usage of a hybrid meta- In 2016, a study [10] highlighted the importance of feature
heuristic algorithm which uses artificial bee colony along selection in intrusion detection systems (IDS) to improve
with dragon fly algorithm for feature selection for the creation accuracy and performance. The study proposes a recursive
of the IDS. They have also obtained considerable results feature elimination mechanism and a decision tree-based
in classifying the attack and non-attack packets. The Tabu classifier to identify and eliminate irrelevant parts. Applying
search metaheuristic algorithm for feature selection along this approach to the NSL-KDD dataset results in signifi-
with Random forest for classification has been proposed by cant accuracy improvements. The NSL-KDD dataset is a
the authors of [7]. They claim to have reduced the false pos- benchmark for intrusion detection systems. These findings
itive rate considerably by their approach. Further the authors emphasize the value of feature selection in designing effective
of [8] have proposed a novel metaheuristic algorithm termed IDS.
Operational Crow Search algorithm for dimensionality reduc- An adaptive ensemble learning model named the Multi-
tion of the feature space and have used Recurrent Neural Tree algorithm is proposed [11], focusing on the NSL-KDD
Networks (RNN) for attack classification. dataset. The MultiTree algorithm adjusts the training data
Our paper proposes an optimized approach for detecting proportion and constructs multiple decision trees. A selec-
malicious packets by integrating metaheuristic algorithms tion of base classifiers such as decision tree, random forest,
into an Intrusion detection system. The proposed algorithm kNN, and DNN is employed to enhance the overall detec-
aims to improve accuracy and precision while reducing space tion effectiveness. An ensemble adaptive voting algorithm
and time complexity by integrating metaheuristic algorithms is also designed to improve detection accuracy further. It is
with existing machine learning classifier techniques. The important to note that data analysis reveals the critical role
experimental results demonstrate that this hybrid approach of data feature quality in determining detection effective-
outperforms existing classifiers, making it a promising ness. The identified limitation of the study conducted in
solution for IDS optimization. this paper pertains to the training and modeling process on

21746 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

noisy data, which needs a comprehensive feature selection metaheuristic algorithms. The primary objective is to
approach. This deficiency has been specifically addressed in comprehensively analyze these algorithms and evaluate their
our research. performance in different scenarios. The aim is to provide
The work on machine learning and metaheuristic algo- a set of well-suited algorithms for various use cases and
rithms for anomaly-based intrusion detection in IoT-based practical requirements. The study aims to bridge the gap in
health-care applications by [12] employs algorithms like IDS optimization by delivering a comprehensive range of
Particle Swarm Optimization (PSO), Genetic Algorithm algorithm combinations capable of meeting diverse use case
(GA), and Differential Evolution (DE) for feature selection requirements and achieving desirable performance outcomes.
and uses k- Nearest Neighbour (kNN) and Decision Tree
(DT) for classification. The proposed hybrid approach com- TABLE 1. Summary of literature review.
bines these techniques to improve performance. The paper
also presents an IoT-based healthcare architecture using the
best-performing algorithm to detect and prevent malicious
traffic. Reference [13] recommend a novel feature selection
method using GA to determine the optimal feature subsets
from the NSL-KDD dataset. The results of the proposed
work were then compared with the existing feature selection
methods to verify improved performance.
Reference [14] introduce a novel approach for net-
work intrusion detection using the Horse herd optimization
algorithm (HOA) and Quantum-inspired optimization. The
proposed algorithm, MQBHOA, leverages horses’ behav-
ior in a herd to select effective features and enhance
social behaviors for intrusion detection. The K-Nearest
Neighbor (KNN) classifier is employed for classification.
The performance of MQBHOA is evaluated on NSL-KDD
and CSE-CIC-IDS2018 datasets. The results demonstrate
that MQBHOA outperforms other metaheuristic algorithms,
achieving higher feature selection and classification accuracy
success rates.
The authors of [15] conduct a comprehensive investigation
on the impact of feature selection on intrusion detection
systems’ performance. They employ the Random Forest (RF)
algorithm to select pertinent attributes, aiming to enhance
the effectiveness of IDSs. The study includes a comparative
analysis involving diverse classifiers, including k-NN, DT,
Support Vector Machine (SVM), Logistic Regression (LR),
and Naïve Bayes (NB) classifiers, for the NSL-KDD dataset.
The findings demonstrate notable improvements in detection
rate, accuracy, and false alarm reduction compared to existing III. MACHINE LEARNING-ASSISTED METAHEURISTICS
state-of-the-art classifiers. A. OPTIMIZATION ALGORITHMS AND METAHEURISTICS
Reference [16] propose a high-performance classification Optimization algorithms use mathematical procedures to
algorithm, SEKS, and SEIDS, for improving attack detection achieve the best possible solution within constraints. These
in an IDS. Their approach combines clustering, classification, algorithms iteratively modify the parameters of a system
and metaheuristic algorithms to enhance accuracy and detect or function to minimize or maximize a specific objective
unfamiliar attacks. The research shows that their method out- function by exploring an ample solution space to find the
performs previous classification methods in accuracy. Table 1 optimal solution according to predefined criteria. Optimiza-
outlines the summary of the reviewed literature. tion techniques are categorized as Deterministic or Stochastic
Additionally, research has been conducted on optimized per their behavior, Unconstrained vs. Constrained subject to
feature selection for IDS. However, there is a need for some constraints, and Linear vs. Nonlinear, based on the
more in-depth exploration and optimization of IDS using objective function [13]. Optimization techniques are also
metaheuristic algorithms, considering various combinations extended to work on Local vs. Global solutions and First-
derived from different metaheuristic algorithms and machine Order vs. Second-Order solutions based on the derivative of
learning classifiers. the objective function.
Our article addresses this gap by emphasizing the develop- Metaheuristic algorithms are characterized by their
ment and comparison of different machine learning-assisted stochastic and iterative nature, utilizing randomness to

VOLUME 12, 2024 21747


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

explore the solution space [14]. One of the key advantages population array is initially randomly generated, with each
of metaheuristics is their ability to quickly find satisfactory gene (element) being a number between 0 and 1. In this
solutions for complex problems, even when traditional opti- case, the fitness function is modified to evaluate the accuracy
mization algorithms fail to do so. Some well-known examples obtained by a particular organism in the population array. The
of metaheuristic algorithms include Genetic Algorithm, Sim- genomic sequence of an organism represents the presence
ulated Annealing, Particle Swarm Optimization, Grey Wolf or absence of specific features from the feature set. Each
Optimization (GWO), and Ant Colony Optimization (ACO). gene in the genomic sequence corresponds to a particular
Our study explores three prominent metaheuristic algorithms: feature, indicating whether it is selected. The fitness function
Genetic Algorithm, Particle Swarm Optimization, and Grey determines how well an organism solves the problem by
Wolf Optimization. These algorithms have been selected evaluating the accuracy using this genomic sequence. The
based on their established effectiveness in addressing the GA works through successive generations, aiming to improve
study’s research objectives. accuracy over time. The algorithm applies genetic operators,
such as selection, crossover, and mutation, to simulate natural
B. METAHEURISTICS-BASED FEATURE SELECTION
selection and evolution principles. The modified GA utilized
in our research for feature selection is shown in Figure 2.
The technique used in our research proposes machine
Particle Swarm Optimization (PSO) is a population-based
learning-assisted metaheuristics for carrying out feature
optimization algorithm inspired by the social behavior of bird
selection. Exhaustive feature selection can be effective for
flocking or fish schooling. PSO is a heuristic search algorithm
small datasets with few features. It becomes computationally
that aims to find the optimal solution in a search space by
infeasible for high-dimensional datasets. Heuristic methods
iteratively updating a group of particles representing potential
and metaheuristic algorithms can be used to reduce the
solutions [19]. The modified PSO-based feature selection
search space and improve the efficiency of feature selec-
algorithm begins by randomly initializing a population of
tion. The approach chosen depends on the problem’s specific
particles within the search space.
requirements and the dataset’s characteristics. When machine
Each particle is represented by a 2D vector called ‘‘popula-
learning is used in the fitness function of metaheuristic
tion’’ where the number of rows corresponds to the number of
algorithms, it becomes ‘‘machine learning-assisted meta-
particles, and the number of columns represents the number
heuristics’’. In this approach, the fitness function of the
of features in the dataset. The population matrix is initialized
metaheuristic algorithm is enhanced with a machine-learning
with random numbers. A 2D vector called ‘‘velocity’’ is also
model that can estimate the fitness value of a candidate solu-
created with the exact dimensions of the population matrix.
tion. The machine learning model is trained on a set of labeled
It is initialized with zeros. Each particle in the population
data, which contains the fitness values of previously evalu-
has an associated position and velocity vector. The algorithm
ated solutions. The machine learning-assisted metaheuristics
optimizes the fit- ness scores throughout the iterations by
approach can improve the optimization performance of the
adjusting the particle positions and velocities. Each particle
algorithm by reducing the number of function evaluations
updates its velocity and position vectors based on these com-
required to find the optimal solution. This technique is
parisons to move towards better solutions in the search space.
depicted in Figure 1.
The velocity update is influenced by three factors: its previous
The technique was achieved using the machine learn- ing
velocity, personal best position, and global best position.
model to predict the fitness value of candidate solutions, elim-
The particle adjusts its velocity vector to balance exploration
inating the need to evaluate all solutions in the search space.
(following the global best) and exploitation (following its
The machine learning model can also capture the underlying
personal best). The velocity update equation in PSO is typ-
patterns and relationships in the search space, leading to more
ically defined as follows: The update rule for the velocity of
efficient and effective optimization.
a particle in the PSO algorithm is given in equation 1 below:

C. GENETIC ALGORITHM, PARTICLE SWARM v(t + 1) = w.v(t) + c1.r1(pbest −x(t)) + c2.r2(gbest − x(t))
OPTIMIZATION, AND GREY WOLF OPTIMIZATION FOR (1)
FEATURE SELECTION
Genetic Algorithm (GA) is a metaheuristic optimization where v(t + 1)is the updated velocity of the particle at time
algorithm inspired by natural selection and evolution. (t + 1). v(t) is the particle’s current velocity at time t. w is
GA mimics the process of natural selection by iteratively the inertia weight, controlling the impact of the particle’s
evolving a population of candidate solutions to find the previous velocity. c1 and c2 are the cognitive and social
optimal solution [18]. coefficients, determining the influence of the particle’s best
In the context of this study, we modify the GA for fea- position (pbest ) and the global best position (gbest ) on the
ture selection. The population array represents a group of velocity update. r1 and r2 are random numbers in the range
potential solutions, where each solution is represented as [0, 1], introducing stochasticity into the algorithm. x(t) is the
a genomic sequence. The length of the genomic sequence current position of the particle at time t. After updating the
corresponds to the number of features in the dataset. The velocity, the particle updates its position using the following

21748 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

FIGURE 1. Feature selection using hybrid metaheuristic algorithms.

equation 2.
x(t + 1) = x(t) + v(t + 1) (2)
The iterations continue until a termination criterion is met,
which could be a maximum number of iterations, reach- ing
a predefined fitness threshold, or other stopping condi- tions.
Figure 3 showcases the usage of the PSO algorithm for feature
selection.
At the core of the GWO algorithm is a population of
candidate solutions, represented as a pack of grey wolves.
The algorithm iteratively updates the positions of the wolves
to explore and search for the optimal solution within the
given search space. The wolves’ social hierarchy and hunting
behavior guide this exploration [21]. The Modified Grey
Wolf Optimization (GWO) algorithm for feature selection
involves a step-by-step procedure to identify the most rele-
vant features in a given dataset. Figure 4 is the pseudocode
explaining the flow of the Modified GWO algorithm. The
algorithm begins by initializing a population of search agents,
representing potential feature subsets. These search agents
are randomly positioned within the boundaries of the search
space.
Next, the fitness of each search agent is evaluated using a
fitness function that measures the performance of the selected
features. The fitness function typically considers metrics such
as accuracy, error rate, or other performance measures spe-
cific to the problem domain. The three best search agents
are identified based on their fitness values: Alpha, Beta, and FIGURE 2. Flowchart for the modified genetic algorithm for feature
Delta. These agents represent the leaders of the population selection.
and have the highest fitness values. Their positions serve as
references for updating the positions of other search agents.
Here,
The algorithm then proceeds to iterate for a specified number
of times. In each iteration, the positions of the three leaders Xi is the position of the wolf.
(Alpha, Beta, and Delta) are updated using a set of equa- X1 is the first best position of the best wolf.
tions (3), (4) as mentioned below. These equations involve X2 is the second best position of the best wolf.
parameters such as A, C, and D, which control the movement X3 is the third best position of the best wolf.
of the leaders within the search space. A(i) is a randomly generated number for the
exploration around the current position of
the alpha wolf
Xi = Alpha_position − A(i) × D_alpha(i) (3) D_alpha is the distance between the current wolf
Positions [i, j] = (X 1 + X 2 + X 3)/3 (4) (Positions [i, j] and the alpha wolf.

VOLUME 12, 2024 21749


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

Algorithm 1 GWO Algorithm


1: Function fitness function gwo(positions, X_train, X_test,
y_ train, y_test, classifier):
2: features ← extract_features(positions)
3: train_xf ← X_train[:, features]
4: test_xf ← X_test[:, features]
5: classifier.fit(train_xf, y_train)
6: accuracy ← classifier.score(test_xf, y_test)
7: w ← 0.9
8: return-(w ∗ accuracy + (1 - w) / len(features))
9:
10: Function GWO(X_train, y_train, X_test, y_test, classifier,
lb, ub, dim, SearchAgents no, Max_iter):
11: Initialize Alpha pos, Beta pos, and Delta pos
12: Initialize Positions with random values within the search
space
13: Initialize Convergence curve
14: for each iteration from 0 to Max_iter do
15: for each search agent in Positions do
16: Update the search agent’s position within the search
space boundaries
17: Calculate the fitness of the search agent
18: Update Alpha pos, Beta pos, and Delta pos based on the
search agent’s fitness
19: end for
20: Update the parameter ‘a’ based on the current iteration
21: for each search agent in Positions do
22: for each dimension in the search agent’s position do
23: Generate random numbers r1 and r2
24: Update A1, C1, D alpha, and X1
FIGURE 3. Flowchart for modified PSO algorithm for feature selection.
25: A1 ← 2 ∗ a ∗ r1 - a
26: C1 ← 2 ∗ r2
27: D alpha ← abs(C1 ∗ Alpha_pos[j] - Positions[i, j])
28: X1 ← Alpha_pos[j] - A1 ∗ D_alpha
The updated positions of the leaders are used as references 29: Update A2, C2, D beta, and X2
for updating the positions of the remaining search agents. 30: A2 ← 2 ∗ a ∗ r1 - a
The positions of the remaining search agents are also updated 31: C2 ← 2 ∗ r2
32: D beta ← abs(C2 ∗ Beta_pos[j] - Positions[i, j])
using similar equations, considering the values of A, C, and
33: X2 ← Beta_pos[j] - A2 ∗ D_beta
This ensures exploration and exploitation of the search space 34: Update A3, C3, D delta, and X3
to find optimal feature subsets. Additionally, the values of 35: A3 ← 2 ∗ a ∗ r1 - a
parameters A, and C are updated throughout the iterations to 36: C3 ← 2 ∗ r2
control the search behavior of the algorithm. The iterations 37: D delta ← abs(C3 ∗ Delta_pos[j] - Positions[i, j])
38: X3 ← Delta_pos[j] - A3 ∗ D_delta
continue until the maximum number of iterations is reached.
39: Update the search agent’s position
At the end of the algorithm, the position of the Alpha search 40: Positions[i, j] ← (X1 + X2 + X3) / 3
agent, representing the best-selected features, is returned as 41: end for
the final outcome of the feature selection process. 42: end for
43: Update the convergence curve with the best fitness value
IV. DATASET DESCRIPTION (Alpha_score) of this iteration
This study included the NSL-KDD and KDDCUP.10% of the 44: if the iteration is a multiple of 1 then
45: Print the current iteration and the best fitness value
datasets for testing and evaluating the algorithms. (Alpha_score)
46: end if
A. NSL-KDD 47: end for
The NSL-KDD dataset is a network intrusion detection 48: return Alpha_pos
dataset that was created by the University of New Brunswick
Canadian Institute for Cybersecurity in response to some
of the inherent problems of the KDD’99 dataset. These
problems include: • The imbalance of the classes in the dataset can make it
• The presence of redundant records in the train set can difficult to train accurate classifiers.
bias the classifiers towards more frequent records.
The NSL-KDD dataset addresses these problems by:
• The presence of duplicate records in the test sets can bias
the learners’ performance towards methods with better • Removing redundant records from the train set and
detection rates on the frequent records. removing duplicate records from the test sets.

21750 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

• Balancing the classes in the dataset by oversampling the TABLE 2. Dataset information.
minority classes.
The NSL-KDD dataset contains 41 features, which are
divided into four categories:
• Basic features: These features provide basic informa-
tion about the network traffic, such as the source and 3) Concatenating Data: The concatenation of the train
destination IP addresses, the protocol used, and the and test DataFrames into a single DataFrame is per-
number of bytes transferred. formed. This is done to ensure consistent preprocessing
• Data subject features: These features provide informa- steps are applied to both the training and test data.
tion about the content of the network traffic, such as the 4) Attack Classification: A new column called ‘‘at-
number of packets, the number of connections, and the tack_check’’ is introduced in the DataFrame. This col-
number of TCP flags set. umn is derived from the ‘‘attack_type’’ column and
• State features: These features provide information classifies instances as either attacks or nor- mal activi-
about the state of the network, such as the number of ties. Instances that have a value other than ‘‘normal’’
connections in progress and the number of connections in the ‘‘attack_type’’ column are marked as ‘‘True’’
that have been closed. in the corresponding ‘‘attack_check’’ column, indicat-
• Timing features: These features provide information ing an attack. Conversely, instances with the value
about the timing of the network traffic, such as the start ‘‘normal’’ in the ‘‘attack_type’’ column are marked
and end times of the connection. as ‘‘False’’ in the ‘‘attack_check’’ column, indicating
The NSL-KDD dataset contains 148,515 records, which normal network activities. This study identifies and
are divided into a train set of 125,972 records and a test set classifies attacks, irrespective of specific attack types
of 22,543 records. The dimension of the train set is (125,972, such as DOS, Probe, U2R, or R2L. The objective is to
43), and the dimension of the test set is (22,543, 43). distinguish between normal network activities and any
form of attack without delv- ing into the specific attack
B. KDDCUP.DATA_10% DATASET categories. By adopting this approach and considering
The kddcup.data_10% dataset contains (494,021 43) records only the binary classification of attacks versus normal
and the difference in the number of features between the activities, the study simplifies the task of detecting and
NSL-KDD dataset and the kddcup.data_10% dataset is since identifying any attack without the need to differentiate
the NSL-KDD dataset includes an additional feature called between the various attack types.
‘label’. The ‘label’ feature indicates whether the record 5) Encoding Categorical Variables: Label encoding on
represents normal or anomalous network traffic. The KDD the categorical variables in the DataFrame using Labe-
cup.data_10% dataset is a good resource for researchers inter- lEncoder from the preprocessing module is performed.
ested in network intrusion detection. The dataset is well- This ensures that categorical variables are represented
balanced and contains various features that can be used to as numerical values for model training.
train classifiers. The NSL-KDD and KDD cup.data_10%
datasets are considered to be benchmark datasets in the realm B. FEATURE AND TARGET SPLIT
of Intrusion Detection Systems as researchers and practi- 1) Feature split: The features (X) and the target vari-
tioners widely use them to evaluate the performance of new able (Y) are separated from the data frame. The
intrusion detection techniques. ‘outcome’, ‘attack_check’, and ‘attack_type’ columns
are dropped from the feature set (X), while the ‘at-
V. METHODOLOGY tack_check’ column is used as the target variable
This section of the article presents the overall methodology (Y). The columns ‘outcome’, ‘attack_check’, and ‘at-
followed for the data cleaning, feature selection using meta- tack_type’ in the NSL-KDD dataset, as well as the
heuristic algorithms and classification using ML classifiers. columns ‘target’, ‘Attack Type’, and ‘attack_check’ in
the kddcup.data_10_% dataset, provide direct infor-
A. DATA CLEANING AND PRE-PROCESSING mation about the classification or the nature of the
1) Loading the Data: The training and test data files are instances. Including these columns in the feature set
read. The loaded data is stored in the train and test would trivialize the classification task, as the answer or
DataFrames. the class label would already be explicitly stated. These
2) Column Labels: Column labels are assigned to the columns are dropped from the feature set to ensure a
train and test DataFrames using the labels list. These meaningful and challenging classification task.
labels represent the different attributes or features of 2) Splitting method: The feature and target data are split
the data. These labels’ definitions are available in into training and testing sets using the train_test_split
the documentation or publication associated with the method from the Sklearn model selection module. The
NSL- KDD [22] and KDD cup 1999 dataset [23], [24]. data is split into ‘X_train’, ‘X_test’, ‘y_train’, and

VOLUME 12, 2024 21751


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

‘y_test’, with a test size of 0.30 (30% of the data is used


for testing). Table 1 presents the details of the dataset
used.

C. FEATURE SELECTION USING METAHEURISTIC


ALGORITHMS AND CLASSIFICATION USING ML
A suitable ML classifier, such as Random Forest, Support FIGURE 4. Sample fitness function of GA algorithm.
Vector Machine (SVM), or Neural Network, is selected based
on the specific requirements. In this research study, the
focus was on four primary Machine Learning classifiers: performance outcomes. The primary objective is to thor-
Gaussian Naive Bayes (GNB), Decision Tree (DT), Logistic oughly analyze these algorithms and assess their performance
Regression (LR), and Random Forest (RF). The chosen ML in different scenarios. Consequently, this research offers a
classifier is trained using the training dataset (Xtrain, Ytrain) selection of well-suited algorithms for various use cases and
within the fitness function. Figure 4 gives one of the fitness practical requirements.
functions used for the GA algorithm. Here, the features of the For instance, organizations with substantial computational
GA algorithm, the X and y variables and the ML classifier resources may prioritize algorithms that demonstrate high
are passed as arguments to the fitness function. The classifier performance, regardless of their time and space complexity.
fits the X features to the y predictions. Finally, the accuracy On the other hand, organizations with limited computa-
score is calculated by comparting the actual data with the tional capabilities and resources may opt for algorithms
predictions made by the classifier. that exhibit lower asymptotic complexities during the mod-
In this manner, various fitness functions based on the eling phase while still achieving satisfactory performance
metaheuristic algorithm and the ML classifier are generated levels. Our approach enables the selection of algorithms
and tested. The feature set is generated ran- domly following based on specific computational constraints and operational
the working principles of the metaheuristic algorithm. The requirements, facilitating the practical implementation of IDS
overall methodology is presented in Figure 5. The classifier’s solutions.
performance on the validation or test dataset (Xtest, Ytest)
is tested to obtain the fitness score. The study’s objective
A. ALGORITHM NOMENCLATURE
is to maximize the accuracy of the fitness score. Once the
The algorithms developed in this study, combining machine
maximum number of iterations or generations is reached, the
learning classifiers and metaheuristic algorithms, are
feature set with the highest fitness score (global maximum)
assigned specific names for ease of identification. The
is selected as the optimal solution. After train- ing with the
nomenclature used for these algorithms can be found in
selected features, the model uses four primary classifiers:
Table 3.
Gaussian Naive Bayes, Decision Trees, Logistic Regression,
Each algorithm name consists of three distinct parts. The
and Random Forest classifier to classify the data as malicious
first part represents the machine learning classifier that clas-
or non-malicious.
sifies network flows as malicious or benign. The second part,
VI. EXPERIMENTAL SET-UP, RESULTS, AND ANALYSIS separated by a hyphen symbol, denotes the employed feature
The experimentation has been conducted using Python with selection technique. The first part of the second half signifies
the Scikit-learn library on a Windows 11 system. The test the name of the metaheuristic algorithm utilized, while the
computer has a processor which is Intel(R) Xeon(R) E-2124 latter part represents the machine learning classifier incorpo-
CPU @ 3.30GHz 3.31 GHz. The installed RAM is of 32 GB rated in the fitness function of the metaheuristic algorithm.
size (31.9 GB usable), and it uses a 64-bit operating system, For instance, an algorithm named DT-GA_RF indicates the
x64-based processor. final classification achieved after feature selection uses a
Due to the inherent stochastic nature of the developed Decision Tree (DT). The feature selection process is con-
algorithms, they were executed multiple times to obtain a ducted using a Genetic Algorithm (GA) in combination with
consensus by aggregating the results. Stochastic algorithms Random Forest (RF), with Random Forest employed within
are valuable in solving complex problems that deterministic the Genetic Algorithm’s fitness function.
techniques may struggle with. These algorithms leverage
randomization to explore a vast search space and discover B. RESULTS AND DISCUSSION
high-accuracy solutions that deterministic methods might All the 48 developed machine learning-assisted metaheuris-
overlook. By incorporating probabilistic methods, stochas- tic algorithms were meticulously evaluated against each
tic algorithms identify accurate solutions and potentially other, and the results are summarized. Table 4 presents the
find optimal solutions [25]. This study bridges the gap top 10 algorithms with the highest test scores, arranged in
in IDS optimization by delivering a comprehensive set of descending order.
algorithm combinations. These algorithmic combinations can The confusion matrix scores used for computing the
address diverse use case requirements and achieve desired performance of the proposed algorithms are given in the

21752 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

equations (5), (6), (7), (8) below:


TP + TN
Accuracy = (5)
TP + TN + FP + FN
TP
Precision = (6)
TP + FP
TP
Recall = (7)
TP + FN
Precision × Rcall
F1score = 2 × (8)
Precision + Recall
where TP = True Positive
TN = True Negative
FP = False Positive
FN = False Negative
An important observation derived from the results is the
exceptional performance of the Random Forest (RF) clas-
sifier. The Random Forest Classifier is renowned for its
ability to deliver robust performance even in the presence
of noisy data. Notably, all the top 10 algorithms in the table
employ the Random Forest Classifier as their final classifier,
underscoring its effectiveness in intrusion detection.
The algorithm that achieved the highest test score, with
an accuracy of 99.5787%, is one that incorporates feature
selection through Particle Swarm Optimization (PSO) with
Random Forest in the fitness function. Following closely
is the algorithm that utilizes Random Forest as the classi-
fier, paired with Particle Swarm Optimization with Decision
Tree (PS_DT) as the feature selection technique. The third
highest-scoring algorithm employs Genetic Algorithm (GA)
with Gaussian Naive Bayes (GNB) as its feature selec-
tion technique. These findings highlight the prowess of the
Random Forest classifier and emphasize the successful uti-
lization of various metaheuristic algorithms combined with
feature selection techniques for achieving high accuracy in
intrusion detection tasks.
The top 10 algorithms based on test scores may differ
from the top 10 algorithms based on the F1 score. Test
scores reflect the accuracy and generalization ability of the
algorithm on new, unseen data. In contrast, F1 scores com-
bine precision and recall to provide a balanced performance
measure, particularly in imbalanced datasets. As a result,
algorithms with high accuracy may not necessarily have the
highest F1 scores, and vice versa. The ranking of algorithms
will depend on the IDS dataset, problem domain, and specific
evaluation criteria employed.
A high recall score’s significance in Intrusion Detec-
tion Systems (IDS) is crucial for effectively identifying and
detecting malicious activities. Recall, also known as sen-
sitivity or true positive rate, measures the proportion of
FIGURE 5. Machine learning assisted metaheuristic technique for feature
actual positive instances (intrusions) that the IDS correctly selection and classification.
identifies [26]. Table 5 presents the top 10 recall scores
in descending order. Among these algorithms, RF-PSO_RF
achieved the highest recall score, indicating its ability to A high recall score is desirable in IDS because it helps min-
detect a large proportion of actual intrusions accurately. This imize the risk of false negatives, where actual intrusions go
algorithm is followed by RF-PSO_DT and RF-GA_GNB, undetected. By effectively capturing a significant number of
which also demonstrate strong performance in terms of recall. true positive instances, IDS algorithms with high recall scores

VOLUME 12, 2024 21753


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

can enhance the overall security of a system by promptly algorithms in each plot, the efficacy of the selected feature
identifying potential threats and enabling timely response set can be evaluated. Feature selection techniques are used to
measures. Therefore, algorithms with high recall scores, such identify the most relevant features (or attributes) that can pro-
as RF- PSO_RF, RF-PSO_DT, and RF-GA_GNB, play a vital vide maximum information gain for the model while reducing
role in ensuring the effectiveness and reliability of IDS in the dimensionality of the input data. It is important to note
detecting and mitigating security breaches. In Figure 6, the that the choice of feature selection technique can significantly
comparison of algorithms against the test score and corre- impact the performance of an IDS model.
sponding F1 score is presented. When choosing an Intrusion The study analyzed 48 unique feature selection techniques:
Detection System (IDS) algorithm, it is crucial to consider machine learning-assisted metaheuristic algorithms. These
both the test score and the F1 score as they provide com- algorithms incorporated a machine learning model within
plementary information about the algorithm’s performance. their fitness function. The study categorized these algorithms
An algorithm with a high test score but a low F1 score may based on the feature selection technique employed, resulting
indicate many false negatives, meaning it fails to detect actual in 12 distinct techniques, this includes GA_GNB, GA_LR,
threats. On the other hand, an algorithm with a high F1 GA_DT, GA_RF, PSO_GNB, PSO_LR, PSO_DT, PSO_RF,
score but a low test score may generate many false positives, GWO_GNB, GWO_LR, GWO_DT and GWO_RF. Follow-
triggering alerts for benign traffic. Hence, selecting an IDS ing the feature selection process, the study focused on four
algorithm that balances a high test score and a high F1 score prominent machine learning models (GNB, RF, LR, and DT)
is vital. This balance helps minimize false negatives and pos- for classification purposes. All results obtained under these
itives, ensuring accurate detection rates. The plot in Figure 7 classifiers were normalized, and each group’s mean score
serves as a tool to compare the performance of different IDS was calculated. For instance, the feature selection technique
algorithms. The stacked plot in Figure 7 visually represents GWO_LR was utilized by RF-GWO_LR, LR-GWO_LR,
the algorithms’ performance in different IDS classes across DT-GWO_LR, and GNB-GWO_LR. The results of these
multiple metrics. The ‘‘all- rounders’’ or best-performing four algorithms were normalized and averaged, ultimately
algorithms can be identified by observing the bars with grouped under GWO_LR. This same process was applied to
high stacks across all performance categories, including Test the remaining 11 feature selection techniques. To evaluate
Score, F1-Score, Recall-Score, and Precision-Score. These the performance of the algorithms, four-vector arrays were
algorithms demonstrate consistent and balanced performance maintained to capture the mean test score, mean F1 score,
across various aspects of IDS evaluation. Choosing algo- mean precision score, and mean recall score. Each array
rithms that excel in all performance metrics helps ensure a contained 12 scores corresponding to the 12 feature selection
comprehensive and robust IDS solution. By considering the techniques. To provide a clear understanding of the overall
overall performance rather than emphasizing a single metric, performance of the feature selection techniques, a 3D bar
the selected algorithms are more likely to provide accurate plot is generated. This visualization enables comparing the
detection, minimize false negatives and false positives, and 12 developed algorithms across the four performance metrics.
maintain high levels of precision and recall. Figure 9 depicts the 3D bar plot. This analysis makes it
By examining the stack bar chart in 7, we can identify sev- possible to determine the superior machine learning-assisted
eral algorithms demonstrating promising performance across metaheuristic technique based on the overall performance
multiple performance metrics. These algorithms include observed in the bar plot. The findings from the 3D bar
RF-GWO_DT, RF-GA_GNB, RF-GA_LR, DT-GWO_DT, plot in Figure 9 are further examined and represented in
LR-GWO_LR and so on. These algorithms exhibit high a 2D version combining a bar plot and multiple line plots
stacks in Test Score, F1-Score, Recall-Score, and Precision- in Figure 10. This visualization aims to provide a more
Score, indicating strong performance across different metrics. comprehensive understanding of the performance of different
Feature selection is crucial in building an effective machine feature selection techniques. Upon analyzing the combined
learning-based IDS. Figure 8 shows the subplots that are orga- plot, a key inference is drawn regarding the superiority of
nized in a grid, with each row representing a different type of two specific feature selection techniques over the others. The
optimization algorithm (Genetic Algorithm, Particle Swarm algorithms GWO_LR and GWO_GNB consistently demon-
Optimization, Grey Wolf Optimization) and each column strate higher scores across all performance metrics. These
representing a different machine learning model (Gaussian techniques exhibit strong performance in terms of the mean
Naive Bayes, Logistic Regression, Decision Tree, Random test score, mean F1 score, mean precision score, and mean
Forest). The purpose of organizing the subplots in this way recall score.
is to compare the performance of different IDS algorithms While some other algorithms display high scores in the
within each optimization and model category and across mean test score metric, they exhibit relatively poorer perfor-
different categories. Each plot in the grid corresponds to a mance in the remaining evaluation metrics. This observation
different feature selection technique. This indicates that the highlights the importance of considering multiple perfor-
data has been preprocessed using different feature selection mance metrics to assess the overall effectiveness of a feature
methods, and the resulting features are being used to train selection technique. The combined 2D plot allows for a more
the IDS model. By visualizing the performance of different nuanced understanding of the comparative performance of

21754 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

TABLE 3. Algorithm nomenclature.

TABLE 4. Top 10 high test-score. TABLE 5. Top 10 high recall-score algorithms.

the feature selection techniques. It enables researchers and most effective technique for their specific requirements.
practitioners to identify the algorithms that excel across mul- Dimensionality reduction is significant while modeling an
tiple evaluation metrics, such as GWO_LR and GWO_GNB, IDS using ML because it can significantly reduce the space
thereby making informed decisions regarding adopting the and time complexity of the model [27]. High-dimensional

VOLUME 12, 2024 21755


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

data can be very computationally expensive to process. They approximately 15 features from the original set of 41 features
may lead to overfitting, where the model is too complex in the training dataset.
and learns the noise in the data instead of the underlying This substantial reduction in the training data has notable
patterns. Feature selection is a technique used for dimen- implications, including reduced space and time complexity.
sionality reduction, which aims to identify the most relevant The scatter plot in Figure 12 shows the relationship between
features for the model while removing redundant or irrel- the selected features’ length and the test score for various
evant features [22]. By reducing the number of features, IDS models, where the feature selection was made through
feature selection can improve the model’s accuracy, reduce different metaheuristic algorithms. The plot shows that some
overfitting, and speed up the training and prediction time. algorithms have achieved high test scores with very few
Striking a balance between lowering the data size (Space features selected. This suggests that these algorithms could
complexity) and performance is crucial when modelling an select the most relevant features for the model, resulting in a
IDS using machine learning. Reducing data size can be ben- more efficient and accurate model.
eficial as it reduces the space complexity of the model and Moreover, the color coding of the scatter plot indicates
leads to faster and more efficient computations. that different metaheuristic algorithms have resulted in dif-
However, reducing the data size too much can lead to a ferent lengths of features selected for each IDS model. This
loss of critical information and patterns, which can negatively suggests that the choice of metaheuristic algorithm used
impact the performance of the IDS. On the other hand, perfor- for feature selection can significantly impact the resulting
mance is vital as it determines the effectiveness of the IDS in feature set and the overall performance of the IDS model.
detecting and preventing intrusions. A highly accurate model Therefore, selecting an appropriate metaheuristic algorithm
will detect attacks and minimize false positives and nega- for feature selection is essential in developing an effective
tives. However, achieving high performance often requires IDS model. By analyzing the scatter plot, we can observe that
more data and complex algorithms, increasing space com- there are specific algorithm groups, such as RF-GWO_RF,
plexity and slow computation. Therefore, a balance between RF-GWO_DT, DT-GWO_RF, and GNB-GWO_DT, which
space complexity and performance must be struck to ensure exhibit both high test scores and a consistent feature selection
that the IDS is efficient and effective. This balance can be length below 10.
achieved through feature selection, dimensionality reduction, The exact performance metrics of the aforementioned algo-
and model optimization. These techniques aim to reduce the rithms are provided in Table 6 for further evaluation and
data size while retaining critical information and optimizing reference. The algorithms are grouped by distinct colors and
the model’s performance. symbols, making them easily identifiable. These algorithms
Figure 11 provides valuable information for modelling an demonstrate excellent performance with high test scores and
Intrusion Detection System (IDS) using machine learning by consistently high metrics such as F1-Score, Recall, and Preci-
considering the number of features to select before train- sion Scores. Additionally, they exhibit the ability to maintain
ing the model. Specifically, the graph shows each algorithm a relatively low number of features selected, which can be
group’s mean length of selected features and the mean test advantageous in terms of the efficiency and interpretabil-
score. ity of the IDS. Overall, these results indicate the potential
The mean length of selected features is an essential IDS effectiveness of these algorithms for intrusion detection tasks.
feature selection metric. The number of features selected One of the key challenges in building an effective Intrusion
can impact the model’s performance, as too many features Detection System (IDS) is accurately identifying the subset of
can result in overfitting, and too less features can result in features most relevant for detecting malicious activities while
underfitting. Therefore, having an idea of the mean length of maintaining computational efficiency. The feature selection
selected features can help select the ideal set of features for process is crucial in IDS modeling as it directly impacts per-
the model. The mean test score, on the other hand, provides formance. Including irrelevant or redundant features can lead
information about the model’s performance. to overfitting and reduced generalization capability, while
Achieving a balance between test scores and the num- selecting less features may result in a lack of discrimina-
ber of selected features is crucial. An exceptional algorithm tory power to detect malicious activities. Additionally, the
would select minimal features while achieving high scores. computational cost of feature selection becomes a significant
Figure 11 presents a bar plot of the algorithms, showcasing concern, particularly for large datasets, as time-consuming
their corresponding lengths of selected features. Additionally, algorithms may need to be more practical for real-time IDS
a red line plot indicates the corresponding test scores. This applications. Thus, finding a balance between the number of
approach can be extended to other performance metrics, such features and the computational cost of feature selection is
as F1-score, recall, and precision. Among the algorithms, essential when building an effective IDS.
GWO_LR and GWO_GNB stand out for their remark- The plot in Figure 13 provides valuable insights into
able results. Despite significantly reducing the NSL-KDD the relationship between feature selection time and the test
dataset, GWO_LR maintains a mean test score above 94%, score for different feature selection algorithms in the context
while GWO_GNB achieves a score of 92.5%. GWO_LR of IDS. It is a valuable tool for selecting an appropriate
selects fewer than 20 features, whereas GWO_GNB chooses feature selection technique based on a given dataset. The

21756 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

FIGURE 6. Algorithms compared against their test scores and f1 scores.

FIGURE 7. Algorithms vs test-score vs f1-score vs recall vs precision.

plot demonstrates a trade-off between feature selection time and improving detection accuracy. By thoroughly evaluat-
and test score, highlighting that different algorithms exhibit ing the feature selection time and test scores of different
varying levels of computational complexity and effectiveness algorithms, IDS developers can make informed decisions to
in selecting relevant features. Some algorithms may require enhance the effectiveness and efficiency of their systems,
more time for feature selection but yield higher test scores, ultimately contributing to the improved security of computer
indicating better performance, while others may have shorter networks and systems. The algorithms used in the study were
selection times but result in lower test scores. Therefore, categorized into 12 groups based on the feature selection
when designing an IDS, careful consideration of both feature techniques they employed. The average test score and fea-
selection time and test score is essential, as the computational ture selection time were computed for each group, and the
cost significantly impacts the overall system performance. results are presented in Figure 13. Finding the most optimized
Selecting an algorithm that balances these factors is crucial algorithm involves balancing, minimizing time complexity,
to optimizing the IDS performance, reducing false alarms, and maximizing test scores. Some algorithms achieve high

VOLUME 12, 2024 21757


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

FIGURE 8. Feature selection techniques and test-scores.

FIGURE 9. Feature selection techniques compared.

accuracy but suffer from excessive time complexity, while achieves a test score of 85% with an even lower feature
others perform reasonably well with reasonable accuracy. selection time of 59.13 seconds. These instances highlight the
Figure 13 illustrates the relationship between feature selec- variations in performance among the different algorithms.
tion time, test score, and the algorithms employed. The blue Furthermore, our research explores all possible combina-
bars represent the feature selection time, while the yellow tions of weights/ tradeoffs for each criterion (Test Score, F1
line plot represents the mean test score. The objective is to Score, Recall, Precision, Length-Selected-Features, and Run-
achieve a low feature selection time while maintaining a high Time- FS) and then calculates the weighted score for each
test score. Among the algorithms, GWO_GNB stands out algorithm using the current combination of weights. It then
with the lowest feature selection time of 34.18 seconds and a ranks the algorithms based on their weighted scores and prints
respectable test score of 93%. On the other hand, GWO_LR the combination of weights and the ranking of algorithms for
demonstrates an impressive test score of 98%, but it is accom- each combination of weights.
panied by one of the most extended feature selection times, The weights allotted for different evaluation criteria to cal-
clocking in at 5529.18 seconds. culate each algorithm’s overall score as depicted in Table 6.
Among the algorithms analyzed, PSO_LR demonstrates a This can be useful for comparing and selecting the best
decent test score of 87% with a relatively low feature selection algorithm based on multiple criteria rather than just one. The
time of 1427.73 seconds. On the other hand, PSO_GNB weights determine the relative importance of each criterion

21758 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

FIGURE 10. Feature selection techniques vs test-scores | F1 score | recall | precision.

FIGURE 11. Feature selection techniques vs the mean length of selected features and test score.

in the overall score calculation. For example, we believe that + (combination [weights] [1]
test scores and length-selected-features are essential criteria. ×df [Normalized_F1_score])
In that case, we can assign them higher weights than other
+ (combination [weights] [2]
criteria like recall and time for Feature Selection.
During the computation of the final score, a weighted ×df [Normalized_Recall_score])
sum is calculated by multiplying each weight with its + (combination [weights] [3]
corresponding value and summing up the results. ×df [Normalized_Precision_score])
This process allows for combining different weighted fac-
+ (combination [weights] [4]
tors to determine the overall score. For example, in the
equation shown below is used to calculate the weighted ×df [Normalized_Length_Selected_features])
score: In Equation (9) each weight (obtained from the weight + (combination [weights] [5]
combinations) ×df [Normalized_Run_time_FS]) (9)
df [Weighted_score]
is multiplied by its respective normalized value (e.g.,
= (combination [weights] [0] normalized test score, normalized f1 harmonic score).
×df [Normalized_Test_score]) df stands for ‘data frame’ chosen for the respective

VOLUME 12, 2024 21759


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

FIGURE 12. Test score vs the length of selected features.

FIGURE 13. Feature selection time and test score vs the algorithms.

normalized value. The results are then summed together to Algorithms with smaller values for ‘Normalized-Length-
obtain the final weighted score. Using this weighted sum Selected-Features’ or ‘Normalized-Run-Time-FS’ will have
approach, different weight combinations can be applied to a higher weighted score than those with larger values, promot-
emphasize certain factors over others in the computation of ing the selection of algorithms that balance performance and
the final score. This allows for flexibility in determining efficiency. This subtraction term helps create a comprehen-
the importance of each factor and enables the evalua- sive evaluation metric that considers the positive aspects of
tion of different scoring criteria based on their respective various criteria and the potential drawbacks associated with
weights. The products involving the weights of ‘Normalized- longer feature selection time or a more significant number of
Length-Selected-Features’ and ‘Normalized-Run-Time-FS’ selected features. Three combinations of weights have been
are subtracted from the sum of the other weighted products. presented below.
The subtraction assigns a higher importance or preference
to algorithms with a lower value for ‘Normalized-Length- 1) COMBINATION 1
Selected-Features’ or ‘Normalized-Run-Time-FS’. By sub- This combination of ([0.5, 0.3, 0.1] and [−0.5, −0.1]])
tracting these products, we effectively penalize algorithms as seen in first row of Table 5, prioritizes Test Scores
with a higher value for these features. and Length- Selected-Features, assigning them the highest

21760 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

TABLE 6. Performance metrics for IDS algorithms.

TABLE 7. The three weight combinations.

FIGURE 14. Weighted score of algorithms across the three combinations.

weights. Test Score measures a model’s accuracy in pre- model’s balance between Precision and Recall, which sug-
dicting unseen data, so unsurprisingly, it has been given gests that the user values a model well at identifying true
high priority. Meanwhile, Length-Selected-Features refers positives and avoiding false positives. In this combination,
to the number of features selected by the feature selection Length- Selected-Features is given a lower weight, which
algorithm, and placing high weight on this suggests a desire may indicate a desire to keep the feature space relatively
for a high-performing and efficient model. small to improve efficiency and avoid overfitting. Addition-
Additionally, Recall and Precision are given lower weights ally, Run-Time-FS weighs 0.2, higher than in Combination
in this combination. Recall measures a model’s ability to 1. This suggests that while model efficiency is still essential,
correctly identify positive instances (i.e. true positives), while achieving good overall performance is a higher priority.
Precision measures the proportion of true positives among all
positive predictions. Giving these lower weights suggests that 3) COMBINATION 3
the user may value a balanced model that does not prioritize This combination of ([0.3, 0.2, 0.2, 0.2] and [−0.6, −0.1]])
sensitivity or specificity over the other. Finally, Run-Time- as seen in third row of Table 5 places the highest weight
FS has the lowest weight of 0.1, which suggests that model on Length-Selected-Features, indicating a preference for a
efficiency is still important but has yet to be the highest simple and efficient model. This suggests that the user values
priority. This weight value indicates that the user wants to a model that is easy to understand and use, even if it sacrifices
balance model performance with the resources required to some performance. Test Score and F1 Score are given lower
train and run the model. weights, which indicates a willingness to sacrifice a small
amount of accuracy in favour of simplicity. Recall and Preci-
2) COMBINATION 2 sion are given equal weight in this combination, suggesting a
This combination of ([0.4, 0.4, 0.1, 0.1] and [−0.3, −0.2]]) desire for a balanced model that performs well in sensitivity
as seen in the second row of Table 5 places equal weight on and specificity. Finally, Run-Time-FS has the lowest weight
Test Score and F1 Score, which indicates a preference for a of 0.1, indicating that model efficiency is still important but
model with good overall performance. F1 Score measures a not the highest priority.

VOLUME 12, 2024 21761


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

TABLE 8. High performers in each combination.

These three model combinations suggest that the user organizations consider algorithms with lower asymptotic
is considering trade-offs between model performance, effi- complexity, which are computationally efficient while still
ciency, and simplicity and is trying to find the best balance achieving satisfactory results regarding the desired metrics.
for their specific use case. The plot in Figure 14 shows the The research study provides a diverse set of algorithms, offer-
results of applying different weight combinations to perfor- ing multiple options that cater to the specific requirements of
mance metrics for different algorithms. Each line represents a different organizations.
combination of weights, and the x-axis shows the algorithms The algorithm that achieved the highest test score, with
ranked in order of their weighted score. The y-axis shows the an accuracy of 99.5787%, is one that incorporates fea-
weighted score, a composite score calculated based on the ture selection through Particle Swarm Optimization (PSO)
specified weights for each performance metric. with Random Forest in the fitness function. RF-PSO_RF
The significance of this plot is to provide a way to eval achieved the highest recall score, indicating its ability to
uate and compare different algorithms based on multiple detect a large proportion of actual intrusions accurately.
criteria simultaneously. By assigning weights to different The algorithms RF-GWO_DT, RF-GA_GNB, RF-GA_LR,
performance metrics, we can prioritize certain aspects of DT-GWO_DT, LR-GWO_LR exhibit high stacks in Test
the model’s performance over others, and the weighted Score, F1-Score, Recall-Score, and Precision-Score, indi-
score reflects this overall evaluation. This plot can help us cating strong performance across different metrics. The
decide which algorithm to choose based on our priorities and algorithms RF-GWO_RF, RF-GWO_DT, DT-GWO_RF, and
preferences. GNB-GWO_DT exhibit high test scores and a consistent fea-
By examining the plot, we can assess the performance ture selection length below 10. Regarding time complexity,
of the algorithms for the three use cases indicated by the GWO_GNB stands out with the lowest feature selection time
combinations. RF-GWO_DT consistently achieves the high- of 34.18 seconds and a respectable test score of 93%. On the
est scores across all three combinations. For Combination 1, other hand, GWO_LR demonstrates an impressive test score
RF-GWO_DT obtains a score of 97%, while for Combination of 98%, but it is accompanied by one of the most extended
2 and Combination 3, it achieves 96% and 85%, respectively. feature selection times, clocking in at 5529.18 seconds.
This information allows us to compare the performance of Furthermore, our research explores all possible combina-
different algorithms under different weight configurations. tions of weights/ tradeoffs for each criterion (Test Score,
RF-GWO_DT stands out as a high-performing algorithm F1 Score, Recall, Precision, Length-Selected-Features, and
across all three use cases, demonstrating its consistency and Run-Time- FS) and then calculates the weighted score for
effectiveness in various scenarios. each algorithm using the current combination of weights.
By examining the plot, we can assess the performance of the
VII. CONCLUSION AND FUTURE WORK algorithms for the three use cases indicated by the combina-
This research study explored various combinations of tions. RF-GWO_DT consistently achieves the highest scores
machine learning-assisted metaheuristics for modeling an across all three combinations.
intrusion detection system (IDS). By leveraging different For Combination 1, RF-GWO_DT obtains a score of 97%,
algorithms and their combinations, the study aimed to pro- while for Combination 2 and Combination 3, it achieves
vide organizations with various options to meet their specific 96% and 85%, respectively. This information allows us to
requirements. compare the performance of different algorithms under dif-
Large organizations with ample resources and com- ferent weight configurations. RF-GWO_DT stands out as a
putational capabilities have the flexibility to choose the high-performing algorithm across all three use cases.
highest-performing algorithm regardless of its time and space While the current research has addressed many optimiza-
complexity. These organizations can invest in high-end com- tion areas, several potential avenues for future work could
putation facilities and leverage complex algorithms that offer further enhance the IDS systems. These areas of future scope
superior performance in desired metrics. On the other hand, aim to explore additional dimensions and challenges to con-
the study recognizes that not all organizations have the same tinue refining and expanding the capabilities of IDS systems.
resources and capital, particularly those just starting up. The following points highlight some of these areas:
These organizations may need more computational resources • Efficient search-related metaheuristic algorithm: One
and funding. Therefore, the study recommends that such avenue for future work involves developing more

21762 VOLUME 12, 2024


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

efficient search-related metaheuristic algorithms. These [10] H. Nkiama, S. Zainudeen, and M. Saidu, ‘‘A subset feature elimination
algorithms would focus on finding optimal hyperparam- mechanism for intrusion detection system,’’ Int. J. Adv. Comput. Sci. Appl.,
vol. 7, no. 4, 2016.
eters in machine learning models. By improving the [11] X. Gao, C. Shan, C. Hu, Z. Niu, and Z. Liu, ‘‘An adaptive ensemble
search process, the accuracy and computation time of machine learning model for intrusion detection,’’ IEEE Access, vol. 7,
the models could be significantly enhanced. This would pp. 82512–82521, 2019.
[12] S. Saif, P. Das, S. Biswas, M. Khari, and V. Shanmuganathan, ‘‘HIIDS:
contribute to more optimized IDS performance. Hybrid intelligent intrusion detection system empowered with machine
• Global Maxima for F1 Score in the Fitness Function: learning and metaheuristic algorithms for application in IoT based health-
Another area of exploration is redefining the fitness care,’’ Microprocessors Microsyst., vol. 2022, Aug. 2022, Art. no. 104622.
[13] N. Kunhare, R. Tiwari, and J. Dhar, ‘‘Intrusion detection system using
function used in metaheuristic algorithms. Instead of hybrid classifiers with meta-heuristic algorithms for the optimization and
solely optimizing for accuracy, the fitness function could feature selection by genetic algorithm,’’ Comput. Electr. Eng., vol. 103,
be modified to optimize for the global maxima of the Oct. 2022, Art. no. 108383.
[14] R. Ghanbarzadeh, A. Hosseinalipour, and A. Ghaffari, ‘‘A novel net-
F1 score or other case-specific objectives. This would work intrusion detection method based on metaheuristic optimisation
allow the IDS to prioritize detection and classifi- cation algorithms,’’ J. Ambient Intell. Humanized Comput., vol. 14, no. 6,
performance beyond simple accuracy measure- ments, pp. 7575–7592, Jun. 2023.
[15] N. Kunhare, R. Tiwari, and J. Dhar, ‘‘Particle swarm optimization and
resulting in more robust and reliable intrusion detection. feature selection for intrusion detection system,’’ Sādhanā, vol. 45, no. 1,
• Ensemble Learning Techniques: Investigating ensem- pp. 1–14, Dec. 2020.
ble learning techniques could be another area of future [16] K. Saraswathi, S. Gayathridevi, N. T. Renukadevi, and M. S. D. Kumar,
‘‘Intrusion detection system using metaheuristic algorithm,’’ in Proc. 2nd
work. Ensemble methods, such as bagging, boosting, Global Conf. Advancement Technol. (GCAT), Oct. 2021, pp. 1–11.
or stacking, can combine multiple classifiers to improve [17] T. E. Abrudan, J. Eriksson, and V. Koivunen, ‘‘Steepest descent algorithms
overall prediction accuracy and robustness. Exploring for optimization under unitary matrix constraint,’’ IEEE Trans. Signal
Process., vol. 56, no. 3, pp. 1134–1147, Mar. 2008.
the effectiveness of ensemble methods in the context of [18] K. Choi, D.-H. Jang, S.-I. Kang, J.-H. Lee, T.-K. Chung, and H.-S. Kim,
the IDS being studied could lead to even better detection ‘‘Hybrid algorithm combing genetic algorithm with evolution strategy for
and classification results. antenna design,’’ IEEE Trans. Magn., vol. 52, no. 3, pp. 1–4, Mar. 2016.
[19] M. M. Fouad, A. I. El-Desouky, R. Al-Hajj, and E. M. El-Kenawy,
Further research can aid in developing intrusion detection
‘‘Dynamic group-based cooperative optimization algorithm,’’ IEEE
systems by investigating these different aspects. This includes Access, vol. 8, pp. 148378–148403, 2020.
improving ensemble learning, real-time detection, scalability, [20] S. Sengupta, S. Basak, and R. Peters, ‘‘Particle swarm optimization: A sur-
explainability, and detecting adversarial attacks, among other vey of historical and recent developments with hybridization perspectives,’’
Mach. Learn. Knowl. Extraction, vol. 1, no. 1, pp. 157–191, Oct. 2018.
things. These explorations can improve the IDS’s abilities and [21] S. Mirjalili, S. M. Mirjalili, and A. Lewis, ‘‘Grey wolf optimizer,’’ Adv.
usefulness in effectively identifying and preventing intrusions Eng. Softw., vol. 69, pp. 46–61, Mar. 2014.
in demanding situations. [22] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur.
Defense Appl., Jul. 2009, pp. 1–6.
REFERENCES [23] S. Stolfo, D. W. Fan, W. Lee, A. Prodromidis, and P. Chan, ‘‘Credit card
fraud detection using meta-learning: Issues and initial results,’’ in Proc.
[1] I. Butun, S. D. Morgera, and R. Sankar, ‘‘A survey of intrusion detec-
AAAI Workshop Fraud Detection Risk Manage., 1997, pp. 83–90.
tion systems in wireless sensor networks,’’ IEEE Commun. Surveys Tuts.,
[24] L. Dhanabal and S. P. Shantharajah, ‘‘A study of NSL-KDD dataset for
vol. 16, no. 1, pp. 266–282, 1st Quart., 2014.
intrusion detection system based on classification algorithms,’’ Int. J. Adv.
[2] E. Gyamfi and A. D. Jurcut, ‘‘Novel online network intrusion detection
Res. Comput. Commun. Eng., vol. 4, no. 6, pp. 446–452, 2015.
system for industrial IoT based on OI-SVDD and AS-ELM,’’ IEEE Internet
[25] W. Long, J. Jiao, X. Liang, and M. Tang, ‘‘Inspired grey wolf optimizer for
Things J., vol. 10, no. 5, pp. 3827–3839, Mar. 2023.
solving large-scale function optimization problems,’’ Appl. Math. Model.,
[3] X.-S. Yang, Nature-Inspired Metaheuristic Algorithms, 2nd ed. U.K.: Univ.
vol. 60, pp. 112–126, Jan. 2018.
of Cambridge, 2010.
[26] P. Bedi, N. Gupta, and V. Jindal, ‘‘I-SiamIDS: An improved siam-IDS for
[4] R. Alkanhel, E.-S. M. El-kenawy, A. A. Abdelhamid, A. Ibrahim, handling class imbalance in network-based intrusion detection systems,’’
M. A. Alohali, M. Abotaleb, and D. S. Khafaga, ‘‘Network intrusion detec- Appl. Intell., vol. 51, no. 2, pp. 1133–1151, 2021.
tion based on feature selection and hybrid metaheuristic optimization,’’
[27] A. Majeed, ‘‘Improving time complexity and accuracy of the machine
Comput., Mater. Continua, vol. 74, no. 2, pp. 2677–2693, 2023.
learning algorithms through selection of highly weighted top k features
[5] A. K. Dey, G. P. Gupta, and S. P. Sahu, ‘‘Hybrid meta-heuristic based from complex datasets,’’ Ann. Data Sci., vol. 6, no. 4, pp. 599–621,
feature selection mechanism for cyber-attack detection in IoT-enabled Dec. 2019.
networks,’’ Proc. Comput. Sci., vol. 218, pp. 318–327, Jan. 2023.
[6] W. A. H. M. Ghanem, Y. A. B. El-Ebiary, M. Abdulnab, M. Tubishat,
N. A. M. Alduais, A. B. Nasser, N. Abdullah, and O. A. Alwesabi,
‘‘Metaheuristic based IDS using multi-objective wrapper feature selection
and neural network classification,’’ in Advances in Cyber Security. Cham,
Switzerland: Springer, Dec. 2021, pp. 384–401.
[7] A. Nazir and R. A. Khan, ‘‘A novel combinatorial optimization based
feature selection method for network intrusion detection,’’ Comput. Secur., AGANITH SHANBHAG received the Graduate
vol. 102, Mar. 2021, Art. no. 102164. degree in electronics and communication engi-
[8] R. SaiSindhuTheja and G. K. Shyam, ‘‘An efficient metaheuristic neering from the Manipal Institute of Technology,
algorithm based feature selection and recurrent neural network for DoS Manipal. He is currently the Manager of Analyt-
attack detection in cloud computing environment,’’ Appl. Soft Comput., ics with ACT Fibernet, Bengaluru. His research
vol. 100, Mar. 2021, Art. no. 106997. interests include design development and data
[9] D. Stiawan, A. Heryanto, A. Bardadi, D. P. Rini, I. M. I. Subroto, analytics.
M. Y. B. Idris, A. H. Abdullah, B. Kerim, and R. Budiarto, ‘‘An approach
for optimizing ensemble intrusion detection systems,’’ IEEE Access, vol. 9,
pp. 6930–6947, 2021.

VOLUME 12, 2024 21763


A. Shanbhag et al.: Leveraging Metaheuristics for Feature Selection With ML Classification

SHWETA VINCENT (Member, IEEE) received OM PRAKASH KUMAR is currently a Senior


the Ph.D. degree in computer science engineer- Assistant Professor with the Department of Elec-
ing from the Karunya Institute of Technology and tronics and Communication Engineering, Mani-
Sciences. pal Institute of Technology, Manipal Academy
She has over 15 years of work experience of Higher Education, Manipal. He is an active
in academia and industry. She is a passionate Researcher in RF and microwave technology.
Researcher and an Educator in the field of machine He has published around 38 peer-reviewed articles
learning, optimization, and wireless communica- in reputable international journals and confer-
tions. Currently, she is an Associate Professor with ences. He serves as a Reviewer for journals, such
the Manipal Institute of Technology (MIT), where as the AEU-International Journal of Electronics
she teaches and mentors undergraduate and postgraduate students, conducts and Communications, Engineering Science and Technology, an International
research projects and publishes scholarly articles. She also has expertise in Journal JESTECH, Ain Shams Engineering Journal, International Journal of
geographic information systems (GIS), artificial intelligence (AI), and C Electronics (Taylor & Francis), and Heliyon (Elsevier).
programming.

SHARMILA ANAND JOHN FRANCIS received


S. B. BORE GOWDA is currently a Professor with the Doctorate of Philosophy degree in computer
the Department of Electronics and Communica- science and engineering/computer applications,
tion Engineering, Manipal Institute of Technology, in 2010. She is currently the Head of the Scientific
Manipal. His research interests include wireless Research Unit, Rajal Almaa Campus, King Khalid
sensor networks, embedded design, cryptography, University, Saudi Arabia. Her specialized area is
and network security. He has published several mobile ad hoc networks, sensor networks, mesh
journals and conference papers in reputed journals. networks, and cloud computing. She is also guid-
ing seven Ph.D. students and some master’s degree
students in the areas of computer networks.

21764 VOLUME 12, 2024

You might also like