0% found this document useful (0 votes)

8 views

Predictive Analysis of Network-Based Attacks by Hybrid Machine Learning Algorithms Utilizing Bayesian Optimization Logistic Regression and Random Forest Algorithm (1)

This study presents a predictive analysis of network-based attacks using hybrid machine learning algorithms, specifically focusing on Bayesian Optimization-enhanced Random Forest for binary classification and a hybrid Logistic Regression and Random Forest for multiclass classification. The research evaluates these models on the KDD99 and NSL-KDD datasets, demonstrating their effectiveness in improving intrusion detection accuracy and adaptability. The findings contribute to the advancement of cybersecurity methodologies and offer practical solutions for real-world intrusion detection scenarios.

Uploaded by

viji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Predictive Analysis of Network-Based Attacks by Hybrid Machine Learning Algorithms Utilizing Bayesian Optimization Logistic Regression and Random Forest Algorithm (1)

Uploaded by

viji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Received 25 August 2024, accepted 17 September 2024, date of publication 20 September 2024, date of current version 9 October 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3464866

Predictive Analysis of Network-Based Attacks by

Hybrid Machine Learning Algorithms Utilizing
Bayesian Optimization, Logistic Regression,
and Random Forest Algorithm
MANISANKAR SANNIGRAHI AND R. THANDEESWARAN
School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India
Corresponding author: R. Thandeeswaran ([email protected])

ABSTRACT These days, intrusion detection systems are one of the newest trends in society. These
technologies serve as a defense against a variety of security breaches, the number of which has been
rising recently. The need for adaptive security solutions is pressing since the sorts of attacks that arise are
ever-changing. This study aims to enhance the performance of intrusion detection models on the KDD99
and NSL-KDD datasets through advanced optimization techniques. By addressing challenges related to
evolving attack strategies and intricate tasks, the research introduces innovative machine learning approaches
tailored for intrusion detection, focusing on both binary and multiclass classification scenarios. The study
employs a Bayesian Optimization-enhanced Random Forest (BO_RF) algorithm for binary classification
and a hybrid Logistic Regression and Random Forest (LR_RF) algorithm for multiclass classification. Our
models were implemented and evaluated in a Jupyter Notebook environment using key metrics: Accuracy,
Precision, Recall, and F1-Score. For binary classification, eight metrics were assessed, while twenty-six
were analyzed for multiclass classification across both datasets. The results demonstrate the effectiveness
of the proposed approaches in both classification types, highlighting their potential for robust and adaptable
intrusion detection. Theoretical contributions include advancing the understanding of intrusion detection
methodologies and the effectiveness of machine learning algorithms in cybersecurity. From a practical
perspective, the proposed model can offers a robust and adaptable solution for real-world intrusion detection
scenarios, potentially minimizing security breaches and enhancing overall cyber security posture.

INDEX TERMS Machine learning (ML), network attacks, classification, intrusion detection system (IDS).

I. INTRODUCTION the largest barrier to network communication these days

In recent years, a lot of important data has been transmitted is infiltration. An intrusion detection system scans network
via the Internet to computers or portable devices. It ulti- traffic for malicious activity and notifies the administrator
mately results in a rise in attacks against vital information when it finds it. Intrusion Detection System architectures
network systems. Thus, network security has emerged as the primarily fall into three categories: machine learning-based
key concern in thwarting attacks against confidential data techniques [2], statistical methods [3], and knowledge-based
on networks [1]. Various techniques and strategies related to systems [4]. Machine learning-based techniques are primarily
network security are examined. Gaining access control over employed to detect and prevent attacks by sensing irregu-
the network is the most popular method of security. However, larities in network traffic [5]. There are two methods for
detecting network attacks: anomaly detection and misuse
The associate editor coordinating the review of this manuscript and detection. The process of anomaly detection involves con-
approving it for publication was Hosam El-Ocla . trasting regular network traffic with system notifications.
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 142721
M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

Anomaly detection establishes a typical pattern of network and limitations. Section IV introduces the datasets utilized in
traffic to identify new assaults, while misuse detection relies the research, including details about the KDD99 cup dataset
on already-registered indications in the system to identify and the NSLKDD dataset. Data visualization and preprocess-
known attacks ing techniques are elaborated upon in Section V, aimed at
An intrusion detection system built using machine learning facilitating a better understanding of the datasets. Section VI,
techniques makes use of logs of standard network activity presents the proposed framework designed to enhance the
as well as data collected from network traffic. It assists in interpretability of any Intrusion Detection System (IDS).
locating known attacks as well as unidentified ones that have This section is further divided into two parts, addressing
not yet been located. Machine learning models can detect binary classification and multiclass classification, respec-
attacks or respond to threats in real time. Machine learn- tively. It encompasses the experiments conducted using the
ing is suited for both large-scale network system analysis NSL-KDD and KDD99 datasets, along with a detailed pre-
and monitoring, as well as the efficient handling of large sentation of the results. Finally, Section VII offers concluding
volumes of data [6]. It can be applied to analyze user and remarks and outlines potential future research directions.
system behavior to spot insider threats or unapproved access.
Machine learning techniques can be used to connect complex II. RELATED WORK
attacks with numerous stages or to dynamically alter network There are a variety of reasons behind the significance of
segments in response to a recognized danger. These models Machine learning in intrusion detection. It is useful for
have numerous drawbacks in addition to their advantages. securing the integrity of computer networks and systems.
The report omits essential material. Certain models merely Cavusoglu [8], proposed a hybrid IDS system by combining
forecast the existence of an assault, not its kind. The second different feature selection and machine learning algorithms
is the restriction of poor detection performance for low data over the NSL-KDD dataset. The performance is demon-
frequency. The primary cause of this is an unbalanced dataset, strated by comparing the proposed system with other studies,
wherein certain attacks have significantly more cases than it is shown that the proposed system has a low false positive
others. As such, it is exceedingly challenging for any model rate and high accuracy. Li et al. [9], proposed a hybrid method
to detect attacks with low frequency [7]. based on K-NN and binary classification which achieved
The research work makes several notable contributions to suitable results over the NSL-KDD dataset. The model is
the area of intrusion detection and cyber security. evaluated by comparing five other learning techniques. The
• Given the increasing complexity and diversity of cyber result showed that the proposed method performs better than
threats, there’s a crucial need for security solutions that all other baselines in different evaluation criteria.
can adapt effectively. To address this demand, the pro- Al-Khassawneh [10] in this study, the effectiveness of
posed methods utilize advanced optimization techniques different classification algorithms in detecting anomalies in
and hybrid algorithms capable of adjusting dynamically network traffic patterns is evaluated using the NSL-KDD
to evolving attack scenarios. dataset. Additionally, the relationship between hacker attacks
• The study introduces innovative intrusion detection and commonly used network protocols is investigated to
approaches based on machine learning algorithms, understand how attackers generate abnormal network traffic.
to improve detection accuracy and adaptability. Through The proposed model enhances IDS precision and suggests
the utilization of a Random Forest with Bayesian opti- new research directions in the field. Fuhnwi et al. [11] in
mization (BO_RF) algorithm for binary classification this study, an approach for Network Intrusion Detection
and a hybrid Logistic Regression and Random For- Systems (NIDS) using XGBoost and Recursive Feature Elim-
est (LR_RF) algorithm for multiclass classification, the ination (RFE) is proposed. Evaluation of the NSL-KDD
research presents fresh strategies for addressing intru- dataset demonstrates superior performance in detecting vari-
sion detection challenges. ous attack types, with XGBoost outperforming other machine
• Extensive evaluation and comparison of the proposed learning algorithms and achieving high classification accu-
algorithms with existing methods, such as Random For- racy. Vibhute et al. [12] the study focuses on developing a
est, Logistic Regression, Naïve Bayes, and SVM, are network based IDS using NSL-KDD datasets. Utilizing an
conducted using the KDD99 and NSL-KDD datasets ensemble learning-enabled random forest algorithm, features
over Jupyter Notebook. This thorough analysis provides were selected. Three machine learning models - KNN, logis-
valuable insights into the effectiveness of the proposed tic regression, SVM and - achieved validation accuracies
models over traditional models. of 98.24%, 88.86%, and 87.58%, respectively, indicating
Overall, the research facilitates the development of more applicability for real-time monitoring and detection of cyber-
efficient, adaptable, and effective intrusion detection systems, attack. Shehadeh et al. [13] the study evaluate intrusion
thereby bolstering overall cybersecurity resilience. detection using Random Forest, KNN, and Naïve Bayes
The subsequent sections of the paper are structured as on three datasets (KDDCUP-99, UNSW-NB15, and NSL-
follows. In Section II, outlines the prior research conducted KDD). Random Forest proves the most reliable. Limitations
on this subject. In Section III, different machine learning include dataset quantity and focusing solely on classification
algorithms are discussed, along with their respective strengths algorithms. Future research could explore additional data

142722 VOLUME 12, 2024

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

mining techniques like Neural Networks and analyze spe- Naive Bayes operates under the assumption of feature
cific dataset intricacies for improved algorithm performance. independence, which may not be valid for intricate relation-
Unlike previous studies that primarily relied on traditional ships within network data. Its performance diminishes when
algorithms, this paper introduces the application of hybrid dealing with highly correlated features, potentially result-
algorithms. Earlier research often focused exclusively on ing in inflated probability estimates. Moreover, Naive Bayes
binary classification and limited their evaluation to one or two encounters difficulties with continuous features, particularly
metrics. In contrast, this study incorporates both binary and if the underlying distribution is not accurately represented by
multiclass classification, and conducts a more comprehensive the selected probability distribution [19].
analysis by considering all four evaluation metrics.
B. SUPPORT VECTOR MACHINE (SVM)
III. MACHINE LEARNING ALGORITHMS SVM is an abstract algorithm of Machine Learning that learns
Machine Learning is a branch of Artificial Intelligence that is by training over a specific type of dataset for making accurate
the best way to accomplish specific goals by simulation. It can predictions and conceptualizing the remaining data. SVMs
take the result of previous experiences as instructions for belong to the supervised class of Machine Learning that
future operations without being explicitly programmed [14]. are mainly utilized for analyzing data, pattern recognition,
Machine Learning can be classified into three major type’s regression, and classification analysis [20]. The main objec-
supervised, unsupervised, and semi-supervised learning [15]. tive of a SVM is to find a hyperplane in an N-dimensional
If the target labels and classes are known before execution space where data points can distinctly be divided into two
then it is called Supervised Learning. If the target class is categories.
Linear kernel function of SVM—— x,x′

unknown then this learning is called Unsupervised. The learn-
Polynomial function of SVM————, γ x,x′ +r where

ing which is a combination of supervised and unsupervised
methods of learning is called Semi-supervised Learning. d= specified parameter degree.
In this study, supervised learning algorithms are employed Radial basis function of SVM———–exp(−γ ||x − x ′ ),
due to the availability of labeled datasets, facilitating the where γ = specified parameter gamma that is always more
classification task. The proposed techniques are hybrids, than zero.
Sigmoid function of SVM———tanh(γ x, x ′ + r), r =

combining the strengths of multiple algorithms to overcome
limitations observed in existing approaches. Each existing Coefficient of θ
algorithm brings its unique advantages and drawbacks to the The general working structure of SVM is shown in
table, which are carefully considered and addressed [16]. figure (1).
Below is a concise analysis of the methods evaluated in this
study.

A. NAÏVE BAYES
This algorithm can be explained as a probabilistic classifier
which is obtained from the application of Bayes Theorem,
which is an equation of statistical quantities that explains
the relationship between conditional probabilities [17]. The
naïve Bayes classification is very useful because they are very
fast and simple classification algorithm in datasets of high
dimension.
The likelihood of an outcome based on the previous out-
come that has occurred in similar circumstances is known as FIGURE 1. SVM classification.

Bayes theorem [18]. In equation (1), a relationship between

y = Class variable, & x1 . . . . . . xn = Dependent Vector of Support Vector Machines (SVMs) might face scalabil-
features stated by Bayes theorem. ity issues, particularly with large datasets, due to quadratic
optimization requirements relative to the number of training
P(y)P(x1 , . . . . . . ., xn ) examples. Their effectiveness relies heavily on selecting the
P(y|x1 , . . . . . . ., xn ) = (1)
P(x1 , . . . . . . ., xn ) appropriate kernel function, which can be difficult, partic-
ularly in high-dimensional spaces. Additionally, SVMs are
In equation (2), the estimation of P(x, | y) and P(y) is done
susceptible to noisy data and outliers, potentially impacting
by using maximum a priori (MAP)
their ability to generalize effectively [21].
Yn
ŷ = arg maxy P(y) P (xi | y) (2)
i=1 C. LOGISTIC REGRESSION
This algorithm can be explained as a probabilistic classifier Logistic regression accomplishes the tasks by predicting the
which is actually obtained from the application of Bayes probability of events, outcomes, or observations. It’s a statis-
Theorem. tical model which is used to build model-dependent variables

VOLUME 12, 2024 142723

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

by using a logistic function [22]. Given data (x, y) where x D. RANDOM FOREST
is (m x n) matrix with m samples and n attributes and y is a This algorithm creates and makes a forest random; this for-
vector of m examples. The weight matrix to create random est is a group of decision trees which are trained with a
initialization defined in equation (3). bugging method. The bagging method is the amalgamation
of the learning models to increase overall results [25]. This
a = w0 + w′1 x1 + w′1 x2 + . . . w′1 xn (3)
algorithm is useful for both regression and classification
Then pass the output a to link function which is formulated problems which are used to form most of the current systems
in equation (4). of machine Learning. The greater number of trees leads to
high accuracy and also prevents the overfitting problem.
ŷi = 1/(1 + e−a ) (4) There are formulas for the evaluation of random forest is
Then the cost function is calculated which derived in shown in equation (9) and (10)
equation (5). Xn
Gini Index = 1 − p2 (9)
i=1 i
1 Xi=m Xn
cost (w) = − yi log (yi ) Entropy = − pi log(pi ) (10)
m i=1 i=1
+ (1 − yi ) log 1 − ŷi

(5) Information Gain = E (Parent) – E (Parent | Child) or Gini
(Parent) – Gini (Parent | Child), Where Gini= Gini Index,
The updating of weights is done as per the derivative of the
E= Entropy, p= probability.
cost, the formula is shown in equation (6) and (7)
For the final evaluation majority/hard voting method is
used, the formula of this method is given in equation (11).
Xi=n
dwj = (ŷ − ŷi )xji (6)
i=1 ŷ = mode{C1 (x) , C2 (x) , . . . . . . .., Cm (x)} (11)
wi = wj − (adwj ) (7)
where ŷ = class label, Cm = set of classifier, the class label
Logistic regression is used to calculate the probability of of each classifier is predicted by majority voting.
a given data point set belonging to either class ‘0’ or ‘1’ a Random Forest models, despite their effectiveness, can
given value of w and x. The exponent function in the sigmoid pose challenges in interpretation due to their complexity,
function is justified because the probability must be more particularly with numerous trees and features. Although they
than zero. But to make the value less than one the numerator offer reduced overfitting risks compared to single decision
needs to be divided by a value bigger than it [23]. This trees, they still need careful tuning to avoid overfitting noisy
equation is divided by the numerator term then the sigmoid data [26]. Additionally, training a Random Forest model
function is obtained. The sigmoid function is expressed in can be computationally demanding, especially with sizable
equation (8). Figure (2) shows the curve of logistic regression. datasets or when employing a high number of trees.
P = 1/1+e−(w1 x1 +w2 x2 +...+wn xn ) (8) E. BAYESIAN OPTIMIZATION
Optimization is considered the heart of a machine learn-
ing model. Bayesian Optimization constructs a model of an
objective function based on probability for selecting hyper-
parameters for the evaluation of the objective function [27].
Bayesian Optimization varies from Grid and Random search
because it enhances the search speed by using past perfor-
mances where other methods are independent of previous
evaluations. It has two components one is the probabilistic
model another is the acquisition function. The probabilis-
tic model starts with the prior probability distribution for
optimization of the function. The acquisition function com-
FIGURE 2. Curve of logistic regression.
putes the posterior distribution of the function [28]. The next
sampling point is determined by maximizing the acquisition
Logistic Regression relies on the assumption of a function. The objective function is defined in equation (12).
linear relationship between features and the log odds Xt = argmaxx u(X|D1:t−1 ) (12)
of the target variable, potentially overlooking complex
non-linear relationships within the data. Challenges arise with where, u= acquisition function, D1:t−1 = the total t samples
high-dimensional or feature-rich datasets, as Logistic Regres- There are mainly three main types of acquisition functions,
sion may struggle to achieve optimal performance [24]. first upper confidence bound is defined in equation (13).
Moreover, Logistic Regression may exhibit poor results when
UCB[x∗ ] = µ x∗ + β 1/2 σ [x ∗ ]

dealing with imbalanced datasets. (13)

142724 VOLUME 12, 2024

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

Second, probability of improvement which is defined in TABLE 1. Classification of the KDD99 dataset.
equation (14).
Z ∞
PI x∗ = Normf[x ∗ ] µ x ∗ σ x ∗ df [x ∗ ]

(14)
f [ẋ]
Third, expected optimization is defined in equation (15)
Z ∞
(f x∗ − f [ẋ])Normf[x∗ ] [µ x∗ σ x∗ df[x∗ ]
∗
EI x =
f[ẋ]
(15)

(14), and (15), µ x = mean value of

∗
In equation (13),
data point x, σ x ∗ = variance value of data point x, β =
controlling parameter of the degree of exploration, f x ∗ =
normal distribution, f [ẋ] = current maxima.
Bayesian optimization offers strengths such as efficient
optimization of black-box functions and the ability to han-
dle noisy or expensive objective functions. However, it may
face limitations in high-dimensional search spaces and when
dealing with discontinuous or non-smooth functions [29].
Additionally, Bayesian optimization’s performance heavily
relies on the choice of surrogate model and acquisition
function, which may impact its effectiveness in different
scenarios.
TABLE 2. Classification of the NSLKDD dataset.
IV. DATA DESCRIPTION
There are many datasets available that are considered as a
benchmark and are used for analysis of different Machine
Learning models and security solutions [30]. In this paper,
we have used KDD99 and NSLKDD datasets.

A. KDD99 DATASET
In 1999, the International Knowledge Discovery and Data
Mining Tools Competition was organized to collect traffic
records, an environment had been set up by Lincoln Labs
for acquiring raw TCP dump of 9 weeks from a LAN of
the US Air Force [31]. The size of the training data is 4 GB
collected from 7 weeks, which then processed into 5 million
connections and the test data is around 2 million connections
collected from 2 weeks. The attack types explored here are
R2L. U2R, DoS, Probing. The description of the dataset is
given in table 1.

B. NSLKDD DATASET
In 2009, the NSL-KDD dataset was brought as a cleaned and
revised version of the KDD99 dataset by the University of
New Brunswick. The dataset contains 43 features for each
record, where 41 are traffic input data, the other two are labels
i.e. traffic is normal or malicious, and Scores i.e. severity
of traffic input [32]. There are 4 different attack classes in
the dataset: DoS, R2L, U2R, and Probing. There are four of the data provides information about the data. Two cru-
types of features in the dataset: 4 Categorical, 6 Binary, 10 cial elements of data processing are normalization and
Continuous, and 23 Discrete. The features of the dataset is standardization [33]. To get more significant results, the
explained in table 2. size of a dataset grows exponentially as the number of
characteristics increases. This causes an overfitting issue,
V. DATA EXPLORATION which lengthens computation times and lowers model accu-
Visualization is the most effective method for comprehend- racy. The following describes the methods applied in this
ing data and how it functions. Knowing the distribution paper.

VOLUME 12, 2024 142725

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

A. NORMALIZATION AND STANDARDIZATION the Jupyter Notebook environment [38]. There are mainly two
Normalization is a method of arranging data to reduce types of classification techniques used in machine learning
duplication where data points are scaled between zero and one binary and the other multiclass classification.
one [34]. It is utilized to remove undesired characteris-
tics from a dataset. Mathematically it is represented in
equation (16). A. BINARY CLASSIFICATION
The process of classification where the data is divided
Xnew = X−Xmin Xmax − Xmin (16) into two classes or groups is called as binary classifica-
The method of restructuring data in uniform format is tion [39]. The two classes are labelled as either 0 or 1.
called data standardization. It compares the data points by There are many machine learning algorithms used to do
putting them on the same scale. This process is also called binary classification like SVM, logistic regression, random
Z-score. Mathematically it is represented in equation (17). forest, etc. PCA is utilized to reduce the dimensional-
ity of the dataset, the Bayesian optimization technique is
Z =X−µ σ

(17) employed to identify the optimal parameter configuration
where, µ = Mean of the data points, σ = Standard Deviation for improved results. Bayesian techniques employ histor-
ical evaluation data to construct a probabilistic model,
B. PRINCIPAL COMPONENT ANALYSIS (PCA) termed a ‘‘surrogate,’’ mapping hyperparameters to the
It is a method to examine interrelations between variables of a likelihood of achieving a score on the objective function.
dataset [35]. The algorithm uses an orthogonal transformation This surrogate function simplifies optimization compared
to convert correlated variables into uncorrelated variables. to the actual objective function. By iteratively updat-
Principal component analysis is used to avoid overfitting ing the surrogate probability model with each evaluation,
problems by reducing the dimensionality of the original Bayesian reasoning aims to refine predictions and improve
dataset and transforming it to lower lower-dimensional accuracy.
dataset preserving most of the actual sample information [36]. The random forest technique has eighteen hyperparam-
The variance of the low-dimensional dataset is greater than eters, three of which—max_smaples, max_features, and
the higher-dimensional dataset. Figure (3) gives a general n_estimators—are selected for optimization. In the end, the
overview of how PCA works. attack kinds are classified using an optimized Random Forest
approach. The Random Forest method heavily relies on key
hyperparameters such as max_samples, max_features, and
n_estimators to govern the model’s performance, complexity,
and generalization capability. These hyperparameters control
the bootstrap sample size, feature selection randomness, and
forest count, respectively. Fine-tuning these hyperparame-
ters is crucial for achieving optimal performance, balancing
model complexity, and ensuring generalization ability in
Random Forest models [40].
The framework for binary classification is illustrated
in figure (4). Initially, the datasets undergo standardiza-
tion and normalization, following the processes outlined in
equations (16) and (17). Then, PCA is employed for dimen-
sionality reduction. Following these preparatory steps, the
FIGURE 3. Principal component analysis.
proposed algorithm, along with other traditional algorithms,
is applied for final analysis.
VI. PROPOSED FRAMEWORK Traditional algorithms like SVM and Logistic Regression
A computer system will always have vulnerabilities, which may struggle with imbalanced datasets, where one class dom-
opens the door to different network attacks attempting to inates the other. In contrast, the Random Forest method,
compromise system integrity. For network security purposes, optimized with Bayesian optimization, can address this issue
it is more crucial to identify the sort of attack than whether by dynamically balancing class distribution during training,
an assault has taken place. Any network administrator must ensuring adequate representation of rare attack types. While
constantly have accurate information to take the neces- Random Forest models are robust to noise, they may overfit
sary precautions to safeguard computer infrastructure [37]. without proper tuning [41]. Bayesian optimization helps opti-
We presented a hybrid approach in this paper that outperforms mize Random Forest hyperparameters, such as tree count and
existing approaches in identifying network threats. The pro- depth, mitigating overfitting and enhancing generalization,
posed algorithm experiments over the KDD99 and NSLKDD particularly in noisy environments. Moreover, Random Forest
datasets to evaluate detection performance. The ratio of train- with Bayesian optimization, offers adaptability by updating
ing and testing dataset is 70:30, the implementation is done in model parameters over time, enabling continuous learning

142726 VOLUME 12, 2024

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

FIGURE 4. Framework of the Proposed algorithm for binary classification.

Algorithm 1 The Proposed Bayesian-Based Random Forest TABLE 3. Classification of KDD99 dataset.
(BO_RF) Algorithm
Input: The dataset be X = [X1 , X2 . . . . . . .Xn ]
The Target variables Y = [Y1 , Y2 . . . Ym ]
Output: Classification report for each target variable.

1: Initialize the dataset X = [X1 , X2 . . . . . . .Xn ],

Target variables Y = [Y1 , Y2 . . . Ym ],
iteration i
2: Compute objective function by using Bayesian
optimization
Xt = argmaxx u(X|D1:t−1 )
3: Compute acquisition function to select best
parameters
R ∞ ∗
EI x∗ = f[ẋ] (f x − f [ẋ])Normf[x∗ ] [µ x∗ , σ x∗ df[x∗ ]

4: Set i = 0 proposed algorithm as shown in algorithm (1). Traditional

5: while i < n do methods use a series of equations to carry out their analy-
6. Compute entropy and information gain
Pn ses which are mentioned in Section III, thereby establishing
Entropy = − pi log (pi ) a foundational benchmark for comparison. In contrast, the
i=1
Information Gain = E (Parent) – proposed algorithm introduced an advanced optimization
E (Parent | Child) process, utilizing equations (12) and (15) to refine classifica-
7: Set i = i + 1 tion accuracy and efficiency. The final classification process,
8: Select the most voted prediction by using the which involved equations (9)-(11), presents a comprehen-
majority/hard voting technique
ŷ = mode C1 (x) , C2 (x) , . . . . . . .., Cm (x) sive analysis that integrates standard methodologies with
9: end while innovative optimization techniques, ultimately achieving
10: Return the classification report for each target superior performance in detecting and categorizing network
variable intrusions.
The classification report obtained by the proposed
algorithm over the KDD99 dataset is given in table 3.
and adaptation to evolving data distributions and new attack The comparison of different machine learning algorithms
patterns [42]. to the proposed algorithm based on different evaluation crite-
The classification of the KDD99 and NSL-KDD datasets ria is tabulated in table 3 and shown in figure (5). From these,
was conducted using both traditional algorithms and the we can easily conclude the superiority of BO_RF algorithm

VOLUME 12, 2024 142727

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

over traditional methods in all the aspects of evaluation crite- work together effectively, leveraging optimization efficiency
ria. and model robustness to develop efficient machine learning
models for real-time scenarios [44].

Algorithm 2 The Proposed Grid Search-Based Hybrid Logis-

tic Regression and Random Forest (LR_RF) Algorithm
Input: The dataset be X = [X1 , X2 . . . . . . .Xn ]
The Target variables Y = [Y1 , Y2 . . . Ym ]

Output: Classification report for each target variable.

1: Initialize the dataset X = [X1 , X2 . . . . . . .Xn ],

Target variables Y = [Y1 , Y2 . . . Ym ],
FIGURE 5. Comparison over the KDD99 dataset. iteration i, j
2: Utilize GridSearchCV to find optimal parameters
The experimental results over the NSLKDD dataset are
3: Set i = 0
compared between existing techniques and the proposed tech-
4: while i < n do
nique is shown in table 4.
5: Compute weight matrix and link function
TABLE 4. Classification of NSLKDD dataset.
a =w0+ w‘1 x1 + w2‘ x2 + . . .w‘n xn
ŷi = 1 (1+e−a )
6. Compute cost function by utilizing link function
and weight matrixP
i=m
cost (w) = − m1 i=1 yi log (yi ) +
(1−yi ) log(1 − ŷi )
7: Calculate the sigmoid function
P =1 1 + e−(w1 x1 +w2 x2 +...+wn xn )
8: Set j = 0
9: while j < m do
10: Compute entropy
P and information gain
Entropy = − ni=1 pi log (pi )
The comparison of different machine learning algorithm Information Gain = E (Parent) –
to the proposed algorithm over NSLDD dataset is given in E(Parent | Child)
figure (6). Similar improvement over traditional methods is 11: Set j = j + 1
also observed here with BO_RF algorithm. 12: Select the most voted prediction by using hard or
majority voting technique
ŷ = mode{C1 (x) , C2 (x) , . . . . . . .., Cm (x)}
13: end while
14: Set i = i + 1
15: end while
16: Return the classification report for each target variable.

B. MULTICLASS CLASSIFICATION
The process of classification where data is divided into three
or more classes or categories is called multiclass classifica-
tion [45]. The classes are labelled from 0 to n-1 where n is
the total no of classes. A hybrid algorithm using Logistic
FIGURE 6. Comparison over the NSLKDD dataset. regression and Random forest is proposed. The proposed
Bayesian optimization enhances Random Forest models algorithm is implemented using Grid search which can search
by efficiently tuning hyperparameters like the number of for optimal parameters from a great number of parameters.
trees and maximum depth. This optimization reduces com- The optimal parameters are used to improve the performance
putational costs and improves predictive performance by of the algorithm. The combination of Logistic Regression
adapting to changes in the performance landscape during and Random Forest in a hybrid approach offers comple-
training [43]. This adaptability is particularly beneficial in mentary benefits. While Logistic Regression may overlook
real-time applications where model performance may fluctu- complex data patterns, Random Forest can capture them
ate. In summary, Bayesian optimization and Random Forest effectively by aggregating predictions from multiple decision

142728 VOLUME 12, 2024

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

FIGURE 7. Framework of the proposed algorithm for multiclass classification.

trees. Moreover, Random Forest’s robustness to noise and described previously in Section III. The proposed algorithm,
outliers enhances model performance, mitigating the impact however, introduced more advanced and nuanced approach.
of outliers. On the other hand, Logistic Regression pro- In the initial phase, it applied equations (3) - (8) (which
vides interpretable coefficients that elucidate the relationship are rooted in logistic regression) to optimize the model’s
between features and the target variable, aiding in understand- parameters and enhance its predictive capabilities. This phase
ing model predictions. By leveraging the strengths of both laid the groundwork for a more precise classification. The
methods, hybrid algorithms achieve improved performance second phase of the proposed algorithm employed equa-
and prediction accuracy. tions (9) - (11) (which utilize the random forest technique)
Figure 5 depicts the multiclass classification framework. to further refine the classification process. This dual-phase
The datasets are initially subjected to standardization, nor- methodology which combines logistic regression with ran-
malization, and dimensionality reduction. Once these steps dom forest, represents a significant improvement over tradi-
are completed, the proposed algorithm, along with other tra- tional approaches. It not only ensures greater accuracy and
ditional algorithms, is implemented for the final analysis. efficiency in the detection of network intrusions but also
Traditional algorithms such as Logistic Regression and underscores the innovative and robust nature of the proposed
Naive Bayes may struggle to capture intricate data rela- algorithm in addressing complex cybersecurity challenges.
tionships. However, by integrating Logistic Regression The multiclass classification of KDD99 dataset by using
with Random Forest in a hybrid model and optimizing proposed LR_RF algorithm is described in table 5.
hyperparameters through grid search, the model can effec- The comparison of experimental results over KDD99
tively address non-linear relationships and high-dimensional dataset is shown below in figure (8) and (9).
feature spaces, ultimately enhancing classification accu- The result of multiclass classification over four different
racy [46]. Although Random Forest models are potent, attack classes of the NSLKDD dataset is explored below in
they are often perceived as ‘‘black box’’ models, hindering table 6.
interpretability. Yet, by combining Logistic Regression in a The comparison between different machine learning algo-
hybrid approach, interpretability can be enhanced, as Logistic rithms and proposed algorithm over four attack classes of
Regression offers coefficients that signify the significance of NSLKDD dataset is shown in the figures (10) & (11).
each feature in the classification process. In deploying hybrid models combining Logistic Regres-
The classification of the KDD99 and NSL-KDD datasets sion and Random Forest for Intrusion Detection Systems
was carried out using both traditional algorithms and the (IDS), practical considerations such as computational costs
proposed algorithm. Traditional methods leveraged a com- and real-time applicability need to be addressed. Random
prehensive set of equations to conduct their analyses, estab- Forest typically demands more computational resources and
lishing a foundational benchmark for comparison which are training time, especially with large datasets or numerous

VOLUME 12, 2024 142729

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

TABLE 5. Classification of KDD99 dataset. TABLE 6. Classification of NSLKDD dataset.

FIGURE 8. Comparison over the KDD99 dataset.

FIGURE 10. Comparison over the NDSLKDD dataset.

FIGURE 11. Comparison of accuracy of the models.

FIGURE 9. Comparison of accuracy of the models.

costs, and performance. By amalgamating the strengths of
decision trees. Grid search, commonly used for hyperpa- both approaches and mitigating their weaknesses, these mod-
rameter tuning, further escalates computational expenses. els strive for an optimal compromise. Such hybrid models
Hybrid Logistic Regression and Random Forest models aim can leverage Random Forest’s higher predictive accuracy
to strike a balance between model complexity, computational while harnessing Logistic Regression’s real-time capabilities

142730 VOLUME 12, 2024

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

for swift decision-making, particularly in scenarios where [11] G. S. Fuhnwi, M. Revelle, and C. Izurieta, ‘‘Improving network intrusion
computational resources are limited [47]. detection performance: An empirical evaluation using extreme gradi-
ent boosting (XGBoost) with recursive feature elimination,’’ in Proc.
IEEE 3rd Int. Conf. AI Cybersecur. (ICAIC), Feb. 2024, pp. 1–8, doi:
VII. CONCLUSION 10.1109/ICAIC60265.2024.10433805.
In summary, the experiments conducted with the KDD99 and [12] A. D. Vibhute, C. H. Patil, A. V. Mane, and K. V. Kale, ‘‘Towards detection
of network anomalies using machine learning algorithms on the NSL-
NSLKDD datasets, combined with PCA for dimensionality KDD benchmark datasets,’’ Proc. Comput. Sci., vol. 233, pp. 960–969,
reduction, confirm the efficacy of ML techniques in intru- Jan. 2024, doi: 10.1016/j.procs.2024.03.285.
sion detection. Through the use of hybrid machine learning [13] A. Shehadeh, H. ALTaweel, and A. Qusef, ‘‘Analysis of data mining
techniques on KDD-cup’99, NSL-KDD and UNSW-NB15 datasets for
methods, the study effectively tackles challenges posed by intrusion detection,’’ in Proc. 24th Int. Arab Conf. Inf. Technol. (ACIT),
attack types with high prediction rates across diverse eval- Dec. 2023, pp. 1–6, doi: 10.1109/ACIT58888.2023.10453884.
uation criteria. These findings underscore the importance [14] T. Mehmood and H. B. Md Rais, ‘‘Machine learning algorithms
in context of intrusion detection,’’ in Proc. 3rd Int. Conf.
of flexible and resilient intrusion detection approaches in Comput. Inf. Sci. (ICCOINS), Aug. 2016, pp. 369–373, doi:
the face of constantly evolving cyber threats. The suggested 10.1109/ICCOINS.2016.7783243.
method, which integrates PCA for dimensionality reduction [15] N. A. Solekha, ‘‘Analysis of NSL-KDD dataset for classification of attacks
based on intrusion detection system using binary logistics and multinomial
and hybrid machine learning techniques, surpasses alternative
logistics,’’ Seminar Nasional Off. Statist., vol. 2022, no. 1, pp. 507–520,
algorithms in both binary and multi-class classification sce- Nov. 2022, doi: 10.34123/semnasoffstat.v2022i1.1138.
narios. Looking ahead, future research efforts will focus on [16] S. K. Mehak, Z. Rasheed, N. A. Ibupoto, and S. Ashraf, ‘‘Machine
advancing intrusion detection capabilities by leveraging deep learning algorithms for prediction of thyroid syndrome at initial stages
in females,’’ Kurdish Stud., vol. 12, no. 5, pp. 466–470, Jul. 2024, doi:
learning methods. These methods have the potential to further 10.53555/ks.v12i5.3247.
enhance the accuracy and efficiency of IDS by automatically [17] N. Wattanapongsakorn, S. Srakaew, E. Wonghirunsombat,
learning intricate patterns and representations from the data. C. Sribavonmongkol, T. Junhom, P. Jongsubsook, and C. Charnsripinyo,
‘‘A practical network-based intrusion detection and prevention system,’’
By exploring deep learning approaches, our aim is to develop in Proc. IEEE 11th Int. Conf. Trust, Secur. Privacy Comput. Commun.,
even more effective and reliable security solutions capable of Jun. 2012, pp. 209–214, doi: 10.1109/TRUSTCOM.2012.46.
addressing the evolving landscape of cyber threats. [18] T. Alves, R. Das, and T. Morris, ‘‘Embedding encryption and machine
learning intrusion prevention systems on programmable logic controllers,’’
IEEE Embedded Syst. Lett., vol. 10, no. 3, pp. 99–102, Sep. 2018, doi:
REFERENCES 10.1109/LES.2018.2823906.
[1] R. D. Ravipati and M. Abualkibash, ‘‘Intrusion detection system clas- [19] S. A. Repalle and V. R. Kolluru, ‘‘Intrusion detection system
sification using different machine learning algorithms on KDD-99 and using AI and machine learning algorithm,’’ Int. Res. J. Eng.
NSL-KDD datasets—A review paper,’’ Int. J. Comput. Sci. Inf. Technol., Technol., vol. 4, no. 12, pp. 1709–1715, 2017. [Online]. Available:
vol. 11, pp. 1–16, Jun. 2019, doi: 10.2139/ssrn.3428211. https://d1wqtxts1xzle7.cloudfront.net/55496979/IRJET-V4I12314
[2] S. Ganesan, G. Shanmugaraj, and A. Indumathi, ‘‘A survey of data mining [20] N. K. Trivedi, R. G. Tiwari, A. K. Agarwal, and V. Gautam, ‘‘A detailed
and machine learning-based intrusion detection system for cyber security,’’ investigation and analysis of using machine learning techniques for thyroid
in Risk Detection and Cyber Security for the Success of Contemporary diagnosis,’’ in Proc. Int. Conf. Emerg. Smart Comput. Informat. (ESCI),
Computing, 2023, pp. 52–74, doi: 10.4018/978-1-6684-9317-5.ch004. Mar. 2023, pp. 1–5, doi: 10.1109/ESCI56872.2023.10099542.
[3] K. Ashok and S. Gopikrishnan, ‘‘Statistical analysis of remote health [21] K. A. Taher, B. Mohammed Yasin Jisan, and Md. M. Rahman, ‘‘Network
monitoring based IoT security models & deployments from a prag- intrusion detection using supervised machine learning technique with fea-
matic perspective,’’ IEEE Access, vol. 11, pp. 2621–2651, 2023, doi: ture selection,’’ in Proc. Int. Conf. Robot., Elect. Signal Process. Techn.
10.1109/ACCESS.2023.3234632. (ICREST), Jan. 2019, pp. 643–646, doi: 10.1109/ICREST.2019.8644161.
[4] M. Rampavan and E. P. Ijjina, ‘‘Genetic brake-net: Deep learn- [22] K. Shaukat, S. Luo, V. Varadharajan, I. Hameed, S. Chen, D. Liu, and
ing based brake light detection for collision avoidance using genetic J. Li, ‘‘Performance comparison and current challenges of using machine
algorithm,’’ Knowl.-Based Syst., vol. 264, Mar. 2023, Art. no. 110338, doi: learning techniques in cybersecurity,’’ Energies, vol. 13, no. 10, p. 2509,
10.1016/j.knosys.2023.110338. May 2020, doi: 10.3390/en13102509.
[5] Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, [23] S. A. R. Shah and B. Issac, ‘‘Performance comparison of intrusion
‘‘Network intrusion detection system: A systematic study of machine detection systems and application of machine learning to snort system,’’
learning and deep learning approaches,’’ Trans. Emerg. Telecommun. Tech- Future Gener. Comput. Syst., vol. 80, pp. 157–170, Mar. 2018, doi:
nol., vol. 32, no. 1, p. e4150, Jan. 2021, doi: 10.1002/ett.4150. 10.1016/j.future.2017.10.016.
[6] L. Cui, Y. Qu, L. Gao, G. Xie, and S. Yu, ‘‘Detecting false data [24] W. Seo and W. Pak, ‘‘Real-time network intrusion prevention system based
attacks using machine learning techniques in smart grid: A survey,’’ on hybrid machine learning,’’ IEEE Access, vol. 9, pp. 46386–46397, 2021,
J. Netw. Comput. Appl., vol. 170, Nov. 2020, Art. no. 102808, doi: doi: 10.1109/ACCESS.2021.3066620.
10.1016/j.jnca.2020.102808. [25] J. Ribeiro, F. B. Saghezchi, G. Mantas, J. Rodriguez, and
[7] T. Meng, X. Jing, Z. Yan, and W. Pedrycz, ‘‘A survey on machine learn- R. A. Abd-Alhameed, ‘‘HIDROID: Prototyping a behavioral host-based
ing for data fusion,’’ Inf. Fusion, vol. 57, pp. 115–129, May 2020, doi: intrusion detection and prevention system for Android,’’ IEEE Access,
10.1016/j.inffus.2019.12.001. vol. 8, pp. 23154–23168, 2020, doi: 10.1109/ACCESS.2020.2969626.
[8] Ü. Çavuşoğlu, ‘‘A new hybrid approach for intrusion detection using [26] M. A. Al-Naeem, ‘‘Prediction of re-occurrences of spoofed ACK
machine learning methods,’’ Appl. Intell., vol. 49, pp. 2735–2761, packets sent to deflate a target wireless sensor network node
Feb. 2019. [Online]. Available: https://link.springer.com/article/ by DDOS,’’ IEEE Access, vol. 9, pp. 87070–87078, 2021, doi:
10.1007/s10489-018-01408-x 10.1109/ACCESS.2021.3089683.
[9] L. Li, Y. Yu, S. Bai, Y. Hou, and X. Chen, ‘‘An effective two- [27] A. H. Azizan, S. A. Mostafa, A. Mustapha, C. F. M. Foozy,
step intrusion detection approach based on binary classification M. H. A. Wahab, M. A. Mohammed, and B. A. Khalaf, ‘‘A machine learn-
and k-NN,’’ IEEE Access, vol. 6, pp. 12060–12073, 2018, doi: ing approach for improving the performance of network intrusion detection
10.1109/ACCESS.2017.2787719. systems,’’ Ann. Emerg. Technol. Comput., vol. 5, no. 5, pp. 201–208,
[10] Y. A. Al-Khassawneh, ‘‘An investigation of the Intrusion detection system Mar. 2021, doi: 10.33166/aetic.2021.05.025.
for the NSL-KDD dataset using machine-learning algorithms,’’ in Proc. [28] S. Asiri, Y. Xiao, S. Alzahrani, S. Li, and T. Li, ‘‘A survey of intelligent
IEEE Int. Conf. Electro Inf. Technol. (eIT), May 2023, pp. 518–523, doi: detection designs of HTML URL phishing attacks,’’ IEEE Access, vol. 11,
10.1109/eIT57321.2023.10187360. pp. 6421–6443, 2023, doi: 10.1109/ACCESS.2023.3237798.

VOLUME 12, 2024 142731

M. Sannigrahi, R. Thandeeswaran: Predictive Analysis of Network-Based Attacks

[29] E. N. Crothers, N. Japkowicz, and H. L. Viktor, ‘‘Machine-generated [43] A. Shokeen, N. Yadav, and V. Sisaudia, ‘‘Performance analysis of different
text: A comprehensive survey of threat models and detection machine learning algorithms for intrusion detection on KDD-CUP-99
methods,’’ IEEE Access, vol. 11, pp. 70977–71002, 2023, doi: dataset,’’ in Proc. AIP Conf., 2024, vol. 3072, no. 1, Art. no. 020010,
10.1109/ACCESS.2023.3294090. doi: 10.1063/5.0203394
[30] Q. Xiong, C. Yuan, B. He, H. Xiong, and Q. Kong, ‘‘GTRF: A general [44] S. M. Kasongo, ‘‘A deep learning technique for intrusion detection system
deep learning framework for tuples recognition towards supervised, semi- using a recurrent neural networks based framework,’’ Comput. Commun.,
supervised and unsupervised paradigms,’’ Eng. Appl. Artif. Intell., vol. 124, vol. 199, pp. 113–125, Feb. 2023, doi: 10.1016/j.comcom.2022.12.010.
Sep. 2023, Art. no. 106500, doi: 10.1016/j.engappai.2023.106500. [45] A. O. Alzahrani and M. J. F. Alenazi, ‘‘ML-IDSDN: Machine learning
[31] S. Mohanty and M. Agarwal, ‘‘Recursive feature selection and intrusion based intrusion detection system for software-defined network,’’ Concur-
classification in NSL-KDD dataset using multiple machine learning meth- rency Comput., Pract. Exper., vol. 35, no. 1, p. e7438, Jan. 2023, doi:
ods,’’ in Proc. Int. Conf. Comput., Commun. Learn. Cham, Switzerland: 10.1002/cpe.7438.
Springer, 2023, pp. 3–14, doi: 10.1007/978-3-031-56998-2_1. [46] K. Johnson Singh, D. Maisnam, and U. S. Chanu, ‘‘Intrusion detection sys-
[32] M. Zakariah, S. A. AlQahtani, A. M. Alawwad, and A. A. Alotaibi, tem with SVM and ensemble learning algorithms,’’ Social Netw. Comput.
‘‘Intrusion detection system with customized machine learning tech- Sci., vol. 4, no. 5, p. 517, Jul. 2023, doi: 10.1007/s42979-023-01954-3.
niques for NSL-KDD dataset,’’ Comput., Mater. Continua, vol. 77, no. 3, [47] S. Ahmadi, ‘‘Network intrusion detection in cloud environments: A com-
pp. 4025–4054, 2023, doi: 10.32604/cmc.2023.043752. parative analysis of approaches,’’ Int. J. Adv. Comput. Sci. Appl., vol. 15,
[33] M. Wang, K. Zheng, Y. Yang, and X. Wang, ‘‘An explainable machine no. 3, pp. 1–9, 2024, doi: 10.14569/IJACSA.2024.0150301.
learning framework for intrusion detection systems,’’ IEEE Access, vol. 8,
pp. 73127–73141, 2020, doi: 10.1109/ACCESS.2020.2988359.
[34] S. Neupane, J. Ables, W. Anderson, S. Mittal, S. Rahimi, I. Banicescu,
and M. Seale, ‘‘Explainable intrusion detection systems (X-IDS): A survey
of current methods, challenges, and opportunities,’’ IEEE Access, vol. 10,
pp. 112392–112415, 2022, doi: 10.1109/ACCESS.2022.3216617.
[35] E. K. Boahen, W. Changda, and B.-M. Brunel Elvire, ‘‘Detection of
compromised online social network account with an enhanced Knn,’’ MANISANKAR SANNIGRAHI received the
Appl. Artif. Intell., vol. 34, no. 11, pp. 777–791, Sep. 2020, doi: M.Tech. degree in computer science and infor-
10.1080/08839514.2020.1782002. mation security from the Kalinga Institute of
[36] E. K. Boahen, B. E. Bouya-Moko, F. Qamar, and C. Wang, ‘‘A deep learn- Industrial Technology, Bhubaneswar, Odisha,
ing approach to online social network account compromisation,’’ IEEE India, in 2020. He is currently pursuing the Ph.D.
Trans. Computat. Social Syst., vol. 10, no. 6, pp. 3204–3216, Dec. 2023, degree with the School of Computer Engineer-
doi: 10.1109/tcss.2022.3199080. ing and Information Systems, Vellore Institute of
[37] B. Sharma, L. Sharma, C. Lal, and S. Roy, ‘‘Explainable artificial intel- Technology, Vellore, India. His research interests
ligence for intrusion detection in IoT networks: A deep learning based
include machine learning, network security, and
approach,’’ Expert Syst. Appl., vol. 238, Mar. 2024, Art. no. 121751, doi:
cryptography.
10.1016/j.eswa.2023.121751.
[38] E. K. Boahen, S. A. Frimpong, M. M. Ujakpa, R. N. A. Sosu,
O. Larbi-Siaw, E. Owusu, J. K. Appati, and E. Acheampong, ‘‘A deep
multi-architectural approach for online social network intrusion detection
system,’’ in Proc. IEEE World Conf. Appl. Intell. Comput. (AIC), Jun. 2022,
pp. 919–924, doi: 10.1109/AIC55036.2022.9848865.
[39] H. Attou, A. Guezzaz, S. Benkirane, M. Azrour, and Y. Farhaoui, ‘‘Cloud-
based intrusion detection approach using machine learning techniques,’’ R. THANDEESWARAN received the B.E.,
Big Data Mining Anal., vol. 6, no. 3, pp. 311–320, Sep. 2023, doi: M.Tech., and Ph.D. degrees from Vellore Institute
10.26599/BDMA.2022.9020038. of Technology, Vellore. He is currently a Professor
[40] J. P. Bharadiya, ‘‘A tutorial on principal component analysis for dimen- with the School of Computer Engineering and
sionality reduction in machine learning,’’ Int. J. Innov. Sci. Res. Technol.,
Information Systems, Vellore Institute of Tech-
vol. 8, no. 5, pp. 2028–2032, 2023. [Online]. Available: https://www.
nology. He has 25 years of teaching experience
researchgate.net/profile/Jasmin-Bharadiya-4/publication/371306692
and expertise in computer and communication
[41] A. Verma and V. Ranga, ‘‘On evaluation of network intrusion
detection systems: Statistical analysis of CIDDS-001 dataset using networks, data and information security, network
machine learning techniques,’’ Authorea Preprints, 2023, doi: protocols, traffic analysis, and the IoT security
10.36227/techrxiv.11454276.v1. domains. He has published 27 research articles in
[42] P. Dini, A. Elhanashi, A. Begni, S. Saponara, Q. Zheng, and K. Gasmi, SCI, Scopus, and highly reputed journals, and also published several books
‘‘Overview on intrusion detection systems design exploiting machine and completed a funded project by the Government of India. He is a member
learning for networking cybersecurity,’’ Appl. Sci., vol. 13, no. 13, p. 7507, of CSI and the Soft Computing Research Society.
Jun. 2023, doi: 10.3390/app13137507.

142732 VOLUME 12, 2024

Batch 1_4 CSE C
No ratings yet
Batch 1_4 CSE C
9 pages
Enhancing Intrusion Detection Systems Using Machine Learning Techniques and Rule
No ratings yet
Enhancing Intrusion Detection Systems Using Machine Learning Techniques and Rule
109 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
9 pages
Intrusion Detection
No ratings yet
Intrusion Detection
76 pages
MachineLearningAlgorithmsforIntrusionDetectio
No ratings yet
MachineLearningAlgorithmsforIntrusionDetectio
23 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
27 pages
Final Progress
No ratings yet
Final Progress
22 pages
ramaiah2021
No ratings yet
ramaiah2021
17 pages
Applsci 13 07507 v4
No ratings yet
Applsci 13 07507 v4
34 pages
Applying Machine Learning To Cyber Security
No ratings yet
Applying Machine Learning To Cyber Security
117 pages
Project Paper Publication
No ratings yet
Project Paper Publication
10 pages
4768-Article Text-8889-1-10-20210501
No ratings yet
4768-Article Text-8889-1-10-20210501
8 pages
IDS Theisis (1)
No ratings yet
IDS Theisis (1)
30 pages
01-JCCE2202270 Online
No ratings yet
01-JCCE2202270 Online
10 pages
Deep Learning vs Machine Learning for Intrusion De
No ratings yet
Deep Learning vs Machine Learning for Intrusion De
19 pages
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
No ratings yet
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
44 pages
s42400-021-00103-8
No ratings yet
s42400-021-00103-8
22 pages
Supervised Machine Learning Algorithms For Intrusion Detection
No ratings yet
Supervised Machine Learning Algorithms For Intrusion Detection
14 pages
10 1016@j Jnca 2005 06 003 PDF
No ratings yet
10 1016@j Jnca 2005 06 003 PDF
19 pages
Introductory Chapter Machine Learning in
No ratings yet
Introductory Chapter Machine Learning in
8 pages
10.1007@978-981-13-9710-344
No ratings yet
10.1007@978-981-13-9710-344
13 pages
631eaa91dbcfb7 78471842
No ratings yet
631eaa91dbcfb7 78471842
13 pages
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
No ratings yet
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
7 pages
Cyber Intrusion Detection Using Machine Learning Classification Techniques
No ratings yet
Cyber Intrusion Detection Using Machine Learning Classification Techniques
11 pages
[FREE PDF sample] Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebooks
100% (1)
[FREE PDF sample] Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebooks
61 pages
2.2(4)
No ratings yet
2.2(4)
20 pages
Intrusion Detection System Based on Pattern Recognition
No ratings yet
Intrusion Detection System Based on Pattern Recognition
9 pages
Research on Network Intrusion Detection Technology Based on Machine Learning
No ratings yet
Research on Network Intrusion Detection Technology Based on Machine Learning
14 pages
Performance Analysis of Machine Learning
No ratings yet
Performance Analysis of Machine Learning
22 pages
Download full Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebook all chapters
100% (4)
Download full Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebook all chapters
60 pages
s40537-023-00694-8
No ratings yet
s40537-023-00694-8
26 pages
Ids
No ratings yet
Ids
22 pages
s10586-024-04673-3
No ratings yet
s10586-024-04673-3
56 pages
HDLNIDS Hybrid Deep-Learning
No ratings yet
HDLNIDS Hybrid Deep-Learning
17 pages
Batch 13(Pptx)
No ratings yet
Batch 13(Pptx)
27 pages
8499-Article Text-9477-2-10-20231102
No ratings yet
8499-Article Text-9477-2-10-20231102
12 pages
Chapter1 2 3 Guidelines 2025 Format
No ratings yet
Chapter1 2 3 Guidelines 2025 Format
118 pages
Machine Learning Technical Report
No ratings yet
Machine Learning Technical Report
12 pages
Network Intrusion Detection Using Supervised Machine Learnin (3) )
No ratings yet
Network Intrusion Detection Using Supervised Machine Learnin (3) )
24 pages
IEEE_Conference_Template
No ratings yet
IEEE_Conference_Template
4 pages
Machine Learning Algorithms For Intrusion Detection in Cybersecurity
No ratings yet
Machine Learning Algorithms For Intrusion Detection in Cybersecurity
9 pages
Intrusion Detection System Using Machine Learning Techniques A Review
No ratings yet
Intrusion Detection System Using Machine Learning Techniques A Review
8 pages
s00521-023-09309-y
No ratings yet
s00521-023-09309-y
19 pages
Paper 2-Application of Machine Learning Approaches in Intrusion Detection System
No ratings yet
Paper 2-Application of Machine Learning Approaches in Intrusion Detection System
10 pages
Smart Intrusion Detection System Compris
No ratings yet
Smart Intrusion Detection System Compris
6 pages
Signature-based Intrusion Detection Using Machine Learning and Deep Learning Approaches Empowered With Fuzzy Clustering
No ratings yet
Signature-based Intrusion Detection Using Machine Learning and Deep Learning Approaches Empowered With Fuzzy Clustering
33 pages
Intrusion Detection System for Proactive Cyber Threat Detection
No ratings yet
Intrusion Detection System for Proactive Cyber Threat Detection
15 pages
2.1(1)
No ratings yet
2.1(1)
8 pages
Machine Learning Techniques For Intrusion Detection: December 2013
No ratings yet
Machine Learning Techniques For Intrusion Detection: December 2013
11 pages
Corrected Intrusion Dection 1-3
No ratings yet
Corrected Intrusion Dection 1-3
51 pages
Capstone_Complete_report
No ratings yet
Capstone_Complete_report
49 pages
Effectiveness of Machine Learning Based Intrusion Detection Systems
No ratings yet
Effectiveness of Machine Learning Based Intrusion Detection Systems
12 pages
Network Intrussion Etection System
No ratings yet
Network Intrussion Etection System
31 pages
SSRN 4920457
No ratings yet
SSRN 4920457
14 pages
Machine Learning For Intrusion Detection in Cyber Security: Applications, Challenges, and Recommendations
No ratings yet
Machine Learning For Intrusion Detection in Cyber Security: Applications, Challenges, and Recommendations
24 pages
IntruDTree - A Machine Learning Based Cyber
No ratings yet
IntruDTree - A Machine Learning Based Cyber
16 pages
Instant Download Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu PDF All Chapter
100% (3)
Instant Download Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu PDF All Chapter
84 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
Ids Ae 2
No ratings yet
Ids Ae 2
9 pages
Honeypot Systems and Techniques: Definitive Reference for Developers and Engineers
From Everand
Honeypot Systems and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
outliers in machine learning - Google Search
No ratings yet
outliers in machine learning - Google Search
2 pages
how to install matplotlib in python - Google Search
No ratings yet
how to install matplotlib in python - Google Search
7 pages
DAA ICT QB
No ratings yet
DAA ICT QB
1 page
CS3352 FDS QB
No ratings yet
CS3352 FDS QB
6 pages
boxblot in fods
No ratings yet
boxblot in fods
5 pages
Listening 4.4 Vv
No ratings yet
Listening 4.4 Vv
3 pages
Group 2 Functions of Campus Paper
No ratings yet
Group 2 Functions of Campus Paper
2 pages
Introduction
No ratings yet
Introduction
13 pages
2024 LEARNER GUIDE Reporting Format (1) (1)
No ratings yet
2024 LEARNER GUIDE Reporting Format (1) (1)
6 pages
PHYSICS-IB Course Definitions 2024-25 .Docx
No ratings yet
PHYSICS-IB Course Definitions 2024-25 .Docx
4 pages
MOR Report Qualitative
No ratings yet
MOR Report Qualitative
38 pages
Review Seven Jump
No ratings yet
Review Seven Jump
21 pages
ANT1215 Indigenous Values Course Notes
No ratings yet
ANT1215 Indigenous Values Course Notes
2 pages
Introduction To Inverse Kinematics With Jacobian Transpose, Pseudo Inverse and Damped Least Squares Methods
No ratings yet
Introduction To Inverse Kinematics With Jacobian Transpose, Pseudo Inverse and Damped Least Squares Methods
19 pages
C B Macpherson And The Problem Of Liberal Democracy Jules Townshend pdf download
No ratings yet
C B Macpherson And The Problem Of Liberal Democracy Jules Townshend pdf download
81 pages
OJT-MOA - Sample
No ratings yet
OJT-MOA - Sample
6 pages
Expository Writing and Its Types
No ratings yet
Expository Writing and Its Types
2 pages
Shadow Work Journal Prompts
No ratings yet
Shadow Work Journal Prompts
4 pages
ST John Passion
No ratings yet
ST John Passion
38 pages
Avoiding Predatory Journals
No ratings yet
Avoiding Predatory Journals
20 pages
Course Name: Signal & System (Course Code), EE-320 Semester, Batch 2018
No ratings yet
Course Name: Signal & System (Course Code), EE-320 Semester, Batch 2018
7 pages
530 Coursework Assignment 3 2015
No ratings yet
530 Coursework Assignment 3 2015
1 page
2 - Phillips Academy PDF
100% (1)
2 - Phillips Academy PDF
4 pages
5.+Evi+dan+Aziz KIA
No ratings yet
5.+Evi+dan+Aziz KIA
18 pages
Week 6
No ratings yet
Week 6
7 pages
EdRev-Pro Ed
No ratings yet
EdRev-Pro Ed
80 pages
Unit 2 - Topic 2
No ratings yet
Unit 2 - Topic 2
7 pages
Grade 9 Verbal and Non Verbal Communication
No ratings yet
Grade 9 Verbal and Non Verbal Communication
8 pages
11.11.24 FN Boys
No ratings yet
11.11.24 FN Boys
7 pages
Pre Diploma Level Scholarship Result Published From Ctevt Sudurpashchim Pradesh
No ratings yet
Pre Diploma Level Scholarship Result Published From Ctevt Sudurpashchim Pradesh
15 pages
Bed Semester 4
100% (1)
Bed Semester 4
34 pages
Notice Writing
No ratings yet
Notice Writing
2 pages
Information - FNP Student Skills Check Off 0
No ratings yet
Information - FNP Student Skills Check Off 0
9 pages
Brief Game Concept Creation
No ratings yet
Brief Game Concept Creation
2 pages
COA Unit II
No ratings yet
COA Unit II
54 pages

Predictive Analysis of Network-Based Attacks by Hybrid Machine Learning Algorithms Utilizing Bayesian Optimization Logistic Regression and Random Forest Algorithm (1)

Uploaded by

Predictive Analysis of Network-Based Attacks by Hybrid Machine Learning Algorithms Utilizing Bayesian Optimization Logistic Regression and Random Forest Algorithm (1)

Uploaded by

Received 25 August 2024, accepted 17 September 2024, date of publication 20 September 2024, date of current version 9 October 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3464866

Predictive Analysis of Network-Based Attacks by

I. INTRODUCTION the largest barrier to network communication these days

142722 VOLUME 12, 2024

Bayes theorem [18]. In equation (1), a relationship between

VOLUME 12, 2024 142723

142724 VOLUME 12, 2024

(14), and (15), µ x = mean value of

VOLUME 12, 2024 142725

142726 VOLUME 12, 2024

FIGURE 4. Framework of the Proposed algorithm for binary classification.

1: Initialize the dataset X = [X1 , X2 . . . . . . .Xn ],

4: Set i = 0 proposed algorithm as shown in algorithm (1). Traditional

VOLUME 12, 2024 142727

Algorithm 2 The Proposed Grid Search-Based Hybrid Logis-

Output: Classification report for each target variable.

1: Initialize the dataset X = [X1 , X2 . . . . . . .Xn ],

142728 VOLUME 12, 2024

FIGURE 7. Framework of the proposed algorithm for multiclass classification.

VOLUME 12, 2024 142729

TABLE 5. Classification of KDD99 dataset. TABLE 6. Classification of NSLKDD dataset.

FIGURE 8. Comparison over the KDD99 dataset.

FIGURE 11. Comparison of accuracy of the models.

FIGURE 9. Comparison of accuracy of the models.

142730 VOLUME 12, 2024

VOLUME 12, 2024 142731

142732 VOLUME 12, 2024

You might also like