Hybrid Feature Selection

The document discusses the importance of robust security mechanisms for detecting web attacks, highlighting the WannaCry attack as a significant example. It proposes a Hybrid Feature Selection technique called Recursive Feature Elimination and Mutual Information (RFEMI) to improve the accuracy of Intrusion Detection Systems (IDS) using various Machine Learning Algorithms. The study demonstrates that the proposed system can effectively identify web attacks, achieving high accuracy with a focus on reducing computational time and improving feature selection.

Uploaded by

M Farhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Hybrid Feature Selection

Uploaded by

M Farhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Hybrid Feature Selection (RFEMI) Techniques

and Intrusion Detection

2023 10th International Systems
Conference on Future for
Internet of Things Web
and Cloud Attacks
(FICloud)

Detection Using Supervised Machine Learning

Algorithms.

Ibrahim Abobaker IEEE Student

Member
Department of Computer Science
University of Bradford
Bradford, UK
[email protected]

In the realm of business, it is crucial to establish robust the web application. At the same time, the WannaCry attack
security mechanisms for identifying web attacks. Advanced on the UK National Health Service in 2017 is an excellent
Intrusion Detection Systems (IDSs) can effectively fortify network example of a significant cyber-attack. The attack used a
security by operating at the network's perimeter. However, the Windows Server Message Block protocol vulnerability to
presence of a vast number of features within the system presents a
significant challenge in accurately detecting web attackers. To
install backdoor tools and execute a ransomware package. [6].
address this issue, a technique known as Recursive Feature Another attack on Tesco Bank in the UK was targeted by an
Elimination and Mutual Information (RFEMI) is proposed. This online fraudster on November 7, 2016, resulting in cash being
technique aims to select the most essential features while taken from approximately 40,000 accounts and causing a loss
eliminating duplicates, thereby simplifying attack detection and of about £600 per account. [7].
reducing computational time. The study conducted experiments to
assess the effectiveness of the proposed feature selection technique The web application uses a web browser and its
in detecting web attacks using various Machine Learning technology to complete precise tasks across the internet. Web
Algorithms (MLAs) such as Decision Tree Classifier (DTC), XGB applications use a combination of client-side scripts and
Classifier (XGB), Gradient Boosting Classifier (GBC) and K- servers to distribute data to legitimate users through a web
Nearest Neighbor (KNN) for Network-based Intrusion Detection browser. The client-side scripts are used to present specific
Systems (NIDS). The results demonstrate that the proposed system data to users, while the server-side scripts help handle, store,
can accurately identify web attacks using the CICIDS-2017 and retrieve the data. This model allows for the efficient and
dataset. Among the classifiers, the DTC classifier exhibited the
effective sharing of data with people and businesses.[2].
highest accuracy at 0.9961, with a False Positive Rate (FPR) of
0.054. Nevertheless, the web traffic has been examined via the
Keywords— Computer networks. Intrusion Detection Systems. requests from servers and clients through the HTTPS and
Machine Learning Algorithms. High dimensionality, web attack HTTP protocols. Web application security is essential as it
detection. stores numerous user data and delivers the means to access
many assets like online internet banking, online buying, online
I.INTRODUCTION purchase and other services provided by the financial
Security mechanisms to detect web attacks are an essential organization. This expansive application needs web systems
part of the design of network devices; new technologies such to handle and store an enormous amount of data, including
as IoT and mobile devices pose new threats and challenges to personally sensitive data such as username, password, date of
web security. This is because Web applications are widely birth, and credit card information, thereby eliciting varying
used in our daily lives and across communication networks for degrees of cyber security threats. For example, the service is
client-server communication. Despite the efforts of inconsistent if it fails to defend sensitive data such as bank
researchers and developers who have implemented security details. Consequently, Web applications are vulnerable to a
measures like firewalls, data encryption, and user wide range of security attacks that can compromise
authentication, web attacks still pose a significant challenge to Confidentiality, Integrity, and Availability, collectively
web applications. A study done by Symantec in 2019 found known as the CIA. These attacks can occur at different levels
that every month, around 4,818 websites were hacked with of the web application's architecture, posing a significant
form jacking code, and cybercriminals could make up to $2.2 challenge to its security.
million per month by stealing credit card information from Even though Web applications have similarities with
those compromised websites. [3]. traditional IT systems, they have unique characteristics that
Furthermore, almost one million new threats are make them more susceptible to attacks. This vulnerability
distributed into the network daily, as stated by Symantec [5]. requires special attention to secure web applications from
There are almost one million new threats distributed daily on various security threats.[8]. For instance, remote code
execution attacks target file systems; however, SQL injection

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

979-8-3503-1635-3/23/$31.00 ©2023 IEEE 78
DOI 10.1109/FiCloud58648.2023.00020
targets databases and, therefore, causes immeasurable • To Propose an Intrusion Detection System (IDS) that
destruction and loss to governments and companies [9]. In effectively identifies and prevents both known and
addition, data encryption has added an extra layer of novel attacks: Our system aids in the classification of
sophistication to the analysis task. Therefore, it has posed an network traffic, facilitating the identification of
enormous burden on security analysts to examine traffic and various types of attacks.
classify threats to develop suitable anomalies; the (IDS) is • Designing, implementing, and testing an Anomaly-
often designed to detect this kind of attack [10]. Based Intrusion Detection System (ABIDS)
However, much work has been done on the impact of web approach: This system utilizes the identified
attacks on web applications, including References , the anomalies and employs four different classifiers to
majority of literature focuses on increasing the detection rate, detect web attacks, including zero-day attacks.
and most of the work has been published with outdated Evaluation is conducted using the CICIDS-2017
dataset.
datasets such as (KDD99CUP, NSL-KDD and Advanced
Research Projects Agency (DARPA)). Nevertheless, The • Revealing that specific features, namely Destination
Open Web Application Security Project (OWASP) has Port, Packet Length Mean, and Average Packet Size,
identified the top 10 security risks for web applications, exhibit significance and commonality across both
including injection flaws, Broken Access Control, and XSS methods when analyzing the CICIDS-2017 datasets.
attacks. These have been major threats to web applications and This insight highlights the importance of these
servers for many years. This work focuses on understanding features in detecting web attacks, indicating their
the effects of web attacks on HTTP traffic, such as SQL relevance and potential as key indicators of
injection and Cross-Site Scripting (XSS). It proposes a system malicious activity. Understanding the shared
for IDS using unsupervised ML and evaluating the proposed significance of these features across multiple
approach with reliable datasets that had these kinds of attacks, methods provides valuable knowledge for future
such as CICIDS-2017 and UNSW-NB15[11]. The purpose is research and the development of more robust
intrusion detection systems.
to inspire comprehensive studies that prioritize finding
solutions to detect and prevent web attacks in HTTP traffic. The performance of the classifiers in detecting intrusions
is assessed to evaluate the tasks of the proposed system.
The unknown attack is known as a zero-day Attacks is a Various metrics such as False Negative Rate, True Positive
type of computer security weakness that has not been publicly Rate, False Positive Rate, Recall, and accuracy of different
disclosed or fixed by the responsible vendor. Practitioners in MLAs are measured and analyzed.
this field commonly use the term to refer to unpublished
vulnerabilities that are actively exploited in the wild. This The rest of this research work is structured as follows:
creates fear because IT professionals cannot protect against In Section II, the discussion revolves around the
something they don't know exists [12].To this end, the background of detecting web attacks through the use of
proposed system includes a method for selecting the most machine learning. Moving forward, Section III presents the
important features to build a lightweight intrusion-detection proposed method and the process of feature selection.
model that can accurately detect web and zero-day attacks. Numerical Experiment and results are then presented in
This approach was evaluated using the CICIDS-2017 dataset Section IV, followed by the conclusion of findings in Section
and has the potential to reduce the problems caused by new V and suggestions for future work.
attacks.
II. LITERATURE REVIEW
A. MOTIVATION AND CONTRIBUTION This section reviews literature relevant to Web attacks,
The research is driven by the increasing security threats IDS methods for the proposed system, and related studies.
faced by web applications and web servers due to their
extensive and widespread use. The focus is primarily on web B. Web Attacks
attacks and their impact on web application vulnerabilities. While the web attacks threat problem has increased
Anomaly detection (AD) techniques serve as the key focus sharply resulting from using web applications in our everyday
area of this study, aiming to develop an innovative approach lives such as online internet banking, purchasing, and selling,
that emphasizes early detection and prevention of web attacks. so also have the methods to detect and deal with it. Amongst
the studies on the threat of web attacks, several ways have
Within this context, the main tasks of the research paper been suggested to prevent the conduct of potential web attacks
are as follows: mainly.
• Addressing feature selection uncertainty and For instance, SQL injection is a generally utilised attack to
reducing the number of features: this work utilize a exploit a command injection weakness; it is used to inject false
Hybrid Feature Selection (FS) method called data into an SQL query over the web page for malicious
Recursive Feature Elimination and Mutual purposes in order to alter, delete, or insert information[13].
Information (RFEML). This technique selects ten The banking sector is a popular target for hackers. It is
features from each method, eliminates duplicates, understood the essential data including full name, current
home address, and date of birth stored in the banking server's
and reduces the feature count from 78 to 13 in the
CICIDS-2017 dataset. This reduction helps reduce database. This data can be gathered from the server due to
training and testing time. social engineering' identity fraud' [12].

79
On the other hand, Cross Site Scripting (XSS) is another data automatically; they use the UNSW-NB15 dataset to
kind of web attack. The improper system coding writing gives compare results. Their result achieves a detection rate of 97.28
an excellent option for the hacker to manipulate the known using the OGM method on the web attack dataset and 95.56
weaknesses. This attack uses the victim's exposure when using the UNSW-NB15 dataset.
visiting the vulnerable website. And the weak website works
as a medium for the hackers in order, to provide malicious As previously mentioned, a majority of the studies and
code to the target's browser[13]. related research have failed to consider computer system
utilization or the influence of big data on the detection of web
The exposure of weak web applications and essential data attacks or zero-day attacks. Utilizing Machine Learning (ML)
on it has highlighted the necessity for the exploration of algorithms could prove advantageous, not only for traffic
network security, which caused a sharp increase in the number classification but also for investigating the underlying causes
of attacks that triggered the web-based system. [14]. of features to determine their normality or anomaly. Initially,
intrusion detection systems (IDS) operated on the principles
Among the works done to deal with the web attack issue, of "deep packet inspection," involving the examination of The
several techniques have been suggested to identify the web limited studies discussing web and zero-day attacks primarily
attacks using IDS mainly. For example, reference [7] applied focused on achieving high accuracy and assessing the
end-to-end deep learning to identify attacks autonomously in vulnerability of such attacks. Consequently, there exists a
real time. They explore the potential of end-to-end deep significant gap in the existing literature regarding solutions for
learning in IDS and use deep learning in the whole process detecting and preventing web attacks [12]. This article aims to
starting with feature engineering and ending with prediction. bridge this knowledge gap, which is crucial for the
Their approach is to work without users manually selecting development of an efficient system. While most existing
features or constructing big labelled training sets. The authors works concentrate on analyzing the content of payloads to
evaluate unsupervised and semi-supervised learning methods identify web attacks.
to identify web attacks. Their methodology is based on the
Robust Software Modelling Tool, and their results show that C. Anomaly Detection
RSMT can effectively identify web attacks. Anomalies detection (AD) is a technique used to identify
The reference work [15] proposed a detection method patterns in the data that do not conform to the defined features
using a new ensemble deep learning to detect web attacks; of standard patterns in the data. It is defined as "an observation
they first built three models to identify the attacks. After that, that deviates so much from other observations to arouse
they utilise an ensemble classifier to make the final suspicions that a different mechanism generated it[14]. Driven
determination corresponding to the achieved results from the by various unusual activities from web attacks, credit card
three models. They performed experiments with real-world fraud, and numerous different kinds of attacks, (AD) are
datasets running in a distributed environment and CSIC 2010 essentially the reason they point out unusual events. Thus,
as a benchmark to evaluate the proposed Web attack Detection they can take key actions in vast application areas[15].
System WADS. The investigation shows the suggested Denning in [16] The concept of anomaly detection was
method can identify web attacks correctly, with a performance introduced by Denning, and since then, many researchers
of 99.47% accuracy, 99.29%, and 99.70% precision. Although worldwide have presented a lot of work on it.
the work achieved good accuracy, it seemed to overlook the On the other hand, AD is a data analysis technique that
performance resulting from the ensemble methods, which
identifies data patterns that do not conform to the standard
consumed more time classifying the data.
patterns[17, 18]. This technique is a wide and dominant
As recommended by[9], a single classifier can detect the category of (IDS); It helps to detect unusual events, such as
anomaly with a reduced attack sample in the training dataset web attacks or credit card fraud.AD work by A "norm profile"
by evaluating different classification algorithms, including is a baseline or standard behavior pattern created for an entity,
Random Forest (RF) classifier, Logistic Regression (LR) and Naïve
Bayes (NB) algorithms. The result showed that NB skilfully
identified attacks with few samples in training, like R2L and
U2R attacks. Whereas RF and J48 have recognized attacks
like DoS and Probe, J48 showed slightly lower results than
RF. They initially evaluate the result using the complete NSL-
KDD dataset and then using 20 % in the second stage and
compare the result using Precision, Recall, and F-Score.
Although the result shows a high precision score of 98.28%
using 20 % of the data, the result may be inaccurate as there is
no evidence this module will work with other data or real-time
traffic and was not tested on novel attacks.
On the other hand, researchers [16] introduced an
architectural scheme to develop a threat intelligence strategy
to detect web attacks using four step approach as a fellow:
1. gathering web attack data using crawling websites, 2.
extracting essential features using the Association Rule
Mining3.utilizing the obtained features to simulate web attack
data, and 4. offering a new Outlier Gaussian Mixture (OGM) Figure 1Anomaly based IDS using Machine Learning
method to detect known and novel attacks using an AD
method. They propose a method to capture network traffic

80
which is used to compare and identify any observed behaviors done by considering all the features or attributes as dimensions
that deviate from the norm profile[19]. It looks for patterns in in the calculation of Euclidean distances[5].
data that don't match the usual or normal patterns, as a way to By leveraging these methods, you can enhance your ability to
identify unusual or suspicious events. It does this by detect web attacks by utilizing their unique strengths in pattern
comparing the observed behaviors to a normal behaviour recognition, classification, and anomaly detection. enhances
profile and identifying data entries that don't fit with the rest the system's ability to accurately detect web attacks. The
of the dataset. evaluation of the proposed system on the CICIDS-2017
Figure 1 shows some ML techniques used in AD. dataset demonstrates promising results, indicating improved
classification accuracy and reduced model building time.
D. Supervised ML and Classification Techniques
Classification is a method used to differentiate unusual E. The CICIDS-2017 dataset
patterns, and this technique is used to differentiate unusual Many published works in AD and feature selection proposals
patterns; a fully labelled training dataset and a test dataset are use DARPA’98 and KDD’99 Cup. However, famous critics
needed. The classifier is trained first and then tested with the received advice against their use due to the out-of-data issue
test dataset, making it a good choice for detecting unseen and not presenting the actual network traffic[24]. In this work,
attack patterns. This approach is effective and has a high The CICIDS2017 dataset is a more recent dataset proposed by
detection rate for known attacks.[20]. The methods used for the Canadian Institute in 2017 for cybersecurity research is
(IDS) can be updated with new information and strategies, used; it includes normal traffic and new attacks when data is
making them adaptable. They can also identify unusual data collected (Catillo et al., 2021). The data is accessible in
patterns and are ideal for detecting new or "zero-day" attacks. bidirectional flow labelled form ( CSV ) and packet format (
Most IDS models use a single classifier, such as support pcap ).
Vector Machines, Genetic Algorithms, Logistic Regression,
KNN, and Random Forest. This researchers used four Nevertheless, this data obtains massive traffic and has a large
Machine learning algorithms and will be discussed in details number of features for anomaly detection. It includes recent
in the following subsections. and challenging attack distribution attacks, including Brute
1) T Decision Tree Classifier force, Infiltration, Botnet, DDoS, Dos, Web, and PortScan
Decision Tree learning is a commonly used method for (Cybersecurity, 2017).To this end, The data capture period is
categorizing data based on different attributes. Decision trees five days. Monday is the "normal day" and contains benign
are useful for processing large amounts of data and are often traffic. However, on Thursday there were web attacks
used in data mining applications. They do not require any prior including Brute Force, XSS, and SQL Injection that occurred
knowledge or specific settings and are suitable for exploratory between (9:20 - 10 a.m.), XSS (10:15 - 10:35 a.m.), and SQL
knowledge discovery. Decision trees are represented in a tree- Injection (10:40 10:42 a.m.). The attacker was a Kali Linux
like structure, which makes the acquired knowledge easy to node, and the victim was a WebServer Ubuntu[25].
understand[21].
Feature Selection and data pre-processing
2) XGB Classifier
XGBoost is a popular algorithm used for boosting that aims to In Machine Learning, the isolation process for predictive
achieve high efficiency, flexibility, and portability. This features from unwanted features is known as Feature
algorithm generates decision trees sequentially and assigns Selection (FS). It is well agreed that information theory has
weights to all independent variables. The model then become a powerful concept to undertake this strategy. This is
combines the various classifiers/predictors to form a more justified by the fact that correlation can yield a predictive
powerful and precise model. XGBoost can solve problems power, namely mutual information. Within the available
including regression, classification, ranking, and user-defined literature, there are numerous FS techniques. That said, a
prediction. It includes a sparsity-aware split discovery considerable number of researchers have used several
algorithm to handle different forms of sparsity patterns in data, approaches by employing filtering methods and wrapping.
and a distributed weighted quantile sketch approach to The complexity of keeping the data’s dimensionality of the
determine the optimal split points across weighted data manageable and targeting the most suitable features on
datasets[22]. which to drive the learning process would increase with the
3) Gradient Boosting Classifier ongoing increase of datasets. It is to be noted that the increase
The Gradient Boosting Classifier (GBC) is a machine learning in the size of datasets would be explicitly in terms of the data
algorithm used for classification and regression models. It features as well as the sample. However, it is unavoidable to
builds a gradual sequence of weak prediction models, such as enhance prediction accuracy, ignore the unstoppable growth
regression decision trees, to optimize the learning process. of training complexity, and understand the model more
These models are combined in an ensemble to improve their deeply.
accuracy, with each ensemble correcting errors from the
previous one. The nodes and leaves in the model make The best known two categories to reduce the dimensionality
predictions based on decision nodes, and the accuracy of the are either by transforming the feature space through feature
model is improved as more weak models are added to the extraction, which describes putting the original features into
ensemble[23]. new ones, or by following a different strategy of FS, by
4) K-Nearest Neighbour choosing a subset of features. The second category branches
The k-Nearest Neighbor Algorithm is a machine learning tool into three methods: embedding, filtering approaches, and
that predicts class labels for different instances by measuring wrapping. Notably, the filtering strategy is more advantageous
the shortest Euclidean distance from other instances. This is since it does not depend on the classifier, is more dynamic in

81
dealing with overfitting risks, and is more responsive to a Stage 3 employs four other Machine Learning Algorithms (MLAs)
structured method. More interestingly, many researchers have to classify the final optimized subset C, generated by the RFEML
successfully applied information-theoretic techniques and method. This step focuses on improving the overall performance of
concepts, for example,[26, 27]. the computer network, in addition to enhancing the accuracy of
attack detection. Previous studies in this field have primarily
As datasets become larger, it becomes more difficult to concentrated on enhancing the detection accuracy without adequate
manage data dimensionality and select suitable features for consideration for the network's overall performance.
learning processes and has directed our focus on classifiers, a Overall, this hybrid approach combines the strengths of both RFE
filtering method precisely for machines that can potentially and MI methods, leading to more accurate results compared to using
classify these samples. Since classes have a fixed entropy, the either method alone. This approach is particularly effective when
main target would be formulated in relation to the conditional dealing with large datasets containing numerous potential variables
entropy corresponding to the classification given the set of or factors that influence outcomes, such as classification accuracy
features. Then, the goal is to select the minimal set of features rates. Figure 3 depicts the first part of the proposed system, which
that can reduces the uncertainty classification to the required represents the described method (RFEML) for feature selection,
while table 1 show feature selected by both method separately.
level. For this reason, the Random Forest classifier (RFC)has Table 1Selected Features by (RFEML)
been used to find the most critical eight features, as shown in
fig 2; and the proposed FS method is used to confirm the Selected Features by Recursive Selected Features by
results. This problem has attracted the attention of many feature elimination Mutual Information
researchers, for example [28][[29]]. Destination Port', Destination Port',
' Total Length of Bwd Packets', ' Total Length of Bwd
Packets',
' Bwd Packet Length Mean', ' Bwd Packet Length Mean',
' Bwd Packet Length Std', ' Packet Length Mean',
' Packet Length Mean', ' Packet Length Std',
' Average Packet Size', ' Packet Length Variance',
' Avg Bwd Segment Size', ' Average Packet Size',
' Fwd Header Length.1', ' Avg Bwd Segment Size',
' Subflow Fwd Bytes', ' Subflow Bwd Bytes',
'Init_Win_bytes_forward' Destination Port',
The second part of the framework (Fig. 4) provides a
characterization of four MLA to classify the dataset based on the
whole dataset in the first stage; then, use the thirteen features from
the dataset to test (RFEML) system. Next, compare and analyze the
accuracy, precision, (FNR), (TPR), (FPR), (TNR), and percentage

Figure 2 Features important by RF.

III.METHODOLOGY, APPROACH, AND PROPOSED MODELS

The methodology, approach, and proposed models for building a
hybrid feature selection method to detect web attacks are illustrated
in Figure 3 and Figure 4. The process consists of three stages,
outlined as follows:
Stage 1 involves a pre-processing step to handle duplicate, missing,
and infinite features. This step aims to identify the most relevant
features that contribute significantly to the classification task.
In Stage 2, a combination of Recursive Feature Elimination (RFE) Figure 3first stage of the proposed method (RFEML) for FS
and Mutual Information (MI) techniques is applied to generate the
final optimal set of features. This selection process helps reduce the of incorrectly classified and correctly identifying web attacks of
training and testing time while maintaining the accuracy of the each classifier algorithm.
detection rates. For example, on the CICIDS-2017 dataset, the
number of features is reduced from 78 to 20, and further removal of
duplicate features results in a final set of 13 features. Figures 2 and IV.NUMERICAL EXPERIMENTS AND RESULTS
3 reveal that certain features, namely Destination Port, Packet
Length Mean, and Average Packet Size in the CICIDS-2017 The confusion matrix is commonly utilized in predictive
datasets, are both common and significant across all two methods. analytics to visualize the performance of the experiment
The Random Forest Classifier and the proposed system have using the proposed method, and the whole dataset and five
independently selected these features. states are considered as a following:

82
The recall is a measure of how well the model correctly
identifies all positive cases, and it is defined using a specific
formula., as follows:
𝑇𝑇𝑇𝑇 (6)
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹

Accuracy is a metric used to measure how close a predicted

value is to the actual value, it can be calculated by
𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 (7)
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹

F. Results Evaluation
Table 3 presents the results using exclusive features on the
dataset, while Table 4 displays the results using the proposed
method. Both tables compare the performance of different
Figure 4The second part of the proposed method (RFEML) for classifiers based on various metrics such as False Negative
FS Rate (FNR), True Positive Rate (TPR), False Positive Rate
(FPR), Accuracy (AC), Precision (PC), and Recall (RC).
Comparing the two tables, it can be observed that the
classifiers' performance generally improves when using the
The False Negative Rate (FNR) refers to situations where an proposed method (Table 3) compared to using exclusive
attack has occurred, but the system fails to identify it as such, features (Table 4). The FNR values decrease for all classifiers
giving a wrong prediction[5]. in Table 4, indicating a reduction in the rate of false negatives.
𝐹𝐹𝐹𝐹 (1) Similarly, the TPR values increase, indicating an
FNR = improvement in the rate of true positives.
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
The True Positive Rate (TPR) is a measure of how accurately
a system can identify attacks. It measures the proportion of
correctly identified attack instances out of all actual attack
instances.
𝑇𝑇𝑇𝑇 (2) Table 2Results using the whole features on the dataset
𝑇𝑇𝑇𝑇𝑇𝑇 =
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
The False Positive Rate (FPR) refers to the situation when a
M FNR( FPR( TPR(%) TNR AC( PC RC
system detects an attack when there is actually no attack. In LA %) %) (%) %)
other words, the system identifies a data point as an attack,
DT 0.0019 0.1203 0.998 0.879 0.9961 0.88 0.88
but it is actually not an attack. C 6 6
𝐹𝐹𝐹𝐹 (3)
𝐹𝐹𝐹𝐹𝐹𝐹 =
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
The True Negatives Rate (TNR) refers to the proportion of
cases where the model correctly predicts a negative outcome XG 0.0072 0.0043 0.9956 0.983 0.9924 0.99 0.83
when the actual outcome is also negative. In other words, B 9 8 5 4
TNR considers cases where both the prediction and actual
outcome are negative, and the model makes a correct KN 0.0025 0.1035 0.9974 0.896 0.9952 0.98 0.84
prediction. N 9 4
𝑇𝑇𝑇𝑇 (4)
𝑇𝑇𝑇𝑇𝑇𝑇 =
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇

GB 0.0066 0.0663 0.9933 0.933 0.9506 0.93 0.58

The performance of AIDS will be evaluated using a set of
5 6
metrics.:
Precision is a metric that measures the accuracy of identifying
positive cases, by calculating the proportion of correctly
identified positive cases and is determined by
𝑇𝑇𝑇𝑇 (5)
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹

83
False Negative Rate = (FNR), True Positive Rate = TPR, KNN 0.003 0.0149 0.997 0.9851 0.9952 1.0 0.9
False Positive Rate = (FPR), AC = Accuracy , Recall = RC 8

GB 0.0549 0.0004 0.945 0.9995 0.9506 1.0 0.6

Results Evaluation 8

2 False Negative Rate = (FNR), 2. True Positive Rate =

1 TPR, False Positive Rate = (FPR), AC = Accuracy , Recall =
0 RC
G. Discussion
Evaluation method In contrast to other works that primarily focused on
achieving high accuracy and low false alarm rates while
Decision Tree Classifier neglecting the performance of the network and devices, this
study takes into account the overall performance by selecting
XGBClassifier an optimal number of essential features.
KNeighbors Classifier For instance, in the work conducted by(Moustafa et al.
2018) the Outlier Gaussian Mixture (OGM) method was
GradientBoostingClassifier
examined, resulting in an improved accuracy of 97.28% on
Figure 5Accuracy of each classifier with the proposed the web attack dataset and 95.56% on the UNSW-NB15
method. dataset. However, this method showed limitations in detecting
unknown attacks.
On the other hand, the work done by (Kshirsagar and
In Table 4, the Decision Tree Classifier shows a decrease Kumar 2022) investigated various feature selection methods
in FNR from 0.00196 to 0.0015, while its TPR increases from such as Gain, Gain Ratio, Chi-squared, and Relief, along with
0.998 to 0.9984. On The other hand, XGB Classifier also J48, Random Forest, Naïve Bayes, and KNN algorithms. Their
exhibits a decrease in FNR from 0.00729 to 0.0073 and an system achieved an enhanced detection rate of 99.9909% with
increase in TPR from 0.9956 to 0.9927. While, the
KNeighbors Classifier shows a decrease in FNR from 0.00259
to 0.003 and an increase in TPR from 0.9974 to 0.997. Lastly,
the Gradient Boosting Classifier demonstrates a decrease in Results Evaluation
FNR from 0.00665 to 0.0549 and an increase in TPR from
0.9933 to 0.945. 1.5
Axis Title

1
These improvements in FNR and TPR suggest that the 0.5
proposed method enhances the classifiers' ability to detect web
0
attacks accurately. Additionally, the Accuracy (AC) values
generally remain high or improve slightly in Table 4
compared to Table 3, indicating that the proposed method
maintains or improves overall classification accuracy.
Figure 5 displays the accuracy of each classifier on the Evaluation method on…
whole dataset when using exclusive features. Figure 6, on the
other hand, illustrates the accuracy of each classifier when
utilizing the proposed method. Comparing the two figures, it Decision Tree Classifier
is evident that the classifiers' accuracy tends to improve or XGBClassifier
remain consistent when employing the proposed method
(Figure 6) compared to using exclusive features (Figure 5). KNeighbors Classifier

Overall, the results in Table 4, as well as Figures 5 and 6, GradientBoostingClassifier

suggest that the proposed method enhances the performance
Figure 6Accuracy of each classifier with the proposed
of the classifiers, leading to improved accuracy and detection
method.
rates for web attacks compared to using exclusive features.
Table 3Result using the proposed method. an 11.08 s model building time using relevant features.
However, it is worth noting that their system needs to be tested
MLA FNR( FPR( TPR( TNR( AC(% PC Rec with different datasets to ensure its generalizability..
%) %) %) %) ) all Comparing the present work with the study conducted by
DTC 0.0015 0.054 0.9984 0.946 0.999 1.0 0.9 (Popov et al. 2020), where Univariate Feature Selection and
9
Principal Component Analysis (PCA) were employed, the
proposed method showcased improved results. The test
XGB 0.0073 0.0091 0.9927 0.9908 0.9924 0.9 0.9
accuracy score obtained by the proposed method was 0.95,
4 9 6 compared to 0.7502 achieved by the LR algorithm using 19
features from the dataset. This indicates that the proposed
method outperforms the previous work in terms of
classification accuracy.

84
With respect to this objective, upon examining the Table 3 and Table 4 present the comparative performances
comparison presented in Table 1, it becomes evident that the of the classifiers based on metrics such as False Negative Rate,
suggested system, which employs the RFEML technique and True Negative Rate, True Positive Rate, False Positive Rate,
identifies the 13 most influential features, exhibits the highest Recall, and accuracy. Among the classifiers, XGB and KNN
level of classification accuracy when compared to similar exhibited superior performance in detecting web attacks on
studies. This proposed approach not only enhances the the CICIDS2017 dataset, warranting further investigation.
accuracy of classification but also significantly diminishes the However, the DTC classifier achieved the highest accuracy at
time required for model development, thereby delivering an 0.999, surpassing the other methods.
overall superior performance in contrast to previous On the other hand, the XGB and GBC algorithms
methodologies. Therefore, through careful feature selection demonstrated good accuracy at 0.992and 0.950, respectively.
and the RFEML approach, this work achieves a notable The proposed system, employing the (RFEML) method,
improvement in classification accuracy and model building effectively reduces training and testing time while improving
time compared to other relevant studies. The comparison is overall accuracy. Furthermore, the proposed approach can be
presented in Table 4 support this conclusion. enhanced in future research to detect a wider range of web
attacks. Additionally, this technical method holds potential for
deployment in IoT environments, where different feature
selection techniques can be employed to construct an optimal
Table 4Comparison with Other Work feature selection mechanism.
References Dataset Classifiers Advantages Limitation References
used /Results [1] N. Moustafa, G. Misra, and J. Slay, "Generalized outlier gaussian
mixture technique based on automated association features for
simulating and detecting web application attacks," IEEE Transactions
on Sustainable Computing, 2018.
[1] UNSW- Outlier Achieved Requires [2] D. Kshirsagar and S. Kumar, "Towards an intrusion detection system
NB15 Gaussian 97.28% validation for detecting web attacks based on an ensemble of filter feature
dataset Mixture accuracy with another selection techniques," Cyber-Physical Systems, pp. 1-16, 2022, doi:
method using the dataset; Not 10.1080/23335777.2021.2023651.
OGM effective for [3] Symantec, "ISTR Internet Security Threat Report," 2019.
method on detecting
the web unknown [4] A. S. A. Aziz, E. Sanaa, and A. E. Hassanien, "Comparison of
attack attacks. classification techniques applied for network intrusion detection and
dataset and classification," Journal of Applied Logic, vol. 24, pp. 109-118, 2017.
95.56%
[5] I. Abobaker and A. Musa, "Machine Learning for Intrusion Detection
using the
UNSW- and Network Performance," in 2021 8th International Conference on
NB15 Future Internet of Things and Cloud (FiCloud), 23-25 Aug. 2021 2021,
dataset. pp. 86-91, doi: 10.1109/FiCloud49777.2021.00020.
[2] CICIDS RF, Attaind To validate [6] M. H. Kamarudin, C. Maple, T. Watson, and N. S. Safa, "A
2017 Decision 99.6161% the result LogitBoost-Based Algorithm for Detecting Known and Unknown Web
Stump, J48, accuracy by with another
Attacks," IEEE Access, vol. 5, pp. 26190-26200, 2017, doi:
Hoeffding J48 dataset
tree, and
10.1109/access.2017.2766844.
REP [7] C. Agrawal and Z. Hasan, "Analysis of Major Security Attacks in
[4] NSL-KDD NB, RF, of 98.28% Only 20% Recent Years."
dataset J48, and using 20 % of testing
[8] Quartz. "Data is expected to double every two years for the next
ML of the data data used
when decade." https://qz.com/472292/data-is-expected-to-double-every-
comparing. two-years-for-the-next-
to training decade/#:~:text=Thanks%20to%20advancements%20in%20technolog
data which y,hitting%2045%2C000%20exabytes%20in%202020. (accessed.
can affect [9] Y. Pan et al., "Detecting web attacks with end-to-end deep learning,"
the
Journal of Internet Services and Applications, vol. 10, no. 1, pp. 1-22,
accuracy.
The CICIDS- DTC, XGB, Achieved Detect more
2019.
proposed 2017 GBC, and accuracy various web [10] M. Ahmad, Q. Riaz, M. Zeeshan, H. Tahir, S. A. Haider, and M. S.
System KNN and lees attacks. Khan, "Intrusion detection in internet of things using supervised
computing machine learning based on application and transport layer features
process. using UNSW-NB15 data-set," EURASIP Journal on Wireless
Communications and Networking, vol. 2021, no. 1, 2021, doi:
10.1186/s13638-021-01893-8.
V.CONCLUSIONS AND FUTURE WORK [11] S. M. Kasongo and Y. Sun, "Performance analysis of intrusion
detection systems using a feature selection method on the UNSW-
The findings of this study demonstrate that utilizing the NB15 dataset," Journal of Big Data, vol. 7, pp. 1-20, 2020.
proposed method to select 13 key features from the dataset [12] E. Levy, "Approaching Zero," IEEE Security & Privacy, vol. 2, no. 4,
significantly enhances the performance of the classifier by pp. 65-66, 2004, doi: 10.1109/MSP.2004.33.
eliminating data redundancy. The primary objective of this [13] P. R. McWhirter, K. Kifayat, Q. Shi, and B. Askwith, "SQL Injection
work was to achieve optimal performance and security within Attack classification through the feature extraction of SQL query
strings using a Gap-Weighted String Subsequence Kernel," Journal of
a system. By focusing on these significant features and information security and applications, vol. 40, pp. 199-216, 2018.
employing supervised machine learning classifiers such as [14] F. Cavallin and R. Mayer, "Anomaly Detection from Distributed Data
DTC, XGB, GBC, and KNN algorithms, web attacks can be Sources via Federated Learning," in Advanced Information
efficiently and effectively detected without incurring Networking and Applications: Proceedings of the 36th International
unnecessary computational costs. Conference on Advanced Information Networking and Applications
(AINA-2022), Volume 2, 2022: Springer, pp. 317-328.

85
[15] A. Smiti, "A critical overview of outlier detection methods," Computer Emerging Technologies in Data Mining and Information Security:
Science Review, vol. 38, p. 100306, 2020. Proceedings of IEMIS 2018, Volume 2, 2019: Springer, pp. 651-659.
[16] D. E. Denning, "An intrusion-detection model," IEEE Transactions on [24] J. McHugh, "Testing intrusion detection systems: a critique of the 1998
software engineering, no. 2, pp. 222-232, 1987. and 1999 darpa intrusion detection system evaluations as performed by
[17] Y. M. Tukur, D. Thakker, and I. U. Awan, "Edge‐based blockchain lincoln laboratory," ACM Transactions on Information and System
enabled anomaly detection for insider attack prevention in Internet of Security (TISSEC), vol. 3, no. 4, pp. 262-294, 2000.
Things," Transactions on Emerging Telecommunications [25] UNP. "Canadian Institute for Cybersecurity."
Technologies, vol. 32, no. 6, p. e4158, 2021. https://www.unb.ca/cic/datasets/ids-2017.html (accessed 15/05/2023.
[18] S. Garg and S. Batra, "A novel ensembled technique for anomaly [26] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos,
detection," International Journal of Communication Systems, vol. 30, "Feature selection and classification in multiple class datasets: An
no. 11, p. e3248, 2017. application to KDD Cup 99 dataset," Expert Systems with
[19] M. H. Kamarudin, C. Maple, T. Watson, and N. S. Safa, "A New Applications, vol. 38, no. 5, pp. 5947-5957, 2011.
Unified Intrusion Anomaly Detection in Identifying Unseen Web [27] N. Acharya and S. Singh, "An IWD-based feature selection method for
Attacks," Security and Communication Networks, vol. 2017, pp. 1-18, intrusion detection system," (in English), Soft computing (Berlin,
2017, doi: 10.1155/2017/2539034. Germany), vol. 22, no. 13, pp. 4407-4416, 2017, doi: 10.1007/s00500-
[20] S. Bahl and S. K. Sharma, "Improving Classification Accuracy of 017-2635-2.
Intrusion Detection System Using Feature Subset Selection," presented [28] E. Jaw and X. Wang, "Feature Selection and Ensemble-Based Intrusion
at the 2015 Fifth International Conference on Advanced Computing & Detection System: An Efficient and Comprehensive Approach," (in
Communication Technologies, 2015. English), Symmetry (Basel), vol. 13, no. 10, p. 1764, 2021, doi:
[21] B. Gupta, A. Rawat, A. Jain, A. Arora, and N. Dhami, "Analysis of 10.3390/sym13101764.
various decision tree algorithms for classification in data mining," [29] H. B. M. Rais and T. Mehmood, "Feature selection in intrusion
International Journal of Computer Applications, vol. 163, no. 8, pp. 15- detection, state of the art: A review," (in English), Journal of
19, 2017. Theoretical and Applied Information Technology, vol. 94, no. 1, pp.
[22] K. Konar, S. Das, and S. Das, "Employee attrition prediction for 30-43, 2016. [Online]. Available: https://go.exlibris.link/yjmgnwPK.
imbalanced data using genetic algorithm-based parameter optimization
of XGB Classifier," in 2023 International Conference on Computer, EEE conference templates contain guidance text for composing and
Electrical & Communication Engineering (ICCECE), 2023: IEEE, pp. formatting conference papers. Please ensure that all template text is
1-6. removed from your conference paper prior to submission to the
[23] N. Chakrabarty, T. Kundu, S. Dandapat, A. Sarkar, and D. K. Kole, conference. Failure to remove template text from your paper may result
"Flight arrival delay prediction using gradient boosting classifier," in in your paper not being published.