0% found this document useful (0 votes)
17 views8 pages

Enhancing Intrusion Detection Systems Through Dimensionality Reduction - A Comparative Study of Machine Learning Techniques For Cyber Security

Uploaded by

Yash Nawale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Enhancing Intrusion Detection Systems Through Dimensionality Reduction - A Comparative Study of Machine Learning Techniques For Cyber Security

Uploaded by

Yash Nawale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Cyber Security and Applications 2 (2024) 100033

Contents lists available at ScienceDirect

Cyber Security and Applications


journal homepage: http://www.keaipublishing.com/en/journals/cyber-security-and-applications/

Enhancing intrusion detection systems through dimensionality reduction: A


comparative study of machine learning techniques for cyber security
Faisal Nabi a,∗, Xujuan Zhou b
a
Muhammad Ali Jinnah University, Karachi Pakistan
b
University of Southern Queensland, Toowoomba, Australia 4350

a r t i c l e i n f o a b s t r a c t

Keywords: Our research aims to improve automated intrusion detection by developing a highly accurate classifier with min-
Cyber security imal false alarms. The motivation behind our work is to tackle the challenges of high dimensionality in intrusion
Intrusion detection system detection and enhance the classification performance of classifiers, ultimately leading to more accurate and ef-
Supervised machine learning
ficient detection of intrusions. To achieve this, we conduct experiments using the NSL-KDD data set, a widely
Anomaly detection
used benchmark in this domain. This data set comprises approximately 126,000 samples of normal and abnormal
PCA
Random projection network traffic for training and 23,000 samples for testing. Initially, we employ the entire feature set to train clas-
sifiers, and the outcomes are promising. Among the classifiers tested, the J48 tree achieves the highest reported
accuracy of 79.1 percent. To enhance classifier performance, we explore two projection approaches: Random
Projection and PCA. Random Projection yields notable improvements, with the PART algorithm achieving the
best-reported accuracy of 82.0 %, outperforming the original feature set. Moreover, random projection proves
to be more time-efficient than PCA across most classifiers. Our findings demonstrate the effectiveness of random
projection in improving intrusion detection accuracy while reducing training time. This research contributes
valuable insights to the cybersecurity field and fosters potential advancements in intrusion detection systems.

Introduction firewalls, the primary objective of an intrusion detection system is to


detect various signs of attacks as early as possible.
Due to the increasing frequency and sophistication of cyber-attacks By proactively identifying and responding to potential intrusions,
across various domains, network security has become a critical area IDSs play a crucial role in enhancing network security and safeguard-
of research garnering global attention. Cybercriminals employ diverse ing against evolving cyber threats. Their ability to detect and mitigate
techniques to breach users’ security, gaining unauthorized access to sen- attacks in real-time is vital in maintaining the integrity and confidential-
sitive data and profiting from activities like eavesdropping [1]. Conven- ity of sensitive data, thereby making them an indispensable component
tional firewalls and anti-virus software, unfortunately, fall short in de- in modern cybersecurity strategies. As cyber-attacks continue to evolve,
tecting zero-day attacks, denial of service attacks, data theft, and other ongoing research and advancements in intrusion detection systems will
sophisticated attack types. As a result, cyber security crimes continue remain essential in ensuring the resilience and security of our intercon-
to rise due to vulnerabilities in computer systems, ineffective security nected digital world.
policies, and a lack of awareness about cybercrime [2]. In 2016 alone, In the realm of intrusion detection systems (IDSs), there are two main
over three billion zero-day attacks were reported, necessitating urgent types: signature-based IDSs and anomaly-based IDSs. Signature-based
and effective solutions to combat these threats [3]. IDSs analyze incoming traffic by comparing it to predefined patterns
In response to these challenges, intrusion detection systems (IDSs) representing known attacks. They are effective at detecting attacks with
have garnered significant attention from cyber security researchers. IDSs high accuracy and low false alarms but are limited to recognizing only
are software products designed to automate the process of monitoring attacks stored in their database, necessitating constant updates with new
and analyzing intrusions. An intrusion is defined as any attempt to com- attack signatures [4]. On the other hand, anomaly-based IDSs continu-
promise the confidentiality, integrity, availability, or bypass the security ously monitor incoming traffic, raising an alarm if any deviation from
mechanisms of a network or a computer system [4]. Unlike traditional normal behavior exceeds a certain threshold. These systems can detect
novel attack types but may generate a larger number of false alarms [5].

Peer review under responsibility of KeAi Communications Co., Ltd.



Corresponding author.
E-mail address: [email protected] (F. Nabi).

https://doi.org/10.1016/j.csa.2023.100033
Received 4 August 2023; Received in revised form 28 November 2023; Accepted 14 December 2023
Available online 11 January 2024
2772-9184/© 2023 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/)
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

Anomaly-based IDSs learn normal behavior using machine learning NSL-KDD dataset, covering four attack types (DOS, Probe, user to root,
(ML) algorithms with various data instances characterizing network traf- root to local). Reported accuracy results are 84 % (logistic regression),
fic. ML techniques are divided into unsupervised (no labeled classes) 79 % (naïve Bayes), 75 % (SVM), and 99 % (random forest). Random for-
and supervised (labeled classes) learning, with the latter being common est’s near-perfect accuracy raises overfitting concerns. In [7], the same
in anomaly detection. Supervised algorithms utilize labeled data sets problem was addressed with cross-validation as the validation method
representing normal and anomaly behaviors as features, training a clas- and feature selection applied before feeding the data to three classifiers:
sification model to detect new attack patterns and raise alarms [1]. j48, naïve Bayes, and REPTREE. Feature selection proved effective in
Many anomaly-based IDSs employing ML algorithms have been pro- enhancing classification performance. In [8], SVM and k-nearest neigh-
posed (e.g., [6–14]). However, a key challenge is the high dimension- bor were tested on the KDD CUP99 dataset (32,000 samples) for normal
ality of data sets used for training the classification models, leading to and four attack types. Two experiments were conducted: one using the
increased training time. This is crucial for the effectiveness of online full feature set and the other with PCA for dimensionality reduction.
IDSs. Additionally, redundant information may exist, reducing classifi- PCA improved accuracy to around 90 % in both cases. Similarly, in [9],
cation accuracy and increasing false alarms. To address this, dimension- SVM with different kernels was experimented for intrusion detection.
ality reduction approaches are used, transforming the high-dimensional PCA was effective in enhancing classification performance, with the RBF
feature space into a lower-dimensional space. Techniques like Principal kernel SVM achieving over 99 % accuracy, though overfitting concerns
Component Analysis (PCA) preserve variance between data instances, remain. A similar approach was applied in [10], yielding improved clas-
while faster solutions like random projection (RP) use a random matrix sification performance with PCA.
based on a certain distribution, such as Gaussian, to reduce dimension- In [11], the authors focused on detecting distributed DOS attacks
ality [9]. (DDOS) using machine learning algorithms on the CICIDS2017 dataset.
This paper aims to investigate the performance of common super- Feature selection reduced the feature set from 85 to 12 features, and ran-
vised machine learning algorithms for anomaly-based intrusion detec- dom forest achieved the best results with around 96 % accuracy. High
tion (IDs). Additionally, the impact of two dimensionality reduction training time raised concerns. In [12], SVM and artificial neural net-
techniques, namely Principal Component Analysis (PCA) and Random works were experimented for intrusion detection on the UNSW-NB-15
Projection (RP), on classification performance is explored. While PCA dataset. Feature reduction methods (categorization, univariate feature
is a well-known method in this domain, its time-consuming nature selection, PCA) were employed, and categorization yielded the best re-
prompts us to assess the performance of RP, which offers a faster alter- sults with over 90 % accuracy, outperforming PCA. In [13], k-means
native. As a result, the main contributions of this work are as follows: clustering with feature selection was proposed for intrusion prediction
on the KYOTO dataset. Clustering significantly improved classification
• Analyzing the performance of commonly used supervised machine
performance, achieving very high accuracy rates.
learning algorithms for anomaly-based intrusion detection. This
In [14], a different approach using random projection for intrusion
analysis involves training a classification model using approximately
detection based on Apache web server log data was explored. The ap-
126,000 samples of normal and anomaly patterns from the NSL-KDD
proach showed potential for effective intrusion identification through
data set.
visualization. Lastly, in [15], an end-to-end system was proposed for in-
• Examining the effect of applying PCA dimensionality reduction al-
trusion detection using novel data sets simulating intrusion in LAN and
gorithm on classification performance.
cloud environments. Decision tree and regression showed good results
• Examining the effect of applying RP dimensionality reduction algo-
in LAN and cloud environments, respectively.
rithm on classification performance.
In [19], the authors used the KDD’99 and the NSL-KDD datasets to
• Comparing the classification performance achieved by PCA and RP
train decision tree (DT), multi-layer perceptron (MLP), random forest
to identify potential advantages and trade-offs of each approach.
(RF), and a stacked autoencoder (SAE) model for detecting network in-
By addressing these aspects, this study seeks to provide valuable in- trusion. In their comparative study, they claimed that the random forest
sights into the effectiveness of supervised machine learning algorithms classifier showed the most consistent and accurate results. Similarly, the
for anomaly-based intrusion detection and the impact of dimensional- authors of [21] also used the benchmarking dataset NSL-KDD to conduct
ity reduction techniques on classification performance. The findings will a comparative study for intrusion detection using four ML techniques in-
contribute to a better understanding of which methods are more suitable cluding Random Forest, J48, ZeroR, and Naïve Bayes. However, they did
for efficient and accurate intrusion detection in practical applications. not involve the data dimensions reduction techniques in their study.
The remainder of this paper is structured as follows: Section 2 re-
views related work in this field. Section 3 discusses the methodology Summary for identifying the research gaps
employed in this study. Section 4 delves into the experimental findings.
Finally, Section 5 concludes the paper and makes recommendations for The problem of intrusion detection and supervised learning has gar-
future research. nered global attention, leading to numerous studies using various ML
algorithms and validation methods. Notably, the choice of validation
Related work method can significantly impact classification performance, with cross-
validation often providing better results than independent testing data
The area of supervised learning and intrusion detection has garnered sets. Additionally, the size of the testing data set has a bearing on the
significant attention among cyber security researchers. Numerous stud- classification outcomes. While some works report very high classifica-
ies focus on applying common supervised ML techniques and evaluat- tion results, concerns arise about potential overfitting issues. Moreover,
ing their performance on popular intrusion datasets. Examples of these it is observed that normal behavior yields better accuracy measures com-
techniques include decision trees, random forests, Bayes methods, sup- pared to intrusion behaviors, an aspect often overlooked in overall ac-
port vector machines (SVM), neural networks, ensemble classifiers, and curacy reporting across all classes.
more. Feature selection and PCA are frequently utilized to reduce dimen-
sionality and generally enhance classification performance. However,
Supervised ML for intrusion detection PCA comes with a substantial training time cost due to matrix calcula-
tions. In contrast, our work proposes a novel approach by employing
Recently, the authors in [6] experimented with four supervised random projection combined with machine learning for intrusion de-
machine-learning algorithms for intrusion detection: logistic regression, tection, a highly efficient and rapid method in comparison to PCA. The
SVM, naïve Bayes, and random forest. Training was conducted on the results demonstrate the superiority of the random projection approach

2
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

Table 1 Where 𝑊 is a 𝑃 × 𝐷 matrix achieving the desired linear transformation


Number of instances per class. of the data and μ is the mean of the data. The P-dimensional vectors of
Training data Testing data the matrix 𝑊 are given by the 𝐷 dominant eigenvectors (𝑣), associated
Class # Instances # Instances with the highest Eigenvalues (𝜆), of the sample covariance matrix
∑𝑁 ( )( )𝑇
Normal 67,343 9711 𝑦𝑖 − 𝜇 𝑦𝑖 − 𝜇
Anomaly 58,630 12,833 𝑆 = 𝑖=1 (2)
𝑁
Such that 𝑆𝑣 = 𝜆𝑣 and 𝑁 is the number of observations. The data in
the reduced space are uncorrelated such that their covariance.
over both PCA and the full feature set, as illustrated in subsequent sec- ∑ 𝑇
tions. To assess its performance, we evaluate the proposed approach on Matrix 𝑆𝑥 = 𝑛𝑖=1 𝑥𝑥 𝑁
is diagonal and its elements are the Eigenval-
an independent data set from NSL-KDD comprising over 22,000 samples ues, (𝜆) [17].
representing normal and anomaly behaviors. On the other hand, there is a simple yet efficient method for dimen-
sionality reduction based on random projections.
Methodology In this method, the original data 𝑌 in a higher dimensional space,
is transformed into a lower dimensional space 𝑋 via: 𝑋 = 𝑊 𝑌 , where
In this section, the methodology for building the classifier and fea- 𝑊 is a 𝐷𝑋𝑃 random matrix where 𝐷 is of a very small dimensionality
ture selection analysis is presented. compared to 𝑃 and its columns are realizations of independent and iden-
tically distributed zero-mean-normal variables that are scaled to have a
unit length. This idea is motivated by the Johnson- Lindenstrauss lemma
Study dataset
which states that if points in high dimensional feature space of dimen-
sion 𝑃 , are projected onto a randomly selected lower-dimensional space
The data set used in this study is NSL-KDD full dataset available from
of suitable dimension 𝐷, then the distances between points are approx-
the UNB data sets repository [16]. The dataset is collected from diverse
imately preserved if 𝐷 is large enough.
sources, such as network traffic flows, and contain valuable information
⟨( ( ) ( ) )⟩ ( ) ( ) 4
about user behavior, host configurations, and system settings. Analyzing ‖𝜙 𝑦𝑖 − 𝜙 𝑦𝑗 ‖2𝐷 − ‖𝑦𝑖 − 𝑦𝑗 ‖2𝐷 ∅ ≤𝐷 2
̄ ‖ 𝑦𝑖 − 𝑦𝑗 ‖𝑃 (3)
this information is crucial for studying attack patterns and identifying
where ‖ ‖𝑃 and ‖ ‖𝐷 denote the Euclidian distances norms in 𝑉𝑃 and 𝑉𝐷 ,
abnormal behaviors. It consists of a diverse range of intrusions simu-
respectively and < >∅ is the average overall possible is tropic random
lated in a military network environment. It simulated a typical US Air
choices for the unit vectors defining the random mapping ∅ [17].
Force LAN to create an environment to acquire raw TCP dump data for a
In our experiments, we evaluate the performance of random projec-
network. The LAN was simulated like a real environment and breached
tion based on two different choices for the elements of the matrix 𝑊 :
with multiple attacks. A connection is a sequence of TCP packets starting
and ending at some time duration between which data flows to and from - The first choice is generated using a Gaussian distribution with the
a source IP address to a target IP address under some defined protocol. satisfaction of two main properties: orthogonality and normality.
There are 125, 973 TCP/IP connections (instances) that are character- - The Gaussian distribution can be replaced by a simpler distribution,
ized with 41 features extracted from normal and anomaly data for train- we refer to it as Sparse, such as:
ing the model, and 22,544 for testing the model as illustrated in Table 1. √ {
−1 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦12
Moreover, a sample of the feature set is illustrated in Table 2. Accord- 𝑊 = 3× (4)
+1 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦12
ing to [16], it does not contain any redundancy in the training records.
Moreover, there are no duplicates in the testing data. We normalize the
data before feeding it to the dimensionality reduction algorithms.
Classification
Some well-known datasets in this domain include DARPA, KDD
CUP99, NSL-KDD, KYOTO, CICIDS2017, UNSW-NB-15, among others.
Once the data set is prepared, it is fed to the chosen supervised ma-
For comprehensive details regarding these datasets, including feature
chine learning algorithms, to experiment with their classification per-
sets, classes, and other relevant information, the authors in [2] provide
formance on the ability to differentiate between the two classes: normal
a detailed discussion. The NSL-KDD dataset was chosen as it’s a widely
and anomalous. We experiment before and after dimensionality reduc-
recognized benchmark for intrusion detection, providing diverse attack
tion and use five well-known classification algorithms: BayesNet, Naïve
samples. Using a consistent dataset allows fair comparison of ML al-
Bayes, J48, PART, and Random Forest. BayesNet is a classification tech-
gorithms. The research focused on dimensionality reduction’s impact
nique that probabilistic graphical model that uses a directed acyclic
on classification using NSL-KDD. Evaluating multiple algorithms on this
graph for representing the feature set and their conditional dependen-
dataset ensures reliable conclusions. Future work can explore different
cies. Naïve Bayes is a classification technique based on the Bayes’ theo-
datasets to assess algorithm performance in various scenarios.
rem [17], this theorem can describe the probability of an event based on
the previous knowledge of conditions related to that event. Naïve Bayes
Dimensionality reduction
classifier task to classify a new object to a specific class assumes that the
feature in classes is not directly related. J48 algorithm is the java imple-
As the dimensionality of the feature set is relatively high (41 fea-
mentation of the C4.5 algorithm which builds decision trees based on
tures), we experiment with two projection approaches for reducing the
the training data. PART algorithm iterates for several iterations build
dimensionality of the feature set: 1) principal component analysis (PCA)
a partial decision tree using the C4.5 algorithm at each iteration and
and random projection (RP).
makes the best leaf into a rule. Finally, the random forest algorithm is a
In the first approach, PCA, the high dimensional feature space is re-
classification technique that constructs multiple decision trees and out-
duced into a lower-dimensional feature space using an orthogonal pro-
puts the class that represents the average prediction across the multiple
jection that maximizes the variance and separation between data and
trees [18].
can lead to better classification performance. Given a P- dimensional
observed data vector, y. PCA transforms the data observation into a
Performance evaluation
lower-dimensional space of dimension D, where each observation x, in
this lower dimensionality space can be expressed as
To evaluate the performance of the classifiers, we build the classi-
𝑥 = 𝑊 (𝑦 − 𝜇) (1) fier using the training data set and test its performance on the supplied

3
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

Table 2
Description of the features.

Feature Description

Duration length (number of seconds) of the connection


protocol type type of the protocol, e.g. tcp, udp, etc.
Service network service on the destination, e.g., http, telnet, etc.
src_bytes number of data bytes from source to destination
dst_bytes number of data bytes from destination to source
Flag normal or error status of the connection
Land 1 if connection is from/to the same host/port; 0 otherwise
wrong_fragment number of “wrong’’ fragments
Urgent number of urgent packets

Content features

Hot number of “hot’’ indicators


num_failed_logins number of failed login attempts
logged_in 1 if successfully logged in; 0 otherwise
num_compromised number of ”‘compromised’’ conditions
root_shell 1 if root shell is obtained; 0 otherwise
su_attempted 1 if “su root’’ command attempted; 0 otherwise
num_root number of “root’’ accesses
num_file_creations number of file creation operations
num_shells number of shell prompts
num_access_files number of operations on access control files
num_outbound_cmds number of outbound commands in an ftp session
is_hot_login 1 if the login belongs to the “hot’’ list; 0 otherwise
is_guest_login 1 if the login is a “guest’’ login; 0 otherwise

Traffic features using a 2-second time window

count number of connections to the same host as the current connection in the past two seconds
serror_rate % of connections that have “SYN’’ errors
rerror_rate % of connections that have ”REJ’’ errors
same_srv_rate % of connections to the same service
diff_srv_rate % of connections to different services
srv_count number of connections to the same service as the current connection in the past two seconds
srv_serror_rate % of connections that have “SYN’’ errors
srv_rerror_rate % of connections that have “REJ’’ errors
srv_diff_host_rate % of connections to different hosts

testing data set (separate data set from training data set). Classification Table 3
accuracy is calculated on the tested data as the ratio between correctly Algorithm’s classification results before dimensionality reduction.
classified samples divided by the total number of tested samples. An- Classification Algorithm Accuracy (%) FPR (%)
other performance evaluation measure is the false positive rate (FPR)
Bayes-Net 71.4 25.5
which calculates the rate of false-positive and is calculated as the num-
Naïve Bayes 73.1 23.5
ber of false positives divided by the total number of true negatives and J48 79.1 18.5
false positives. PART 73.9 24.0
Random Forest 77.8 20.1

Experiments results and discussions


Table 4
Confusion matrix for J48 classifier.
This section presents the experiment results in three parts. Firstly,
we display the classification outcomes across the five classifiers without a B ← classified as
dimensionality reduction. Next, we present the classification results af- 9240 471 a = normal
ter applying PCA. Lastly, we analyze the impact of random projection 34,231 8602 b = anomaly
on the classification performance.

Experiment 1: experiments using the full training data set rows represent the ground truth classes and the columns represent the
predicted classes.
In this experiment, the five supervised learning algorithms discussed It is noted from each of these tables that the normal instances are
earlier are utilized to build the classifier using the full feature set of 41 classified correctly with a higher percentage than that of anomaly in-
features from the training data set. Subsequently, the model is tested stances, which we refer to as true positive rate. As it is clear from the
on the testing data set. Table 3 displays the accuracy and FPR results above table, the true positive rate for the normal class is (=9240/9711)
obtained from the five supervised machine learning algorithms before that is 94.6 % compared to (=8602/12833) is 67.0 % true positive rate
any dimensionality reduction is applied. for the anomaly class, which indicates the difficulty in predicting new
It is clear from Table 4 that the highest accuracy and the lowest FPR intrusions.
are obtained using the J48 classification algorithm with an accuracy
of (79.1 %) and a false positive rate of (18.5 %). Generally, accuracy Experiment 2: experiments using classifiers after PCA
and FPR results are stable across the five algorithms with no dramatic
changes. For further analysis of the performance results of the best clas- In this experiment, we applied PCA to the data set to reduce its di-
sifier (J48), Table 4 presents the confusion matrix for this classifier. The mensionality, and then we assessed the performance of the five classi-

4
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

Table 6
Confusion matrix for Naïve Bayes classifier.

a B ← classified as

9281 430 a = normal


4343 8490 b = anomaly

Fig. 1. The PCA projection of 3000 samples from the Normal class (circles) and
3000 samples from the Anomaly class (triangles).

Fig. 3. Classification accuracy results before and after PCA.

Fig. 2. Classification accuracy results across the first 5 Eigenvectors (dimen-


sions).

Table 5
Algorithm’s classification results after projecting the data with PCA.

Classification Algorithm Accuracy (%) FPR (%) Fig. 4. A random projection of 3000 samples from the Normal class, represented
by circles, and 3000 samples from the Anomaly class, represented by triangles.
Bayes-Net 76.2 21.5
Naïve Bayes 78.8 19.0
J48 76.7 21.0
PART 77.0 20.5 Bayes algorithm). For a visual representation of the effect of applying
Random Forest 74.1 24.0 PCA across the five classifiers, refer to Fig. 3.

Experiment 3: experiments using classifiers after random projection


fication algorithms in the reduced feature space. Our experiments re-
vealed that approximately 92 % of the data variance could be explained In this experiment, our initial focus is on evaluating the performance
by the first eigenvector of the covariance matrix (first principal compo- of the Gaussian random matrix across the five classification algorithms
nent). Fig. 1 illustrates the PCA projection of 6000 samples, while Fig. 2 while varying the reduced dimensions from one to up to five dimensions.
displays accuracy results across the first five eigenvectors (reduced di- Fig. 4 visually presents a random projection of 6000 samples, offering
mensions). Notably, the highest accuracy was achieved when project- valuable insights. As demonstrated in Fig. 5, the optimal classification
ing the data into a one-dimensional feature space. Consequently, we performance is achieved when the data is reduced into a 5-dimensional
projected the data into a one-dimensional feature space and evaluated feature space. Consequently, we proceed to transform the data into this
its performance using the five aforementioned classification algorithms. reduced space using both Gaussian and Sparse matrices, applying them
The results are summarized in Table 5. across the five classification algorithms. The comprehensive results of
Table 5 clearly indicates that the Naïve Bayes algorithm achieved this process are illustrated in Table 7.
the highest reported accuracy of 78.8 %. The corresponding confusion The results presented in Table 7 highlight the superiority of the Gaus-
matrix, presented in Table 6, reveals a true positive rate of 95.6 % for the sian matrix over the Sparse matrix in terms of providing better accu-
normal class, while the anomaly class has a true positive rate of 66.2 %. racy and false-positive rates. This is attributed to the Gaussian matrix’s
In general, PCA proved to be effective in enhancing the performance ability to achieve a more effective dimensionality reduction, preserving
of three out of the five tested classifiers. However, it is worth noting the underlying data structure and relationships more efficiently. The
that the best reported accuracy result obtained with the full training even spread of projected data points in the lower-dimensional space
data set (J48 algorithm) surpassed that of the reduced data set (Naïve contributes to higher accuracy levels for classifiers trained on the Gaus-

5
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

Table 8
Confusion matrix for PART classifier.

a B ← classified as

9438 273 a = normal


3784 9049 b = anomaly

Table 9
Precision, recall, and F1 measures for PART classifier.

Measurement Class Precision (%) Recall (%) F1 (%)

Normal 71.4 97.2 82.3


Anomaly 97.1 70.5 81.7

Fig. 5. Classification accuracy results across five dimensions with random pro-
jection.

Table 7
Algorithm’s classification results after random projection.

Matrix Gaussian Sparse

Classification Algorithm Accuracy (%) FPR (%) Accuracy (%) FPR (%)

Bayes-Net 78.4 16.0 75.5 22.0


Naïve Bayes 77.3 20.5 71.6 25.0
J48 79.6 17.0 77.2 20.5
PART 82.0 16.2 75.3 21.5
Random Forest 77.5 20.0 77.3 20.5
Fig. 6. Classification accuracy results before and after random projection.

Table 10
sian matrix compared to the Sparse matrix. Additionally, the Gaussian Comparison between PCA and Random projection accuracy results.
matrix outperforms the Sparse matrix in terms of false-positive rates, a Classification Algorithm PCA (%) Random projection (%)
crucial metric for intrusion detection systems, ensuring fewer normal
Bayes-Net 76.2 78.4
instances are misclassified as anomalies. In the absence of the Gaussian
Naïve Bayes 78.8 77.3
matrix, the alternate option is to use the Sparse matrix for dimension- J48 76.7 79.6
ality reduction. However, this choice may come with drawbacks. The PART 77.0 82.0
Sparse matrix might not perform as effectively as the Gaussian matrix, Random Forest 74.1 77.5
leading to less accurate classification and reduced ability to discrimi-
nate between normal and anomalous instances. The clustering of data
points in the lower-dimensional space could result in a loss of relevant The high precision rate for the anomaly class (97.1 %) indicates that
information, hindering effective data representation. among all predicted instances classified as intrusions, 97.1 % are gen-
Furthermore, using the Sparse matrix may increase the risk of over- uine intrusions. Conversely, the high recall measure for the normal class
fitting, particularly with high-dimensional data. The Gaussian matrix’s (97.2 %) indicates that 97.2 % of the instances in the normal class are
capacity to provide a more generalized representation helps mitigate correctly classified as normal. The overall F1 measure shows a balanced
this risk, while the Sparse matrix might struggle to maintain generaliza- performance for both classes.
tion capability. Comparing the results after random projection (Gaussian) with the
In conclusion, the Gaussian matrix emerges as the preferred option original data set, we observe that it has effectively enhanced the major-
for enhancing intrusion detection systems and cyber security due to its ity of classifiers with improvements ranging from 0.5 % to 8.1 %. These
ability to retain essential data characteristics, improve accuracy, and enhancements are illustrated in Fig. 6, demonstrating the efficacy of ran-
reduce false-positive rates. On the other hand, using the Sparse matrix dom projection in improving classification performance across various
might result in decreased classification performance and increased risk classifiers.
of overfitting. The selection of the Gaussian matrix ensures a more ro-
bust and reliable intrusion detection system, making it a valuable dimen- Experiment 4: comparison between PCA and random projection
sionality reduction technique for practical implementation. Therefore,
for further comparison with the original high-dimensional data set and Table 10 presents a comparison of accuracy results after applying
PCA results, we will consider the outcomes associated with the Gaussian two projection techniques, PCA and random projection (Gaussian), on
matrix projection. the NSL-KDD data set. The table showcases the impact of these dimen-
Table 7 highlights that the PART algorithm achieved the highest re- sionality reduction methods on the classification performance of various
ported accuracy of 82.0 %, making it the best-performing approach in supervised machine learning algorithms. The accuracy values for each
this study. The associated confusion matrix in Table 8 allows us to cal- algorithm are reported, allowing for a direct comparison between the
culate precision, recall, and F1 measures. Precision is calculated by di- two projection approaches.
viding true positives by the sum of true positives and false positives, From the table, it can be observed how PCA and random projection
while recall is the ratio of true positives to the sum of true positives and (Gaussian) influence the performance of the classifiers. The comparison
false negatives. The F1 measure is calculated as the harmonic mean of provides insights into the effectiveness of each technique in enhancing
precision and recall. Table 9 presents these measures for both classes. the accuracy of intrusion detection systems. The results shed light on

6
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

which dimensionality reduction approach yields better performance for Conclusion and future work
each specific classifier, enabling the selection of the most suitable tech-
nique based on the desired classification outcome. Overall, this com- In this paper, we addressed the problem of automated intrusion de-
parison aids in understanding the trade-offs and benefits of using PCA tection and utilized the widely used NSL-KDD data set, which contains
and random projection (Gaussian) for intrusion detection tasks, offer- approximately 126,000 instances for training and 23,000 samples for
ing valuable guidance for building robust and efficient cyber security testing. We applied five popular classification algorithms to the full
systems. training data set, namely Bayes Net, Naïve Bayes, J48, PART, and Ran-
From the table, several noteworthy observations can be made: dom Forest. The best-reported results were achieved with the J48 algo-
rithm, attaining a relatively good accuracy of 79.1 %.
- Random projection (Gaussian) outperforms PCA: Across the major- To tackle the high dimensionality issue of the 41-dimensional feature
ity of the classification algorithms, random projection yields better vector, we experimented with two projection approaches: PCA and ran-
accuracy results compared to PCA. This suggests that random projec- dom projection. PCA demonstrated effectiveness in enhancing the per-
tion is more effective in preserving the essential data characteristics formance of three out of the five tested classifiers, resulting in improve-
and improving classification performance for intrusion detection. ments ranging from 3.1 % to 5.7 %. The success of PCA can be attributed
- PART classifier with random projection achieves the highest accu- to its ability to transform the feature space into a lower-dimensional sub-
racy: Among all the experiments conducted, the best-reported accu- space while retaining crucial information through feature selection. This
racy on the data set is achieved by the PART classifier after applying led to a more efficient and informative data representation, thereby im-
random projection. This highlights the effectiveness of random pro- proving classifier performance. Moreover, PCA’s noise reduction capa-
jection in enhancing the performance of this specific classifier for bility contributed to more accurate and robust classifiers by emphasizing
intrusion detection. essential data patterns while reducing noise and irrelevant information.
- Encouraging results for random projection: The results indicate that Additionally, PCA’s ability to prevent overfitting was valuable for high-
random projection is a promising dimensionality reduction tech- dimensional datasets, as it provided a more generalized representation
nique for intrusion detection. Its simplicity, power, and faster im- of the data. Furthermore, the computational efficiency gained through
plementation make it a viable alternative to PCA in enhancing the PCA was beneficial, as it reduced the computational burden on clas-
accuracy of classifiers for cyber security tasks. sifiers, making them suitable for real-time or large-scale applications.
Additionally, we found that random projection was also effective, im-
Overall, the comparison demonstrates that random projection is a proving the performance of the majority of classifiers compared to the
valuable technique for improving the performance of intrusion detec- original data set. The best-reported accuracy after applying random pro-
tion systems. Its advantages over PCA in terms of accuracy and compu- jection was 82.0 %, outperforming the accuracy achieved before using
tational efficiency make it an appealing choice for real-world applica- this technique. Moreover, random projection proved to be more efficient
tions. The encouraging results from these experiments further motivate than PCA, requiring less training time for most classifiers.
researchers and practitioners to explore and leverage random projection For future work, we intend to explore other dimensionality reduc-
as an effective tool in the field of cyber security and intrusion detection. tion techniques, such as LDA and Kernel PCA and other start-of-the-art
methods, such as the new method developed in [20], to assess their
Experiments summary impact on classification performance. Conducting experiments on var-
ious data sets will allow us to identify the most effective approach for
In summary, the experiments reveal the following key points: enhancing intrusion detection systems’ accuracy and efficiency. A com-
bination of supervised, unsupervised, and semi-supervised techniques
- The full training data set demonstrates effectiveness for classifica- can be employed to enhance the overall effectiveness of the intrusion
tion, achieving 79.1 % accuracy and 18.5 % false-positive rate (FPR) detection system in a dynamic and evolving cyber threat landscape in
on the testing data set using the J48 algorithm. However, due to its the future work as well. Recently, deep learning methods are also ap-
large size (around 126,000 instances), training with the full data set plied to intrusion detection [22,23]. In the future, deep learning will
requires a significant amount of time. be explored to facilitate intrusion detection systems. By gaining insights
- PCA has been effective in enhancing the performance of three clas- into the strengths and limitations of different techniques, we aim to de-
sifiers, showing promise in reducing dimensionality and improving velop more robust and reliable intrusion detection systems capable of
classification results. However, the best-reported accuracy achieved effectively countering evolving cyber threats in practical scenarios.
using the full data set surpasses the accuracy attained with PCA.
- Random projection is highly effective in enhancing the performance Declaration of competing interest
of the majority of classifiers, with accuracy improvements of more
than 8.0 % observed with the PART algorithm. The authors declare the following financial interests/personal rela-
- Applying random projection to the data set provides better accuracy tionships which may be considered as potential competing interests
results when compared to using the full training data set, offering a Faisal Nabi reports financial support and writing assistance were pro-
more efficient dimensionality reduction technique. vided by University of Southern Queensland. Faisal nabi reports a rela-
- Random projection outperforms PCA with the majority of classifiers tionship with usq that includes: non-financial support. Faisal nabi has
and requires much less time for computation, making it a more fa- patent pending to n/a. n/a
vorable option in terms of both accuracy and efficiency.
CRediT authorship contribution statement
These findings suggest that while the full training data set demon-
strates strong classification performance, its large size poses computa- Faisal Nabi: Writing – original draft. Xujuan Zhou: Formal analysis,
tional challenges. PCA and random projection provide effective dimen- Supervision.
sionality reduction techniques, with random projection showing partic-
ular promise in achieving improved accuracy and efficiency across var- References
ious classifiers. As a result, random projection emerges as a viable and
valuable approach for intrusion detection systems and cyber security [1] A. Verma, V. Ranga, Machine learning based intrusion detection systems for IoT
applications, Wirel. Person. Commun. 111 (4) (2020) 2287–2310.
applications, offering a powerful alternative to the traditional methods [2] A. Thakkar, R. Lohiya, A review of the advancement in intrusion detection datasets,
for enhancing classification performance. Procedia Comput. Sci. 167 (2020) 636–645.

7
F. Nabi and X. Zhou Cyber Security and Applications 2 (2024) 100033

[3] A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, Survey of intrusion detection [13] F. Salo, M. Injadat, A. Moubayed, A.B. Nassif, A. Essex, Clustering enabled classifica-
systems: techniques, datasets and challenges, Cyber Secur. 2 (1) (2019) 1–22. tion using ensemble feature selection for intrusion detection, in: 2019 International
[4] R. Bace, P Mell, NIST Special Publication On Intrusion Detection Systems, Booz-Allen Conference on Computing, Networking and Communications (ICNC), IEEE, 2019,
And Hamilton Inc, Mclean VA, 2001. pp. 276–281.
[5] H. Liu, B. Lang, Machine learning and deep learning methods for intrusion detection [14] A. Juvonen, T. Hamalainen, An efficient network log anomaly detection system using
systems: a survey, Appl. Sci. 9 (20) (2019) 4396. random projection dimensionality reduction, in: 2014 6th international conference
[6] M.C. Belavagi, B. Muniyal, Performance evaluation of supervised machine learning on new technologies, mobility and security (NTMS), IEEE, 2014, pp. 1–5.
algorithms for intrusion detection, Procedia Comput. Sci. 89 (2016) 117–123. [15] G.D.C. Bertoli, L.A.P. Júnior, O. Saotome, A.L. Dos Santos, F.A.N. Verri, C.A.C. Mar-
[7] K. Kumar, J.S. Batth, Network intrusion detection with feature selection techniques condes, . . . J.M.P De Oliveira, An end-to-end framework for machine learning-based
using machine-learning algorithms, Int. J. Comput. Appl. 150 (12) (2016). network intrusion detection system, IEEE Access 9 (2021) 106790–106805.
[8] I. Kumar, N. Mohd, C. Bhatt, S.K. Sharma, Development of IDS using supervised ma- [16] https://www.unb.ca/cic/datasets/nsl.html, Accessed 29-4-2021
chine learning, in: Soft computing: Theories and Applications, Springer, Singapore, [17] C.M. Bishop, Pattern Recognition and Machine Learning, springer, 2006.
2020, pp. 565–577. [18] Witten, I.H., & Frank, E. (2002). Data mining: practical machine learning tools and
[9] P. Nskh, M.N. Varma, R.R. Naik, Principle component analysis based intrusion detec- techniques with Java implementations.
tion system using support vector machine, in: 2016 IEEE International Conference on [19] A. Devarakonda, N. Sharma, P. Saha, S. Ramya, Network intrusion detection: a com-
Recent Trends in Electronics, Information & Communication Technology (RTEICT), parative study of four classifiers using the NSL-KDD and KDD’99 datasets, Journal
IEEE, 2016, pp. 1344–1350. of Physics: Conference Series, 2161, IOP Publishing, 2022.
[10] S. Waskle, L. Parashar, U. Singh, Intrusion detection system using PCA with random [20] S. Anita, S.M. Hadi, N.H. Nosrati, Network intrusion detection using data dimensions
forest approach, in: 2020 International Conference on Electronics and Sustainable reduction techniques, J. Big Data 10 (1) (2023).
Communication Systems (ICESC), IEEE, 2020, pp. 803–808. [21] K. Arunesh, M. Manoj Kumar, A comparative study of classification techniques for
[11] N. Bindra, M. Sood, Detecting DDoS attacks using machine learning techniques intrusion detection using Nsl-Kdd data sets, Int. J. Adv. Technol. Eng. Sci. 5 (2)
and contemporary intrusion detection dataset, Autom. Control Comput. Sci. 53 (5) (2017).
(2019) 419–428. [22] L. Ashiku, C. Dagli, Network intrusion detection system using deep learning, Proce-
[12] N. Aboueata, S. Alrasbi, A. Erbad, A. Kassler, D. Bhamare, Supervised machine dia Comput. Sci. 185 (2021) 239–247.
learning techniques for efficient network intrusion detection, in: 2019 28th Interna- [23] Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, F. Ahmad, Network intru-
tional Conference on Computer Communication and Networks (ICCCN), IEEE, 2019, sion detection system: a systematic study of machine learning and deep learning
pp. 1–8. approaches, Transact. Emerg. Telecommun. Technolog. 32 (1) (2021) e4150.

You might also like