TSP JCS 46915
TSP JCS 46915
DOI: 10.32604/jcs.2023.046915
ARTICLE
Adil Hussain1 , Amna Khatoon2, *, Ayesha Aslam2 , Tariq1 and Muhammad Asif Khosa1
1
School of Electronics and Control Engineering, Chang’an University, Xi’an, 710000, China
2
School of Information Engineering, Chang’an University, Xi’an, 710000, China
*Corresponding Author: Amna Khatoon. Email: [email protected]
Received: 19 October 2023 Accepted: 21 November 2023 Published: 03 January 2024
ABSTRACT
The importance of cybersecurity in contemporary society cannot be inflated, given the substantial impact of
networks on various aspects of daily life. Traditional cybersecurity measures, such as anti-virus software and
firewalls, safeguard networks against potential threats. In network security, using Intrusion Detection Systems
(IDSs) is vital for effectively monitoring the various software and hardware components inside a given network.
However, they may encounter difficulties when it comes to detecting solitary attacks. Machine Learning (ML)
models are implemented in intrusion detection widely because of the high accuracy. The present work aims to
assess the performance of machine learning algorithms in the context of intrusion detection, providing valuable
insights into their efficacy and potential for enhancing cybersecurity measures. The main objective is to compare
the performance of the well-known ML models using the UNSW-NB15 dataset. The performance of the models is
discussed in detail with a comparison using evaluation metrics and computational performance.
KEYWORDS
Intrusion detection; machine learning; performance analysis
1 Introduction
Cybersecurity is an essential area of study due to the expanding influence of networks on modern
life. Cybersecurity approaches primarily utilize anti-virus software, firewalls, and intrusion detection
systems (IDSs). These strategies serve to safeguard networks against both internal and external
threats. One of the integral components in safeguarding cyber security is an IDS, which assumes a
critical function in monitoring the operational status of software and hardware within a network.
In recent years, IDS has been implemented with the primary goal of examining network traffic and
promptly detecting any instances of malicious activities or potential threats [1]. An IDS, like a firewall,
is designed to safeguard the principles of confidentiality, integrity, and availability, which are the
primary objectives that potential attackers want to compromise [2]. To assess the efficacy of an IDS,
specific criteria have been established to define the desirable attributes that a proficient IDS should
exhibit. It includes precision, limited additional costs, satisfactory efficiency, and comprehensiveness,
as mentioned above.
This work is licensed under a Creative Commons Attribution 4.0 International License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
2 JCS, 2024, vol.6
IDS encompass two primary ways of detecting attacks: anomaly-based detection and signature-
based (or misuse detection) detection. The former method is employed to assess and contrast the
alterations in the system’s behaviour to identify any irregularities in the actions or activities of the
system. As an illustration, the system will generate an alarm for anomaly detection in the event
of an excessive number of user queries to the database compared to the historical data. It also
presents a drawback, as it is imperative to acknowledge that not all users should be seen as identical.
Furthermore, it should be noted that each day is characterized by its unique circumstances. As
a result, implementing a dynamic query restriction would prove to be more advantageous in this
context [3]. Another drawback is implementing anomaly detection in a dynamic context, wherein
user requirements evolve and may not align with past data [2]. Therefore, it can be concluded that
anomaly detection is prone to generating false positive results [3]. In the subsequent approach of IDS,
data referred to as fingerprints or signatures about prior instances of successful assaults are retained,
and regulations are formulated within the IDS. The rules are then juxtaposed with incoming packets
introduced into the network [2]. One primary drawback of this methodology is its inherent limitation
to pre-existing knowledge, rendering it inefficient in identifying novel attacks. Furthermore, ensuring
that the IDS remains current with the latest information on emerging attack techniques [4–6].
An additional aspect that should be acknowledged in this context is the categorization of IDS
into two types: Host-based IDS (HIDS) and Network-based IDS (NIDS). The HIDS is responsible
for monitoring the activities of a specific device within a network, specifically focusing on user actions
such as log file analysis and software installations [2,7]. HIDS have demonstrated high efficacy in
detecting attacks and facilitating the surveillance of individual user behaviour. However, previous
research has demonstrated that it is ineffective in countering significant isolated attacks [2]. In contrast,
the NIDS is strategically positioned near the firewall to effectively oversee the entirety of the network
and its interconnected devices, therefore scrutinizing the ingress and egress traffic [7]. The contents of
each packet are subjected to analysis to identify protocols that may indicate the presence of malicious
activity. One of the primary benefits of NIDS is their ability to provide faster response times, enabling
more efficient mitigation of potential threats.
Furthermore, NIDS are commonly implemented as independent entities that do not exert any
influence on the power consumption of the network devices, unlike HIDS. Nevertheless, one notable
limitation of NIDS is their handling of encrypted data. The inability of the IDS to analyze the data
contained within an encrypted packet hinders its ability to identify potential attack risks [2] accurately.
In the security realm, ML is already employed in various domains to safeguard people and
networks from potential security breaches. A prevalent domain in which many consumers engage with
machine learning, often without awareness, is the utilization of spam detection filters in e-mail systems.
This instance is sometimes used as an illustration of pattern recognition, owing to the extensive array of
features or attributes that can be linked to spam [3]. For many users, spam is perceived as an annoyance
rather than anything else. However, regrettably, it can have a somewhat more malevolent connotation,
such as the act of perpetrating identity theft, achieved through the manipulation of a user’s behaviour
to induce them into engaging with a hyperlink. The problem of spam e-mails has persisted for a
considerable duration, and implementing ML in spam detection has significantly diminished the
volume of spam messages received by users. It phenomenon is observable in the context of Gmail
[8]. In 2015, Google stated that they successfully prevented 99.9% of spam e-mails by implementing
ANN into their spam filtering system. Subsequently, they reported blocking over 100 million spam
e-mails daily after implementing their open-source machine learning framework, TensorFlow [9].
JCS, 2024, vol.6 3
The study evaluates existing ML algorithms and assesses how well they perform compared to
many performance measures using the UNSW-NB15 dataset. The algorithms have undergone training
and testing procedures for binary and multiclass categorization to understand better. To have a deeper
understanding of the optimal performance of each algorithm, the present study focuses on the training
and testing of the algorithms. The findings derived from research will provide a more comprehensive
understanding of the suitability of these existing algorithms, including performance results, training,
and prediction time in a particular context to attain the most effective outcomes. Furthermore, this
study establishes a solid foundation for future investigations on integrating ML techniques with
intrusion detection systems.
This paper is organized as follows: Section 2 contains the Machine Learning models used for
this study’s performance analysis. Section 3 contains the methodology, dataset, and feature extraction
process. Section 4 contains the performance metrics, and Section 5 contains the results of the machine
learning models, including performance metrics and confusion matrix for each model. Section 6 is the
discussion section, which contains the comparison and analysis of the models. Section 7 concludes the
study.
2 Literature Survey
In recent years, Machine Learning (ML) has been a prominent technology that is also effective for
designing new anomaly-based intrusion detection systems [10–14]. Most of the experiments carried out
in IDS have used one or more benchmark datasets [15]. KDD99, NSL-KDD, UNSW-NB 15, CIDDS-
001, CICIDS2017, CSE-CIC-IDS2018, and several other datasets are examples of well-known IDS
collections. Every single one of those databases includes both typical and malicious traffic in equal
measure. To generate the characteristics, data relating to network packets were preprocessed, and each
entry was either categorized as a typical activity or as a specific kind of attack on the network. On
such datasets, various machine learning methods were utilized to differentiate between normal and
malicious traffic (a task known as binary classification problems) or identify specific attack types (a
task known as multiclass classification issues). Authors in [16] thoroughly examined the fundamentals
of intrusion detection systems, supervised machine learning techniques, cyber-security assaults, and
datasets.
Many studies in this field investigate the degree of accuracy achieved by various machine learning
algorithms when applied to benchmark datasets. Authors in [17] evaluated the accuracy of the
IDS classification results generated utilizing the NSL-KDD dataset using three different machine
learning approaches, including Random Forest (RF), Support Vector Machine (SVM), and K-Nearest
Neighbours (KNN). Authors in [18] presented a thorough experimental evaluation of various ML
techniques such as Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Artificial
Neural Network (ANN) on a recently published benchmark dataset known as UNSW-NB15, taking
into account both binary and multi-class classification. Tuan et al. [19] investigated the performance
of SVM, ANN, NB, Decision Tree, and unsupervised ML. They only looked at Distributed Denial of
Service (DDoS) assaults. They found unsupervised machine learning did the greatest job telling the
difference between DDoS attacks and normal network traffic. However, the specific unsupervised ML
techniques that were utilized were not revealed. Several articles test a single machine learning method
on one or more benchmark datasets, such as a variety of neural network types [13,20–23], RF [24],
SVM [25], K-means [26], etc. [13,20–23], RF [24], SVM [25], and K-means [26], etc.
Existing research mostly compares machine learning algorithms by measuring their performance
using accuracy and confusion matrix. The study compares ML algorithms using various performance
4 JCS, 2024, vol.6
criteria, including accuracy, precision, recall, and F1 score, coupled with the confusion matrix. In
addition, the models’ computational performance is examined by considering the training time,
prediction time, and total time. The next part provides a brief explanation of the algorithms that were
utilized in this work.
This study uses both supervised and unsupervised learning techniques. Supervised learning is
predicated on utilizing valuable information included within labelled data. The goal of classification
is widely recognized as the predominant objective in supervised learning, and it is also extensively
employed in IDS. Nonetheless, manually assigning labels to data is known to be both costly and
JCS, 2024, vol.6 5
time-intensive. Therefore, the primary limitation of supervised learning is the insufficiency of properly
tagged data. On the other hand, unsupervised learning can extract significant feature information
from unlabeled data, simplifying training data acquisition. Nevertheless, it is commonly observed
that unsupervised learning approaches tend to exhibit lower detection performance compared to their
supervised learning counterparts.
decision trees characterised by a distinctive approach. At every node, a random subset of features
is selected to determine the optimal split, introducing an additional element of randomness into
the decision-making procedure. In the intrusion detection domain, the application of Extremely
Randomised Trees is feasible for examining network traffic data. The features correspond to the
characteristics of network packets or system events, while the labels indicate whether an event is
considered benign or qualifies as an intrusion. One illustrative instance involves utilising Extremely
Randomised Trees to differentiate between typical network operations and potentially malicious
behaviour by examining network packet attributes such as source-destination information, protocol,
and packet size.
3 Methodology
This research uses supervised and unsupervised machine learning models for intrusion detection
using the UNSW-NB15 dataset. The models are trained and tested using the dataset. The dataset
is split into train and test sets used for the model’s training and testing. The steps for the model
evaluation are shown in Fig. 2. Once the model is trained using the training set, the test set is used
for the performance evaluation, and the performance of the model is highlighted in terms of accuracy,
precision, recall, F1 score, along with the Confusion Matrix. For the Neural Network MLP, GRU
8 JCS, 2024, vol.6
and LSTM models, the dataset is split into training and test sets for 0.8 and 0.2, respectively, and the
models are trained using 200 epochs for each model. The performance of all the models, including
performance evaluation and runtime evaluation, includes the training, prediction, and total time. The
methodology is shown in Fig. 2, where the dataset is split, having the labelled and unlabeled data, which
is then split into training and test sets. The training set is used to train the machine learning models,
which are then used for the test set prediction. Finally, the model’s performance is evaluated along
with accuracy, recall, precision, and F1 score, along with the confusion matrix. The computational
time of the models is also evaluated.
Figure 2: Methodology
3.1 Dataset
The UNSW-NB15 [27] dataset was compiled by the University of South Wales, where researchers
utilized the Bro tool to extract 49-dimensional features from network traffic using three virtual servers.
The UNSW-NB 15 dataset is selected as it encompasses a broader range of attack types than the
KDD99 dataset and exhibits greater characteristics. The data categories encompass standard data and
a collection of nine distinct types of intrusion. The features include flow features, fundamental features,
content features, temporal features, supplementary features, and annotated features. The UNSW-
NB15 dataset is a novel IDS dataset utilized in several contemporary research investigations. While the
impact of UNSW-NB15 is presently deemed less significant than that of KDD99, it is imperative to
create novel datasets to advance the development of new IDS that rely on machine learning techniques.
The TCPdump utility can capture a substantial amount of raw network traffic, namely 100
gigabytes, often stored in Pcap file format. The dataset consists of nine distinct categories of attacks:
Fuzzers, Analysis, Backdoors, Denial of Service (DoS), Exploits, Generic, Reconnaissance, Shellcode,
and Worms. The use of the Argus and Bro-IDS tools is observed, alongside the development of twelve
algorithms to generate a comprehensive set of 49 features accompanied by a class label.
4 Evaluation Metrics
Truly Positive (TN), False Negative (FN), True Positive (TP), and False Positive (FP) are the
primary components of the evaluation metrics that are employed to assess the performance of ML
algorithms. To assess the efficacy of the models, this study employs the confusion matrix, which offers
a comprehensive perspective on the algorithm’s performance. The evaluation metrics are shown in
Fig. 3.
JCS, 2024, vol.6 9
4.1 Accuracy
When analyzing the performance of ML algorithms, it is one of the most widely used assessment
measures. It can be attributed partly to its simplicity and ease of implementation [19]. Accuracy can
be conceptualized as quantifying the extent to which test data points have been correctly classified,
typically represented as a percentage. Nevertheless, it is advisable to refrain from measuring accuracy
solely on the training data due to the potential for overfitting. This is because the accuracy rate
may frequently appear greater than its true value, leading to an incorrect outcome. Mathematical
correctness can be quantified as follows:
TP + TN
Accuracy = (1)
P+N
One significant limitation of accuracy is its inability to distinguish between false positives (FP)
and false negatives (FN) [19]. Consequently, it becomes challenging to identify the areas where the
algorithm makes errors, potentially resulting in more severe issues depending on the context in which
the method is applied.
Precision and recall, like accuracy, are frequently employed in several domains because of their
ease of implementation and comprehensibility, which is their primary advantage. One of the primary
limitations associated with precision and memory is the omission of true negatives (TN). It implies
that the correctly categorised negatives do not influence the overall score of either criterion. Hence,
the omission of true negative (TN) ratings in evaluating algorithm performance can result in an overall
skewed perspective, and it is imperative to use TN scores only when their relevance is deemed necessary.
4.3 F1 Score
The F1 score, sometimes called the F-Measure [28], is a composite metric that combines accuracy
and recall through a weighted average. It metric yields a singular comprehensive score for evaluating
the performance of a classification model. The F1 score can be formally described in mathematical
terms as shown in Eq. (4):
2(Precision × Recall)
F= (4)
(Precision + Recall)
Since an algorithm may have great precision but low recall, or vice versa, it is regarded as a
common statistic since it can provide an overall picture of both precision and recall. The F1 score,
a commonly employed evaluation statistic that aims to provide a more equitable assessment of
precision and recall, should be utilised judiciously [21]. This phenomenon can be attributed to the
metric’s calculation method, which involves computing the weighted average of precision and recall.
According to the source, the weighted value is subject to variation between different calculations,
as the specific weights assigned are contingent upon the nature of the evaluation. According to the
authors in reference [21], it is recommended to establish a uniform weight that is applied to all metric
computations to address its concern.
5 Performance Evaluation
In this section, the performance of each model, including the evaluation metrics and confusion
matrix, is discussed, along with the computational time.
Table 1 (continued)
Metrics Results
Precision 92.83%
F1 score 92.80%
Training time 1.68 s
Prediction time 0.01 s
Total time 1.69 s
Table 2 (continued)
Metrics Results
Precision 95.09%
F1 score 95.05%
Training time 0.01 s
Prediction time 20.04 s
Total time 20.05 s
Table 3 (continued)
Metrics Results
Precision 96.38%
F1 score 96.38%
Training time 1.38 s
Prediction time 0.01 s
Total time 1.39 s
The performance comparison of all the models considering the performance metrics, including
accuracy, precision, recall and F1 score, is compared in Table 11.
Table 11: Performance metrics comparison of machine learning models
Accuracy Recall Precision F1 score
Logistic regression 92.80% 92.80% 92.83% 92.80%
KNN 95.04% 95.04% 95.09% 95.05%
Decision tree 96.38% 96.38% 96.38% 96.38%
Extra trees 97.53% 97.53% 97.55% 97.53%
Random forest 97.68% 97.68% 97.69% 97.68%
Gradient boosting 95.85% 95.85% 95.86% 95.85%
MLP 96.25% 96.25% 96.26% 96.25%
MLP (Keras) 96.10% 96.10% 96.10% 96.10%
GRU (Keras) 96.33% 96.33% 99.33% 96.33%
LSTM 96.48% 96.48% 96.48% 96.48%
In conclusion, the optimal model selection is contingent upon the task’s specific demands. Models
such as Extremely Randomized Trees and Random Forest demonstrate exceptional performance
across multiple evaluation metrics, rendering them highly appropriate for various applications. Never-
theless, models like KNN, Decision Trees, and LSTM exhibit commendable performance and possess
prospective benefits in interpretability and resource utilization. Moreover, although the MLP Neural
Network demonstrated flawless recall, there is room for enhancing its precision to achieve a more
equitable performance. The context and specific priorities of the task should guide the model selection
process.
Table 12 (continued)
Training time Prediction time Total time
Decision tree 1.4 0.0 1.4
Extra trees 3.7 0.2 3.9
Random forest 5.5 0.2 5.7
Gradient boosting 46.9 0.0 47.0
MLP 43.9 0.0 43.9
MLP (Keras) 41.3 1.7 43.0
GRU (Keras) 82.6 4.1 86.7
LSTM 86.0 3.0 89.0
The Gradient Boosting Classifier exhibits commendable performance but necessitates a consid-
erable training period of 46.92 s. However, it demonstrates exceptional efficiency once trained with a
prediction time of under 0.04 s. In terms of neural networks, the MLP Neural Network necessitates a
training duration of 43.92 s, a very extended period that remains feasible for numerous applications.
However, it boasts a noteworthy prediction time of 0.02 s. The computational efficiency of the LSTM
and GRU models is a challenge because of their extended training and prediction timeframes.
If real-time or near-real-time processing is a priority, models like Logistic Regression or KNN
might be preferred. However, if computational efficiency can be traded off for stronger predictive
performance, ensemble methods or neural networks like MLP may be the way to go. The context and
objectives of your project will play a significant role in determining the most suitable model.
7 Conclusion
The research focuses on applying machine learning and neural network models for intrusion
detection, employing the UNSW-NB15 dataset. The dataset provided by the University of South
Wales is a comprehensive compilation of information. It encompasses many attack types and extracts
49-dimensional features from network traffic. As a result, this dataset holds significant value for
current research in intrusion detection. The UNSW-NB15 dataset is critical in enhancing intrusion
detection systems relying on machine learning techniques. The study encompassed meticulous data
preparation and feature selection techniques to optimize the dataset for model training and evaluation.
The process included in this study encompassed several techniques for data preprocessing, such
as removing irrelevant or redundant features, using clamping to limit extreme values, applying a
logarithmic function to address skewed data, and reducing the number of labels in categorical
features. Implementing preprocessing processes is crucial in enhancing the model’s performance by
refining input data. The study utilized a complete range of evaluation criteria, encompassing accuracy,
precision, recall, F1 score, and specificity, to assess the effectiveness of the machine learning and
neural network models. These metrics provide a comprehensive viewpoint on the performance of
the models across several facets of classification. The study examined multiple models, classifying
their performance according to accuracy, efficiency, and a trade-off between them. Several models
exhibited exceptional performance, including Extremely Randomized Trees and Random Forest
ensemble techniques, which revealed incredible accuracy and precision.
22 JCS, 2024, vol.6
The performance of models such as KNN and Decision Trees was moderate, demonstrating
a balanced compromise between accuracy and efficiency. The models that exhibited potential for
enhancement encompassed Logistic Regression and the Gradient Boosting Classifier; both demon-
strated commendable performance, albeit necessitating lengthier training durations. The MLP neural
network exhibited flawless recall; nevertheless, there was scope for enhancement in terms of precision.
The selection of the most appropriate model is contingent upon the objectives of the intrusion
detection system. Logistic Regression and KNN models have demonstrated their suitability for real-
time processing and great efficiency. Ensemble approaches and neural networks, including LSTM and
GRU, exhibit robust predicting capabilities but potentially necessitate extended training durations.
The determination ought to be influenced by contextual factors and project priorities. Finally, the
consideration of computational efficiency plays a crucial role in the process of selecting a model.
Certain models, such as Logistic Regression and KNN, exhibit notable efficiency and are well-suited
for real-time processing. Alternative approaches, such as ensemble methods and neural networks,
have longer training durations, although they exhibit robust predictive skills. The optimization of
performance and efficiency must be congruent with the requirements and limitations of the project.
Acknowledgement: None.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: models’ implemen-
tation: Adil Hussain. Ayesha Aslam; data collection: Tariq; analysis and interpretation of results:
Adil Hussain, Amna Khatoon, Tariq; draft manuscript preparation: Adil Hussain, Muhammad Asif
Khosa. All authors reviewed the results and approved the final version of the manuscript.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the
present study.
References
[1] D. E. Denning, “An intrusion-detection model,” IEEE Transactions on Software Engineering, vol. 13, no.
2, pp. 222–232, 1987.
[2] G. G. Liu, “Intrusion detection systems,” in Applied Mechanics and Materials, vol. 596. Switzerland: Trans
Tech Publications Ltd., pp. 852–855, 2014. https://doi.org/10.4028/www.scientific.net/AMM.596.852
[3] C. Chio and D. Freeman, Machine Learning and Security: Protecting Systems with Data and Algo-
rithms. O’Reilly Media, Inc., 2018. [Online]. Available: https://www.oreilly.com/library/view/machine-
learningand/9781491979891/ (accessed on 13/11/2023)
[4] A. Ali, S. Shaukat, M. Tayyab, M. A. Khan, J. S. Khan et al., “Network intrusion detection leveraging
machine learning and feature selection,” in Proc. of IEEE ICAIS, New Jersey, USA, pp. 49–53, 2020.
[5] S. Shaukat, A. Ali, A. Batool, F. Alqahtani, J. S. Khan et al., “Intrusion detection and attack classification
leveraging machine learning technique,” in Proc. of IEEE IIT, Al Ain, UAE, pp. 198–202, 2020.
[6] L. N. Tidjon, M. Frappier and A. Mammar, “Intrusion detection systems: A cross-domain overview,” IEEE
Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3639–3681, 2019.
[7] M. Rege and R. B. K. Mbah, “Machine learning for cyber defense and attack,” in Int. Conf. on Data
Analytics, Athens, Greece, pp. 83, 2018.
[8] The Verge. [Online]. Available: https://www.theverge.com/2019/2/6/18213453/gmail-tensorflow-machine-
learning-spam-100-million (accessed on 13/11/2023).
JCS, 2024, vol.6 23
[9] N. Japkowicz, “Why question machine learning evaluation methods,” in Proc. of AAAl, Massachusetts,
USA, pp. 6–11, 2006.
[10] S. A. Raza, S. Shamim, A. H. Khan and A. Anwar, “Intrusion detection using decision tree classifier with
feature reduction technique,” Mehran University Research Journal of Engineering & Technology, vol. 42, no.
2, pp. 30–37, 2023.
[11] P. Dixit and S. Sanjay, “Analysis of state-of-art attack detection methods using recurrent neural network,”
in Proc. of PCCDS, Springer Singapore, pp. 795–804, 2022.
[12] A. Devarakonda, S. Nilesh, S. Prita and S. Ramya, “Network intrusion detection: A comparative study
of four classifiers using the NSL-KDD and KDD’99 datasets,” Journal of Physics: Conference Series, vol.
2161, no. 1, pp. 012043, 2022.
[13] S. P. Thirimanne, L. Jayawardana, L. Yasakethu, P. Liyanaarachchi and C. Hewage, “Deep neural network
based real-time intrusion detection system,” SN Computer Science, vol. 3, no. 2, pp. 145, 2022.
[14] S. Wang, J. F. Balarezo, S. Kandeepan, A. Al-Hourani, K. G. Chavez et al., “Machine learning in network
anomaly detection: A survey,” IEEE Access, vol. 9, pp. 152379–152396, 2021.
[15] M. Ghurab, G. Gaphari, F. Alshami, R. Alshamy and S. Othman, “A detailed analysis of benchmark
datasets for network intrusion detection system,” Asian Journal of Research in Computer Science, vol. 7,
no. 4, pp. 14–33, 2021.
[16] E. E. Abdallah and A. F. Otoom, “Intrusion detection systems using supervised machine learning
techniques: A survey,” Procedia Computer Science, vol. 201, pp. 205–212, 2022.
[17] Y. A. Al-Khassawneh, “An investigation of the intrusion detection system for the NSL-KDD dataset using
machine-learning algorithms,” in Proc. of IEEE eIT, Romeoville, USA, pp. 518–523, 2023.
[18] A. R. Bahlali and A. Bachir, “Machine learning anomaly-based network intrusion detection: Experimental
evaluation,” in Proc. of AINA, Juiz de Fora, Brazil, pp. 392–403, 2023.
[19] T. A. Tuan, H. V. Long, L. H. Son, R. Kumar, I. Priyadarshini et al., “Performance evaluation of Botnet
DDoS attack detection using machine learning,” Evolutionary Intelligence, vol. 13, no. 2, pp. 283–294, 2020.
[20] C. Yin, Y. Zhu, J. Fei and X. He, “A deep learning approach for intrusion detection using recurrent neural
networks,” IEEE Access, vol. 5, pp. 21954–21961, 2017.
[21] B. Ingre and A. Yadav, “Performance analysis of NSL-KDD dataset using ANN,” in Proc. of IEEE
SPACES, Guntur, India, pp. 92–96, 2015.
[22] J. Kim, J. Kim, H. Kim, M. Shim and E. Choi, “CNN-based network intrusion detection against denial-
of-service attacks,” Electronics, vol. 9, no. 6, pp. 916, 2020.
[23] B. Roy and H. Cheung, “A deep learning approach for intrusion detection in Internet of Things using bi-
directional long short-term memory recurrent neural network,” in Proc. of ITNAC, Sydney, Australia, pp.
1–6, 2018.
[24] N. Farnaaz and M. Jabbar, “Random forest modeling for network intrusion detection system,” Procedia
Computer Science, vol. 89, pp. 213–217, 2016.
[25] M. S. Pervez and D. M. Farid, “Feature selection and intrusion classification in NSL-KDD cup 99 dataset
employing SVMs,” in Proc. of IEEE SKIMA, Dhaka, Bangladesh, pp. 1–6, 2014.
[26] V. Kumar, H. Chauhan and D. Panwar, “K-means clustering approach to analyze NSL-KDD intrusion
detection dataset,” International Journal of Soft Computing and Engineering, vol. 4, pp. 2231–2307, 2013.
[27] M. Nour and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems
(UNSW-NB15 network data set),” in Proc. of MilCIS, Canberra, Australia, pp. 1–6, 2015.
[28] D. Hand and P. Christen, “A note on using the f-measure for evaluating record linkage algorithms,”
Statistics and Computing, vol. 28, no. 3, pp. 539–547, 2018.