Towards Autonomous Cybersecurity: An Intelligent Automl Framework For Autonomous Intrusion Detection

Uploaded by

asjeevannavar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Towards Autonomous Cybersecurity: An Intelligent Automl Framework For Autonomous Intrusion Detection

Uploaded by

asjeevannavar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Towards Autonomous Cybersecurity: An Intelligent AutoML

Framework for Autonomous Intrusion Detection

Li Yang Abdallah Shami
Ontario Tech University Western University
Oshawa, Canada London, Canada
[email protected] [email protected]

ABSTRACT ACM Reference Format:

The rapid evolution of mobile networks from 5G to 6G has necessi- Li Yang and Abdallah Shami. 2024. Towards Autonomous Cybersecurity:
An Intelligent AutoML Framework for Autonomous Intrusion Detection. In
tated the development of autonomous network management systems,
Proceedings of the Workshop on Autonomous Cybersecurity (AutonomousCyber
arXiv:2409.03141v1 [cs.LG] 5 Sep 2024

such as Zero-Touch Networks (ZTNs). However, the increased com- ’24), October 14–18, 2024, Salt Lake City, UT, USA. ACM, New York, NY, USA,
plexity and automation of these networks have also escalated cyberse- 11 pages. https://doi.org/10.1145/3689933.3690833
curity risks. Existing Intrusion Detection Systems (IDSs) leveraging
traditional Machine Learning (ML) techniques have shown effec-
tiveness in mitigating these risks, but they often require extensive 1 INTRODUCTION
manual effort and expert knowledge. To address these challenges, The progression of mobile networks has played a pivotal role in the
this paper proposes an Automated Machine Learning (AutoML)- digital revolution, with each generation bringing forth new technolo-
based autonomous IDS framework towards achieving autonomous gies and capabilities. The fifth-generation (5G) networks have signif-
cybersecurity for next-generation networks. To achieve autonomous icantly enhanced mobile broadband and enabled massive machine-
intrusion detection, the proposed AutoML framework automates type communications with ultra-reliable low latency [23]. 5G net-
all critical procedures of the data analytics pipeline, including data works leverages abstraction and virtualization techniques, such as
pre-processing, feature engineering, model selection, hyperparam- Software-Defined Networking (SDN), Network Function Virtualiza-
eter tuning, and model ensemble. Specifically, it utilizes a Tabular tion (NFV), and Network Slicing (NS), to provide flexible, efficient,
Variational Auto-Encoder (TVAE) method for automated data bal- and automated network management and services [18].
ancing, tree-based ML models for automated feature selection and For the evolution from 5G to the sixth generation (6G) networks,
base model learning, Bayesian Optimization (BO) for hyperparame- network automation has become a necessity to meet the unprece-
ter optimization, and a novel Optimized Confidence-based Stacking dented demand for future network applications. 6G networks are
Ensemble (OCSE) method for automated model ensemble. The pro- expected to leverage Artificial Intelligence (AI), Machine Learning
posed AutoML-based IDS was evaluated on two public benchmark (ML), and automation techniques to provide functional modules and
network security datasets, CICIDS2017 and 5G-NIDD, and demon- operational services, leading to self-organizing and autonomous
strated improved performance compared to state-of-the-art cyberse- networks [23]. Previous researchers have extensively dedicated ef-
curity methods. This research marks a significant step towards fully forts to developing network automation architectures, including
autonomous cybersecurity in next-generation networks, potentially Intent-Based Network Management (IBN), Self-Organizing Network
revolutionizing network security applications. Management (SON), and Autonomic Network Management (ANM),
etc. [9]. Recently, Zero-Touch Networks (ZTNs) were proposed by
the European Telecommunications Standards Institute (ETSI) as a
CCS CONCEPTS
fully autonomous network management architecture with minimal
• Security and privacy → Network security; Intrusion/anomaly human involvement [36]. Network automation solutions, including
detection and malware mitigation; • Computing methodolo- ZTNs, can effectively decrease network operational costs, enhance
gies → Machine learning. resource utilization efficiency, and mitigate the risks associated with
human errors.
KEYWORDS On the other hand, as network and service management requires
Autonomous Cybersecurity; Intrusion Detection System; Zero-Touch a trustworthy and reliable system, cybersecurity has become a crit-
Network; AutoML; Machine Learning; Ensemble learning. ical component of next-generation networks. Modern networks
are vulnerable to various cyber-attacks, such as Denial of Service
(DoS), sniffing/eavesdropping, spoofing, web attacks, and botnets
[32]. These threats can lead to severe consequences, including finan-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed cial losses, disruption of critical services, compromise of sensitive
for profit or commercial advantage and that copies bear this notice and the full citation information, and reputational damage [4]. Therefore, effective cy-
on the first page. Copyrights for components of this work owned by others than the bersecurity measures should be developed to enhance the security
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission of modern networks, while autonomous cybersecurity solutions are
and/or a fee. Request permissions from [email protected]. essential for safeguarding future networks with high automation
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA requirements.
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-1229-6/24/10 AI/ML techniques are widely used in network applications to
https://doi.org/10.1145/3689933.3690833 develop data-driven cybersecurity mechanisms such as Intrusion
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA Li Yang and Abdallah Shami

Detection Systems (IDSs) and anomaly detection systems, which can of detecting various types of cyber-attacks to safeguard current and
analyze network traffic patterns and identify anomalies or cyber- future networks.
attacks [8]. AI/ML models have shown effectiveness in network This paper presents the following key contributions:
data analytics and IDS development, due to their capability to large (1) It proposes a novel and comprehensive AutoML framework1 .
volumes of network data, identify complex patterns, and adapt to that enables fully autonomous intrusion detection in next-
evolving threats. ML-based IDSs can detect malicious attacks and generation networks, holding the potential to achieve fully
predict potential threats based on historical data, thereby triggering autonomous cybersecurity.
countermeasures or response mechanisms to safeguard against the (2) It proposes a novel automated data balancing method based
detected attacks [32]. on TVAE and class distribution exploration.
To ensure robust cybersecurity in next-generation networks, such (3) It proposes a novel ensemble learning method, OCSE, which
as ZTNs, it is crucial to incorporate self-management functionali- extends the traditional stacking ensemble method by incorpo-
ties that address security concerns, such as self-configuration, self- rating confidence values of classes and the BO-TPE method
monitoring, self-healing, self-protection, and self-optimization [2]. for model optimization.
To meet these requirements, autonomous cybersecurity solutions, (4) It assesses the proposed AutoML-based IDS model using two
such as autonomous IDSs, should be developed to automatically public benchmark network security datasets, CICIDS2017 [26]
monitor network activities, detect network anomalies, and identify and 5G-NIDD [25] datasets, which contain state-of-the-art
potential attacks. cyber-attack scenarios.
Automated ML (AutoML) techniques, which are developed to au- (5) It compares the performance of the proposed AutoML-based
tomate the design and implementation of ML models, are promising IDS model with state-of-the-art methods.
solutions to realize network automation for ZTNs or future networks To the best of our knowledge, no previous research has proposed
[33]. AutoML techniques offer the advantage of automating labo- such a comprehensive autonomous IDS model that leverages Au-
rious and repetitive tasks involved in the ML and data analytics toML to automate all essential network data analytics procedures,
pipeline, such as data pre-processing, feature engineering, model ensuring efficient and automatic detection of diverse cyber-attacks
selection, and hyperparameter tuning [33]. This automation can ef- for safeguarding 5G and next-generation networks.
fectively reduce human effort, minimize the occurrence of human The paper is structured as follows: Section 2 introduces the re-
errors, and alleviate the need for extensive expert knowledge. In lated work using AI/ML and AutoML-based methods for developing
the cybersecurity domain, autonomous IDSs can be developed using IDSs and cybersecurity mechanisms. Section 3 presents a detailed
AutoML techniques by automatically designing, tuning, and optimiz- description of the proposed AutoML-based IDS framework, including
ing ML models that can effectively detect cyber-attacks and achieve AutoDP, AutoFE, automated base model selection, HPO, and auto-
self-monitoring and self-protection. mated model ensemble. Section 4 presents and discusses the experi-
Therefore, this paper proposes an AutoML-based autonomous mental results of evaluating the proposed framework on benchmark
IDS framework to automatically detect malicious cyber-attacks for network datasets. Finally, Section 5 summarizes the paper.
safeguading 5G and potential 6G networks. The proposed AutoML
framework enables the automation of critical procedures of the 2 RELATED WORK
ML/data analytics pipeline for intrusion detection. Specifically, it
AI/ML models have been extensively applied in recent years to the
consists of: an Automated Data Pre-processing (AutoDP) component
development of IDSs for modern networks. This related work sec-
that focuses on automated data balancing using the Tabular Varia-
tion aims to provide an overview of the critical studies that have
tional Auto-Encoder (TVAE) [31] method to address class-imbalance
contributed to the development and advancement of IDSs using ML
issues and improve data quality, an Automated Feature Engineering
and AutoML models for future networks.
(AutoFE) component that automatically selects the most relevant fea-
tures based on their average importance scores calculated using the
Gini index and entropy metrics, an automated base model learning
2.1 AI/ML-based IDSs
and selection component that automatically trains six tree-based ma- Research on developing IDSs using AI/ML models has gained signifi-
chine learning models—Decision Tree (DT) [32], Random Forest (RF) cant attention and importance, as threat hunting and cyber-attack
[29], Extra Trees (ET) [12], Extreme Gradient Boosting (XGBoost) [7], detection are critical components of cybersecurity systems for mod-
Light Gradient Boosting Machine (LightGBM) [16], and Categorical ern networks.
Boosting (CatBoost) [24]—and selects the top three best-performing Traditional AI/ML algorithms have demonstrated their effective-
models from them, a Hyper-Parameter Optimization (HPO) compo- ness in intrusion detection, especially tree-based algorithms such as
nent that automatically tunes and optimizes the hyperparameters DT and RF. Sharafaldin et al. [26] created the benchmark network
of the selected ML models using Bayesian Optimization with Tree- security dataset, CICIDS2017, and observed that the DT and RF algo-
structured Parzen Estimator (BO-TPE) [10] to obtain optimized base rithms outperformed the other compared ML models on this dataset.
models, and an automated model ensemble component that employs Maseer et al. [19] proposed a ML-based benchmarking Anomaly-
the proposed novel Optimized Confidence-based Stacking Ensemble based IDS (AIDS) approach that develops ten typical supervised and
(OCSE) method to generate the meta-learner for final intrusion de- unsupervised ML models and evaluates their performance on the
tection. Overall, the proposed AutoML-based IDS can automatically CICIDS2017 dataset. The experimental results illustrate that the DT
process network data and generate optimized ML models capable 1 Code for this paper is publicly available at: https://github.com/Western-OC2-Lab/
AutonomousCyber-AutoML-based-Autonomous-Intrusion-Detection-System
Towards Autonomous Cybersecurity: ... AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA

Table 1: Comparison of Various IDS Approaches with Emphasis on ML and AutoML Components.

Benchmark Automated
Traditional ML Model Model
Paper Dataset DL Models AutoDP AutoFE Model
Models Optimization Ensemble
Evaluation Selection
Sharafaldin et al. [26] ! ! !
Maseer et al. [19] ! ! !
Yang et al. [32] ! ! ! !
Agrafiotis et al. [1] ! !
Tayfour et al. [28] ! !
He et al. [13] ! ! ! !
Khan et al. [17] ! ! ! ! ! ! !
Elmasry et al. [11] ! ! ! ! !
Singh et al. [27] ! ! ! ! !
Proposed AutoML
! ! ! ! ! ! ! !
Framework

and K-Nearest Neighbor (KNN) based AIDS models perform the best processing minority class samples is significantly faster than dealing
on the CICIDS2017 dataset among the evaluated ML models. Yang et with the entire large dataset. Furthermore, the development of tradi-
al. [32] proposed a Multi-Tiered Hybrid IDS (MTH-IDS) framework tional ML/DL models for intrusion detection poses several critical
for intrusion detection in vehicular networks. It incorporates both challenges, such as manual effort, human bias & errors, and exper-
supervised learning algorithms (DT, RF, ET, and XGBoost) and un- tise requirements. These challenges underscore the importance of
supervised learning methods (k-means) to detect multiple types of automating AI/ML models and developing autonomous IDSs.
cyber-attacks. They evaluated their framework on the CAN-intrusion
dataset and the CICIDS2017 dataset to emphasize the model’s effec-
tiveness. 2.2 AutoML-based IDSs
The utilization of Deep Learning (DL) methods in the develop- AutoML techniques are promising solutions to develop autonomous
ment of IDSs has become prevalent due to their effectiveness in IDSs by automating the tedious procedures in the data analytics/ML
handling high-dimensional network traffic data. Agrafiotis et al. [1] pipeline. While AutoML is a relatively new research area in IDS
proposed the embeddings and Fully-Connected network (Embed- development, several recent works have already employed AutoML
dings & FC) model to detect malware traffic in 5G networks. This techniques to create autonomous IDSs for modern networks. Yang et
IDS model employs the Long Short-Term Memory Autoencoders al. [33] provided a comprehensive discussion on the general and spe-
(LSTM-AE) to transform packets into embeddings and uses the Fully- cific procedures of applying AutoML techniques to IoT data analytics
Connected (FC) network model to identify attacks. The Embeddings and conducted a case study to employ AutoML for IoT intrusion
& FC IDS demonstrates improved accuracy when applied to the 5G- detection tasks. Khan et al. [17] proposed an Optimized Ensemble
NIDD dataset, a dedicated dataset for 5G networks. Tayfour et al. IDS (OE-IDS) for intrusion detection in network environments. It
[28] proposed a DL-LSTM method supported by Software-Defined automates the hyperparameter tuning process of four supervised ML
Networking (SDN) to detect cyber-attacks in the Internet of Things algorithms and uses them to develop an ensemble model based on a
(IoT) and 5G networks. The DL-LSTM model achieved high accuracy soft-voting method. The OE-IDS model achieved better accuracy and
on the CICIDS2017 dataset, demonstrating the effectiveness of deep F1-scores than most other compared traditional ML models on the
learning in network intrusion detection. He et al. [13] proposed a CICIDS2017 and UNSW-NB15 datasets. Elmasry et al. [11] proposed
Pyramid Depthwise Separable Convolution neural network-based a double PSO and DL-based IDS for network intrusion detection. It
IDS (PyDSC-IDS) for network intrusion detection. The PyDSC-IDS involves the Particle Swarm Optimization (PSO) method to select
model uses Pyramid convolution (PyConv) to extract features from features and tune hyperparameters of three DL methods: Deep Neu-
data and Depthwise Separable Convolution (DSC) to reduce model ral Networks (DNN), LSTM, and Deep Belief Networks (DBN). This
complexity. Compared with other DL models, PyDSC-IDS achieves IDS model outperforms other compared methods in terms of accu-
higher detection accuracy with only a small increase in complexity racy and detection rate on the CICIDS2017 dataset. Singh et al. [27]
on the NSL-KDD, UNSW-NB15, and CICDIDS2017 datasets. proposed AutoML-ID, an AutoML-based IDS designed for Wireless
Due to the robustness of tree-based ML algorithms in handling Sensor Networks (WSNs). The AutoML-ID approach focuses on sim-
large-scale, high-dimensional, and non-linear network data, they ple automated ML model selection and hyperparameter optimization
are utilized as base models in the proposed framework for intrusion using Bayesian Optimization (BO). The model AutoML-ID was tested
detection. While DL models offer powerful data analysis capabilities, on a public IDS dataset, Intrusion-Data-WSN, and achieved better
they often come with higher computational complexity compared to performance than traditional ML models.
traditional ML algorithms. To mitigate the impact of this challenge, The existing literature has demonstrated the advantages of AutoML-
the proposed framework employs the TVAE method, a DL model, based IDSs in improving performance and reducing human effort in
only for synthesizing samples for minority classes. This approach intrusion detection and cybersecurity applications. However, many
proves to be more efficient than using it for intrusion detection, as current AutoML-based IDS models only focus on automated model
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA Li Yang and Abdallah Shami

selection and hyperparameter optimization, leaving significant poten-

tial for improvement in other crucial stages of the AutoML pipeline.
In our proposed AutoML framework, we aim to propose and de-
velop techniques to automate every critical step in the data analyt-
ics pipeline, including the TVAE-based automated data balancing
method in the AutoDP process to handle class imbalance issues, the
tree-based averaging method in the AutoFE process to reduce noise
and data complexity, automated model selection and BO-based hy-
perparameter optimization on tree-based ML algorithms to acquire
optimized base models, and the proposed OCSE method for auto-
mated model ensemble to further enhance model performance. Table
1 summarizes and compares the contributions of existing literature
introduced in Section 2.
Overall, this paper presents a generic, comprehensive, and fully
automated AutoML framework for future networks with high au-
tomation requirements.

3 PROPOSED FRAMEWORK
3.1 System Overview
The objective of this study is to develop an autonomous IDS model
capable of detecting various cyber-attacks to safeguard 5G and po-
tential 6G networks. The overall framework of the proposed AutoML-
based IDS is demonstrated in Fig. 1, which comprises five stages:
AutoDP, AutoFE, automated base model learning and selection, HPO,
and automated model ensemble. During the initial stage, AutoDP,
the input network traffic data undergoes pre-processing, where the
proposed automated data balancing method identifies and addresses
class-imbalance issues through the TVAE model to improve data
quality. In the AutoFE stage, the most relevant features are automat-
ically selected based on their importance scores calculated by the
Gini index and entropy metrics using tree-based algorithms. This
AutoFE process reduces data complexity and improves the general-
ization ability of the IDS model by minimizing noisy and redundant
features. Subsequently, during the automated base model learning
and selection stage, six tree-based ML algorithms (i.e., DT, RF, ET,
XGBoost, LightGBM, and CatBoost) are trained and evaluated on
the training set, and the top three best-performing models are au-
tomatically selected as the base models for further processing. In
the HPO stage, the three selected ML models are further optimized
through automated hyperparameter tuning or HPO using the BO-
TPE method. In the automated model ensemble stage, the confidence
values of all classes generated from the three optimized base models
are integrated using the proposed OCSE model to obtain the final
ensemble IDS model for final intrusion detection.
Overall, this comprehensive AutoML-based framework enables
the integration of advanced AutoML techniques across multiple
stages, collectively enhancing the detection capabilities and robust- Figure 1: The proposed AutoML-based IDS framework.
ness of the proposed autonomous IDS model against various cyber
threats for safeguarding next-generation networks.

often requiring massive human effort and expert knowledge. To ad-

3.2 Automated Data Pre-Processing (AutoDP) dress these challenges, Automated Data Pre-processing (AutoDP)
Data pre-processing is an essential stage in the ML and data ana- has emerged as a critical component of AutoML that aims to auto-
lytics pipeline, since it directly influences the quality of input data matically identify and address data quality issues in datasets, thereby
and, consequently, the performance of ML models [6]. However, ensuring that ML models can learn meaningful patterns from high-
data pre-processing procedures can be tedious and time-consuming, quality data [33].
Towards Autonomous Cybersecurity: ... AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA

In the proposed AutoML framework, the AutoDP component Algorithm 1: Automated Data Balancing Using Tabular Variational
focuses on automated data balancing, a crucial aspect of data pre- Auto-Encoder (TVAE)
processing that addresses class imbalance issues. Class imbalance is Input: 𝐷𝑡𝑟𝑎𝑖𝑛 : the original training set of the dataset
a common data quality issue in network data analytics and intrusion Output: 𝐷𝑡𝑟𝑎𝑖𝑛
𝑏𝑎𝑙 : the balanced training set

detection problems, as cyber-attacks or anomalies usually occur 1 𝑎𝑣𝑒𝑟𝑎𝑔𝑒_𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = AverageNumberOfSamplesPerClass(𝐷𝑡𝑟𝑎𝑖𝑛 )

less frequently compared to benign or normal events, leading to a // Calculate the average number of samples per class
2 𝑚𝑖𝑛𝑜𝑟𝑖𝑡 𝑦_𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = IdentifyMinorityClasses(𝐷𝑡𝑟𝑎𝑖𝑛 ,
significant imbalance in class distribution. Class imbalance issues
often bias ML models in IDS development, leading them to prioritize 𝑎𝑣𝑒𝑟𝑎𝑔𝑒_𝑠𝑎𝑚𝑝𝑙𝑒𝑠) // Identify classes with less than half the
average samples
the detection of normal sample and common attacks and neglecting
3 𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐_𝑑𝑎𝑡𝑎 = [] // Initialize an empty list to store synthetic
the less common but critical threats [33]. data
Data balancing techniques are designed to address class imbalance 4 for each class 𝑐𝑙𝑠 in 𝑚𝑖𝑛𝑜𝑟𝑖𝑡 𝑦_𝑐𝑙𝑎𝑠𝑠𝑒𝑠 do
issues and can be classified into under-sampling and over-sampling 5 𝑐𝑙𝑠_𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = ExtractClassInstances(𝐷𝑡𝑟𝑎𝑖𝑛 , 𝑐𝑙𝑠)
techniques. Under-sampling methods alter the class distribution // Extract instances of the minority class
by eliminating instances from the majority classes to balance data, 6 𝑛𝑢𝑚_𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒_𝑠𝑎𝑚𝑝𝑙𝑒𝑠 - Count(𝑐𝑙𝑠_𝑠𝑎𝑚𝑝𝑙𝑒𝑠)
which can result in the loss of critical patterns of normal network // Calculate the deficit in samples for the class
activities [15]. On the other hand, over-sampling methods balance 7 𝑇𝑉 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙 = TrainTVAE(𝑐𝑙𝑠_𝑠𝑎𝑚𝑝𝑙𝑒𝑠) // Train the
data by creating synthetic samples for the minority classes, which TVAE model on the minority class samples
may slightly increase model training time but often outperform 8 𝑛𝑒𝑤_𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = GenerateSyntheticInstances(𝑇𝑉 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙,
𝑛𝑢𝑚_𝑠𝑎𝑚𝑝𝑙𝑒𝑠) // Generate synthetic instances to match the
under-sampling methods. Therefore, over-sampling methods are
average class sample size
considered in this research. 9 𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐_𝑑𝑎𝑡𝑎.append(𝑛𝑒𝑤_𝑠𝑎𝑚𝑝𝑙𝑒𝑠) // Append the
Over-sampling methods can be broadly categorized into random new synthetic instances to the synthetic data list
and informed over-sampling methods [33]. Random over-sampling 10 end
randomly replicates samples from the minority classes, while in- 11 𝐷𝑡𝑟𝑎𝑖𝑛 = Concatenate(𝐷𝑡𝑟𝑎𝑖𝑛 , 𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐_𝑑𝑎𝑡𝑎)
𝑏𝑎𝑙 // Concatenate
formed methods aim to generate higher-quality samples to improve the original and synthetic data to form a balanced dataset
the data balancing performance. In the proposed framework, the 12 return 𝐷𝑡𝑟𝑎𝑖𝑛
𝑏𝑎𝑙

Tabular Variational Auto-Encoder (TVAE) [31] method is used as

an informed over-sampling method to synthesize minority class
samples to balance tabular network data. TVAE is a DL model that
extends the capabilities of the traditional autoencoder by incorporat-
ing probabilistic modeling and adapting to tabular data [31]. TVAEs
incorporate an encoder to project the input data x as a probability minority class samples that better represent patterns of less common
distribution in the latent space z, denoted as 𝑞(z|x), and a decoder to attacks, thereby assisting in effective IDS model development.
reconstruct the data generation process as a probability distribution
𝑝 (x|z). The objective function of TVAE is to maximize the Evidence
Lower BOund (ELBO) on the log-likelihood of data, denoted by [5]: 3.3 Automated Feature Engineering (AutoFE)
After the AutoDP stage, Automated Feature Engineering (AutoFE)
𝐸𝐿𝐵𝑂 = E[log 𝑝 (x|z)] − 𝐷𝐾𝐿 (𝑞(z|x)||𝑝 (z)) (1)
is another crucial component of the proposed AutoML framework.
where 𝐷𝐾𝐿 is the Kullback-Leibler (KL) divergence, a measure of the Feature Engineering (FE) involves extracting and selecting the most
difference between two probability distributions, and E denotes the informative and relevant features from a dataset, as the original
expectation. features are often suboptimal for specific datasets [14]. This process
To automate the data balancing procedure, the proposed AutoDP enhances the performance of ML models. AutoFE aims to automate
method consists of two procedures: automated class-imbalance detec- the traditional FE process, minimizing human effort on FE tasks. In
tion and automated data synthesis. In the automated class-imbalance the proposed framework, AutoFE focuses on the Feature Selection
detection procedure, the system calculates three metrics in the train- (FS) process, aiming to identify and select the most relevant features
ing set: the number of classes, the number of samples in each class, to construct a highly efficient and accurate ML model. The proposed
and the average number of samples per class. Based on these three Automated FS (AutoFS) method is designed based on the feature
metrics, the system will identify minority classes that have fewer importance scores generated by the tree-based algorithms used in
than a certain threshold, defined as half of the average number of the automated model learning process.
samples per class in the training set, in the proposed framework. If To construct a DT in tree-based algorithms, features that result in
there are minority classes indicating class imbalance, the system will significant reductions in Gini impurity or entropy will be assigned
automatically perform automated data synthesis by synthesizing higher importance scores, as they have an important impact on the
minority class samples using the TVAE method until the number node-splitting process. Gini impurity and entropy are two common
of samples in each minority class reaches the threshold (half of the evaluation metrics to measure the impurity of nodes in DTs for
average number of samples). The details of the proposed TVAE-based classification problems to which intrusion detection problems belong
automated data balancing method are provided in Algorithm 1. Fi- [35]. The Gini index quantifies the impurity of a node by evaluating
nally, a balanced training dataset is automatically obtained using the the probability of misclassifying a randomly selected element within
proposed AutoDP method, which involves generating high-quality that node. The Gini index is calculated as follows for a multi-class
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA Li Yang and Abdallah Shami

problem with 𝐾 classes [35]: child nodes based on the chosen criterion, and repeating this process
𝐾 until a stopping condition is met.
∑︁
Gini (𝑝 1, 𝑝 2, . . . , 𝑝𝐾 ) = 1 − 𝑝𝑖2 (2) RF [29] is an ensemble learning model constructed on multiple DT
𝑖=1 models. RF works by generating a set of DTs from randomly selected
subsets of the training set and then aggregates the votes from the
where 𝑝 1, 𝑝 2, . . . , 𝑝𝑘 indicate the proportions of classes 1, 2, . . . , 𝑘.
base DTs to decide the final result based on the majority voting rule.
Entropy is another impurity measure that calculates how much
ET [12] is another tree-based ensemble method constructed by
information is required to identify the class of a randomly selected
combining multiple DTs. However, it randomizes both features and
element within a node [35]. The entropy for a multi-class problem
cut-point choices to construct completely randomized trees. As the
with 𝐾 classes can be denoted by:
splitting points in ETs are chosen randomly, the constructed trees in
𝐾
∑︁ ETs are more diverse and less prone to over-fitting than in RF.
Entropy (𝑝 1, 𝑝 2, . . . , 𝑝𝐾 ) = − 𝑝𝑖 log2 (𝑝𝑖 ) (3) XGBoost [7] is an ensemble model built on the speed and per-
𝑖=1 formance of the Gradient-Boosted Decision Trees (GBDT) model.
In the proposed AutoFE framework, the FS process is automated by XGBoost distinguishes itself from traditional gradient boosting meth-
leveraging the power of tree-based models, as they can automatically ods by incorporating a regularization term into the objective function,
calculate the feature importance scores during their training process. which effectively controls the model’s complexity, smooths the fi-
The specific procedures of this AutoFS process are as follows: nal weights, and mitigates overfitting [33]. Additionally, XGBoost
(1) Train the six tree-based ML models (DT, RF, ET, XGBoost, uses a second-order Taylor expansion to estimate the loss function,
LightGBM, and CatBoost) on the training set and evaluate enabling an accurate model update and fast convergence.
their performance. LightGBM [16] is another improved version of the GBDT model
(2) Obtain the feature importance scores generated from the top with enhanced model performance and efficiency. Similar to other
three best-performing models. tree-based algorithms, LightGBM is constructed on an ensemble of
(3) Calculate the average relative importance score for each fea- DTs, but it introduces two advanced techniques, Gradient-based One-
ture across the top three best-performing models. Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) [33].
(4) Rank the features from highest to lowest based on these aver- GOSS is a down-sampling approach that keeps instances with large
age importance scores. gradients and randomly samples instances with small gradients to
(5) Select features from the top of this ranked list, accumulating save model training time and memory. EFB groups mutually exclu-
their importance scores until the sum reaches a predefined sive features into bundles as single features, reducing feature space
threshold, 𝛼 (default value is 0.9). dimensionality and improving training efficiency. By utilizing GOSS
(6) Generate the updated training and test sets using the newly and EFB, LightGBM significantly reduces data size without losing
generated feature set that comprises the selected features. critical information, preserving accuracy in the learning process and
reducing computational cost.
By implementing the proposed AutoFE process, the most relevant
CatBoost [24] is another GBDT-based algorithm that is particu-
features are selected based on their cumulative relative importance
larly effective for datasets with categorical features. CatBoost distin-
scores, ensuring a total of 90%. Simultaneously, features with a cumu-
guishes itself from traditional GBDT algorithms through three major
lative importance score below 10% are discarded, effectively reducing
innovations: symmetric trees, ordered boosting, and native feature
noise and computational complexity. The cumulative feature impor-
support. Symmetric trees ensure that all the DTs in the model are
tance threshold, 𝛼, can be tuned using the optimization method
symmetric, which simplifies the model and reduces the risk of over-
presented in Section 3.5 to customize it for specific tasks or problems.
fitting. Ordered boosting is a novel boosting scheme that prevents
The proposed AutoFE process helps to simplify the model, reduce the
overfitting on small-sized datasets. Native feature support allows
risk of overfitting, improve computational efficiency, and increase
CatBoost to handle categorical features for model performance en-
model interpretability.
hancement natively.
The primary reasons for choosing these six tree-based algorithms
3.4 Automated Base Model Learning and as candidate base models are as follows:
Selection
After the AutoDP and AutoFE procedures, the improved network (1) RF, ET, XGBoost, LightGBM, and CatBoost are all ensemble
traffic datasets are learners by supervised ML algorithms to train models that combine multiple base DTs to improve model
ML-based IDS that can detect various types of cyber-attacks. Six performance and robustness, and DT can serve as the baseline
tree-based ML models, i.e., DT, RF, ET, XGBoost, LightGBM, and model for comparison.
CatBoost, are built as base models to perform the initial intrusion (2) These methods are proficient at handling non-linear and high-
detection. dimensional data to which 5G network data belongs.
DT [32] is a fundamental ML algorithm that makes predictions (3) They support parallel computation, which can significantly
by learning decision rules inferred from the input features. The improve the training efficiency on large network datasets.
decision rules are formed in a tree structure, where each internal node (4) These tree-based ML algorithms offer the advantage of au-
represents a test on a feature, each branch denotes a test outcome, tomatically calculating feature importance in their training
and each leaf node contains a class label [32]. The DT algorithm process, which assists in efficient feature selection process in
recursively partitions data by selecting the best splitting rule, creating the proposed AutoFE method.
Towards Autonomous Cybersecurity: ... AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA

(5) These tree-based models incorporate randomness during their quantile 𝑦 ∗ . The KDEs are estimated using a kernel function with
tree construction process, which enables the proposed frame- a bandwidth that changes based on a provided dataset. The density
work to build a robust ensemble model with diverse base functions of the configurations, modeled by TPE, can be denoted by
models and increased generalizability. [30]:
After training and evaluating the performance of these six tree- 𝑝 𝒙 | 𝑦, 𝐷 (𝑙 ) , if 𝑦 ≤ 𝑦 ∗,



based models on the training set, the three best-performing models 𝑝 (𝒙 | 𝑦, 𝐷) = (5)
𝑝 𝒙 | 𝑦, 𝐷 (𝑔) , if 𝑦 > 𝑦 ∗,

based on their cross-validated F1-scores are automatically selected 
as the three base models to construct the ensemble model discussed The ratio between the two probability density functions is utilized
in Section 3.6. to determine the new configurations for evaluation, facilitating the
gradual identification of optimal configurations. BO-TPE is selected
3.5 Hyper-Parameter Optimization (HPO) as the HPO method to tune and optimize the hyperparameters of
Tuning hyper-parameters is a crucial step in deploying an effective the ML models in the proposed framework for the following reasons
ML model to a particular problem or dataset. Hyper-parameters of a [32] [33]:
ML model determine its architecture and have a direct effect on the (1) BO-TPE is effective for handling high-dimensional variables
performance of this ML model. The process of using optimization with multiple types, rendering it suitable for the tree-based ML
methods to automatically tune and optimize these hyper-parameters methods utilized in the proposed framework, which involve
is known as Hyper-Parameter Optimization (HPO). In HPO or Au- numerous hyperparameters.
toML tasks, Given the hyperparameter search space 𝑋 , the goal is (2) BO-TPE can handle tree-structured search spaces, enabling
to find the optimal hyper-parameter value or configuration 𝑥 ∗ that flexible and complex hyperparameter optimization, making
minimizes the objective function 𝑓 (𝑥) [30]: it well-suited for the tree-based ML models employed in the
proposed framework.
𝑥 ∗ = arg min 𝑓 (𝒙) (4) (3) Unlike other HPO methods, like grid search, which treats
𝒙 ∈𝑋
each hyperparameter configuration independently and causes
In the proposed AutoML framework, the important hyperparame-
many unnecessary evaluations, BO-TPE enables more efficient
ters of the six tree-based algorithms are optimized during the HPO
HPO by exploring promising regions and determining new
process. Utilizing terminology from the Scikit-Learn library, key
hyperparameter configurations based on previous evaluation
hyperparameters for the Decision Tree (DT) include ‘max_depth’,
results.
which sets the maximum tree depth; ‘min_samples_split’, specify-
(4) BO-TPE has low time complexity of 𝑂 (𝑛 log 𝑛), where 𝑛 is
ing the minimum number of samples required to split a node; and
the number of hyperparameter configurations, which is much
‘min_samples_leaf ’, defining the minimum number of samples re-
lower than other HPO methods, such as grid search, with time
quired at a leaf node. The ‘criterion’ hyperparameter allows selection
complexity of 𝑂 (𝑛𝑘 ) [33].
between Gini impurity and entropy to measure splitting quality.
As RF and ET are ensemble models built using DTs, they inherit By automatically tuning the hyperparameters of the three best-
these four critical hyperparameters from DT. Additionally, RF and performing base ML models using BO-TPE, three optimized ML
ET include the ‘n_estimators’ hyperparameter, which determines the models with improved intrusion detection effectiveness are obtained
number of trees in the ensemble and significantly influences model for further analysis.
performance and efficiency.
The number of base trees and maximum tree depth are two crucial 3.6 Automated Model Ensemble
hyperparameters shared by XGBoost, LightGBM, and CatBoost. Ad- After selecting the top three best-performing tree-based models and
ditionally, since these three algorithms are gradient-boosting models, optimizing their hyperparameters using BO-TPE, the three opti-
the learning rate is another critical hyperparameter that significantly mized models are utilized as base models to construct an ensemble
impacts their learning speed and overall performance. model for further performance enhancement. Ensemble learning is
Among the various optimization methods for HPO tasks, Bayesian an advanced technology that combines the prediction outcomes of
Optimization (BO) methods have proven to be efficient [10]. BO multiple individual ML models to make final predictions [34]. Ensem-
leverages a posterior distribution, known as the surrogate, to describe ble learning aims to improve model performance and generalizability
the function under optimization. As more observations are made, by leveraging the collective knowledge of multiple models.
the posterior distribution improves, which increases the certainty In the last stage of the proposed AutoML framework, a novel
about promising regions in the parameter space worth exploring Optimized Confidence-based Stacking Ensemble (OCSE) method is
and the unpromising regions. Therefore, BO methods can determine proposed to construct the final ensemble model by extending the
future hyper-parameter evaluations based on the results of previous traditional stacking ensemble strategy. Stacking is a widely-used
evaluations to avoid unnecessary model assessments [32]. ensemble learning method that comprises two layers of models. The
The Tree Parzen Estimator (TPE) is a common surrogate for BO to first layer of stacking contains multiple trained base learners, and
model the evaluated configurations [10]. BO with TPE, or BO-TPE, their output labels serve as the input for training a robust meta-
can handle a tree-structured hyper-parameter search space using learner in the second layer [20].
Parzen estimators, also known as kernel density estimators (KDEs) Compared with the traditional stacking method, the proposed
[30]. In BO-TPE, the hyper-parameter configuration space D is split OCSE method introduces two additional strategies: confidence inputs
into the better group 𝐷 (𝑙 ) and the worse group 𝐷 (𝑔) based on a top and optimization. Firstly, the three optimized base ML models provide
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA Li Yang and Abdallah Shami

a probability distribution over the target classes for each sample, Algorithm 2: The Proposed AutoML-based IDS Framework Involv-
which indicates the confidence of the models’ prediction for this ing The OCSE Algorithm
sample. The confidence values output by the base models are used as Input: 𝐷𝑡𝑟𝑎𝑖𝑛 , 𝐷𝑡𝑒𝑠𝑡
input to the meta-learner in the second layer of the proposed OCSE Output: 𝐿𝑝𝑟𝑒𝑑 : Predicted labels for normal and attack samples
model. 1 Stage 1: Automated Data Preprocessing (AutoDP)

Secondly, the best-performing base model from the six tree-based 2

𝑏𝑎𝑙
𝐷𝑡𝑟𝑎𝑖𝑛 = BalanceData(𝐷𝑡𝑟𝑎𝑖𝑛 ) using Algorithm 1
models, based on the cross-validated F1-scores, is selected to con- // Address class imbalance using Algorithm 1 and TVAE
struct the meta-learner. Its hyperparameters are optimized by BO- 3 Stage 2: Automated Feature Engineering and Base Model

TPE to obtain the optimized meta-learner, following the same HPO Selection (AutoFE)
4 Train and evaluate 𝐵𝑀𝑖 (DT, RF, ET, XGBoost, LightGBM,
process used for the base models, as described in Section 3.5. The
CatBoost) on 𝐷𝑡𝑟𝑎𝑖𝑛 𝑏𝑎𝑙 // Train and evaluate base
specifications of the proposed OCSE method and the entire AutoML
models using cross-validation
framework are illustrated in Algorithm 2.
5 𝑚𝑖 = Metrics(𝐵𝑀𝑖 ) for each model 𝑖 ∈ {1, 2, 3, 4, 5, 6}
The computational complexity of the proposed OCSE model is // Compute performance metrics for model evaluation
𝑂 (𝑛𝑐𝑚ℎ), where 𝑛 is the number of samples, 𝑐 is the number of 6 𝑀1 , 𝑀2 , 𝑀3 = SelectTopModels(𝑚𝑖 , 3) // Select the
unique classes, 𝑚 is the number of base models, and ℎ is the number of three best-performing models based on metrics
hyperparameter configurations of the meta-learner. In the proposed 7 𝐹 1 , 𝐹 2 , 𝐹 3 = FeatureImportances(𝑀1 , 𝑀2 , 𝑀3 )
framework, 𝑐, 𝑚, and ℎ are all relatively small numbers. // Calculate feature importance for selected models
Compared with other ensemble techniques, the proposed OCSE 8 𝐹𝑎𝑣𝑔 = 𝐹1 +𝐹32 +𝐹3 // Average the feature importance
method presents the following advantages: scores to improve generalization
9 𝐹𝑠 = SelectFeatures(𝐹𝑎𝑣𝑔 , 𝛼 = 90%) // Select most
(1) Utilization of Confidence: Unlike many existing ensemble tech- important features that cumulatively meet importance threshold 𝛼
niques, such as bagging, boosting, and traditional stacking, 10 𝑀1′ , 𝑀2′ , 𝑀3′ = RetrainModels(𝑀1 , 𝑀2 , 𝑀3 , 𝐹𝑠 ) // Retrain
which solely rely on the predicted labels to construct the models using selected features
ensemble model, the proposed OCSE method utilizes the 11 Stage 3: Hyperparameter Optimization (HPO) using BO-TPE

confidence of all classes as input features, which provides 12 𝑀1 = (𝐻 1 , 𝑆 1 ), 𝑀2 = (𝐻 2 , 𝑆 2 ), 𝑀3 = (𝐻 3 , 𝑆 3 ) // Configure

more comprehensive information about the certainty of base search spaces for the hyperparameters of three selected base models
model’s predictions, resulting in more informed and robust 13 ℎ 1∗ , ℎ 2∗ , ℎ 3∗ = OptimizeHP(𝑆 1 , 𝑆 2 , 𝑆 3 ) using BO-TPE //
Optimize hyperparameters using Bayesian Optimization
ensemble predictions.
14 𝑀1′′ , 𝑀2′′ , 𝑀3′′ = GenerateOptimizedModels(ℎ 1∗ , ℎ 2∗ , ℎ 3∗ ) //
(2) Automated and Optimized Models: The proposed OCSE method
Generate models with optimized hyperparameters
automatically selects the best-performing base model as the 15 Stage 4: Model Ensemble using Optimized Confidence-based
second-layer meta-learner and tunes its hyperparameters, re- Stacking Ensemble (OCSE)
sulting in an optimized final learner capable of achieving the 16 𝑃 1 , 𝑃 2 , 𝑃 3 = ConfidenceValues(𝑀1′′ , 𝑀2′′ , 𝑀3′′ , 𝐷𝑡𝑟𝑎𝑖𝑛𝑏𝑎𝑙 , 𝐹 )
𝑠
optimal overall performance. The automation process also // Retrieve confidence values from optimized models
reduces the need for manual effort and saves time in model 17 𝑀 = TrainMetaLearner(𝑃 1 , 𝑃 2 , 𝑃 3 , 𝑀1 ) // Use the
development. best-performing ML model to train a meta-learner on model
(3) Flexibility: OCSE is a flexible method in which both the base confidence values
models and the meta-learner can be replaced with other ML 18 𝑀 ′ = OptimizeMetaLearner(𝑀, BO-TPE) // Optimize the
meta-learner using BO-TPE
algorithms to adapt to a wide range of tasks.
19 𝑃 1′ , 𝑃 2′ , 𝑃 3′ = TestConfidences(𝑀1′′ , 𝑀2′′ , 𝑀3′′ , 𝐷𝑡𝑒𝑠𝑡 ) // Retrieve
Overall, with the use of the novel OCSE method and all the other confidence values for test data
critical components described in this section, the proposed AutoML- 20 𝐿𝑝𝑟𝑒𝑑 = FinalPredictions(𝑀 ′ , 𝑃 1′ , 𝑃 2′ , 𝑃 3′ ) // Predict final
based IDS framework can automatically generate an optimized en- labels using the optimized meta-learner based on confidence values
21 return 𝐿𝑝𝑟𝑒𝑑
semble model for effective and robust intrusion detection, serving as
a key component for autonomous cybersecurity solutions.

4 PERFORMANCE EVALUATION created by simulating a real-world network environment and in-

volves six primary types of attacks: DoS, botnets, brute force, infil-
4.1 Experimental Setup tration, port scan, and web attacks [32]. The diverse attack scenarios
The proposed AutoML-based IDS framework was developed in Python and comprehensive feature set of the CICIDS2017 dataset make it
by extending the Scikit-Learn [22], Xgboost [7], Lightgbm [16], Cat- suitable for network security applications. The 5G-NIDD dataset is
boost [24], Synthetic Data Vault (SDV) [21], and Hyperopt [3] li- one of the most state-of-the-art network security datasets developed
braries. The experiments were performed on a Dell Precision 3630 in December 2022 [25]. This dataset was generated by capturing
computer equipped with an i7-8700 processor and 16 GB of memory, network traffic within a 5G testbed under diverse DoS and port
which served as the server machine in 5G and potential 6G networks. scan cyber-attacks. The 5G-NIDD dataset is particularly suitable for
To evaluate the proposed AutoML-based IDS framework, two our research, as it specifically targets 5G networks and enables the
public benchmark network traffic datasets, namely CICIDS2017 [26] detection of new and sophisticated cyber-attacks.
and 5G-NIDD [25], are utilized in the experiments. The CICIDS2017 To develop and evaluate the proposed AutoML-based IDS model,
dataset is one of the most comprehensive public cybersecurity datasets, both cross-validation and hold-out validation methods are used in the
Towards Autonomous Cybersecurity: ... AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA

experiments. The model training and optimization process utilizes

five-fold cross-validation to automatically generate the optimized
ensemble model, while the final model produced by the proposed
AutoML framework is evaluated using an unseen test set, which is
split from an 80%/20% hold-out validation during the initial stage of
data pre-processing.
Due to the inherent class imbalance issues in network intrusion
detection datasets, four model performance metrics—accuracy, preci-
sion, recall, and F1-scores—are considered collectively in the experi-
ments. The F1-score is utilized as the primary performance metric
in the performance-based automated model selection and tuning
process of the proposed AutoML framework, as it offers a balanced
view of anomaly detection results by calculating the harmonic mean
of recall and precision. Additionally, the model execution time, in-
volving the training and inference time of the final OCSE model, is
utilized to assess the model’s efficiency.

Figure 2: The average importance scores of the selected fea-

4.2 Experimental Results and Discussion tures in the CICIDS2017 dataset (cumulative relative impor-
As described in Section 3, the proposed AutoML-based IDS comprises tance reaching 90%).
five critical stages: AutoDP, AutoFE, automated base model learning
and selection, HPO, and automated model ensemble. Initially, the
proposed TVAE-based automated data balancing method applied in
the AutoDP stage automatically balances the distributions of two
datasets, CICIDS2017 and 5G-NIDD, to prevent model bias. During
the AutoFE stage, important features are selected based on the av-
erage importance scores obtained from the three best-performing
ML models, where the accumulative importance score reaches the
threshold 𝛼 = 90%. These selected features, along with their relative
importance scores for the CICIDS2017 and the 5G-NIDD datasets, are
illustrated in Figs. 2 and 3, respectively. Subsequently, the three top-
performing ML models—RF, XGBoost, and LightGBM—are optimized
by tuning their hyperparameters using BO-TPE. The hyperparam-
eters tuned, their search spaces, and the optimal values obtained
for these hyperparameters for both datasets are detailed in Table 2.
Finally, the three optimized base ML models are integrated using the
proposed OCSE model to automate the model ensemble, improving
the decision-making effectiveness in intrusion detection. Figure 3: The average importance scores of the selected fea-
The performance of the proposed AutoML-OCSE model and sev- tures in the 5G-NIDD dataset (cumulative relative importance
eral state-of-the-art methods in the literature is provided in Table reaching 90%).
3 for the CICIDS2017 dataset and Table 4 for the 5G-NIDD dataset.
The performance is evaluated based on the metrics: accuracy, preci-
sion, recall, F1-score, training time, and average test time per sample. datasets. Compared with state-of-the-art methods on the CICIDS2017
Firstly, as shown in Table 3, results on the CICIDS2017 dataset indi- dataset, AutoML-OCSE demonstrates notable improvements in both
cate that RF, XGBoost, and LightGBM perform better than the other accuracy and inference efficiency. These enhancements are attributed
three base models, DT, ET, and CatBoost. Hence, these three ML primarily to the AutoDP, AutoFE, HPO, and automated model ensem-
models are selected as the base models of the proposed AutoML- ble procedures, which collectively enhance data quality, optimize
OCSE framework. After optimizing the hyperparameters of the three machine learning models, and reduce feature and model complexity.
selected base models (as detailed in Table 2) and integrating their out- Similarly, as indicated in Table 4, the proposed AutoML-OCSE
puts using the proposed OCSE ensemble method, the final AutoML- method outperforms all other compared methods [32] [29] - [24]
OCSE model outperforms all the compared methods in the literature [26] [1] on the 5G-NIDD dataset, achieving the highest accuracy,
[32] [26] [19] [28] - [11]. The proposed method achieves the highest precision, recall, and F1 score, all at 99.956%. In terms of average
metrics on the CICIDS2017 dataset, with accuracy, precision, recall, test time per sample, the AutoML-OCSE method matches the fastest
and F1-score of 99.806%, 99.806%, 99.806%, and 99.804%, respectively. time set by the DT method, demonstrating an exceptional balance
Furthermore, the average test time per sample of the proposed between performance and efficiency.
AutoML-OCSE method is the fastest among the compared methods Regarding the intrusion detection performance of the proposed
[32] [26] [19] [28] - [11], highlighting its efficiency on network traffic AutoML-OCSE on the CICIDS2017 and 5G-NIDD datasets, while
AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA Li Yang and Abdallah Shami

Table 2: The HPO configuration of the three best-performing its accuracy and F1-score are only slightly higher than those of the
models. best-performing base ML models, this is primarily attributed to the
simplicity of these datasets, where many base ML models can achieve
Model Hyperparameter Configuration Optimal Optimal over 99% accuracy and F1-score. In real-world scenarios, where the
Name Space Value on Value on
CICIDS2017 5G-NIDD complexity and variability of network traffic datasets are usually
n_estimators [50,500] 420 370 higher than those of public benchmarks, the proposed AutoML-based
max_depth [5,50] 36 42 IDS is expected to demonstrate more significant improvements. This
RF min_samples_split [2,11] 7 4 enhancement is due to every component of the proposed frame-
min_samples_leaf [1,11] 2 5
criterion [’gini’, ’entropy’ ’gini’
work, including AutoDP, AutoFE, automated model selection, HPO,
’entropy’] and automated model ensemble, each contributing to the overall en-
n_estimators [50,500] 450 310 hancement of intrusion detection performance. Additionally, consid-
max_depth [5,50] 36 27 ering the error rate reduction, the proposed AutoML-OCSE method
XGBoost learning_rate (0, 1) 0.78 0.67
achieves significant decreases of approximately 15.65% and 24.13%
gamma (0, 5) 0.4 0.3
subsample (0.5, 1) 0.7 0.75 in the error rates on the CICIDS2017 and 5G-NIDD datasets, respec-
n_estimators [50,500] 380 340 tively, calculated as 99.806%−99.770% 99.956%−99.942%
100%−99.770% and 100%−99.942% . Furthermore,
max_depth [5,50] 42 34 without the proposed automated model selection method, researchers
LightGBM learning_rate (0, 1) 0.918 0.784 might resort to selecting ML models either randomly or based on
num_leaves [100,2000] 900 1100
min_child_samples [10,50] 42 28
personal experience, which may not always lead to choosing the
best-performing base ML model. This limitation further highlights
Table 3: Model Performance Comparison on CICIDS2017. the significant potential for improvement offered by the proposed
AutoML method, which can automatically select, optimize, and inte-
Method Accuracy Precision Recall F1 (%) Training Avg Test
grate the best-performing ML models. Consequently, the proposed
(%) (%) (%) Time (s) Time Per AutoML-based IDS can achieve substantial improvements over tradi-
Sample (ms) tional cybersecurity methods through its autonomous cybersecurity
KNN [26] 96.3 96.2 93.7 96.3 0.007 0.678
strategies.
DT [32] 99.612 99.612 99.612 99.608 0.6 0.0008
RF [29] 99.718 99.718 99.718 99.714 38.5 0.014
On the other hand, although the AutoML-OCSE model takes
ET [12] 99.245 99.252 99.245 99.243 3.5 0.012 longer to train than certain base models, such as DT, ET, and Light-
XGBoost [7] 99.757 99.757 99.757 99.755 11.0 0.001 GBM, its training time is still shorter than that of some other models,
LightGBM 99.770 99.770 99.770 99.769 2.0 0.004 like CatBoost. This efficiency is due in part to its AutoFE process,
[16]
which reduces data dimensionality and model complexity. Moreover,
CatBoost [24] 99.559 99.559 99.559 99.553 4.6 0.008
KNN-AIDS 99.52 99.49 99.52 99.49 - -
the improvement in accuracy, precision, recall, and F1-score justifies
[19] the slightly increased training time. Furthermore, the AutoML-OCSE
DL-LSTM [28] 99.32 99.32 99.32 99.32 - - model achieves the fastest average inference time per sample by
PyDSC-IDS 97.60 90.73 97.81 94.13 - - constructing a stacking ensemble model based on confidence val-
[13]
OE-IDS [17] 98.0 97.3 96.0 96.7 - -
ues rather than the original high-dimensional dataset, making it
PSO-DL [11] 98.95 95.82 95.81 95.81 - - highly suitable for real-time network data analytics and intrusion
Proposed 99.806 99.806 99.806 99.804 35.6 0.0007 detection applications. In network applications, low inference time
AutoML- is often more crucial than low training time, as model training typi-
OCSE
cally occurs on cloud servers with ample computational resources,
while model predictions are performed on edge or local devices with
Table 4: Model Performance Comparison on 5G-NIDD.
limited computational capabilities in many scenarios.
Overall, the performance results demonstrate the effectiveness
Method Accuracy Precision Recall F1 (%) Training Avg Test
(%) (%) (%) Time (s) Time Per and efficiency of the proposed AutoML-OCSE method. It integrates
Sample (ms) the strengths of various base ML models, automates tedious ML
KNN [26] 99.007 99.008 99.007 99.007 0.006 0.704 tasks through AutoML, and achieves high detection performance via
DT [32] 99.926 99.926 99.926 99.926 0.25 0.0006 an optimized ensemble strategy. Therefore, the proposed AutoML-
RF [29] 99.942 99.942 99.942 99.942 3.2 0.009
ET [12] 99.926 99.926 99.926 99.926 2.3 0.010
OCSE method and the AutoML framework can serve as powerful
XGBoost [7] 99.942 99.942 99.942 99.942 7.1 0.0009 autonomous cybersecurity solutions for intrusion detection in 5G
LightGBM 99.942 99.942 99.942 99.942 1.9 0.008 and potential 6G networks.
[16]
CatBoost [24] 99.918 99.918 99.918 99.918 29.7 0.009
Embeddings & 99.123 99.019 98.316 98.666 - - 5 CONCLUSION
FC [1]
Proposed 99.956 99.956 99.956 99.956 24.3 0.0006 The advent of 5G and the impending transition to 6G networks have
AutoML- underscored the importance of ZTNs in achieving network automa-
OCSE
tion. However, the increased connectivity and complexity of these
networks have also escalated cybersecurity risks, making the devel-
opment of effective and autonomous cybersecurity mechanisms a
Towards Autonomous Cybersecurity: ... AutonomousCyber ’24, October 14–18, 2024, Salt Lake City, UT, USA

critical necessity. In this paper, we propose an AutoML-based Intru- [16] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei
sion Detection System (IDS) to achieve autonomous cybersecurity for Ye, and Tie Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision
tree. In Advances in Neural Information Processing Systems, Vol. 2017-Decem. 3147–
future networks. By introducing AutoDP, AutoFE, automated base 3155.
model learning and selection, HPO, and automated model ensem- [17] Murad Ali Khan, Naeem Iqbal, Imran, Harun Jamil, and Do Hyeun Kim. 2023. An
optimized ensemble prediction model using AutoML based on soft voting classifier
ble components, the proposed AutoML-based IDS can automatically for network intrusion detection. Journal of Network and Computer Applications
generate an optimized ensemble model for accurate intrusion de- 212 (2023), 103560. https://doi.org/10.1016/j.jnca.2022.103560
tection. This paper also proposed a novel TVAE-based automated [18] Madhusanka Liyanage, Quoc Viet Pham, Kapal Dev, Sweta Bhattacharya, Praveen
Kumar Reddy Maddikunta, Thippa Reddy Gadekallu, and Gokul Yenduri. 2022. A
data balancing method and a novel OCSE model to improve the Au- survey on Zero Touch Network and Service Management (ZSM) for 5G and beyond
toML procedures. Through the experiments, the proposed AutoML- networks. Journal of Network and Computer Applications 203 (Jul 2022), 103362.
based IDS achieves high F1-scores of 99.804% and 99.956% on two https://doi.org/10.1016/j.jnca.2022.103362
[19] Ziadoon Kamil Maseer, Robiah Yusof, Nazrulazhar Bahaman, Salama A. Mostafa,
public benchmark network security datasets: the CICIDS2017 and and Cik Feresa Mohd Foozy. 2021. Benchmarking of Machine Learning for Anomaly
5G-NIDD datasets. This illustrates the effectiveness of the proposed Based Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE Access 9 (2021),
22351–22370. https://doi.org/10.1109/ACCESS.2021.3056614
autonomous IDS framework in achieving autonomous cybersecurity. [20] Mohanad Mohammed, Henry Mwambi, Bernard Omolo, and Murtada Khalafallah
In future work, the IDS framework will be extended to involve auto- Elbashir. 2018. Using stacking ensemble for microarray-based cancer classifica-
mated model updating using continual learning and drift adaptive tion. In International Conference on Computer, Control, Electrical, and Electronics
Engineering (ICCCEEE 2018). 1–8. https://doi.org/10.1109/ICCCEEE.2018.8515872
methods in dynamic networking environments. [21] Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. 2016. The Synthetic data
vault. In IEEE International Conference on Data Science and Advanced Analytics
REFERENCES (DSAA 2016). 399–410. https://doi.org/10.1109/DSAA.2016.49
[22] Fabian Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of
[1] Georgios Agrafiotis, Eftychia Makri, Antonios Lalas, Konstantinos Votis, Dimitrios Machine Learning Research 12 (2011), 2825–2830. http://scikit-learn.sourceforge.net
Tzovaras, and Nikolaos Tsampieris. 2023. A Deep Learning-based Malware Traffic [23] Pawani Porambage, Gürkan Gür, Diana Pamela Moya Osorio, Madhusanka Liyan-
Classifier for 5G Networks Employing Protocol-Agnostic and PCAP-to-Embeddings age, Andrei Gurtov, and Mika Ylianttila. 2021. The Roadmap to 6G Security and
Techniques. In ACM International Conference Proceeding Series. 193–194. https: Privacy. IEEE Open Journal of the Communications Society 2 (2021), 1094–1122.
//doi.org/10.1145/3590777.3590807 https://doi.org/10.1109/OJCOMS.2021.3078081
[2] Chafika Benzaid and Tarik Taleb. 2020. ZSM Security: Threat Surface and Best [24] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro-
Practices. IEEE Network 34, 3 (2020), 124–133. https://doi.org/10.1109/MNET.001. gush, and Andrey Gulin. 2018. Catboost: Unbiased boosting with categorical
1900273 features. In Advances in Neural Information Processing Systems, Vol. 2018-Decem.
[3] James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David D. Cox. 2015. 6638–6648.
Hyperopt: A Python library for model selection and hyperparameter optimization. [25] Sehan Samarakoon, Yushan Siriwardhana, Pawani Porambage, Madhusanka Liyan-
Computational Science and Discovery 8, 1 (2015). https://doi.org/10.1088/1749- age, Sang-Yoon Chang, Jinoh Kim, Jonghyun Kim, and Mika Ylianttila. 2022. 5G-
4699/8/1/014008 NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G
[4] Hala Strohmier Berry. 2023. The Importance of Cybersecurity in Supply Chain. Wireless Network. https://doi.org/10.21227/xtep-hv36
In ISDFS 2023 - 11th International Symposium on Digital Forensics and Security. [26] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. 2018. Toward
https://doi.org/10.1109/ISDFS58141.2023.10131834 Generating a New Intrusion Detection Dataset and Intrusion Traffic Characteriza-
[5] David M Blei, Alp Kucukelbir, and Jon D Mcauliffe. 2018. Variational Inference: A tion. In Proc. Int. Conf. Inf. Syst. Secur. Privacy. 108–116. https://doi.org/10.5220/
Review for Statisticians. arXiv preprint arXiv:1801.09808. 0006639801080116
[6] Karansingh Chauhan, Shreena Jani, Dhrumin Thakkar, Riddham Dave, Jitendra [27] Abhilash Singh, J. Amutha, Jaiprakash Nagar, Sandeep Sharma, and Cheng Chi
Bhatia, Sudeep Tanwar, and Mohammad S. Obaidat. 2020. Automated Machine Lee. 2022. AutoML-ID: automated machine learning model for intrusion detection
Learning: The New Wave of Machine Learning. In 2nd International Conference on using wireless sensor network. Scientific Reports 12, 1 (2022), 1–14. https://doi.
Innovative Mechanisms for Industry Applications (ICIMIA 2020). 205–212. https: org/10.1038/s41598-022-13061-z
//doi.org/10.1109/ICIMIA48430.2020.9074859 [28] Omer Elsier Tayfour, Azath Mubarakali, Amira Elsir Tayfour, Muhammad Nadzir
[7] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. Marsono, Entisar Hassan, and Ashraf M. Abdelrahman. 2023. Adapting deep
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge learning-LSTM method using optimized dataset in SDN controller for secure IoT.
Discovery and Data Mining (KDD ’16). 785–794. https://doi.org/10.1145/2939672. Soft Computing (2023), 1–9. https://doi.org/10.1007/s00500-023-08348-w
2939785 [29] Abebe Tesfahun and D. Lalitha Bhaskari. 2013. Intrusion detection using random
[8] Dylan Chou and Meng Jiang. 2021. A Survey on Data-driven Network Intrusion forests classifier with SMOTE and feature reduction. In Proceedings - 2013 Interna-
Detection. ACM Computing Surveys (CSUR) 54, 9 (2021). https://doi.org/10.1145/ tional Conference on Cloud and Ubiquitous Computing and Emerging Technologies
3472753 (CUBE). 127–132. https://doi.org/10.1109/CUBE.2013.31
[9] Estefania Coronado, Rasoul Behravesh, Tejas Subramanya, Adriana Fernandez- [30] Shuhei Watanabe. 2023. Tree-Structured Parzen Estimator: A Tutorial Tree-
Fernandez, Shuaib Siddiqui, Xavier Costa-Perez, and Roberto Riggio. 2022. Zero Structured Parzen Estimator: Understanding Its Algorithm Components and Their
Touch Management: A Survey of Network Automation Solutions for 5G and 6G Roles for Better Empirical Performance. arXiv preprint arXiv:2301.00099.
Networks. IEEE Communications Surveys & Tutorials (2022). https://doi.org/10. [31] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni.
1109/COMST.2022.3212586 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural
[10] K Eggensperger, M Feurer, F Hutter, J Bergstra, J Snoek, H Hoos, and K Leyton- Information Processing Systems, Vol. 32.
Brown. 2013. Towards an Empirical Foundation for Assessing Bayesian Optimiza- [32] Li Yang, Abdallah Moubayed, and Abdallah Shami. 2022. MTH-IDS: A Multitiered
tion of Hyperparameters. In BayesOpt workshop (NIPS). 1–5. Hybrid Intrusion Detection System for Internet of Vehicles. IEEE Internet of Things
[11] Wisam Elmasry, Akhan Akbulut, and Abdul Halim Zaim. 2020. Evolving deep Journal 9, 1 (2022), 616–632. https://doi.org/10.1109/JIOT.2021.3084796
learning architectures for network intrusion detection using a double PSO meta- [33] Li Yang and Abdallah Shami. 2022. IoT Data Analytics in Dynamic Environments:
heuristic. Computer Networks 168 (2020), 107042. https://doi.org/10.1016/j.comnet. From An Automated Machine Learning Perspective. Engineering Applications of
2019.107042 Artificial Intelligence 116 (2022), 1–33. https://doi.org/10.1016/j.engappai.2022.
[12] Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized 105366
trees. Machine Learning 63, 1 (2006), 3–42. https://doi.org/10.1007/s10994-006- [34] Abbas Yazdinejad, Mostafa Kazemi, Reza M. Parizi, Ali Dehghantanha, and Hadis
6226-1 Karimipour. 2023. An ensemble deep learning model for cyber threat hunting in
[13] Jiaxing He, Xiaodan Wang, Yafei Song, and Qian Xiang. 2023. A multiscale intru- industrial internet of things. Digital Communications and Networks 9, 1 (2023),
sion detection system based on pyramid depthwise separable convolution neural 101–110. https://doi.org/10.1016/j.dcan.2022.09.008
network. Neurocomputing 530 (2023), 48–59. https://doi.org/10.1016/j.neucom. [35] Guangyi Zhang and Aristides Gionis. 2023. Regularized impurity reduction: ac-
2023.01.072 curate decision trees with complexity guarantees. Data Mining and Knowledge
[14] Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state- Discovery 37, 1 (2023), 434–475. https://doi.org/10.1007/s10618-022-00884-7
of-the-art. Knowledge-Based Systems 212 (2021), 106622. https://doi.org/10.1016/j. [36] ETSI GS ZSM. 2019. Zero-touch network and service management (zsm); reference
knosys.2020.106622 architecture. Group Specification ETSI GS ZSM.
[15] Qi Kang, Xiao Shuang Chen, Si Si Li, and Meng Chu Zhou. 2017. A Noise-Filtered
Under-Sampling Scheme for Imbalanced Classification. IEEE Transactions on Cy-
bernetics 47, 12 (2017), 4263–4274. https://doi.org/10.1109/TCYB.2016.2606104