0% found this document useful (0 votes)
78 views115 pages

Zero Touch Networks

This paper provides an overview of zero-touch networks and the Zero-Touch Network and Service Management framework for automating 5G and beyond networks. It explores challenges of using machine learning in this context and investigates applying automated machine learning to address issues and improve network management efficiency and performance.

Uploaded by

rkjadhav1453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views115 pages

Zero Touch Networks

This paper provides an overview of zero-touch networks and the Zero-Touch Network and Service Management framework for automating 5G and beyond networks. It explores challenges of using machine learning in this context and investigates applying automated machine learning to address issues and improve network management efficiency and performance.

Uploaded by

rkjadhav1453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Zero-Touch Networks: Towards Next-Generation

Network Automation
Mirna El Rajab, Li Yang, Abdallah Shami
arXiv:2312.04159v1 [cs.NI] 7 Dec 2023

Department of Electrical and Computer Engineering, Western University, 1151


Richmond St., London, Ontario, Canada N6A 3K7

Abstract
The Zero-touch network and Service Management (ZSM) framework repre-
sents an emerging paradigm in the management of the fifth-generation (5G)
and Beyond (5G+) networks, offering automated self-management and self-
healing capabilities to address the escalating complexity and the growing data
volume of modern networks. ZSM frameworks leverage advanced technolo-
gies such as Machine Learning (ML) to enable intelligent decision-making
and reduce human intervention. This paper presents a comprehensive survey
of Zero-Touch Networks (ZTNs) within the ZSM framework, covering net-
work optimization, traffic monitoring, energy efficiency, and security aspects
of next-generational networks. The paper explores the challenges associated
with ZSM, particularly those related to ML, which necessitate the need to
explore diverse network automation solutions. In this context, the study in-
vestigates the application of Automated ML (AutoML) in ZTNs, to reduce
network management costs and enhance performance. AutoML automates
the selection and tuning process of a ML model for a given task. Specif-
ically, the focus is on AutoML’s ability to predict application throughput
and autonomously adapt to data drift. Experimental results demonstrate
the superiority of the proposed AutoML pipeline over traditional ML in
terms of prediction accuracy. Integrating AutoML and ZSM concepts sig-
nificantly reduces network configuration and management efforts, allowing
operators to allocate more time and resources to other important tasks. The
paper also provides a high-level 5G system architecture incorporating Au-

Email addresses: [email protected] (Mirna El Rajab), [email protected] (Li Yang),


[email protected] (Abdallah Shami)

Preprint submitted to Computer Networks


toML and ZSM concepts. This research highlights the potential of ZTNs
and AutoML to revolutionize the management of 5G+ networks, enabling
automated decision-making and empowering network operators to achieve
higher efficiency, improved performance, and enhanced user experience.
Keywords: ZSM, 5G+ Networks, Network Optimization, Network Security,
Traffic Control, AutoML

1. Introduction
In today’s digital world, we rely on telecommunication networks for more
than just phone calls. From streaming movies to controlling smart home
devices, these networks have revolutionized the way we live, work, and com-
municate. As we move towards a world where the Internet of Things (IoT) is
becoming increasingly widespread, the need for Next-Generation Networks
(NGNs) has only grown more indispensable.
NGNs, such as the fifth-generation (5G) and the upcoming sixth-generation
(6G) networks, mark a landmark in telecommunications history; these net-
works represent not only an upgrade from their predecessors but also a
paradigm shift in terms of speed, latency, capacity, and reliability - un-
locking new possibilities for emerging applications and service areas. Ac-
cording to the International Telecommunication Union IMT-2020, three core
service areas for 5G networks include enhanced Mobile Broadband (eMBB),
ultra-Reliable Low-Latency Communication (uRLLC), and massive Machine-
Type Communication (mMTC) [1]. Each service area addresses specific use
cases such as multimedia content access (eMBB), mission-critical applica-
tions (uRLLC), or smart cities (mMTC).
NGNs have the potential to unlock the full potential and meet the chal-
lenging requirements of future use cases, but to fully realize this potential,
they must be designed as highly-flexible and programmable infrastructures
that are context-aware and service-aware. Advancements such as Software
Defined Networking (SDN), Network Function Virtualization (NFV), Multi-
access Edge Computing (MEC), and network slicing play a pivotal role in
the network architecture [2]. These technologies will open up new business
models, such as multi-domain, multi-service, and multi-tenancy models, to
support new markets.
The growth of NGNs has brought with it new challenges, particularly
in terms of network management. As networks become more intricate, tra-

2
ditional manual methods for configuring, deploying, and maintaining them
become cumbersome, time-consuming, and error prone [3]. To tackle this
issue, various efforts have been made to introduce intelligence and reason-
ing into mobile networks for automation and optimization purposes. These
efforts include active networks [4], self-organizing networks [5], autonomic
network management [6], and Zero-Touch Networks (ZTNs) [7].
The ZTN approach has emerged as a fully automated management so-
lution, enabling the network to analyze its current state, interpret it, and
provide suggestions for possible reconfigurations - while leaving validation
and acceptance up to a human operator [8]. Implementing ZTN concepts
and technologies will be essential for operators to achieve greater levels of
automation, improve network performance, and reduce time-to-market for
new features. ZTN-based solutions are available for a diverse set of prob-
lems, from managing resources to ensuring network security and privacy.
European Telecommunications Standards Institute (ETSI) has gained in-
terest in shifting towards ZTN-based solutions. In 2017, ETSI created a
Zero-touch network and Service Management (ZSM) Industry Specification
Group (ISG), to define the requirements and architecture for a network au-
tomation framework based on ZTN concepts [9]. ZSM will ensure that NGNs
remain responsive to evolving user needs and demands via proactive network
management techniques. These techniques leverage the power of Artificial
Intelligence (AI) and Machine Learning (ML) to automate and optimize net-
work operations, enabling efficient resource allocation, dynamic service pro-
visioning, and predictive maintenance. AI and ML are technologies that
enable systems to simulate human intelligence, learn from data, and make
intelligent decisions or predictions.
ZSM still faces significant ML challenges, such as the need for effec-
tive feature engineering, algorithm selection, and hyperparameter tuning.
Thus, there is a need to explore other network automation solutions that can
complement ZSM in order to achieve higher levels of automation and effi-
ciency. One such solution is Automated ML (AutoML), which helps address
these challenges by automating the ML pipeline and improving the efficiency
and effectiveness of the ZSM solution. AutoML handles crucial tasks such
as data preprocessing, feature engineering, model selection, hyperparameter
tuning, model evaluation, and even model updating. By automating these
processes, AutoML significantly reduces the manual effort needed to develop
high-performing ML models.
Accordingly, this survey aims to provide a comprehensive overview of

3
ZSM in 5G and Beyond (5G+) networks, with a focus on network optimiza-
tion, energy efficiency, network security, and traffic control. By highlighting
the ML challenges in ZSM and exploring the potential of AutoML, this survey
aims to contribute to the development of more effective network automation
solutions. In particular, this survey offers the following notable contributions:

1. Review of current standards, architectures, and projects: The paper


examines the existing ZSM standard, its reference architecture, and
other related projects. This analysis serves as a valuable reference and
enhances the understanding of the current landscape in ZSM.

2. In-depth analysis of ZSM in 5G+ networks: The paper comprehensively


examines various aspects of ZSM in NGNs. It delves into network
optimization, energy efficiency, network security, and traffic control,
providing a comprehensive understanding of zero-touch applications.

3. Identification of ML challenges in ZSM: The paper highlights the chal-


lenges associated with applying ML techniques in the context of ZSM.

4. Exploration of network automation solutions: The paper explores the


potential of AutoML and DTs as viable solutions for addressing the
ML-related challenges in ZSM.

5. Detailed breakdown of the AutoML pipeline: The paper provides a


comprehensive understanding of the steps involved in the AutoML
pipeline and reviews existing techniques for each step. This breakdown
aids in the implementation and application of AutoML in NGNs.

6. Application of AutoML in ZSM: Through a detailed case study, the


paper demonstrates the real-world application of an online AutoML
pipeline for network traffic tasks within the ZSM framework.

7. Discussion of research challenges and future directions: The paper dis-


cusses the research challenges in ZSM and AutoML, suggesting future
directions for further exploration and advancement.

In comparison to earlier review papers on the subjects of ZSM [10, 7]


and network automation solutions [8, 11], this survey stands out with the
following differences and improvements:

4
1. Comprehensive survey of zero-touch applications in NGNs: It is the
first paper to comprehensively explore Zero-Touch Network Operation
(ZNO) applications, spanning network optimization, traffic control, en-
ergy efficiency, and security in 5G+ networks.

2. Addressing ML challenges with innovative solutions: Unlike previous


papers, this paper tackles ML challenges in ZSM and proposes AutoML
and Digital Twins (DTs) as potential automation solutions.

3. Practical case study of online AutoML: This paper presents the first
case study applying an online AutoML pipeline to a real network traffic
task within the ZSM context. Additionally, it outlines a high-level
architecture integrating AutoML and ZSM concepts into a 5G system.

The remainder of this paper is organized into nine sections, as illustrated


in Figure 1, providing a comprehensive analysis of ZSM in NGNs. Section 2
lists all the acronyms used in this paper. Section 3 lays the foundation for
ZSM by exploring the pillars of the framework, including the ML paradigm
and 5G+ networks along with their enablers, SDN and NFV. Section 4 takes
a deep dive into ZSM, examining its role in managing the complexities of
5G+ networks. We explore the current standard’s requirements, reference
architecture, and use of intents, as well as other related projects. In Section
5, we analyze network resource management and optimization, looking at
dynamic resource allocation, network slicing, and MEC. Section 6 focuses
on network traffic control, from traffic prediction and classification to intelli-
gent routing. Additionally, in Sections 7 and 8, we analyze ZSM’s potential
in terms of energy efficiency and network security, respectively, addressing
5G+ security measures and weaknesses, ZSM security threats, and recent
advances in 5G+ network trust management in the latter section. In Section
9, we explore different network automation solutions to tackle certain ML
challenges. In particular, automation solutions, such as DTs and AutoML,
have significant importance in a world where communication networks are
integral to daily life. Section 10 further studies AutoML through a use-case
to predict application throughput and see how it fits in a ZSM framework.
In Section 11, we discuss the research challenges in this field and future lines
of work. Finally, Section 12 concludes the survey.

5
Figure 1: Survey Outline

6
2. List of Acronyms
Artificial Intelligence Acronyms
AI Artificial Intelligence
AutoML Automated Machine Learning
ANN Artificial Neural Network
CNN Convolutional Neural Network
DDPG Deep Deterministic Policy Gradient
DL Deep learning
DRL Deep Reinforcement Learning
DQN Deep Q-Network
GRU Gated Recurrent Unit
LSTM Long Short-Term Memory
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
ML Machine Learning
MLP Multi-Layer Perceptron
NAS Neural Architecture Search
PCA Principle Component Analysis
Seq2Seq Sequence-to-Sequence
SL Supervised Learning
SVM Support Vector Machine
RL Reinforcement Learning
RNN Recurrent Neural Network
XAI Explainable Artificial Intelligence

7
Network Management Acronyms
AREL3P Adapted REinforcement Learning VNF Performance
Prediction module for Autonomous VNF Placement

CLARA Closed Loop-based zero-touch network mAnagement


fRAmework

DASA Dynamic Auto-Scaling Algorithm

DLT Distributed Ledger Technology

DT Digital Twin

HARNESS High Availability supportive self-Reliant NEtwork Slicing


System

MANO Management and Orchestration

MD Management Domain

MonB5G Distributed Management of Network Slices in Beyond 5G

NAP Novel Autonomous Profiling

NSOS Network Slicing Orchestration System

PFR Proactive Failure Recovery

RIRM Reliable Intelligent Routing Mechanism

TM TeleManagement

ZNO Zero-Touch Network Operation

ZOOM Zero-touch Orchestration, Operations and Management

ZSM Zero-Touch Network and Service Management

ZTN Zero-Touch Network

ZT-PFR Zero-Touch PFR

8
Performance Metrics Acronyms
CAPEX Capital Expenditure

CPU Central Processing Unit

E2E End-to-End

KPI Key Performance Indicator

OPEX Operational Expenditure

QoE Quality of Experience

QoS Quality of Service

SLA Service Level Agreement

Next-Generation Networks Acronyms


5G+ 5G and Beyond

AES Application Edge Slice

AMF Application Management Function

B5G Beyond 5G

CN Core Network

eMBB enhanced Mobile Broadband

MEC Multi-access Edge Computing

mMTC massive Machine-Type Communication

gNB gNodeB

NFV Network Function Virtualization

NGN Next-Generation Network

NSSF Network Slice Selection Function

9
NSSMF Network Slice Subnet Management Function
NWDAF Network Data Analytics Function
RAN Radio Access Network
SDN Software Defined Network
UE User Equipment
UPF User Plane Function
uRLLC ultra-Reliable Low-Latency Communication
VNF Virtual Network Function
mmWave millimeter wave

Network Security Acronyms


DDoS Distributed Denial of Service
DOS Denial of Service
IDS Intrusion Detection System
MitM Man in the Middle
MUD Manufacturer Usage Description
TEE Trusted Execution Environment
TRM Trust and Reputation Manager

Organizations and Programs


ENI Experiential Network Intelligence
ETSI European Telecommunications Standards Institute
H2020 Horizon 2020
ISG Industry Specification Group

10
General Telecommunication Acronyms
API Application Programming Interface
CSP Communication Service Provider
IoT Internet of Things
RSRP Reference Signal Received Power
RSRQ Reference Signal Received Quality
RSSI Received Signal Strength Indicator
FANET Flying Ad Hoc Network
V2X Vehicle-to-Everything
VPN Virtual Private Network
VM Virtual Machine
VR Virtual Reality
USR User Service Request
WLAN Wireless Local Area Network

3. Background
As the telecommunications industry moves towards the deployment of
5G+ networks and the implementation of ZSM, advanced technologies such
as AI/ML, SDN, and NFV are becoming essential components of network
infrastructure. Such advancements support intelligent, flexible, and auto-
mated network management which in turn allow for efficient and scalable
operations for network operators and service providers. AI/ML optimizes
network performance, predicts and prevents network failures, and automates
network management tasks. SDN provides a simpler approach to network
management by dividing the control and data planes and enabling the dy-
namic allocation of network resources. NFV enables the deployment of net-
work functions as the software on general-purpose hardware and eliminates
the need for proprietary hardware by decoupling network functions.

11
3.1. Artificial Intelligence and Machine Learning Paradigms
AI refers to ”the simulation of human intelligence in machines that are
programmed to think like humans and mimic their actions” [12]. In other
words, AI refers to the ability of systems to imitate human cognitive func-
tions such as learning. ML is an application of AI that enables machines to
learn from large volumes of data and make predictions without directly being
instructed. ML is considered a subset of AI. ML can be further divided into
three main categories: supervised, unsupervised, and reinforcement learning
[13]. Table 1 provides an overview of common techniques used in each ML
category, outlining their strengths and weaknesses to provide a comprehen-
sive understanding of each approach [14, 15, 16, 17, 18, 19].

Table 1: An Overview of Traditional ML Algorithms [14, 15, 16, 17, 18, 19]

ML Algorithm Description Advantages Limitations


Linear A statistical model for • Easy to understand • Oversimplification
Regression predicting continuous vari- • Avoid overfitting by of real world
ables based on one or more regularization problems
independent variables • Limited to linear
relationships

Logistic A statistical model for bi- • Simple to • High reliance on


Regression nary classification based on implement proper presentation
input features • Computationally of data
Efficient • Inability to solve
• No need for feature non-linear problem
scaling

Decision A tree-based model that • Easy to interpret • Prone to overfitting


Trees partitions data into subsets and visualize • Sensitive to data
based on feature values and • Process categorical • Potential to
predicts the class/value of and continuous produce a locally
a new instance by travers- features optimal solution
ing the tree from the root • Handle missing
to a leaf node values

Random An ensemble method com- • Robust performance • Slow in predictions


Forest bining multiple decision with • Appear as a black
trees through averaging to high-dimensional box
improve the accuracy and datasets
robustness of classification • Deal with
and regression tasks imbalanced datasets
• Extract feature
importance

Continued on next page

12
Table 1 – continued from previous page
ML Algorithm Description Advantages Limitations
Naı̈ve Bayes A probabilistic algorithm • Address multi-class • Assume
that uses Bayes’ theorem classification independent
to predict the class of new problems features
data based on the condi- • Insensitive to • Handle discrete
tional probability of fea- irrelevant features datasets better than
tures given a class continuous datasets

Support A linear model for classifi- • Work with • Slow training with
Vector cation and regression that high-dimensional large datasets
Machine finds the best hyperplane data • Poor performance
to separate data points into • Handle non-linear with noisy data
different classes in a high- relationships
dimensional feature space through kernel
functions

K-Nearest A non-parametric method • Simple • Slow for large


Neighbors for classifying data points • Handle multi-class datasets
based on the majority class problems • Suffer the curse of
among the K training in- • Make no dimensionality
stances closest to it in the assumption about • Sensitive to outliers
feature space the data

K-Means A clustering algorithm that • Simple and easy to • Assume spherical,


partitions n observations implement equally-sized
into k clusters based on the • Computationally clusters
similarity of features efficient, making it • Sensitive to the
suitable for large initial position of
datasets centroids

Principle An orthogonal linear trans- • Effective feature • May result in loss of


Component formation technique that extraction information
Analysis transforms the data into a • Reduce • Assume a linear
lower-dimensional space dimensionality for relationship
better analysis between features

Feed- A network of intercon- • Can handle complex • Need abundant


Forward nected processing nodes non-linear training data
Neural that learns to recognize relationships • Prone to overfitting
Networks patterns in data by adjust- • Work well with • Difficult to interpret
ing the inter-node weights large datasets
during training
Convolutional An ANN that uses convo- • Effective in image • Complex
Neural lutional layers to automat- and video analysis computations for
Networks ically learn spatial hierar- • Reduce the number convolutional and
chies and extract features of parameters pooling operations
from image and video data through weight • Require large
sharing amounts of data
• Learn hierarchical
representations

Continued on next page

13
Table 1 – continued from previous page
ML Algorithm Description Advantages Limitations
Recurrent An ANN that can handle • Well-suited for • Incapable of
Neural sequential data by using sequence-related capturing long-term
Network loops to maintain a hid- tasks and dependencies
den state that incorporates time-series data • Sensitive to the
past information • Can handle exploding gradient
variable-length and vanishing
inputs and outputs problems

Long A RNN that can handle • Can handle • Computationally


Short-Term long-term dependencies by long-term expensive in terms
Memory using a memory cell to se- dependencies of memory
RNN lectively forget and remem- • Address the bandwidth
ber information exploding and • Not well-suited for
vanishing gradient parallelization
problems

Autoencoders Unsupervised NNs that • Can handle • High reconstruction


learn data representations unlabeled data error for complex
by compressing high- • Low complexity and noisy data
dimensional data into a • Preserve useful • Prone to
lower-dimensional latent patterns layer-by-layer errors
space and then recon- • Limited
structing it with a decoder interpretability

Q-Learning A value-based approach • Suitable when a • Only suitable for


based on the Q-Table, training set is not problems with small
which calculates the max- available state spaces.
imum expected future re- • Model-free • Slow convergence
ward, for each action at algorithm due to initial zero
each state, to later learn an Q-values
optimal action-value func-
tion.

Supervised Learning (SL) involves training a model on labeled data to


make predictions based on input-output relationships [13]. Common SL al-
gorithms include linear and logistic regression, K-nearest neighbors, Support
Vector Machine (SVM), naı̈ve Bayes, decision trees, and random forests.
Deep learning (DL), a subset of SL, involves the use of Artificial Neural Net-
works (ANNs) to model intricate input-output mappings. DL models, such
as Convolutional Neural Networks (CNNs), are particularly effective for im-
age processing, whereas Recurrent Neural Networks (RNNs), such as Long
Short-Term Memory (LSTM), are commonly used for sequence modeling in
natural language processing tasks.
Unsupervised learning involves training a model on unlabeled data where
the algorithm learns to identify patterns and structure within the data [13].
Common techniques used in unsupervised learning include K-means cluster-
ing and Principle Component Analysis (PCA) for dimensionality reduction.

14
DL also has applications in unsupervised learning, such as leveraging autoen-
coders to learn the underlying representation of data.
Reinforcement Learning (RL) is a feedback-based, environment-driven
approach in which an agent learns to behave in an environment through trial
and error. The ultimate objective is to improve performance by maximizing
a reward signal [13]. Common techniques used in RL include Q-learning and
policy gradient methods. The latter optimizes policy parameters directly
using gradient-based optimization, utilizing the policy gradient theorem to
compute gradients of the expected cumulative reward. While useful for tasks
involving continuous action spaces like robotics control, these methods may
have slow convergence rates and suffer from high variance. RL has evolved
towards Deep Reinforcement Learning (DRL), where deep neural networks
are utilized to model the value function (value-based), the agent’s policy
(policy-based), or both (actor-critic). DRL is most beneficial in problems
with high-dimensional state space.
AI/ML technologies are seen as foundational pillars for network automa-
tion in terms of development, configuration, and management [12]. They
will play a crucial role in achieving a new level of automation and intel-
ligence toward network and service management. Additionally, they will
enhance network performance, reliability, and adaptability through a series
of real-time and robust decisions based on predictions of network and user
behavior, such as user traffic.

3.2. Software Defined Network


Softwarization refers to the concept of running a specific functionality in
software instead of hardware, thus breaking its relationship with the under-
lying hardware [20]. The benefits of softwarization lie in terms of decreased
deployment time in addition to reduced Capital Expenditure (CAPEX) and
Operational Expenditure (OPEX) when new functions are introduced. Soft-
warization also ensures high degrees of flexibility and reconfigurability.
One primary example of softwarization is SDN. SDN consists of SDN
applications, an SDN controller, and an SDN datapath [20]. The SDN appli-
cation is the software that runs network functionalities, such as routing. It
receives an abstract view of the network through an SDN controller, which
is a logically centralized unit. The controller also translates the inputs re-
ceived from the applications down to the switches (i.e., physical network).
Another function of the controller is instructing and configuring a set of net-
work devices to perform certain actions, such as packet forwarding. This set

15
of network devices is known as the SDN datapath.

3.3. Network Function Virtualization


Virtualization improves the software/hardware decoupling of the soft-
warization paradigm by creating virtual instances of dedicated hardware
platforms, operating systems, storage devices, and computer networking re-
sources [20]. In other words, software can run on commercial off-the-shelf
equipment by utilizing a Virtual Machine (VM) or a Docker container. This
is similar to softwarization, as virtualization provides flexibility (VMs can be
migrated across different platforms) and reduces CAPEX and OPEX (intro-
ducing new services involves creating new VMs without any effort from the
hardware side) [21].
From a network perspective, virtualization is associated with the con-
cept of NFVs, where network node functions (such as firewalls and switches)
are virtualized and decoupled from the underlying hardware [2]. These soft-
warized network functions are known as Virtual Network Functions (VNFs)
and can consist of one or more VMs/containers.

3.4. Advancing Mobile Connectivity: From 5G to Beyond


The fifth generation of cellular technology has prevailed research in the
broad information and communication technology field. It was designed to
increase speed, reduce latency, and improve the flexibility and adaptability
of wireless services. Additionally, 5G supports and enhances a wide range of
applications, including autonomous vehicles, online gaming, and voice over
IP. 5G is expected to meet a diverse set of Key Performance Indicators (KPIs)
for eMBB, mMTC, and uRLLC use cases [22, 23]. The 5G eMBB service is
characterized by its peak data rate, which ranges from 10 to 20 Gbps and
is suitable for applications such as 4K media. Another 5G use case is the
mMTC service, which supports a high device density of up to 1 million per
square kilometer for applications such as a smart city. Regarding uRLLC
applications, such as mission-critical applications and self-driving cars, 5G
uRLLC service is expected to offer 1 ms air interface latency and achieve six
nines network availability.
Softwarization and virtualization are two complementary key enablers
for providing flexibility in 5G networks [20]. Cloud-based environments run
and move on-demand VNFs, while SDN dynamically changes the network
topology according to the load and service requirements.

16
As we move towards the future, we are witnessing the emergence of Be-
yond 5G (B5G), also known as 6G. This revolutionary technology is set to
take mobile communications to unprecedented heights by building upon the
capabilities of 5G [24]. B5G promises to offer even faster speeds, lower la-
tency, and greater capacity than 5G [25]. Specifically, B5G is envisioned
to support data transfer rates of up to 1 Tbps, a monumental leap forward
from 5G’s maximum data transfer rate of 20 Gbps. It also aims to drastically
reduce latency to a sub-millisecond range of 10 µs to 100 µs, which is a minus-
cule fraction of 5G’s latency of less than 1 ms. Additionally, B5G is expected
to support a staggering number of connected devices, with an expected ca-
pacity of up to 10 million/km2 , a tenfold increase from 5G’s capacity of up
to 1 million/km2 .
B5G intends to support a plethora of new and emerging use cases, such
as terahertz communication providing ultra-high data rates and low latency
[26, 27]. In addition to these performance enhancements, B5G will bring new
capabilities such as advanced network slicing, edge computing, and network
intelligence. These capabilities will enable more efficient and flexible network
operations, as well as new business models and revenue streams. B5G will
also focus on energy efficiency, sustainability, and security, ensuring that it
is not only technologically advanced but also environmentally conscious and
secure [26]. This represents a leap forward, not just in terms of technology
but also in terms of societal impact.

4. Zero-Touch Network and Service Management Overview


4.1. Need for ZSM in 5G+ Networks
Society’s constant desire for seamless connectivity, high capacity, and new
services has paved the way for 5G+ networks. The 5G+ concept represents a
generational leap in NGNs and aims to revolutionize the telecommunications
industry with new spectrum frequencies, a new Core Network (CN), a new
Radio Access Network (RAN), and its adopted new radio. Softwarization
and virtualization, enabled by technologies such as SDN and NFV, which
decouple network functions from the underlying hardware, are considered
the foundations of 5G+ networks. While this approach guarantees a high
degree of flexibility and reconfigurability while reducing CAPEX and OPEX,
it also results in a complex 5G/6G architecture that network operators find
challenging to manage and operate due to the static nature of traditional
Management and Orchestration (MANO) techniques [3]. Therefore, there is

17
a need to realize the vision of zero-touch networks and service management to
enable automated orchestration and management of network resources and to
ensure End-to-End (E2E) Quality of Experience (QoE) guarantees for end-
users. The goal is to govern the services driven through an autonomous net-
work by high-level policies and rules (aka intents), which is capable of offering
Self-X life cycle operations (self-serving, self-fulfilling, and self-assuring) with
minimal, if any, human intervention [28]. To achieve this, ETSI established
the ZSM ISG in 2017. The group’s objective is to create a framework to en-
able fully-autonomous network operation and service management for 5G+
networks capable of self-{configuration, monitoring, healing, optimization}
[7].

4.2. Current Standard - ETSI ZSM


The ultimate goal of ETSI ZSM is to create a framework that enables
full E2E automation of network and service management in a multi-domain
environment. The ZSM framework comprises operational processes that are
automatically executed without human intervention, such as [8]:
• Design and planning to create new services that meet users’ needs.
• Delivery to enable the on-demand delivery of services while satisfying
requirements.
• Deployment to enhance network and resource utilization.
• Provisioning to reduce manual configuration errors.
• Monitoring and optimizing to avoid service degradation and ensure a
fast recovery.
The ZSM ISG has already released a reference architecture that moves away
from rigid management systems and towards more flexible services [9]. The
key principles, requirements and components are elaborated next in Sections
4.2.1, 4.2.2 and 4.2.3, respectively.

4.2.1. Key Architecture Principles


The ZSM reference architecture defines a set of building blocks, as shown
in Figure 2, that can be integrated to build more complex management ser-
vices and functions following a set of composition and interoperation pat-
terns. This approach makes the architecture modular, scalable, and exten-
sible. It is also resilient to failure as management services are devised to

18
cope with the degradation of other services and/or the infrastructure. These
services can also be combined to create new management services, which
is referred to as service composability. In terms of management functions,
stateless functions that separate processing from data storage are also sup-
ported.
Separation of concern in management is another key principle behind
ZSM. This principle differentiates two management concerns: Management
Domain (MD) and E2E cross-domain service management (i.e., across MDs).
Within the former, services are managed based on their respective resources.
In the latter, E2E services that span multiple MDs are managed, and coordi-
nation between MDs is orchestrated. This principle ensures non-monolithic
systems and reduces the complexity of the E2E service. To automate service
assurance, closed-loop management automation is used to achieve and main-
tain a set of objectives without any external disruption. The architecture is
also coupled with intent-based interfaces that express consumer requests in
an interpretable form and offer high-level abstraction. Overall, the architec-
ture is of minimal complexity and meets all the functional and non-functional
requirements that are discussed next [9, 10].

4.2.2. Architecture Requirements


The ZSM reference architecture follows a set of requirements that are di-
vided into non-functional, functional, and security requirements. Functional
requirements define what the system must do, such as automating the de-
ployment and management of network functions and services. Non-functional
requirements specify how well the system must perform, such as scalability
and reliability. Security requirements dictate how to protect the system and
its data from cyber threats by implementing measures such as encryption
and access control. These requirements are extracted from ETSI GS ZSM
002 [9] and summarized in Table 2.

(i) Non-functional requirements refer to the qualities or characteristics


that the ETSI ZSM framework must possess in order to operate ef-
fectively and efficiently.

• General Requirements:
(a) Realize a certain degree of availability.
(b) Become energy efficient.

19
Table 2: ETSI ZSM Framework Requirements

Requirement Category Examples of Requirements


Type

General Availability, energy efficiency, independence from


vendors, monitoring requirements

Non- Cross-Domain Data High data availability, QoS support, task completion
Functional Services

Cross-Domain Service On-demand service addition/removal, service versions


Integration co-existence, seamless integration of new/legacy
functions

General Resource/service management, closed-loop


management, E2E services support, automation of
processes

Data Collection Real-time data collection, data processing and


governance, metadata attachment, data sharing, and
access

Functional Cross-Domain Data Data storage/processing/sharing management,


Services automation of data management processes (recovery,
redundancy, overload, service failover, processing, and
services with distinct data types)

Cross-Domain Service Service discovery and registration,


Integration and Access synchronous/asynchronous communication, service
invocation

Lawful Intercept Uninterrupted lawful interception

Security and Privacy Prioritizing privacy of personal data, ensuring the


security of data and resources, applying security
policies based on compliance status

Security Availability Availability of data and services, Authorized access to


services by authenticated users

Attack Prevention Automation of attack detection and prevention,


supervision of ML privacy decisions

20
(c) Achieve independence from vendors, operators, and service
providers.
(d) Follow specific monitoring requirements.
• Requirements for Cross-Domain Data Services:
(a) Realize high data availability.
(b) Support Quality of Service (QoS) specifications for data ser-
vices within and outside the framework.
(c) Complete tasks within a preset timeframe.
• Requirements for Cross-Domain Service Integration:
(a) Support the on-demand addition and removal of services.
(b) Support the co-existence of different service versions simulta-
neously.
(c) Avoid any changes to the management functions when inte-
grating services.
(d) Allow integration of new and legacy functions.

(ii) Functional requirements refer to the features and capabilities that the
ETSI ZSM framework must have in order to perform its intended func-
tions.

• General Requirements:
(a) Manage resources and services provided by the MDs.
(b) Support cross-domain management of E2E services.
(c) Support closed-loop management.
(d) Support technology domains needed for an E2E service.
(e) Support access control to services within the MD.
(f) Support open interfaces.
(g) Support hiding the management complexity of MDs and E2E
services.
(h) Automate constrained decision-making processes.
(i) Promote automation of operational life-cycle management func-
tions.
• Requirements for Data Collection:
(a) Allow the collection and storage of real-time data.

21
(b) Enable the preprocessing and filtering of collected data.
(c) Support attaching metadata to collected data.
(d) Allow common access to the collected data across the MDs.
(e) Support the aggregation of collected data cross-domain.
(f) Enforce data governance by supporting various degrees of data
sharing/collection velocity and volume.
(g) Manage the data distribution to maintain consistency.
(h) Provide data to the consumer based on their requirements.
• Requirements for Cross-Domain Data Services:
(a) Allow separation of data storage and processing.
(b) Logically centralize the storage/processing of data.
(c) Enable data sharing within the framework.
(d) Automate management of redundant data.
(e) Automate overload handling of data services.
(f) Automate data service failover.
(g) Automate data recovery.
(h) Automate policy-based data processing.
(i) Automate the processing of data services with distinct data
types.
• Requirements for Cross-Domain Service Integration and Access:
(a) Enable the discovery and registration of management services.
(b) Provide information on accessing the discovered service.
(c) Invoke services indirectly or directly (by the consumer).
(d) Support both synchronous and asynchronous communication
between the consumer and the service provider.
• Requirements for Lawful Intercept: ZSM architecture must en-
sure that lawful interception is not interrupted regardless of any
management service performed by the framework.

(iii) Security requirements refer to the measures that must be taken to en-
sure the security and privacy of the network and its data.

(a) Ensure the security of data, whether at rest, in transit, or in use.

22
(b) Ensure the security of resources in addition to management ser-
vices and functions.
(c) Provide special attention to the privacy of personal data by utiliz-
ing mechanisms such as privacy-by-design or privacy-by-default.
(d) Ensure the availability of data, resources, functions, and services.
(e) Apply relevant security policies based on the compliance status of
services regarding security requirements.
(f) Allow authorized access to services by authenticated users.
(g) Automatically detect, identify, prevent, and migrate attacks.
(h) Supervise decisions of ML/AI regarding privacy and security to
prevent attacks from spreading.

4.2.3. Reference Architecture


The ZSM framework reference architecture, as shown in Figure 2, inte-
grates MDs, E2E service MD, cross-domain data services, and intra- and
inter- integration fabrics. Self-contained and loosely-coupled services are
found within the MD. Each MD handles the automation of orchestration,
control, and guarantee of resources and services within its domain. These
managed resources might be physical (e.g., physical network functions), vir-
tual (e.g., VNFs) and/or cloud resources (e.g., ”X-as-a-service” resources) [3].
E2E services that span multiple MDs coordinate across domains through or-
chestration. The E2E service MD oversees the management of such services.
Each MD, including E2E service MD, is composed of logically grouped man-
agement functions (data collection services, intelligence services, analytics
services, control services and orchestration services), intra-domain integra-
tion fabric in addition to data services that enable data sharing and autho-
rized access management across services within the MD. The management
functions produce and use management services; thus, they can be both pro-
ducers and consumers of the service. Each MD provides a set of management
services through service interfaces. Services that can only be consumed lo-
cally within the domain are provided via the intra-domain integration fabric,
while cross-domain services are enabled through the inter-domain integration
fabric. The inter-domain integration fabric also handles the communication
between management functions and ZSM framework consumers. Another
building block is the cross-domain data service that oversees data persistence
among MDs while also permitting processing tasks to run on the stored data

23
Figure 2: ETSI ZSM Framework Reference Architecture [9]

as a means to reach E2E global optimization. The data in the cross-domain


data services can be exploited by the intelligence services within E2E service
MDs and MDs to support cross-domain and domain-level AI-based closed-

24
loop automation.

4.3. Intents
Intents are a crucial component of network automation in zero-touch net-
works. They provide a high-level, abstract representation of the desired state
of the network, making it easier for network administrators to manage and
configure large and complex networks [29].
The main goal of intents is to make network configuration and manage-
ment more efficient, accurate, and scalable. They achieve this by allow-
ing network administrators to describe the network’s desired state using a
domain-specific language that is then translated into the underlying network
configurations [30]. This falls under the intent-driven management paradigm.
This paradigm eliminates the need for manual intervention, reducing the risk
of human error and freeing up time for more critical tasks.
The benefits behind intents in zero-touch networks are significant [31, 32].
Some of the key advantages include:

• Improved Accuracy: Intents allow network administrators to define the


desired state of the network more precisely and consistently, reducing
the risk of human error.

• Increased Efficiency: Intents automate network configuration and man-


agement tasks, reducing the time and effort required to maintain the
network.

• Enhanced Scalability: Intents can be used to manage large and complex


networks, enabling network administrators to quickly and easily make
changes to the network as it evolves.

• Improved Collaboration: Intents allow multiple stakeholders to work


together to define the desired state of the network, improving commu-
nication and collaboration among teams.

• Better Compliance: Intents provide a clear and consistent representa-


tion of the network’s desired state, making it easier to ensure that the
network is compliant with regulatory requirements.

25
4.3.1. Example Use Case: Intent-Based Approach for Configuring a 5G Net-
work to Support a Virtual Reality Service
For instance, suppose a network operator is tasked with configuring a
5G network to support a new Virtual Reality (VR) service. The VR service
requires low latency and high bandwidth to provide an immersive experi-
ence for users. The network should also be able to dynamically allocate
network resources to meet the changing demands of the VR service. In a
traditional approach, the network operator would have to manually config-
ure the necessary network settings and policies to meet the VR service’s
requirements. However, in a zero-touch network, the network operator can
use an intent-based approach to simplify the process, as shown in Listing 1.

1 intent " VR S e r v i c e" {


2 a l l o c a t e b a n d w i d t h " 1 Gbps " for " VR S e r v i c e"
3 set l a t e n c y " 5 ms " for " VR S e r v i c e"
4 policy " R e s o u r c e M a n a g e m e n t" {
5 d y n a m i c a l l y a l l o c a t e b a n d w i d t h for " VR S e r v i c e" based
on demand
6 p r i o r i t i z e " VR S e r v i c e" over other n e t w o r k t r a f f i c
7 }
8 }

Listing 1: Intent Example to Support VR in 5G Network

The intent in this example describes the desired state of the network as
follows:

1. Allocate 1 Gbps of bandwidth for the VR service.

2. Set a maximum latency of 5 ms for the VR service.

3. The ”Resource Management” policy dynamically allocates bandwidth


for the VR service based on demand and prioritizes the VR service over
other network traffic.

Once the intent is specified, it can be translated into the necessary network
configurations and policies. These can then be automatically implemented
and enforced by the zero-touch network management system, resulting in
a network optimized to support the VR service with the aforementioned
requirements.

26
4.3.2. Use of Intents in the ZSM Framework
Intent should serve as the sole method of communicating requirements
between the zero-touch system and human operators, as well as between
the different subsystems and layers of the management system. In the ZSM
framework, this means that the service specification provided by ZSM frame-
work consumers must be conveyed through an intent object. The E2E service
domain is responsible for translating it into sub-intents that specify specific
requirements for each MD. Communication based on intent objects is a uni-
versal mechanism that can be applied to any MD within the ZSM framework.
With intents, domain-specific semantics can be encapsulated in shared infor-
mation models, and endpoints based on intents can leverage a generic knowl-
edge management service for the life-cycle management of intent objects. In
line with this, Gomes et al. introduced a cutting-edge framework for the
management of autonomous networks within the ZSM framework [30]. This
framework leverages the concept of intent-based models, which are translated
into a set of rules and constraints that drive the configuration and operation
of the network, resulting in a closed-loop control mechanism. One of the key
features of the framework is its ability to continuously monitor the network’s
state and adjust its configuration and operation accordingly, ensuring that
it remains aligned with the specified intent. The framework employs feed-
back mechanisms to collect data from the network and update its configu-
ration and operation in real-time, allowing for rapid adaptation to changing
network conditions. In addition to its closed-loop control capabilities, the
framework also provides abstraction and simplification, presenting the net-
work operator with a simplified view of the underlying infrastructure. This
abstraction reduces the complexity of network management and enhances the
efficiency of decision-making. Results show that the framework can signifi-
cantly improve network performance and stability, compared to traditional
manual approaches to network management. Additionally, results show that
the framework can significantly reduce operational complexity, streamlining
network management and enabling the deployment of more advanced au-
tonomous networks. Future work includes investigating the interoperability
of the proposed framework based on the intent meta-models.
Another Closed Loop-based zero-touch network mAnagement fRAme-
work, CLARA, was developed by Sousa et al. [33]. CLARA’s two main
components are the closed-loop data plane and closed-loop control plane.
The closed-loop data plane is the component of the CLARA framework that

27
realizes and implements the intents defined in the intent definition language.
The data plane is designed to be programmable and flexible, allowing it to
adapt to changing network conditions and accommodate new network ser-
vices. The closed-loop control plane, on the other hand, is the component
of the CLARA framework that monitors and enforces the intents defined in
the intent definition language. It continuously receives feedback from the
network and adjusts its behavior to maintain the desired state of the net-
work. The control plane is responsible for detecting deviations from the
desired state and triggering corrective actions to restore the network to its
desired state. Similar to the results of the previous framework [30], Sousa et
al. demonstrate CLARA’s superiority over traditional network management
approaches in terms of automation, reliability, and scalability. Upcoming
initiatives include integrating ML algorithms into the framework to improve
its accuracy and efficiency. This could include algorithms for network opti-
mization, fault detection and diagnosis, and proactive network management.

4.4. Related Projects


Several organizations and projects are closely related to the ZSM frame-
work, such as the ETSI Experiential Network Intelligence (ENI) ISG and the
TeleManagement (TM) Forum. These organizations focus on exploring use
cases, architectural components, and interfaces related to network automa-
tion. Other projects, such as those funded by the EU’s Horizon 2020 (H2020)
program, are also working towards network automation.

4.4.1. ETSI ENI ISG


This group focuses on AI/ML techniques, context-aware policies, and
closed-loop mechanisms to design a cognitive network management archi-
tecture that provides an effective and adaptive service delivery experience.
ENI aims to enhance the entire management cycle of 5G networks, including
provisioning, operation, and assurance. Coronado et al. state that the out-
puts of ENI in terms of AI/ML algorithms, intent policies, and Service Level
Agreement (SLA) management are to promote service intelligence capabili-
ties in cross-domain cases [8]. Some ENI use cases, such as intelligent network
slice management, network fault identification/prediction, and assurance of
service requirements, are relevant to ZSM.

28
4.4.2. TM Forum Zero-touch Orchestration, Operations and Management
(ZOOM)
TM Forum’s ZOOM project aims to define a new management archi-
tecture of virtual networks and services through automated configuration,
provisioning, and assurance. The guiding principles of ZOOM include near
real-time request execution with no human intervention, open standard Ap-
plication Programming Interfaces (APIs), closed-loop control, and E2E man-
agement [3, 8]. These principles are also shared by ZSM networks.

4.4.3. Distributed Management of Network Slices in Beyond 5G (MonB5G)


This EU-funded H2020 project began on November 1st , 2019, and it is
expected to run for three years and a half to end on April 30th , 2023 [34].
MonB5G works towards providing zero-touch network slice orchestration and
management at massive scales for 5G+ networks. It proposes a hierarchical,
fault-tolerant, and automated data-driven network management system that
focuses on security and energy efficiency. The goal is to split the centralized
management system into distributed sub-systems where the intelligence and
decision-making processes will be split across various components.

4.4.4. Hexa-X
This is another EU-funded H2020 project representing a flagship for the
6G vision [35]. The objective is to interconnect three worlds, namely the hu-
man, physical, and digital worlds, via technology enablers. Over a duration
of 36 months, this project will focus on creating 6G use cases, developing
essential 6G technologies, and defining a new architecture for an intelligent
fabric that weaves together the key technology enablers. In the ZSM do-
main, the Hexa-X project defines AI/ML-driven orchestration as an essential
component for 5G+ networks, which will, in turn, support data-driven and
zero-touch approaches.

4.5. Zero-Touch Network Operations


ZNO is a key concept in the evolution of 5G+ networks, which aims to
automate network operations and reduce the need for manual intervention.
ZSM and AI/ML have a wide range of practical applications in ZNO.

1. Network resource management and optimization: Dynamic resource


allocation, for example, can be automated using ML algorithms to op-
timize resource utilization based on real-time network demands, while

29
network slicing enables the creation of virtualized network slices that
can be automatically customized for specific use cases. MEC can also
be leveraged to automate network functions at the edge, reducing la-
tency and improving the overall user experience.

2. Network traffic control: ML can be used to predict and classify network


traffic, enabling automated network management systems to optimize
network performance and improve the overall user experience. Intel-
ligent routing can also be used to automate the routing of network
traffic based on real-time network conditions, reducing congestion and
improving network efficiency.

3. Energy efficiency: By automating network operations, energy consump-


tion can be optimized based on real-time network demands, which is
critical in achieving the sustainability goals of 5G+ networks.

4. Network security and privacy: ML algorithms can be leveraged to au-


tomate network security functions, including threat detection and re-
sponse, improving the overall security posture of 5G+ networks.

These applications are further discussed in Sections 5-8. Table 3 presents a


comprehensive list of the corresponding schemes and their references.

Table 3: ZNO Applications & Corresponding Schemes

ZNO Application Schemes

Dynamic Resource Allocation [36], [37], [38], [39], [40], [41]


Resource
Network Slicing [42], [43], [44], [45], [46], [47]
Management
Multi-access Edge Computing [48], [49], [50]

Traffic Prediction & [51], [52], [53], [54]


Traffic Control Classification

Intelligent Routing [55], [56], [57], [58]

Energy Efficiency [59], [60], [61]

Network Security & Privacy [62], [63], [64], [65]

30
5. Network Resource Management
To achieve the full potential of the envisioned pervasive network, current
5G networks need improvements. Specifically, automation is limited as net-
work monitoring via analytics is not fully supported [66]. The performance
requirements specified by 3GPP rely on incorporating technologies such as
dynamic resource/spectrum sharing and cognitive zero-touch network orches-
tration for an optimized network.
Network optimization is the art and science of fine-tuning and configuring
a network to achieve the best possible performance, efficiency, and scalability.
This includes optimizing the configuration of network devices, such as routers
and switches, and adjusting the parameters of different protocols and services
that run on the network. The aim is to ensure that the network is running
at its best possible performance, which can include factors such as reducing
network congestion, increasing network throughput, and improving network
availability. It is a continuous process that aims to keep the network run-
ning smoothly and efficiently, with the ultimate goal of providing a reliable
and high-quality service to the users. Traffic engineering, capacity planning,
and network design are among the various techniques and approaches used
for network optimization, which can optimize routing, bandwidth allocation,
and QoS, among other aspects of the network. Monitoring and troubleshoot-
ing tools can also be used to diagnose performance issues in real-time and
automate the network optimization process.
In the context of 5G networks, network optimization becomes even more
crucial due to their unique requirements. These networks are characterized
by high bandwidth, low-latency, and high-concurrency, which demand ad-
vanced techniques for network optimization, such as network slicing, edge
computing, and advanced resource allocation. 5G networks have to handle
a large number of connected devices and services, each with different re-
quirements and characteristics. Resource allocation involves managing the
available network resources, such as bandwidth, processing power, and stor-
age, in order to provide an optimal service to each device and service. This
can include allocating resources dynamically in response to changing network
conditions and user demands, as well as using advanced techniques such as
ML and optimization algorithms to improve the efficiency of resource alloca-
tion. Another important aspect of network optimization in 5G networks is
network slicing, paving the way for efficient and flexible allocation of network
resources, as well as the ability to support diverse and dynamic requirements

31
of 5G networks. Edge computing is also a key technology for network opti-
mization in 5G networks. Edge computing involves moving computing and
storage resources closer to the network edge, where they can be used to
reduce network congestion and improve the responsiveness of the network.
This is particularly important for 5G networks, which will support a wide
range of low-latency and high-bandwidth services, such as virtual reality and
augmented reality applications. Tables 4, 5, and 6 provide an overview of
different proposed schemes and frameworks addressing dynamic resource al-
location, network slicing, and edge computing, respectively.

5.1. Dynamic Resource Allocation


5G+ networks require efficient dynamic network optimization to fully
utilize resources and achieve higher capacity and better QoS with minimal
SLA violations [67]. Effective resource allocation has always been a critical
challenge in wireless communication.
In the context of ZSM, DRL is a promising solution for dynamic resource
allocation problems, which are generally formulated as hard optimization
problems [68]. Iacoboaiea et al. studied the deployment of DRL in a large-
scale zero-touch Wireless Local Area Network (WLAN) for dynamic radio
resource allocation under varying traffic conditions [36]. Their actor-critic
neural network algorithm aims to maximize the E2E performance, where
the WLAN management solution (agent) reconfigures every 10 minutes in
a closed-loop fashion (action) based on the telemetry data received (state).
Telemetry refers to the automatic measurement and wireless transmission of
data from isolated sources. Training on the real WLAN setup would have
led to the exploration of bad network configurations, so they have trained
the DRL agent on a DT (i.e., a digital replica that behaves identically as its
physical WLAN counterpart). This allowed the use of both simulated and
existing network data for training, as utilizing the latter data alone was not
sufficient due to the large number of samples needed for training.
Another scenario is utilizing RL/DRL in 5G networks based on NFV and
SDN technologies. The placement of the VNFs depends on the availability of
physical resources, which in turn can affect network performance and service
latency. As such, inefficient VNF placement and resource utilization might
result in serious performance degradation [38]. Following the ENI and ZSM
standards, Bunyakitanon et al. designed an Adapted REinforcement Learn-
ing VNF Performance Prediction module for Autonomous VNF Placement
(AREL3P) based on the Q-learning algorithm [37]. The algorithm predicts

32
the total service time of an E2E application running VNF video transcod-
ing. Results show the resilience of AREL3P to network dynamics in addition
to the ability to generalize better than SL algorithms, thus tackling adapt-
ability concerns. However, RL approaches slowly converge to the optimal
policy in large-state action sets, rendering it challenging to use in large-scale
5G deployments. This led to DRL, where the intersection of RL and deep
learning helps overcome this limitation [69]. Subsequently, Dalgkitsis et al.
proposed an intelligent VNF placement solution using a deep deterministic
policy gradient RL algorithm [38]. The objective is to minimize the average
E2E latency between the users and the VNFs that compose the uRLLC ser-
vice provided by the network, while considering the distribution of the avail-
able computational resources (CPU, memory, storage) at the network edge.
Results highlight the advantages of the proposed solution over the baseline
algorithm (that rejects any VNF placement request to an edge data center
if it has reached 90% utilization capacity) by achieving the least amount of
SLA violations and the least number of VNF rejections at any traffic level.
To enhance the proposed algorithm in future work, it is suggested to build
the algorithm based on LSTM RNN to provide better insights into the usage
trend of each unit in the network.
For AI/ML to work well in managing network services, it is important
to have a good understanding of how resources are used by the network
and its components. This will allow AI/ML to make better decisions and
improve the user experience. Accordingly, Moazzeni et al. introduced a
Novel Autonomous Profiling method, known as NAP, that can be applied
within the ambit of ZSM for the next generation of NFV orchestration [39].
This NAP method encompasses three key acts:
1. NAP utilizes a weighted resource configuration selection algorithm,
which automatically generates a profiling dataset for VNFs by select-
ing the configuration of resources that have the greatest impact on
the performance goals and Key Performance Indicator targets within a
confined profiling time frame.

2. NAP creates a model to precisely predict the performance metrics for


previously untested resource configurations, where the specified perfor-
mance goals can be attained.

3. NAP employs ML-based techniques to estimate the precise quantity of


resources required to meet both the specified performance goals and

33
the performance metrics in the target environments.

The results obtained from real datasets pertaining to various profiled


VNFs demonstrate that this NAP method can predict the untested con-
figuration of resources as well as the performance metrics with remarkable
accuracy. Therefore, the model generated by the predictor manager, in con-
junction with the proposed NAP method, can be employed for the next gen-
eration of NFV orchestration throughout the entire life-cycle management
of network slices. Future endeavors include expanding the current profiling
work to encompass more resource types, thereby increasing the state space
of profiling predictions exponentially. Additionally, it is intended to extend
this autonomous profiling method to profile VNFs hosted at the edge, en-
compassing scenarios such as network slicing and mobility management.
In 5G networks, the ability to detect and diagnose faults is closely tied
to the ability to allocate resources efficiently. For example, if a failure is de-
tected in a specific part of the network, resources can be reallocated to other
parts of the network to maintain service availability. Similarly, if the net-
work is experiencing congestion, resources can be reallocated to alleviate the
congestion and improve performance. Therefore, efficient and effective fault
detection is necessary to ensure that resources can be reallocated quickly
and efficiently in response to network failures or disruptions. This can be
achieved through the use of advanced monitoring and troubleshooting tools,
as well as automation and orchestration technologies, that can identify and
diagnose faults in real-time and take appropriate action to mitigate them,
signifying the self-healing aspect of ZSM. Sangaiah et al. designed an auto-
matic self-healing process that tackles both detection and diagnosis of faults
in 5G+ networks using two sets of data collected by the network [40]. The
performance support system data is automatically collected by the network,
while drive test data is manually collected in three call scenarios: short, long,
and idle. The short call scenario is used to identify faults during call setup,
the long call scenario is designed to identify handover failures and call in-
terruption, and the idle mode is used to understand the characteristics of
standard signals in the network. The complete and correct call rate quality
criterion was utilized for identifying faults in the network. Examples of de-
tected and diagnosed faults include congestion and failures in traffic channel
assignments. Recognized faults using this criterion were then processed fur-
ther to determine the root cause of the fault. The data was separated into
two categories, traffic and signaling data, and the issues associated with each

34
section were identified individually. As the data available was unlabeled raw
data, a clustering algorithm method (i.e., unsupervised learning approach)
was employed. By applying different algorithms with varying numbers of
clusters, five clusters with a Silhouette coefficient of 0.4509 were obtained for
traffic data and 6 clusters with a Silhouette coefficient of 0.503 for signaling
data. Each cluster represented a specific cause for a fault in the network.
Finally, various classification algorithms were applied to the labeled data
obtained from clustering to evaluate the results accurately. The best accu-
racy in test data was achieved by combining the results of different classifiers
through opinion voting for both traffic and signaling data. One root cause
of a fault is the lack of capacity in the traffic and signaling channel. In that
case, the proposed solution is to increase the capacity (scaling) and allocate
dynamic resources, specifically capacity, to the required channels according
to the network traffic situation. Future work suggests examining subscriber
complaint data in more detail, including the explanations that the subscribers
provide to the complaints center, to identify the fault type and analyze its
cause [40].
The realm of failure recovery in networks is comprised of two distinct
approaches, known as Proactive Failure Recovery (PFR) and reactive fail-
ure recovery. This recovery process involves three key stages, including the
deployment of backup VNFs and image migration, flow reconfiguration, and
state synchronization. However, the execution of each stage incurs a signif-
icant delay, resulting in not only a decline in network performance but also
a violation of SLAs due to prolonged interruption of service [70]. By uti-
lizing failure prediction, the PFR approach can decrease recovery delay by
initiating certain stages of the recovery procedure prior to the manifestation
of the failure. For instance, PFR can save delays in flow rescheduling and
backup launch by initiating these stages beforehand. In this manner, if we
are able to recover failed VNFs using PFR, the performance of the network
can be significantly improved by reducing interruption time during recovery.
This motivates the proposal of a PFR framework for future 6G networks.
Given the constraints of resources and the maximum allowable interrup-
tion time caused by failures, Shaghaghi et al. established a network that
is both highly reliable and resource-efficient by introducing Zero-Touch PFR
(ZT-PFR) approach [41]. This approach utilizes DRL to enhance the fault-
tolerance of networks enabled by NFV. This is formulated as an optimization
problem that aims to minimize a weighted cost, which takes into account fac-
tors such as resource costs and penalties for incorrect decisions. Shaghaghi et

35
al. adopted state-of-the-art DRL-based methods such as soft-actor-critic and
proximal-policy-optimization as solutions to this problem. To train and test
the proposed DRL-based framework, the authors construct an environment
simulator using a simulated model of impending failure in NFV-based net-
works inspired by ETSI. Additionally, to capture time-dependent features,
the agents are equipped with LSTM layers. Additionally, the concept of age
of information is applied to balance between event-based and scheduled mon-
itoring in order to ensure that network status information is up-to-date for
decision-making. Given the ever-changing nature of NFV environments, it
is important to develop learning methods that are online, fast, and efficient.
Thus, further research in this direction could be of great interest [41].
To fulfill the 5G vision in terms of E2E automation and resource shar-
ing/allocation, the 5GZORRO project [71] has been launched by the H2020
program. Its main objective is to utilize distributed AI and Distributed
Ledger Technologies (DLTs) to design a secure and trusted E2E zero-touch
service and network management and orchestration within the 5G network
with a shared spectrum market for real-time trading on spectrum alloca-
tion. While AI is a pillar behind a zero-touch cognitive network orchestrator
and manager, DLT (or blockchain technology) is a protocol that enables
the secure and trusted functioning of a distributed 5G E2E service chain.
Thus, the 5GZORRO framework creates a 5G service layer across different
parties where SLAs are monitored, spectrum is shared, and orchestration
is automated [71, 66]. Another project launched by the H2020 is the 6G
BRAINS project which started on January 1st , 2021, and is expected to run
for 36 months [72]. It focuses on developing an AI-driven multi-agent DRL
algorithm for dynamic resource allocation exceeding massive machine-type
communications with new spectrum links, including THz, Sub-6 GHz, and
optical wireless communications. The aim is to improve the capacity, relia-
bility, and latency for various vertical sectors, such as eHealth and intelligent
transportation.

5.2. Network Slicing


Network slicing is a powerful and innovative technology that allows for
the creation of multiple virtual networks within a single physical infrastruc-
ture. This technology allows for a flexible and efficient allocation of resources,
enabling service providers to offer customized services to different customers
or types of traffic [73]. With network slicing, service providers create virtual

36
Table 4: Dynamic Resource Allocation Schemes

Scheme Description Outcomes Future Work

DRL in Implement a DRL-based dy- Maximize E2E perfor- Establish sufficient


WLAN [36] namic radio resource allocation mance. trust for true fully-
for WLAN management under automated zero-touch
varying traffic conditions. operation.

AREL3P Autonomously place VNFs • Adapt resiliently to Address the slow


[37] in 5G networks using a Q- network dynamics. convergence of this
learning-based VNF perfor- • Generalize better scheme in large-scale
mance prediction module. than SL algorithms. networks.

DRL-based Design an intelligent VNF • Reduced E2E Explore the use of


VNF placement solution in 5G net- latency LSTM RNN to pro-
Placement works using a deep determin- • Low SLA violations vide better insights.
[38] istic policy gradient-based • A low number of
method. VNF rejections at
any traffic level

NAP [39] Design an autonomous profiling Accurately predict Extend the work by
method for the next generation untested resource covering additional
of NFV orchestration in 5G configurations. types of resources.
networks.

Self-healing Autonomously carry out a This process improves Examine subscriber


Process [40] self-healing process tackling network component complaint data to
both detection and diagnosis of understanding. identify the fault type
faults in 5G+ networks. and analyze its cause.

ZT-PFR Enhance the fault-tolerance of Network status infor- Apply online learning
[41] networks through a DRL-based mation is up-to-date ML models to address
zero-touch proactive failure for decision-making. the ever-changing
recovery scheme nature of NFV envi-
ronments.

networks, or “slices”, with unique characteristics and policies, such as differ-


ent levels of security, reliability, and bandwidth. This allows for the creation
of specialized networks for various use cases, such as IoT devices, industrial
automation, and enhanced mobile broadband. In turn, this technology is
deemed useful in the context of 5G networks, which are expected to support
a wide range of use cases. SDN, NFV, and cloud computing are the key
enablers needed to realize network slicing [74].
Nonetheless, despite its many benefits, network slicing also presents a
number of challenges [75]. One of the main challenges is the management
and orchestration of the slices. As the number of slices increases, it becomes
increasingly difficult to manage and monitor them all effectively. Addition-

37
ally, managing the allocation of network resources across multiple slices can
be complex and time-consuming, especially as the demand for each slice fluc-
tuates over time. Accordingly, it is necessary to have a reliable management
system to automate the process of creating, configuring, and deploying slices,
monitor their performance, and troubleshoot any issues that may arise. An-
other challenge lies in the need for advanced security mechanisms to protect
virtual networks from unauthorized access and malicious attacks. There ex-
ists another need to ensure the interoperability of virtual networks with exist-
ing networks and systems. This requires the development of new standards
and protocols to ensure seamless communication between different virtual
networks and existing systems. Business-wise, the deployment and mainte-
nance of the network slicing technology can be costly, and service providers
must find ways to effectively monetize the services they offer to recoup their
investment.
Network slicing can unlock new opportunities for service providers, but
it also poses a number of technical and operational challenges that are be-
ing addressed academically. One of the main areas of research has focused
on developing automated methods for creating, configuring, and deploying
network slices without any manual intervention. This includes the use of
AI/ML algorithms to predict network resource demand and dynamically al-
locate resources to different slices. For instance, Casale et al. proposed a
ML-based approach to predict network resource demand and dynamically
allocate resources to different slices [42]. The proposed approach is able to
adapt to changes in network conditions and user demands, and make real-
time decisions about the allocation of resources to different slices. In fact,
the proposed algorithm is based on RL, which learns from past decisions to
improve the performance of future decisions. The algorithm uses a combina-
tion of decision-making policies, including a greedy policy, a random policy,
and a Q-learning policy. Results compare the performance of the proposed
approach using simulations and compare it to traditional static allocation
methods. The approach shows a better performance in terms of resource
utilization, by allocating resources to the slices that need them the most.
However, this approach has some limitations in terms of scalability, as it
requires a large amount of data to train the ML models. Additionally, it
assumes that the network conditions are static and do not change rapidly.
Another area of research has focused on developing zero-touch manage-
ment and orchestration systems for network slicing in 5G+ networks. This
includes the use of SDN and NFV technologies to enable the dynamic cre-

38
ation, configuration, and management of network slices. As such, Vittal et
al. presented HARNESS, a novel High Availability supportive self-Reliant
NEtwork Slicing System for the 5G core, powered by the SON paradigm
[43]. HARNESS intelligently handles control plane User Service Requests
(USRs), ensuring uninterrupted high-availability service delivery for delay-
tolerant and delay-sensitive slices. It addresses scaling, overload manage-
ment, congestion control, and failure recovery of primary slice types, namely
eMBB, uRLLC, and mMTC. The proposed HARNESS mechanism outper-
forms traditional scheduling methods, minimizing dropped USRs and im-
proving response times. Experimentally, HARNESS achieved 3.2% better
slice service high-availability in a minimal active/active cluster configura-
tion. Future work involves scaling the HARNESS framework and exploring
the selective offloading of control plane USRs on smart network components
for different slice types in a 5G system.
In the context of scaled systems, Chergui et al. proposed a distributed
and AI-driven management and orchestration system for large-scale deploy-
ment of network slices in 6G [44]. The proposed framework is compliant
with both ETSI standards, ZSM and ENI, focusing on autonomous and in-
telligent network management and orchestration to enable autonomous and
scalable management and orchestration of network slices and their dedicated
resources. Future work suggests mapping the framework to different architec-
tures to test its effectiveness. Another compliant framework was introduced
by Baba et al., representing a resource orchestration and management archi-
tecture for 5G network slices. This framework comprises a per-MD resource
allocation mechanism and an MD interworking function, aimed at facilitating
the provision of E2E network services over network slices in the context of
5G evolution [45]. This proposed architecture is underpinned by a plethora
of standard APIs and data models, and its efficacy is demonstrated through
the successful orchestration across multiple domains, and the automation of
closed-loop scenarios. The architecture has been verified and certified as a
proof of concept by the ETSI ZSM.
Similarly, Afolabi et al. proposed a novel and comprehensive global E2E
mobile network slicing orchestration system (NSOS) that enables network
slicing for next-generation mobile networks by considering all aspects of the
mobile network spanning across access, core, and transport parts [46]. The
high-level architecture of the system comprises a hierarchical structure, in-
cluding a global orchestrator and multiple domain-specific orchestrators and
their respective system components. The focus of the system is on allowing

39
customers to request and monitor network slices only, while the proposed
Dynamic Auto-Scaling Algorithm (DASA) ensures that the system can react
instantly to changes in workload. The DASA includes both proactive and
reactive provisioning mechanisms, where the proactive mechanism relies on a
workload predictor implemented using ML techniques, and the reactive provi-
sioning module triggers asynchronous requests to scale in or out the different
entities of the NSOS. The core of the solution is a resource dimensioning
heuristic algorithm which determines the required amount of computational
and virtual resources to be allocated to the NSOS for a given workload so
that a maximum response time of the NSOS is guaranteed. Namely, the
resource dimensioning algorithm is based on a queuing model and will be
invoked when a provisioning decision is taken to decide how many resources
have to be requested or released. The system’s performance is evaluated
through system-level simulations, showing that the algorithm is able to find
the minimal required resources to keep the mean response time of the NSOS
under a given threshold. The response time is defined as the sum of all pro-
cessing and waiting times experienced by a slice orchestration request (e.g.,
slice creation or release) when passing through different NSOS’s entities dur-
ing its lifetime in the orchestrator. The simulation results also suggest that
the request rejection rate during a given period is determined by the reaction
time of the reactive provisioning mechanism, which is in turn affected by the
slice’s instantiation time [46]. As CPU resources are the only resources taken
into account, the inclusion of other resources, such as memory, is encouraged
in the future.
As yet, the NSOS has been purely focused on the technical aspect. Bre-
itgand et al. delved into the issue of coordinating and orchestrating busi-
ness processes across domains in order to facilitate efficient resource sharing
among multiple Communication Service Providers (CSPs) [47]. The lack
of a standard for this aspect of inter-CSP collaboration is identified as a
major hurdle for achieving optimal resource utilization. To address this,
Breitgand et al. proposed a set of design principles that include autonomy
for CSPs in their business and technical processes, non-intrusive extensions
to existing NFV MANO frameworks, preservation of slice isolation, sepa-
ration of concerns between technical and business aspects of orchestration,
and a cloud-native declarative orchestration approach using Kubernetes as
the cross-domain control plane. The proposed dynamic NS scaling occurs
through collaboration between CSPs facilitated by DLT transactions. This
approach utilizes ML techniques to automate the process of extending slices

40
and ensuring QoS requirements, which is inspired by ETSI’s ZSM closed-
loop architecture. This orchestrator is demonstrated on the 5GZORRO vir-
tual content delivery network use case scenario for highly populated areas.
Content delivery networks are geographically distributed networks of compu-
tation and storage resources that offer high-availability and high-performance
services such as web content, application data, and live/on-demand streaming
media. The proposed approach has been validated in a development envi-
ronment, and future work will involve evaluating it in a larger testbed and
with additional use cases to quantify the benefits of inter-CSP slice scaling
[47].

5.3. Multi-access Edge Computing


MEC represents a paradigm shift in the delivery of computing resources
and services within networks. MEC leverages the inherent properties of 5G+
networks to provide low-latency, high-bandwidth data processing and storage
resources that are situated in close proximity to mobile devices and users [76].
This distributed deployment model enables the real-time processing of data,
thereby eliminating the need for extensive data transfer over long distances,
reducing the burden on the network, and leading to a significant improvement
in the user experience and the overall network performance.
From a technical standpoint, MEC is based on the principles of cloud
computing and NFV, and involves the placement of virtualized computa-
tional and storage resources at the network edge. This creates a highly dis-
tributed computing infrastructure, which can be dynamically reconfigured
and optimized to meet the changing demands of mobile users [76]. The use
of NFV ensures that the MEC infrastructure is flexible, scalable, and easily
manageable, and can support a wide range of services and applications. With
the integration of MEC into 5G to support time-sensitive applications, it be-
comes essential to incorporate this decentralized computing infrastructure
into the 5G network slicing framework [48]. To ensure low E2E latency, the
management of slice resources must be comprehensive and extend through-
out the entire application service. With this goal, Bolettieri et al. investi-
gated the integration of the 3GPP network slicing framework with the MEC
infrastructure, in order to ensure efficient management and orchestration
of latency-sensitive resources and time-critical applications [48]. To achieve
this, the authors present a novel slicing architecture that manages and coordi-
nates slice segments across all domains. This design approach also aligns with
multi-tenant MEC infrastructure using nested virtualization, where each slice

41
Table 5: Network Slicing Schemes

Scheme Description Outcomes Future Work

RL-based Predict network resource Adapt to changes in Test the scalability of


Resource demand using RL and dy- network conditions and the approach for large-
Allocation namically allocate resources user demands. scale networks.
[42] to different slices.

HARNESS • Introduce a • HARNESS • Scale the proposed


[43] high-availability and outperforms HARNESS
self-reliant network slicing least-loaded framework.
system for 5G core, built scheduling, • Evaluate the
on the principles of SON. minimizing dropped proposed algorithms
• Intelligently schedule and USRs that could by selectively
serve control plane USRs, impact slice offloading the
for seamless high-availability. scheduling of
high-availability service • HARNESS optimizes frequent and rare
delivery, in both the utilization of slice control plane USRs
delay-tolerant and resources, ensuring on smart network
delay-sensitive slices. uninterrupted slice components for
services. different slice types.

ML-based Design a distributed and • Scalable Map the framework to


Distributed AI-driven MANO system • Compliant with both different architectures
MANO for large-scale deployment of ETSI standards, ZSM to test its effectiveness.
System [44] network slices in 6G. and ENI

Resource Propose a per-MD resource • Architecture has been Address the challenges
MANO allocation mechanism and verified and certified of security and privacy
Architecture an MD interworking func- as a proof of concept in network slicing.
for 5G tion, aimed at facilitating by the ETSI ZSM.
Network the provision of E2E net- • It supports multiple
Slices [45] work services over 5G net- use cases and services.
work slices. • It is a flexible and
scalable architecture.

NSOS [46] Design a global E2E mobile • Instantly react to Account for additional
network slicing orchestration workload changes resources other than
system that enables network with DASA. CPU.
slicing for NGNs taking • Find minimal
into account access, core, resources to maintain
and transport parts of the the NSOS response
network. time threshold.

Cross- Orchestrate cross-domain • Enhance security and Evaluate the model


Domain business processes for trust with smart with a larger testbed
Orchestra- efficient resource shar- multi-party contracts. and additional use
tion and ing among CSPs using • Enable efficient cases to quantify the
Dynamic Kubernetes-based declar- utilization of benefits of inter-CSP
Scaling ative orchestration and resources by scaling slice scaling.
among CSPs DLT-based dynamic slicing slices up or down
[47] scaling. based on traffic
demands.
• Support multi-domain
slicing, optimizing
resource allocation
across different
network domains.

42
segment is assigned specific management and orchestration responsibilities.
However, to improve the feasibility of MEC application relocation between
tenants, a solution must be devised that enhances the interaction between
the proposed architecture and the 5G CN functions, facilitating the synchro-
nization of traffic forwarding rules across various administrative domains.
Another framework leveraging MEC network technology was introduced by
Wu et al. to allow Autonomous Vehicles (AVs) to adapt to changing driv-
ing conditions by sharing their driving intelligence [49]. In this framework,
named Intelligence Networking (Intelligence-Net), driving intelligence refers
to a trained neural network model for autonomous driving. Key features
Intelligence-Net include:
• Sharing of driving intelligence: A unique MEC network-assisted Intelligence-
Net is proposed to facilitate real-time sharing of driving intelligence
between AVs, allowing for adaptation to changing environmental con-
ditions.
• Segmentation of roads: The road is divided into segments, each with
its own dedicated driving model tailored to its specific environmental
features, reducing the dimensionality of each road segment.
• Continuous model updates: Whenever a specific road segment experi-
ences environmental changes, new data is collected and used to retrain
the generic base driving model, improving its ability to adapt to the
new conditions.
• Secure and efficient learning: To ensure security and efficiency, the
framework implements blockchain-enabled federated learning, which
combines the privacy benefits of federated learning with the reduced
communication and computation costs of transfer learning. The blockchain
technology authenticates learning participants and secures the entire
learning process.
Simulation results indicate that this solution can produce updated driving
models that better adapt to environmental changes compared to traditional
methods. AVs can then adopt these changes by downloading the updated
driving models. The proposed Intelligence-Net framework has yet to fully
leverage the available resources. While it poses a challenge, utilizing hetero-
geneous edge computing resources to optimize the system remains a desirable
objective in the future.

43
The integration of MEC into 5G+ networks is a key enabler of ZSM,
which represents a new approach to network management [50]. With MEC,
network functions can be deployed and managed dynamically, providing the
necessary processing and storage resources to support the rapidly chang-
ing demands of mobile users. This enables ZSM to provide dynamic and
efficient network management, improving the user experience and reducing
operational costs. Following this, Sousa et al. introduced a self-healing archi-
tecture based on the ETSI ZSM framework for multi-domain B5G networks
[50]. This architecture utilizes ML-assisted closed control loops across vari-
ous ZSM reference points to monitor network data for estimating end-service
QoE KPIs and to identify faulty network links in the underlying transport
network. To demonstrate this architecture in action, the authors have instan-
tiated it in the context of automated healing of Dynamic Adaptive Streaming
over HTTP video services. Two ML techniques, online and offline, are pre-
sented for estimating an SLA violation through a QoE probe at the edge and
identifying the root cause in the transport network. Experimental evalua-
tion indicates the potential benefits of using ML for QoS-to-QoE estimation
and fault identification in MEC environments. Further work will consider
improvements to the ML pipelines, such as model generalization.

Table 6: MEC-based Schemes

Scheme Description Outcomes Future Work

MEC-enabled Manage and coordinate This framework aligns Facilitate synchroniza-


Network Slicing slice segments across all with the multi-tenant tion of traffic forward-
Framework [48] domains. MEC infrastructure ing rules across domains
using nested virtual- to improve MEC ap-
ization. plication relocation
feasibility between ten-
ants.

Intelligence-Net Share driving intelli- • Continuous model Utilize heterogeneous


[49] gence in AVs to adapt updates edge computing re-
to changing driving con- • Secure and efficient sources to optimize the
ditions. learning system.

Self-healing Use ML-assisted control This approach is a Refine the ML pipeline


Architecture loops across various non-invasive and in terms of model gen-
based on the ZSM reference points encrypted network- eralization, for example.
ETSI ZSM to monitor data and level traffic monitoring
Framework [50] identify faulty links for approach.
B5G networks.

44
6. Network Traffic Control
Network traffic control is an essential aspect of modern networking that
aims to effectively manage and regulate the flow of data packets in a net-
work [77]. Its primary goal is to ensure the efficient utilization of network
resources, prevent congestion, and guarantee a high level of service quality
for all network users.
One key component of network traffic control is traffic prediction, which
uses advanced techniques such as ML to forecast future network traffic pat-
terns. This information is used to proactively manage network congestion
and optimize network resources, thereby ensuring that data is delivered in
a timely and reliable manner [78]. Another crucial aspect of network traf-
fic control is intelligent routing. This process uses advanced algorithms and
data analysis to determine the most efficient path for data to travel from its
source to its destination. The routing algorithm takes into account various
inputs, such as network conditions, available resources, and traffic patterns
to make informed decisions on how to route data [55].
When combined, traffic prediction and intelligent routing form a powerful
system for network traffic control. By providing a comprehensive understand-
ing of network traffic patterns and conditions, this system enables network
administrators to make informed decisions on how to best manage and regu-
late the flow of data. This ultimately leads to a more efficient use of network
resources, improved performance, and a higher level of service quality for all
network users.

6.1. Traffic Prediction & Classification


Network traffic prediction and classification involves forecasting the vol-
ume and nature of network traffic that will transpire in the future, as well
as identifying and categorizing the various types of traffic that are currently
traversing the network [78]. This process is of paramount importance in the
realm of network planning, optimization, and security, as it allows network
administrators to anticipate and prepare for changes in network traffic, and
to ensure that the network is operating at peak efficiency and security.
In the context of 5G networks, the importance of network traffic pre-
diction and classification is accentuated even further. The advent of 5G
networks foreshadows a new era of network connectivity, characterized by
unprecedented speed and capacity, as well as the integration of a wide ar-
ray of new technologies, such as virtual and augmented reality, autonomous

45
vehicles, and IoT devices. With this increased complexity and diversity of
network traffic, it becomes all the more crucial to be able to predict and
classify network traffic with a high degree of accuracy. This is essential for
ensuring that the network can handle the increased demand, and that the
various types of traffic are properly managed and optimized. Furthermore,
5G networks are designed to be highly dynamic and adaptable, capable of
adjusting to changes in traffic patterns in real-time. This makes it even more
imperative to have accurate and up-to-date traffic predictions and classifica-
tions.
The integration of ZSM technology into 5G networks serves to elevate the
already intelligent process of network traffic prediction and classification to
new heights of precision and automation. ZSM, as a SON technology, enables
5G networks to automatically configure and optimize themselves in real-time,
based on the predictions and classifications of network traffic [7]. Through
the utilization of advanced ML algorithms and analytics, ZSM conducts a
thorough analysis of historical traffic data, which is then employed to predict
future traffic patterns with a high degree of accuracy. These predictions, in
turn, are utilized to optimize network resources and configure the network
to handle the expected traffic with optimal efficiency. Furthermore, ZSM
employs advanced analytics to classify the various types of traffic traversing
the network, such as voice, video, and data, and to identify specific applica-
tions and services being utilized by the network’s users. This information is
then used to optimize network performance, ensuring that the different types
of traffic are properly managed and delivered to their intended destinations
with the utmost precision.
Several approaches have been proposed for predicting and classifying net-
work traffic in 5G networks. Table 7 presents an overview of the traffic
prediction schemes discussed in this survey. One of the most widely used
methods is ML, which can be used to identify patterns and trends in net-
work traffic data. Various ML algorithms, such as ANNs, decision trees, and
SVMs, have been applied to network traffic prediction and classification in
5G networks. For example, in a study by Fan and Liu, the authors apply
supervised SVM and unsupervised K-means clustering algorithms for net-
work traffic classification [51]. The dataset includes flow parameters directly
obtained from packet headers, such as segment size and packet inter-arrival
time. The dataset is manually labeled with ten traffic types, including multi-
media, mail, database, and attacks. The comparison of these two algorithms
highlights the enhanced performance of the SVM model compared to the K-

46
means clustering algorithm. However, K-means is able to characterize new
or unknown application types as training samples do not require manual la-
beling in advance. Future work aims to use the proposed model to predict
the number of user plane 5G network functions needed to manage the data
plane traffic in a virtualized environment. The next steps comprise applying
the described classification models to real SDN traffic data.
DL is another approach that has been proposed for network traffic pre-
diction and classification in 5G networks. DL algorithms, such as CNNs
and RNNs, have been shown to be particularly effective for this task. The
advantage of DL is that it can automatically learn features from raw data,
which reduces the need for feature engineering. In a study by Jaffry et al.,
the authors proposed a RNN with an LSTM approach for cellular traffic pre-
diction using real-world call data records [52]. The call data record utilized
was published by Telecom Italia for the Big Data Challenge competition and
collected for 62 days starting November 1st , 2013 [79]. Sample data collected
includes country code, inbound/outbound SMS activities, and inbound/out-
bound call activities. The proposed LSTM model has a hidden layer with
50 LSTM cells followed by a dense layer with one unit. Results highlight
the enhanced performance of the proposed model over vanilla neural net-
works and statistical autoregressive integrated moving average model. This
work can be extended by using this model to design an autonomous resource
allocation scheme for 5G+ networks. Similarly, Alawe et al. presented a
combination of LSTM and deep neural networks to proactively predict the
number of resources along with the network traffic to manage and scale the
CN in terms of the resources used for the Access and Mobility Management
Function (AMF) in 5G systems [53]. Their experimental results reveal that
the use of ML approaches improves the scalability and reacts to the change
in traffic with lower latency. Gupta et al. delved into the examination of
various DL models, namely the Multi-Layer Perceptron (MLP), Attention-
based Encoder Decoder, Gated Recurrent Unit (GRU), and LSTM, on the
Dataset-Unicauca-V2 mobile-traffic dataset [54]. This dataset comprises a
compendium of six days of mobile traffic data, boasting a total of 87 features
and 3,577,296 instances. The data presented was procured from the web
division of Universidad Del Cauca, Colombia, Popayan, where it was meticu-
lously recorded over a span of six days, specifically April 26, 27, 28 and May 9,
11, 15, 2017, at various hours, including both morning and evening. The data
has been meticulously classified into four distinct categories, namely stream-
ing, messaging, searching, and cloud. Each sample contains comprehensive

47
information about IP traffic generated by network equipment, including the
IP address of origin and destination, port, arrival time, and layer 7 protocol
(application). As for performance metrics, recall, precision, and f1-score were
utilized to measure the performance of the models. Findings indicate that the
MLP and Encoder-Decoder models yielded average results for mobile-traffic
forecasting, while the GRU and LSTM models performed exceptionally well,
with the latter yielding the optimal outcome. In the future, Gupta et al. aim
to investigate other time-prediction approaches for resources and to work in
MEC in industrial IoT applications to support industry 4.0.
Table 7: Traffic Prediction Schemes

Scheme Description Outcomes Future Work

SVM and Predict and classify the • SVM shows enhanced • Predict the number of
K-means network traffic in 5G performance user plane 5G network
Algorithms for networks. compared to functions to manage
Network K-means. the data plane traffic.
Traffic • K-means is suitable • Apply the proposed
Classification for new/unknown models to real SDN
[51] application types. traffic data.

DL-based Predict cellular traffic The proposed model Extend the work to de-
Cellular Traffic with RNN LSTM model outperforms vanilla sign an autonomous re-
Prediction [52] using real world call neural networks and source allocation scheme
data record. autoregressive integrated for 5G+ networks.
moving average model.

Scaling AMF Proactively predict the The proposed model Utilize the proposed
in 5G Systems number of resources in reacts to the change model to estimate the
[53] addition to the network in traffic with lower number of user plane 5G
traffic to manage re- latency. CN functions needed to
sources and scale the handle the traffic.
AMF in 5G systems.

DL-based Examine various DL LSTM yields optimal • Investigate other


Mobile Traffic models on the Dataset- outcomes followed by time-prediction
Prediction [54] Unicauca-V2 mobile- GRU, encoder-decoder, approaches.
traffic dataset. and MLP models, re- • Work in MEC in
spectively. industrial IoT
applications to
support industry 4.0.

6.2. Intelligent Routing


With the advent of intelligent routing, networks are now equipped to
adapt and optimize routes in real-time, taking the first step towards self-
driving zero-touch networks. Accordingly, Hu et al. unveiled EARS, an

48
intelligence-driven experiential network architecture that seamlessly inte-
grates the cutting-edge technologies of SDN and DRL to usher in the era
of intelligent and autonomous routing [55]. Hu et al. addressed the limita-
tions of conventional routing strategies, which are heavily reliant on manual
configuration, and introduced a DRL algorithm to optimize data flow rout-
ing. This DRL agent is highly adaptable, learning routing strategies through
its interactions with the network environment. Furthermore, EARS incorpo-
rates advanced network monitoring technologies, such as network state col-
lection and traffic identification, to provide closed-loop control mechanisms,
enabling the DRL agent to optimize routing policies and enhance network
performance. As such, EARS, an intelligence-driven experiential network ar-
chitecture, harnesses the Deep Deterministic Policy Gradient (DDPG) algo-
rithm to dynamically generate routing policies. DDPG utilizes a deep neural
network to simulate the Q-table, and another neural network to produce the
strategy function, allowing it to effectively tackle large-scale continuous con-
trol problems. Through relentless training, EARS can learn to make better
control decisions by interacting with the network environment, and adjust-
ing services and resources based on network requirements and environmental
conditions. Simulations, which compare EARS with typical baseline schemes
such as Open Shortest-Path First [80] and Equal-Cost Multi-Path Routing
[81], demonstrate that EARS surpasses these schemes by achieving superior
network performance in terms of throughput, delay, and link utilization. As
a future undertaking, the algorithm will be evaluated in a real-world SDN
setup. Similarly, Tan et al. introduced a load balancing algorithm, the Re-
liable Intelligent Routing Mechanism (RIRM), designed to optimize traffic
data routing based on the traffic load in 5G CNs [56]. This algorithm takes
into account multiple factors, such as the shortest path, link latency, and node
loading to prevent packet loss. It has been implemented on a 5G testbed,
free5GC, by adding two elements, the RIRM traffic tracker and the RIRM
traffic controller. The tracker continuously monitors user traffic data flow
and reports to the controller, which then calculates the best route to avoid
congestion. Experimental results demonstrate that the proposed algorithm
surpasses traditional round-robin load balancing methods in terms of packet
loss, latency, and average data throughput. This algorithm is specifically
tailored for the uRLLC use case of 5G, and further research will explore its
performance in other use cases, such as mMTC.
One application of 5G access networks is Flying Ad Hoc Networks (FANETs)
that utilize unmanned aerial vehicles as nodes to provide wireless access.

49
These networks are characterized by their limited resources and high mobil-
ity, which pose significant challenges for efficient routing [57]. In this context,
intelligent routing is a crucial component of FANETs, as it enables the net-
work to adapt to the dynamic conditions of the environment and optimize the
use of resources. The use of advanced techniques, such as DRL, is a promis-
ing approach to addressing these challenges. Deep Q-Network (DQN)-based
vertical routing, proposed by Khan et al., is an example of such an intelligent
routing approach, which aims to select routes with better energy efficiency
and lower mobility across different network planes [57]. DQN, the underly-
ing mechanism of this proposed routing method, leverages the principles of
RL and DL to empower an agent to make informed decisions based on the
current state of the system. The integration of deep neural networks within
DQN enables the agent to learn about various states, such as the residual
energy and mobility rate, to predict optimal actions. The training process
of DQN involves the use of mini-batches of experiences, and there are three
vital features that aid in achieving an accelerated convergence toward the
most optimal route:
1. Delayed rewards from the replay memory allow for better prediction of
state-action values in a dynamic environment.
2. The decaying variable shifts the focus from exploration to exploitation
as the number of episodes increases.
3. Mini batches of run-time states from the replay memory are used to
train and minimize a loss function.
The main objective is to improve network performance by reducing frequent
disconnections and partitions. The proposed method is a hybrid approach
that utilizes both a central controller and distributed controllers to share in-
formation and handle global and local information, respectively. It is suitable
for highly dynamic FANETs and can be applied in various scenarios, such as
border monitoring and targeted operations. By clustering the network across
different planes, this method offloads data traffic and improves network life-
time. Simulation results show that it can increase network lifetime by up to
60%, reduce energy consumption by up to 20%, and reduce the rate of link
breakages by up to 50% compared to traditional RL methods. Khan et al.
suggested that there are several open issues that can be explored in future
research to improve the proposed routing method for FANETs in 5G access
networks. These include:

50
• Enhancing the vertical routing method to reduce the routing overhead
incurred to establish and maintain inter- and intra-cluster, and inter-
and intra-plane routes, by allowing clusters to adjust their sizes for
optimal performance.

• Exploring other variants of DQN, such as DDPG, which is well suited


for continuous action space, and double DQN, which addresses the issue
of overestimation of Q-values.

• Examining the use of mini-batches from the replay memory to prevent


overfitting and selecting distinctive experiences with equal chances.

• Investigating the method in different scenarios with varying amounts


of types of terrain and types of obstacles.

As the demand for fast communication with minimal delay continues to


rise in the field of intelligent transportation systems, Vehicle-to-Everything
(V2X) communications have become increasingly crucial in enabling the
seamless sharing of information among vehicles. The integration of 5G
technology, particularly millimeter wave (mmWave), holds great promise in
achieving these goals, but also presents unique challenges in terms of beam
alignment and routing stability due to the dynamic nature of vehicle traf-
fic [58]. Addressing these challenges for V2X communications in 5G-based
mmWave systems, Rasheed et al. employed a 3D-based position detection
scheme for beam alignment and a group-based routing algorithm for secure
and stable data transmissions [58]. Vehicles are grouped based on their dis-
tance, direction, and speed, and each group is headed by a leader who is
responsible for authenticating the group members. Routing paths for packet
transmission from the source vehicle are chosen based on the link weight,
which takes into account the distance between neighbors and the direct/indi-
rect trust degree. Data compression and encryption techniques are used to
enhance the security of data transmissions. The effectiveness of the pro-
posed approach is demonstrated through simulations, which show significant
improvements over existing V2X communication schemes such as the Moving
Zone-Based Routing Protocol. A summary of the schemes discussed in this
section is presented in Table 8.

51
Table 8: Intelligent Routing Schemes

Scheme Description Outcomes Future Work

EARS [55] Intelligence-driven EARS adjusts services Test the algorithm in a


experiential network and resources based on real-world SDN setup.
architecture is designed network requirements and
to optimize data flow environmental conditions.
routing using DDPG.

RIRM [56] Design a load balancing The proposed algorithm Test the algorithm in
algorithm to optimize surpasses traditional other use cases, such as
traffic data routing round-robin load balanc- mMTC.
based on the traffic ing methods in terms of
load in 5G CNs for the packet loss, latency, and
uRLLC use case. average data throughput.

DQN-based Select routes with bet- • Improve the network • Reduce routing
Vertical ter energy efficiency and performance by overhead via
Routing [57] lower mobility across reducing frequent adjustable size clusters.
different network planes disconnections and • Explore other variants
for FANETs. partitions. of DQN, such as
• Offload data traffic and DDPG and double
improves network DQN.
lifetime by clustering • Examine the use of
the network across mini-batches from the
different planes. replay memory to
• Work with highly prevent overfitting.
dynamic FANETs. • Evaluate the scheme
with different types of
terrains and obstacles.

3D Position Secure and stabilize • Optimize packet Utilize ECC-based ses-


Detection & data transmissions us- transmission paths in sion keys for secure com-
Group-Based ing a 3D-based position view of the distance munication between
Routing for detection scheme for between neighbors and vehicles.
Secure Data beam alignment and trust levels.
Transmission a group-based rout- • Strengthen data
[58] ing algorithm for V2X transmissions via data
communications in compression and
5G-based mmWave sys- encryption techniques.
tems.

7. Towards Energy Efficiency


Energy efficiency in networking is a crucial aspect that has gained sig-
nificant attention in recent years. The rapid growth of data traffic and the
increasing number of connected devices have led to a significant increase in
energy consumption in networking systems. This has motivated researchers
to focus on developing energy-efficient networking solutions to reduce the
environmental impact and operating costs of networks.

52
One of the key approaches for achieving energy efficiency in networking is
through the use of SDN and NFV technologies [82]. These technologies enable
the dynamic provisioning and scaling of network resources, which can lead
to significant energy savings. Additionally, the use of VNFs can reduce the
energy consumption of network devices by consolidating multiple functions
on a single physical platform. These enablers comprise 5G+ networks.
Energy efficiency in 5G+ networks is a critical aspect of the next gener-
ation of mobile networks as it directly impacts the environmental and eco-
nomic sustainability of these networks. The deployment of 5G+ networks
is expected to result in a significant increase in energy consumption due to
the expansion of network infrastructure, the deployment of new technologies
such as massive multiple-input/multiple-output and mmWave communica-
tions, and the increased demand for high-bandwidth applications [82].
There are several key strategies that have been proposed to improve the
energy efficiency of 5G+ networks. One of the key strategies is the use of
energy-efficient network architecture and protocols. This includes the use
of energy-efficient radio access technologies, such as energy-aware scheduling
and power control, as well as the use of energy-efficient network functions and
infrastructure. Another key strategy is the use of energy-efficient devices,
such as energy-efficient smartphones and IoT devices, which can reduce the
overall energy consumption of the network.
Zero-touch automation is another promising approach to improve the en-
ergy efficiency of 5G+ networks to automate the configuration, management,
and optimization of network functions and infrastructure. This can help to
reduce the energy consumption of the network by reducing the need for man-
ual intervention and by enabling the network to adapt to changes in traffic
demand and network conditions in a more efficient manner.
As such, Omar et al. formulated an optimization problem to compute
a green efficient solution to maximize energy efficiency under minimal area
spectral efficiency and outage probability in a 5G heterogeneous network [59].
Using a convex method, this problem was solved. Results prove that network
densification does not always result in the optimal efficient solution as the
increase in mmWave base stations increases area spectral efficiency and de-
creases energy efficiency. As for introducing mmWave small cells, it has been
established that they improve coverage, and consequently spectral efficiency.
Future work suggests designing deployment strategies that have the environ-
ment in mind. Tackling a specific 5G use case, Dalgkitsis et al. examined
the impact of network automation on energy consumption and overall op-

53
erating costs in the context of 5G networks, specifically for uRLLC services
[60]. A framework, known as Service CHain Energy-Efficient Management
or SCHE2MA, is proposed, which utilizes distributed RL to intelligently
deploy service function chains with shared VNFs across multiple domains.
SCHE2MA framework is designed to be decentralized, eliminating the poten-
tial for central points of failure, allowing for scalability, and avoiding costly
network-wide configurations. Parallelism is also achieved by introducing the
auction mechanism, a system that enables inter-domain VNF migration in
a distributed multi-domain network. Results show the reduction of average
service latency with the enhancement of energy efficiency by 17.1% com-
pared to a centralized RL solution. This approach addresses the important
challenge of balancing the performance constraints of uRLLC services with
energy efficiency in the context of 5G+ networks, where reducing carbon
emissions and energy consumption is of paramount importance. Future work
will focus on implementing the auction mechanism in a fully decentralized
manner by employing DLTs.
In the context of zero-touch, Rezazadeh et al. proposed a framework
for fully automated MANO of 5G+ communication systems which utilizes
a knowledge plane and incorporates recent network slicing technologies [61].
The knowledge plane plays the role of an all-encompassing system within
the network by creating and retaining high-level models of what the network
is supposed to do, in order to provide services to other blocks in the net-
work. In other words, the knowledge plane joins the architectural aspects
of network slicing to achieve synchronization in a continuous control setting
by revisiting ZSM operational closed-loop building blocks. This framework,
known as Knowledge-based Beyond 5G or KB5G, is based on the use of
algorithmic innovation and AI, utilizing a DRL method to minimize energy
consumption and the cost of VNF instantiation. A unique Actor-Critic based
approach, known as the twin-delayed double-Q soft actor-critic method, al-
lows for continuous learning and accumulation of past knowledge to minimize
future costs. This stochastic method supports continuous state and action
spaces while stabilizing the learning procedure and improving time efficiency
in 5G+. It also promotes a model-free approach reinforcing the dynamism
and heterogeneous nature of network slices while reducing the need for hyper-
parameter tuning. This framework was tested on a 5G RAN NS environment
called smartech-v2, which incorporates both CPU and energy consumption
simulators with an OpenAI Gym-based standardized interface to guarantee
the consistent comparison of different DRL algorithms. Numerical results

54
demonstrate the advantages of this approach and its effectiveness in terms
of energy consumption, CPU utilization, and time efficiency. Future direc-
tions suggest the inclusion of different resources, such as memory. Table 9
summarizes the mentioned schemes, including future work to enhance their
performance and utility.

Table 9: Energy Efficiency Schemes

Scheme Description Outcomes Future Work

Green A green efficient solution is • Network Design deployment


Efficiency designed to maximize energy densification does strategies that have
Optimiza- efficiency under minimal not always result in the environment in
tion [59] area spectral efficiency and the best solution. mind.
outage probability in a 5G • Incorporating
heterogeneous network. mmWave small cells
improve coverage and
spectral efficiency.

SCHE2MA Utilize distributed RL to • Eliminate the Employ DLTs to


[60] intelligently deploy service potential for central implement the auction
function chains with shared points of failure. mechanism in a fully
VNFs across multiple do- • Achieve parallelism decentralized manner.
mains for uRLLC services. through the auction
mechanism.

KB5G [61] • A framework for fully • Support continuous Take into account
automated MANO of 5G+ state and action resources other than
communication systems is spaces. CPU, such as memory.
proposed. • Stabilize the learning
• A twin-delayed double-Q procedure.
soft actor-critic algorithm • Improve time
is designed to minimize efficiency in 5G+.
energy consumption and
the cost of VNF
instantiation.

8. Network Security & Privacy


Network security and privacy are fundamental components of modern
communication networks, ensuring the confidentiality, integrity, and avail-
ability of data and information transmitted over these networks. They are
essential for protecting against unauthorized access, malicious attacks, and
data breaches, and for preserving the privacy of network users.

55
8.1. Safeguarding 5G+ Networks: Security Measures & Weaknesses
5G+ networks are characterized by a highly distributed architecture,
which consists of multiple network elements such as the RAN, the CN, and
the transport network [8]. This architecture poses new security challenges,
as it increases the attack surface and creates new attack vectors for mali-
cious actors. 5G+ networks also have a greater number of connected devices
compared to previous generations of communication networks, which further
exacerbates the security and privacy risks. Additionally, the transmission of
data in 5G+ networks is characterized by elevated levels of speed, bandwidth,
and connectivity. This has dramatically increased the volume of data trans-
mitted over these networks. As such, it is imperative to implement robust
security and privacy measures to protect against the threat of cyberattacks
and promote trust in 5G+ networks.
5G+ networks integrate a plethora of security mechanisms to secure the
transmission of data, protect network infrastructure, and prevent unautho-
rized access to sensitive information [83]. These mechanisms include, but are
not limited to, cryptographic algorithms, firewalls, Virtual Private Networks
(VPNs), Intrusion Detection Systems (IDSs), NFV, and SDN. Cryptographic
algorithms, such as Advanced Encryption Standard, ensure the confidential-
ity and integrity of the data transmitted over the network. Firewalls act
as a barrier between the internal and external networks, providing a line of
defense against unauthorized access and malicious attacks. VPNs allow se-
cure communication over an insecure network, and IDSs detect and respond
to security threats in real-time. NFV and SDN technologies abstract the
underlying hardware from the network services, enabling the automation of
network management and reducing the attack surface.
Despite these robust security measures, 5G+ networks remain vulnerable
to a wide range of threats. Threats to 5G networks can be distinguished
based on the technological domains that they impact [64].

1. User Equipment (UE) Threats: These attacks are aimed at targeting


mobile devices of end-users such as smartphones and laptops. This
can encompass the utilization of mobile botnets to launch Distributed
Denial of Service (DDoS) attacks on various network layers, with the
objective of disrupting and shutting down services.

2. RAN Threats: These attacks focus on the RAN, which is responsible


for the wireless connection between the UE and the network. The

56
presence of rogue base stations that launch Man in the Middle (MitM)
attacks can compromise user information, break privacy, track users,
and cause Denial of Service (DoS).

3. CN Threats: The CN is responsible for the management and direction


of data traffic within the network, and is therefore a target for security
threats. These can encompass attacks on elements such as SDN and
VNF components, as well as Network Slicing, leading to DoS, eaves-
dropping, interception, or hijacking.

4. Network Slicing Threats: Network slicing involves creating virtual net-


works within the network to meet specific needs. These attacks target
this concept and can compromise the isolation between slices, thereby
compromising security and privacy.

5. SDN Threats: SDN separates control and user (data) planes, making
it a potential target for malicious actors. These attacks target the link
between the control and user planes and can take the form of distributed
DoS attacks or gaining control over network devices through Topology
Poisoning attacks.

As such, it is imperative for network operators to remain vigilant and proac-


tive in their approach to network security, and to continuously monitor and
update their security solutions to stay ahead of evolving security threats.

8.2. ZSM Security Threats


ZSM is a key aspect of network security and privacy for 5G+ networks. As
previously iterated, it enables the automated deployment and management
of network services, reducing the risk of human error and enhancing network
security. The implementation of ZSM allows network operators to manage
network configurations and deploy network services with unparalleled speed,
scalability, and accuracy, minimizing the potential for security breaches and
ensuring the confidentiality and integrity of the data transmitted over the
network [83]. Nevertheless, the very factors that enable a ZSM system may
also breed various security threats that could obstruct its functionality. These
threats are summarized in Table 10 [83, 10].

57
Table 10: Security Threats within the ZSM Context [83, 10]

Enabler Attack Description


Script Insertion The attack exploits vulnerabilities in systems that treat the
inputted parameter as a script.
SQL Injection Malicious code is inserted into a database via a vulnerable
input, compromising the database’s security
Buffer Overflow The attack is activated by data that falls outside the expected
Attack types or ranges, causing the system to malfunction and
granting access to its memory areas.

Open API Identity Attack Attack attempts to access a targeted API by using a list of
previously compromised passwords, stolen credentials, or
tokens.
DoS Attack An attacker floods the API with a high volume of requests,
rendering it unavailable.
Application and They involve unauthorized access to data, alteration/deletion
Data Attack of data, insertion of malicious code, and disruption of scripts.
MiTM Attack An attacker intercepts the communication between the API
client and server to steal confidential information
Data Exposure An unauthorized individual intercepts information related to
the application’s purpose (e.g., advertising content), exposing
Intents the system’s goals to risks and triggering additional attacks.
Tampering The attacker makes physical modifications to a connection
point or interface.
MITM Attack An attacker intercepts messages between two entities in order
to remotely eavesdrop on or alter the traffic.
Automated Deception Attack The deceiver convinces the target to believe a false version of
Closed- the truth and manipulates the target’s actions to benefit the
Loop deceiver.
DoS Attack A DoS attack overloads the network with a high volume of
traffic, making the network unavailable to its users.
Privilege Escalation An intruder gains access to a target account, bypassing
authorization and gaining unauthorized access to data.
SDN/NFV
Spoofing An attacker sends false address resolution protocol messages
over a local area network.
Adversarial Attack A malicious actor tampers the training data and/or inserts
small perturbations into the test instances.

AI/ML Model Extraction The attack attempts to steal the model’s parameters to
Attack recreate a similar ML model.
Model Inversion The attack aims to recover the training data or the underlying
Attack information from the model’s outputs.

8.3. Advances in 5G+ Network Trust Management


To enhance trust in 5G+ networks, Benzaı̈d et al. introduced a blockchain-
based data integrity framework [62] that ensures the security and privacy of

58
the data processed by ML pipelines. This framework records and stores in-
puts and outputs of ML pipelines on a tamper-evident log, and uses smart
contracts to enforce data quality requirements and validate the data pro-
cessed. By providing transparency and verifiability in the data used by ML
pipelines, the blockchain technology helps detect and correct any data tam-
pering or manipulation, thus improving trust and reliability in ML results and
contributing to the overall security and privacy of 5G+ networks. Addition-
ally, Palma et al. enhanced the security and trustworthiness of 5G+ networks
by integrating Manufacturer Usage Description (MUD) and Trust and Repu-
tation Manager (TRM) into the INSPIRE-5GPlus framework. [63]. MUD is
a standard that provides access control by specifying the type of access and
network functionalities available for different devices in the infrastructure. It
helps to configure monitoring tools and learn about the normal behavior of
devices, which enables the identification of abnormal events on the 5G infras-
tructure. TRM assesses trust in the infrastructure using multiple values and
enables MUD security requirements to be enforced in a trustworthy manner.
The integration of MUD and TRM into INSPIRE-5GPlus enhances security
and trust by enforcing security properties and continuously auditing the in-
frastructure and security metrics to compute trust and reputation values.
These values are used to enhance the trustworthiness of zero-touch decision-
making, such as the ones orchestrating E2E security in a closed-loop. Future
research aims to develop trust tools at each domain level to create a com-
prehensive trust framework. Another study by Niboucha et al. incorporated
a zero-touch security management solution tackling the problem of in-slice
DDoS attacks in mMTC network slices of 5G [64]. The proposed solution em-
ploys a closed-control loop that monitors and detects any abnormal behavior
of MTC devices, and in the event of an attack, it automatically disconnects
and blocks the compromised devices. This was achieved by following 3GPP
traffic models and training a ML model using gradient boosting to identify
normal and abnormal traffic patterns. The detection algorithm then calcu-
lates the detection rate for each device, and the decision engine takes the
necessary steps to mitigate the attack by severing the connection between
the devices and the network, thereby safeguarding against any potential re-
occurrence of similar attacks. Results show its effectiveness in detecting and
mitigating DDoS attacks efficiently. Future lines of research will focus on
utilizing online learning techniques in the event of encountering new forms
of attacks.
In regards to the ZSM paradigm, the use of ML models in the architec-

59
ture raises privacy and resource limitations concerns. In turn, Jayasinghe et
al. presented a multi-stage federated learning-based model that incorporates
ZSM architecture for network automation [65]. The proposed model is a hi-
erarchical anomaly detection mechanism consisting of two stages of network
traffic analysis, each with an federated learning-based detector to remove
identified anomalies. The complexity and size of the detector’s database
vary depending on the stage. The authors simulate the proposed system
using the UNSW-NB 15 network dataset and demonstrate its accuracy by
varying the anomaly percentage in both stages. The results show that the
model reaches a minimum accuracy of 93.6%. Future work aims to increase
the accuracy of the model and apply it in a security analytics framework in
the ZSM security architecture. To summarize, the key findings and future
directions from the schemes analyzed in this section are presented in Table
11.
Table 11: Network Trust Management Schemes

Scheme Description Outcomes Future Work

Blockchain- A framework is designed Enhance trust and Design liability-


based Data to ensure the security and reliability in ML aware trust schemes
Integrity privacy of data processed results. to enable liable E2E
Framework [62] by ML pipelines in 5G+ service delivery in
networks. NGNs.

MUD and Integrate MUD and TRM The proposed Incorporate trust
TRM into INSPIRE-5GPlus. scheme enforces tools at each domain
Integration [63] security properties. level to create a
comprehensive trust
framework.

Zero-touch • Monitor and detect Detect and mitigate Apply online learn-
Security abnormal mMTC device DDoS attacks effi- ing techniques to
Management behavior. ciently. tackle new forms of
Solution for • Automatically disconnect attacks.
In-slice DDoS and block compromised
Attacks [64] devices during an attack.

Multi-stage A two-stage anomaly detec- • Decentralized Apply it in a secu-


Federated tion mechanism is proposed processing rity analytics frame-
Learning-based to analyze network traffic, • Higher privacy work in the ZSM
Model [65] where each stage is equipped • Communication security architecture.
with a federated learning- efficiency
based detector to remove
identified anomalies.

60
9. Network Automation Solutions
The evolution of networking technology has triggered a fundamental shift
in network management, owing to the growing scale and complexity of net-
works, making traditional manual configuration and management techniques
inefficient and error-prone. In this regard, network automation solutions,
such as the ZSM framework, offer a revolutionary approach to network man-
agement, leveraging cutting-edge technologies such as ML, AI, and SDN to
streamline operations and enhance network performance [7]. These method-
ologies play a pivotal role in enabling automated self-management function-
alities of ZSM, thus leading to an enhancement in service delivery and a
reduction in operating expenses.
At the heart of network automation lies the use of software to automate
tedious and time-consuming tasks like device configuration, policy enforce-
ment, and network monitoring. By exploiting programmable interfaces and
open standards, network automation solutions facilitate seamless interoper-
ability and integration across multi-vendor environments, providing a unified
network view, and enabling fast issue resolution and troubleshooting.
Furthermore, network automation solutions leverage the power of AI and
ML to facilitate self-healing and self-optimizing networks [11]. By monitor-
ing network performance in real-time and detecting anomalies, automation
solutions can trigger automated workflows that mitigate potential issues and
restore normal operation. This capability substantially reduces downtime
and boosts network resiliency, while the identification of under/over-utilized
areas provides efficient resource optimization. Additionally, ML algorithms
enhance security by flagging and mitigating potential threats through traffic
patterns and user behavior analysis, offering a proactive response to security
threats such as malware or cyberattacks.
Nevertheless, the application of AI techniques in automation solutions
presents challenges and risks. Although AI and ML techniques enable cog-
nitive processing in the ZSM system, resulting in complete automation, the
performance of such an implementation is not entirely satisfactory [10]. Net-
work operators expect superior service availability and reliability to minimize
network outages and SLA violations, which could result in significant finan-
cial losses.
To address these challenges, one can leverage the power of DTs and Au-
toML. DTs facilitate the creation of virtual replicas of the physical network
infrastructure, enabling operators to test and validate network configuration

61
changes in a simulated environment before deploying them to the live network
[84]. This reduces the risk of misconfiguration, human errors, and downtime.
Moreover, AutoML algorithms enable the automatic processing of data and
the discovery of optimal configurations and models, reducing the need for
manual intervention [14]. This section highlights significant ML challenges
and proposes solutions to address them.

9.1. AI/ML Challenges


The path toward integrating ML into network automation solutions is
not without its obstacles. To unlock the full potential of ML for network
management, it is crucial to identify and tackle the unique challenges that
arise with this innovative technology. By analyzing and understanding the
potential complexities and roadblocks associated with ML integration, orga-
nizations can overcome them and leverage the power of ML using DTs and
AutoML to achieve optimal network performance.

9.1.1. Need for Skilled Personnel


The application of ML in network automation requires a substantial level
of data science expertise, coupled with specialized knowledge of network
infrastructure. However, finding and retaining such skilled personnel is a
formidable task, with the demand for data scientists being high [85]. This
challenge calls for innovative solutions that reduce the level of expertise re-
quired while maintaining optimal performance.
DTs and AutoML represent two such solutions. DTs simulate network
behavior and generate large amounts of training data, automating the data
preparation process. AutoML can automatically select the best-performing
algorithm and parameters without the need for human intervention, elimi-
nating the requirement for a highly skilled workforce. In this way, DTs and
AutoML address the need for data science experts, making network automa-
tion solutions more accessible and cost-effective.

9.1.2. Data Processing Challenges


Data processing is a critical step in ML, as high-quality data is necessary
for training effective models. However, in this era, network data is complex,
heterogeneous, and voluminous, making it challenging to process and analyze
manually [11]. Data processing involves data collection, cleaning, and feature
extraction. Data collection involves acquiring the data from the network,
and cleaning involves removing unwanted or corrupted data points. Feature

62
extraction involves selecting and transforming the relevant features of the
data. These tasks require a significant amount of time and expertise, making
them a bottleneck for efficient ML.
DTs can generate realistic, simulated network data, enabling the auto-
mated processing and cleaning of data through AutoML. By automating
these tasks, organizations can reduce the time and resources required for
data preparation, freeing them up for more complex tasks.

9.1.3. Model Selection Challenges


The selection of an appropriate ML model is also critical for the success
of network automation. With numerous ML models available, selecting the
best one for a specific task is not a trivial task. Model selection entails
evaluating, comparing, and choosing the best model based on performance,
complexity, and other factors, which can be time-consuming and require
specialized expertise [14].
DTs can help address model selection challenges by providing an accu-
rate representation of the network, enabling the comparison and evaluation
of different models based on their performance. Furthermore, AutoML can
automatically identify the best model for a specific network automation task,
based on the characteristics of the dataset and the desired performance met-
rics.

9.1.4. Model Training Challenges


Training an ML model on a large amount of data is a time-consuming task
that requires significant resources, including processing power and memory
[11, 85]. Model training involves data augmentation, hyperparameter tuning,
and regularization. Data augmentation generates additional training data to
supplement the existing data, while hyperparameter tuning optimizes the
parameters of the model to achieve the best performance. Regularization
adds constraints to the model to prevent overfitting.
DTs can provide a simulated network environment that can generate
additional data to supplement real-world data, reducing the burden on real-
world resources. AutoML can also help organizations optimize the training
process by automatically selecting the most efficient and effective models and
hyperparameters for a given task.

63
9.1.5. Dynamic Nature of Wireless Networks
The uncertainty and dynamic nature of wireless networks represent signif-
icant challenges in the context of network automation. Wireless networks are
subject to a range of factors that can impact network performance, includ-
ing interference, noise, and mobility of devices. The ever-changing nature of
these networks makes it difficult to build accurate and reliable models that
can effectively control and predict network behavior [10, 7]. Specifically,
unpredictable changes in data streams can impair the performance of ML
models. This challenge can be particularly problematic in today’s fast-paced
environment, where rapid response to network issues is critical.
To overcome these challenges, DTs can be employed to simulate changes
in the wireless network and generate new training data to retrain ML models.
With AutoML, a new model can be trained and selected automatically to
replace the outdated one, without the need for manual intervention. This ap-
proach ensures that the ML model remains effective and up-to-date, despite
the non-stationary environment.

9.2. Digital Twins


As industries continue their digital transformation, the deployment of
cutting-edge technologies is paramount. A high-performance network con-
nection that harnesses state-of-the-art networking technologies is critical
to facilitating the transfer of data from physical systems to cloud-hosted
databases for data analytics and the deployment of AI algorithms. This
connection also links physical systems to web or mobile interfaces, allow-
ing users to monitor and control physical systems in real-time remotely [86].
This ground-breaking deployment is known as the Digital Twin, a technology
trend that has been identified as a top strategic initiative by Gartner in 2017
[87, 88].
A DT entails creating a real-time digital replica of a physical system that
synchronizes with its physical counterpart through bidirectional data and
control information flows [84]. It is more sophisticated and capable than a
surveillance system or a simple model, and unlike simulation, it represents
an actual asset with as few assumptions or simplifications as possible [89].
DTs focus on maintaining the full history and up-to-date information of the
assets or systems to facilitate intelligent and data-driven decision-making.
The roots of the DT concept can be traced back to the 1960s when NASA
first pioneered ”twining” for their Apollo program [90, 86]. By creating
physical duplicates on earth that mirrored their systems in space, they were

64
able to simulate a variety of scenarios and test different conditions to analyze
performance and behavior. The idea gained even more traction when it
proved vital in resolving technical problems during the infamous Apollo 13
mission. Fast-forward to the early 2000s, Michael Grieves introduced the
concept of DTs for the manufacturing industry, creating virtual replicas of
factories [91, 92]. These DTs serve as an impeccable tool for monitoring
processes, predicting failures, and increasing productivity, forever changing
the landscape of industrial innovation.
The adoption of DTs represents a remarkable leap forward for industries
seeking to thrive in the digital era. Real-time monitoring, control, and data
acquisition are just some of the advantages that DTs bring to the table
[86, 93]. These sophisticated tools enable remote access, ensuring business
continuity and increasing overall efficiency. With DTs, decision-making is
based on highly-informed predictions that consider both the present and the
future, and the risks associated with each course of action can be assessed
and mitigated in real-time. By testing and optimizing solutions in a virtual
environment, DTs increase overall system efficiency and minimize potential
disruptions. A DT allows virtual testing of various solutions to perform what-
if analysis to evaluate these solutions without affecting the physical system
[94]. In addition, all data is easily accessible in one platform, allowing for
faster and more efficient business decisions by data analytics tools.
A DT network is composed of three pillars, namely physical, digital/vir-
tual, and connection pillars, as illustrated in Figure 3 [89]. The physical pillar
represents the physical asset, the virtual/digital pillar represents the DT, and
the connection pillar allows for the exchange of data and control commands
among them. The system’s modularity enables the system to evolve as the
technology on each component evolves. A DT can be highly modular, which
allows for the rapid reproduction of processes and knowledge transfer [89, 94].
In addition, the modularity of a DT allows for creating hybrid simulation and
prototyping systems, which can accelerate the design process.

65
Figure 3: Sample Architecture of a DT network for a MEC Network

Key enablers for DT networks include reliable and high-performance net-


work connections, cloud computing, big data analytics, AI/ML, and IoT [95].
The integration of these technologies allows for the seamless exchange of data
and control information between physical and virtual systems, providing a
platform for experimentation, monitoring, and optimization of physical sys-
tems.
One of the main enablers of DT networks is the availability of high-
performance, reliable network connections [84, 86]. DT networks require a lot
of data to be transmitted in real-time between physical and digital systems.
This data needs to be accurate and synchronized in both directions so that
the DT can provide an accurate representation of the physical system at all
times. Thus, it is critical to have a network infrastructure that is capable of
handling large amounts of data and providing fast and reliable connections.
Another important enabler is the use of cloud-based technologies for data
storage, processing, and analysis [84]. With the vast amounts of data gener-
ated by DTs, it is often not feasible to store or process it all locally. Cloud-
based technologies allow for scalable and flexible storage and processing,
enabling the analysis of large datasets using advanced analytics techniques

66
such as ML.
Additionally, the development and deployment of DTs require the use
of a range of other technologies, including sensors, edge computing, and
AI [84]. Sensors are used to capture real-time data from physical systems.
These sensors collect a wide range of data, such as temperature, pressure,
vibration, and more, enabling accurate and comprehensive modeling of the
physical system [86]. Edge computing allows for data processing and analysis
to take place closer to the source of the data. Additionally, ML algorithms
enable the DT network to learn and adapt to new operating conditions and
provide automated and proactive responses to system events.
DTs can be used in various applications, including manufacturing, trans-
portation, and healthcare [86]. For instance, in manufacturing, DT networks
can be used to simulate the behavior of a production line, predict machine
failure, and optimize production efficiency. In transportation, DT networks
can be used to simulate traffic patterns, optimize traffic flow, and predict road
accidents. In healthcare, DT networks can be used to simulate the behav-
ior of the human body, predict disease progression, and optimize treatment
plans.
Nevertheless, the high demand for throughput, reliability, resilience, and
low latency required by DT technology goes beyond what is currently of-
fered by 5G [89]. Although DT technology already exists in some industrial
applications supported by 5G or even 4G, it has not been widely adopted
in other sectors, and has not reached its full potential. Therefore, 6G can
be considered an enabler for the massive adoption of DTs, particularly in
high-connectivity-demanding and rapidly emerging applications of aerospace,
Industry 4.0, and healthcare.

9.3. Automated Machine Learning


AutoML, the cutting-edge technology that automates the laborious and
intricate process of building and deploying ML models, is a true marvel of
modern AI. AutoML has become increasingly popular in recent years due to
the growing demand for ML applications and the scarcity of data science and
ML expertise.
One of the key benefits of AutoML is that it can greatly reduce the time
and resources required to develop and deploy ML models, as it automates
time-consuming and resource-intensive tasks [96]. These tasks include ev-
erything from data preparation and feature engineering to hyperparameter

67
tuning and model selection [97]. With AutoML, these processes can be per-
formed in a fraction of the time it would take to do them manually, freeing
up valuable time and resources for more complex and strategic work.
AutoML is not just a productivity tool, but a solution that democratizes
access to the power of ML. It makes the process of building and deploying
ML models accessible to a broader audience, without requiring extensive ex-
pertise in the field. This capability has the potential to revolutionize the way
businesses operate, giving them the ability to leverage data-driven insights
for decision-making and product development.
At its core, AutoML is a comprehensive and integrated solution that
covers the entire ML pipeline, from data preprocessing to model updating, as
illustrated in Figure 4 [14]. This pipeline operates on search and optimization
algorithms to identify the optimal model and corresponding hyperparameters
for a given problem.

Figure 4: An Overview of the AutoML Pipeline

9.3.1. Automated Data Preprocessing


Data preprocessing is a critical step in AutoML as it directly affects the
performance of the model. The main goal of the data preprocessing phase is
to transform raw data into a format that is suitable for training a ML model.
This includes data transformation, imputation, balancing, and normalization
[14].

68
(i) Data Transformation: Data transformation involves converting between
numerical and categorical features. In real-world applications, data is
often represented as strings, requiring encoding to make it machine-
readable. Techniques such as label encoding and one-hot encoding
are used to assign values or columns to categorical features, but lack
meaningful information. Target encoding, on the other hand, replaces
categorical values with meaningful values, such as the median or mean
of that variable, creating better features for ML models [98].

(ii) Data Imputation: Missing data is a ubiquitous problem in ML, and


is often encountered in real-world datasets due to factors such as data
collection issues, measurement errors, or intentional missingness. Miss-
ing data can pose significant challenges to ML tasks, as many learning
algorithms require complete and consistent data to achieve optimal
performance [99].
While it may be tempting to simply drop any observations or features
with missing data, this approach can result in significant data loss [14].
Instead, imputation techniques, whether model-free or model-based,
can be used to replace missing values with reasonable estimates.
Model-free imputation techniques are relatively straightforward and
do not require extensive computation, making them a popular choice.
This category includes both basic (e.g., zero, mean, and median im-
putation) and advanced (e.g., backward/forward filling) methods [14].
On the other hand, model-based imputation methods utilize ML mod-
els to estimate missing values by learning from existing feature values.
Examples of such techniques include K-Nearest Neighbors, XGBoost,
linear regression, and Datawig (i.e., a DL-based approach) [100, 101].
These methods typically yield more accurate results than model-free
techniques, but they require more computational resources and time to
train the ML models.

(iii) Data Balancing: Maintaining balanced data is a crucial aspect of ML,


especially with the increase in size and complexity of datasets. The
occurrence of class imbalance in datasets is marked by highly skewed
class distributions, leading to the degradation of ML models. Learning
on imbalanced datasets can lead to unwarranted bias in the major-
ity classes, negatively impacting the prediction accuracy of minority
classes. Hence, resampling techniques that involve either reducing the

69
number of samples in majority classes (under-sampling) or increasing
the number of samples in minority classes (over-sampling) are essential
to address this class imbalance issue [98, 102].

(iv) Data Normalization: Data normalization is a process that scales fea-


tures to make them comparable and ensure that no feature is given
more weight than another. When features have vastly different scales,
some may overshadow others, and the model may be skewed towards
the more significant ones. Two of the most commonly used normaliza-
tion techniques in ML are z-score and min-max normalization [103, 98].
Z-score normalization transforms the data to have a mean of zero and
a standard deviation of one. Min-Max normalization scales the data
by mapping the range of values in the feature to the range between 0
and 1 [104].

9.3.2. Automated Feature Engineering


It is widely recognized that the quality of data and the selection of rel-
evant features play a fundamental role in achieving optimal performance in
ML applications. Hence, feature engineering has emerged as a critical compo-
nent of the ML pipeline, aimed at maximizing the extraction of informative
features from raw data for use by ML algorithms and models. This crucial
process can be decomposed into three core sub-topics: feature generation,
feature selection, feature extraction, and feature construction [14].
Feature generation is the art of creating new features from existing ones to
enhance the accuracy and robustness of a ML model. Some feature generation
methods include unary (e.g., an exponential transformation of a feature),
binary (e.g., multiplying two features to create a new feature), and high-
order operations (e.g., an average of a group of records) [14].
Feature generation can spawn numerous features, yet some may be ir-
relevant or redundant, bearing minimal or negative effects on prediction.
Feature selection helps identify the best-suited features to improve model
performance and training speed. Selection methods can be classified into
filter, wrapper, and embedded categories [97]. Filter methods assign a score
to each feature and select a subset based on a threshold, wrapper methods
make predictions based on a selected subset of features and evaluate feature
sets, and embedded methods include the selection process in the learning
process of ML models.

70
Feature extraction (e.g., PCA) employs mapping functions to reduce di-
mensionality [97]. It alters the original features to extract more informative
features that can replace the original features. Although not mandatory in
the feature engineering process, feature extraction can be useful when the
produced feature set is high dimensional or underperforming.
As such, automated feature engineering can be seen as a dynamic and
synergistic combination of these three processes to optimize the feature set
for ML models.

9.3.3. Automated Model Learning


In this AutoML process, various search and optimization algorithms are
used to identify the best ML for a given dataset. These algorithms implement
a range of search and optimization techniques to scour through different
model architectures and parameters in search of the best fit for the data. A
summary of some of the most commonly used optimization techniques, along
with their respective advantages and disadvantages, are provided in Table 12
[105, 98, 106, 107, 108, 109].
Grid search, for example, involves exhaustively searching through a pre-
defined set of hyperparameters, while random search randomly selects a set
of hyperparameters for evaluation [105, 98, 106]. Bayesian optimization uses
probabilistic models to predict the performance of different hyperparameter
configurations and selects the next configuration to evaluate based on this
prediction [107]. Gradient-based optimization methods, such as gradient de-
scent and its variants, use the gradient of the loss function with respect to the
hyperparameters to iteratively update the values of the hyperparameters un-
til convergence [108]. Evolutionary algorithms, such as genetic algorithms,
use evolutionary principles, such as mutation and crossover, to search for
optimal hyperparameters [105]. Hyperband is a type of early stopping algo-
rithm that uses a successive halving approach to quickly identify the most
promising hyperparameter configurations [109].
AutoML tools also evaluate multiple models across a range of algorithm
families, including linear models, decision trees, SVMs, and neural networks.
These models differ in their underlying assumptions, which can make some
models better suited to certain types of data or problems.
Once the ideal model has been identified, AutoML tools employ optimiza-
tion algorithms to tune its hyperparameters. Hyperparameters, such as the
learning rate, regularization strength, number of hidden layers, and dropout
rate, are essential configuration settings that influence the performance of

71
Table 12: An Overview of Common Optimization Techniques [105, 98, 107, 108, 105, 109,
106]

Optimization Description Advantages Limitations


Algorithm

Grid Search Exhaustive search over a • Simple and easy to • Computationally


defined hyperparameter implement expensive
space • Not ideal for
high-dimensional
search spaces

Random Random sampling of • More efficient than • Can still be


Search hyperparameters from a grid search for computationally
defined search space high-dimensional expensive for large
search spaces search spaces
• Can not make use of
previous observations

Bayesian Sequential model-based • Fast convergence to • Limited capacity for


Optimiza- optimization using the optimal solution parallelization
tion Bayesian inference • Make use of prior
observations

Gradient- Gradient descent-based • Efficient for • Can get stuck in local


Based optimization using first- optimizing optima
Optimiza- order derivative informa- differentiable • Not ideal for
tion tion objective functions non-differentiable
objective functions

Evolutionary Stochastic optimization • Scale well to higher • Depend heavily on the


Algorithms inspired by biological dimensional selection of algorithmic
evolution problems parameters
• Can handle both • Susceptible to
continuous and premature convergence
discrete variables

Hyperband Sequential halving al- • Efficient use of • Need subsets with


gorithm that adapts to resources small budgets to be
performance of trials representative
and allocates resources
accordingly

72
the model but are not learned from the data [14, 105]. AutoML tools use a
range of search and optimization techniques to search for the most effective
combination of hyperparameters for the model. [110]
With the model architecture and hyperparameters optimized, AutoML
tools initiate the training process, whereby the model is trained on the pro-
vided dataset. This involves utilizing the optimization algorithm to minimize
the objective function, which typically measures the difference between the
predicted output and the true output. Advanced optimization techniques,
such as stochastic gradient descent and Adam, are utilized to speed up the
training process and improve the accuracy of the model [14, 97]. Addition-
ally, regularization techniques such as L1 and L2 regularization are used to
prevent overfitting, and ensemble methods such as bagging and boosting can
be employed to improve model robustness and performance [97].
The resulting trained model is then evaluated on a validation dataset
to estimate its generalization performance. Evaluation metrics, such as ac-
curacy, precision, recall, and F1 score, are computed to compare the per-
formance of different models and hyperparameters and identify the best-
performing model [97]. The best model is then selected based on its ability
to generalize well to unseen data and meet the desired performance criteria.

9.3.4. Automated Model Updating


After the AutoML pipeline completes the training process, the final step
is to deploy the trained model into production. This entails packaging the
model as a software component, like a web service or an API, and integrating
it into a larger system [97]. AutoML tools facilitate this deployment process
by providing automated model versioning, monitoring, and maintenance, en-
suring that the model remains accurate and current over time [14]. However,
one of the challenges that may arise is model drift, which can be further
categorized into two types: data drift and concept drift. Data drift occurs
when the underlying data distribution shifts over time, rendering the exist-
ing model obsolete. Concept drift occurs when the task that the model was
designed to perform changes over time [111]. To overcome this challenge, two
approaches are commonly utilized: model drift detection and adaptation.
Model drift detection involves analyzing the statistical properties of the
data and identifying any changes over time. This is done through two primary
methods: distribution-based and performance-based methods [112].
Distribution-based methods assume that changes in the data distribution
indicate model drift [112]. For instance, the Kolmogorov-Smirnov compares

73
the empirical distribution of a feature or target variable over time with a
reference distribution, such as the distribution at the start of the data stream
[113]. A significant change in the distribution suggests the presence of model
drift.
Performance-based methods, on the other hand, assume that model drift
leads to a decline in model performance. Statistical tests, window-based
approaches, and ensemble-based approaches are common techniques for de-
tecting changes in model performance [114, 14]. Statistical tests (e.g., Fried-
man test) compare the current model’s performance with a reference dataset.
Window-based approaches track model performance over a rolling window
of data and compare it to the previous window. Ensemble-based approaches
compare the current model’s performance with an ensemble of previous mod-
els trained on different time intervals.
Once model drift is detected, the AutoML pipeline can adapt the model
to the new data distribution. This is done through a process called model
adaptation, which involves updating the model to reflect the changes in the
data [115, 116]. There are several techniques for model adaptation, including:

• Updating the model incrementally using small data batches, instead of


retraining on the entire dataset at once [115, 14, 117]

• Retraining the model on the new data to ensure that it remains accurate
and effective [14]

• Transferring knowledge from the old model to a new model trained on


the new data, thereby minimizing the amount of training required [118]

• Creating an ensemble of models trained on different time intervals and


combining their predictions to achieve better performance on the new
data, such as the weighted probability averaging ensemble framework
proposed by Yang et al. [119]

10. Case Study: Online AutoML for Application Throughput Pre-


diction
In this case study, we explore the application of online AutoML for pre-
dicting application throughput in dynamic network environments. Through
a comprehensive analysis, we highlight the advantages of using AutoML in
this context. The subsections that follow provide an overview of the use case,

74
dataset, AutoML framework, 5G system architecture, and present insight-
ful results and analysis. These include a comparison with traditional ML
approaches, a complexity-accuracy trade-off analysis, and periodic AutoML
model drift monitoring. This study demonstrates the potential of AutoML
in optimizing application throughput prediction for dynamic networks.

10.1. Use Case Overview


The emergence of 5G+ networks has ushered in a new era of opportunities
and challenges for the telecommunications industry. The explosive growth of
mobile data traffic and the increasing demand for high-speed and low-latency
services have made optimizing network resources and handling traffic loads
more critical than ever. Furthermore, the emergence of new use cases, such
as IoT applications, has resulted in a massive increase in connected devices,
making reliable and efficient network performance essential to support the
data traffic generated by these devices [120].
5G+ networks are designed to cater to a wide range of applications, in-
cluding high-definition video streaming, virtual and augmented reality, and
real-time gaming, each with varying requirements. Network operators must
meet a diverse set of KPIs for different use cases, including eMBB, mMTC,
and uRLLC, necessitating a thorough understanding of network performance
and user behavior [121]. Therefore, predicting application throughput is cru-
cial to ensure that these applications deliver high-quality service to end-users.
Application throughput is a crucial metric used to evaluate the perfor-
mance of applications running over a network. It reflects the amount of
data that can be downloaded per unit of time at the application layer, which
directly impacts the user experience.
Throughput is influenced not only by network infrastructure but also by
user traffic. Network congestion can occur, leading to reduced application
throughput, as more users connect to the network and start using data-
intensive applications. Predicting application throughput can help network
operators optimize network resources, identify potential network bottlenecks,
and troubleshoot network issues before they affect end-users. Accurately pre-
dicting application throughput can also improve user experience by ensuring
that applications operate efficiently, delivering fast and reliable service to
end-users. Therefore, predicting application throughput is crucial for ensur-
ing network performance, enhancing user experience, and meeting business
objectives.

75
With the deployment of new technologies such as network slicing and
edge computing in 5G+ networks, predicting application throughput be-
comes even more critical. Network slicing allows operators to create cus-
tomized virtual networks to support specific services and applications, and
predicting application throughput is essential to ensure that the resources
allocated to each slice are sufficient to meet the performance requirements
of the applications running on that slice. Edge computing reduces latency
and improves application performance, and predicting application through-
put in such an environment can help operators decide where to deploy their
computing resources to achieve optimal performance.
Real-time monitoring and adjustments to network resources are necessary
to meet a diverse set of KPIs. ZSM is one promising solution for automat-
ing the process of predicting application throughput and scaling network
resources accordingly to ensure a seamless user experience. ZSM proactively
detects and resolves potential network issues before they impact service qual-
ity by predicting and optimizing network performance metrics such as down-
load rate at the application layer. By leveraging AutoML algorithms to gen-
erate up-to-date predictive models, ZSM can autonomously adapt to changes
in traffic patterns and automate the optimization and management of net-
work services, resulting in improved service quality and increased operational
efficiency. This is crucial in a dynamic and fast-changing environment like
5G+ networks.
In this study, we utilize an open-source production dataset to predict
application throughput autonomously using AutoML. To demonstrate the
effectiveness of AutoML in generating up-to-date predictive models, we will
simulate a real-world scenario where the application throughput experiences
a sudden and significant change due to traffic congestion. This scenario is
inspired by the increasing usage of clouds to upload and access one’s files.
Model drift is not restricted to changes in user behavior. It can also occur
due to changes in the application infrastructure. For instance, deploying
additional servers to handle increased traffic will result in an increase in
application throughput, which in turn can lead to a decrease in the model’s
predictive accuracy. In order to mitigate the effects of model drift, the model
is monitored and adjusted based on the new data that reflects the current
state of the application infrastructure.
Overall, our use of AutoML and the open-source production dataset will
allow us to accurately predict application throughput and respond to changes
in network conditions in real-time. By continuously monitoring the network

76
and training the model on the latest data, we can ensure that our predictive
model stays up-to-date to accurately reflect the current state of the network,
even in the face of rapidly changing conditions.

10.2. Dataset Overview


In this study, we employ a publicly available production dataset collected
from a major mobile operator in Ireland [122]. The dataset comprises client-
side cellular KPIs obtained from G-NetTrack Pro, a widely used Android
network monitoring tool. The data were generated from two mobility pat-
terns, stationary and in-motion, across two distinct application patterns,
on-demand video streaming and file downloading (of file size > 200 MBs),
with a total duration of 3142 minutes. The video streaming dataset provides
a direct measurement of popular over-the-top services, namely Netflix and
Amazon Prime, which are representative of the typical user behavior when
watching on a mobile device.
The dataset captures traces pertaining to both 4G and 5G networks, en-
compassing a range of channel-, context-, and cell-related metrics, in addition
to throughput information. Table 13 provides a depiction of the manifold
metrics contained within the dataset, thereby enriching our understanding
of the network’s performance [122, 123].
We leverage this comprehensive dataset to investigate download rate pre-
diction at the application layer, commonly referred to as application through-
put, in the context of 4G and 5G networks. To this end, the dataset is par-
titioned into discrete 4G and 5G sets for both streaming and downloading
applications, by merging both traffic conditions, stationary and in-motion,
for each permutation of network mode and application. Such an approach en-
ables us to explore the complex interplay between network mode, application
type, and traffic conditions in determining the application throughput.
The goal is to provide network operators with valuable insights into ap-
plication throughput prediction, leading to improved network management,
resource optimization, and enhanced user experience. The decision to choose
this dataset was driven by its following attributes.

• Real-world Relevance: The dataset is derived from a major Irish mobile


provider, containing real-world 5G network data. This real-world data
ensures the relevance and practicality of the predictions made by the
AutoML pipeline.

77
Table 13: Production Dataset Metrics [122, 123]

Feature Description

Timestamp Timestamp of sample

Longitude
GPS coordinates of the device
Latitude

Speed Speed of mobile device (km/h)

Operator Name Anonymized cellular operator name

Network Mode Mobile communication standard

Nodehex Radio network controller ID in hexadecimal

LAChex Location area code in hexadecimal

State State of the download process (’I’ for idle & ’D’ for
downloading)

DL bitrate Download/Upload rate measured at the device


UL bitrate (application layer) (kbps)

CellID, CellIDhex, CellIDraw ID of serving cell for mobile along with its hexadecimal and raw
formats

Pingavg, Pingmin, Pingmax Ping statistics


(average, minimum, maximum, standard deviation, and loss,
Pingstd, Pingloss respectively)

Channel Quality Indicator Feedback provided by the UE to the base station

Signal-to-Noise Ratio Difference between the received wireless signal and the noise
floor (dB)

Received Signal Strength Measurement of the power present in a received radio signal
Indicator (RSSI) (dBm)

Reference Signal Received Linear average of power for resource elements carrying cell-
Power (RSRP) specific reference signals (dBm)

Reference Signal Received Ratio between RSRP and RSSI (dB)


Quality (RSRQ)

NRxRSRP
RSRP and RSRQ values for the neighboring cell
NRxRSRQ

78
• Comprehensive Metrics: The dataset includes diverse cellular KPIs
obtained from G-NetTrack Pro. These metrics cover various network
aspects, enabling a comprehensive analysis of network performance.

• Application-specific Insights: The dataset focuses on two distinct ap-


plication types, video streaming and file downloading. Accordingly, the
AutoML pipeline can provide tailored insights and predictions relevant
to these applications.

• Mobility Patterns: The dataset further includes two mobility patterns,


static and driving, providing insights into bandwidth changes in dif-
ferent scenarios. This provides a more accurate model of real-world
network conditions.

• Network Mode and Application Segmentation: The dataset is divided


into different segments based on network mode (4G and 5G) and ap-
plication (streaming and downloading). This enables the AutoML
pipeline to capture and analyze variations in performance across differ-
ent network modes and applications.

10.3. Dataset Distribution


To start with, we utilize these datasets to explore the dynamic changes
in the application throughput. To showcase this variability, we focus on
a specific example from one of the four merged datasets, namely the 5G
data pertaining to file downloads. Through this example, we examine the
distribution of the data by grouping it based on the day and hour of col-
lection. Figures 5 and 6 provide a visual representation of the distribution
of the application throughput across different days and hours in 2019 and
2020, respectively. Specifically, the 5G traces were captured on three days
in December 2019, including Saturday the 14th (Figure 5(a)), Monday the
16th (Figure 5(b)), and Tuesday the 17th (Figure 5(c)), as well as on two
Thursdays in January 2020 (Figure 6(a)) and February 2020 (Figure 6(b)).
Upon inspecting the distribution of application throughput across differ-
ent days, we observe that despite differences in sample sizes, each day exhibits
a similar shape, suggesting the existence of a shared underlying structure.
Taking this analysis a step further, we proceed to examine the distribution
of application throughput across different hours of the day, focusing our at-
tention on data collected on weekdays during 2019 (as illustrated in Figures
5(b) and 5(c)). Our examination reveals that the majority of throughput

79
(a) Distribution of Application Throughput on Saturday, 2019-12-14

(b) Distribution of Application Throughput on Monday, 2019-12-16

(c) Distribution of Application Throughput on Tuesday, 2019-12-17

Figure 5: Distribution of Application Throughput (kbps) for 5G Data from Downloading


a File by Day and Hour in 2019

80
(a) Distribution of Application Throughput on Thursday, 2020-01-16

(b) Distribution of Application Throughput on Thursday, 2020-02-13

Figure 6: Distribution of Application Throughput (kbps) for 5G Data from Downloading


a File by Day and Hour in 2020

81
values for downloading a file in the early morning hours (specifically, at 7:00
a.m. and 8:00 a.m.) are concentrated within the first bin of the histogram.
Such an observation suggests that the distribution of throughput is heavily
skewed towards low values, characterized by a download rate of 50 Mbps
or less. Such a phenomenon could be a result of lower network traffic or
reduced user activity during this particular time window. A similar pattern
is observed in Figure 6(a) for the early morning hours, 7:00 a.m. and 9:00
a.m., of January 16, 2020.
At present, we have yet to detect any data drift, which indicates a consis-
tent pattern of application usage given the same network operator. Nonethe-
less, we establish a baseline by building a model and assessing its performance
on a testing set. Even in the absence of detected model drift, continuous mon-
itoring of the model is vital to detect any potential drops in performance due
to changes in user behavior and adapt accordingly.
Consider, for instance, a renowned music streaming platform that allows
users to download songs and albums for offline listening. As the platform ex-
pands to incorporate fresh artists and genres, users may alter their download
behavior, transitioning to larger file downloads, such as complete albums or
extended playlists. This trend becomes more pronounced during the release
of a highly anticipated album. Such a shift in user behavior may cause a
corresponding impact on the application’s throughput, resulting in longer
wait times and slower download speeds for users.
This change may impair the model’s ability to accurately predict the ap-
plication throughput and adjust the download speed to user behavior (e.g.,
by allocating network resources accordingly). If the model is not updated
to reflect the new user behavior, it may overestimate or underestimate the
required throughput, leading to sub-optimal download speeds and user ex-
perience.
To evaluate the model’s ability to adapt to changes in the underlying
data distribution, we will simulate the aforementioned scenario by introduc-
ing model drift, specifically data drift. Through periodic evaluation on the
incoming data, we will track the model’s performance and determine whether
it can maintain high performance levels in the presence of data drift.

10.4. AutoML Framework


In this work, we tailored the generic AutoML framework demonstrated in
Figure 4 and elaborated in Section 9.3 to fit the nature of the case study. Fig-
ure 7 provides a glimpse into the tailored AutoML pipeline, which commences

82
Figure 7: Respective AutoML Framework for Case Study

with the data preprocessing phase. This involves encoding non-numerical


values and timestamps, substituting null values with either zeros or forward-
filled values, and applying standard or min-max normalization to enhance
data quality. Moving onto the next step, feature engineering, insignificant
features are dropped based on the cumulative importance scores obtained
through the light gradient boosting machine algorithm. Subsequently, we
remove redundant features based on Pearson’s rank-order. These two steps
comprehensively manipulate the data to extract valuable insights and pre-
pare it for the model learning phase.
The third step, model selection and hyperparameter tuning, involves opti-
mizing the neural network architecture of a Sequence-to-Sequence (Seq2Seq)
model. Seq2Seq models have gained widespread attention in recent years due
to their ability to learn complex patterns in sequential data [124]. In partic-
ular, the encoder-decoder architecture of Seq2Seq models has been shown to
be effective for multi-step prediction in various applications, such as machine
translation, speech recognition, and network traffic prediction. This archi-
tecture is especially suited for application throughput prediction, as it can
identify the temporal dependencies and patterns in the input sequence. The
encoder component of the model maps the input sequence, usually histori-
cal throughput data, into a fixed-length vector representation that captures
the relevant information, also known as context vector. This vector is then
passed to the decoder, which generates the output sequence, one element at

83
a time. The decoder leverages the encoder’s context vector and the previ-
ously generated output elements as context to predict the subsequent output
element in the sequence. The Seq2Seq models are equipped with LSTM
units in both the encoder and decoder components to further enhance the
model’s performance. A sample encoder-decoder architecture is illustrated in
Figure 8. The encoder component incorporates two LSTM layers, with the
second output serving as the context vector. This context vector, along with
the current time-step, is then passed to the decoder. The decoder, likewise,
comprises two LSTM layers. The final output from the decoder represents
the subsequent time-step, which is then fed back into the decoder to predict
the succeeding time-step. This iterative process continues until the entire
sequence is predicted. Both the architecture selection and hyperparameter
tuning phases are conducted concurrently via Bayesian optimization. The
model architecture is optimized in terms of the number of hidden dense and
recurrent layers in the encoder and decoder, as well as the number of dense
and LSTM units. The hyperparameters that are tuned are the learning rate
of the Adam optimizer and the dropout rate.
The final step in the pipeline involves monitoring the model’s performance
over time to detect any potential data drift. We utilize a performance-based
approach, specifically a window-based method, where we check the model’s
performance at regular intervals (e.g., every ten minutes with a window size
of 600 seconds). This allows us to detect any significant changes in the
data distribution, which can cause the model’s performance to deteriorate.
Data drift is detected if the performance falls below a dynamic threshold,
which is a certain percentage of the current score. The choice of window size
and threshold percentage involves a trade-off between accuracy and com-
putational complexity. The more strict the values, the more accurate the
model is, but the more computationally complex it becomes. In case of data
drift, we update the model’s weights incrementally to adapt to the new data
distribution.

10.5. 5G System Architecture


The E2E 5G system architecture, as depicted in Figure 9, is composed
of various components that work together to provide a seamless user expe-
rience. At the heart of this architecture are the UEs, which are connected
to the gNodeB (gNB) in the RAN. The gNB is also connected to the User
Plane Function (UPF) in the CN. However, what sets this architecture apart
is the presence of the Application Edge Slice (AES), located in the RAN,

84
Figure 8: Sample Encoder-Decoder Architecture

85
Figure 9: 5G System Architecture

which plays an integral role in predicting application throughput for specific


applications, such as Netflix.
The AES is designed to be in close proximity to the UE and has enough
resources to collect and analyze data, making it ideal for analyzing traffic
patterns. Within the AES, three primary entities work together to collect
data and generate predictions. These entities are the traffic collection entity,
the AutoML pipeline, and the model prediction entity.
The traffic collection entity is responsible for collecting data on UE traf-
fic, including information about the type of traffic, expected traffic volume,
and QoS requirements. To achieve this, the traffic collection entity must
be configured to identify relevant metrics, such as DL rates, UL rates, and
coordinates.
Once the traffic collection entity has collected the necessary data, it is
sent to the AutoML pipeline. This pipeline preprocesses the data and trains
a suitable ML model with periodic monitoring to ensure its accuracy.
The model prediction entity is the final piece of the puzzle in the AES,
responsible for generating predictions based on the model created through
the AutoML pipeline. These predictions are then sent to the Network Data
Analytics Function (NWDAF), which houses the ZSM framework [125].
The ZSM in the NWDAF helps optimize resource allocation for the net-
work slices based on the predictions made by the model in AES. Once the
decision is made, it is sent to the Network Slice Selection Function (NSSF)
and Network Slice Subnet Management Function (NSSMF), which select the

86
appropriate network slice for the UE and manage the resources within the
network slice, including resource allocation and orchestration.
Finally, the AMF and UPF coordinate resource allocation to guarantee
that each UE receives the requested QoS for the selected network slice. In
this way, the AES entities and the ZSM framework work together to collect
data, generate predictions, and optimize resource allocation to ensure an
optimal user experience for this specific application on the network.

10.6. Results and Analysis


We implemented the proposed AutoML pipeline in Python 3.10.11 to
predict the application throughput based on 4 sub-datasets categorized by
network mode (4G and 5G) and application (streaming and file downloading).
Specifically, the experiments were conducted on a machine equipped with
a 14 Core i9-12900HK processor and 32 GB of memory. The experiments
harnessed the computational power of the NVIDIA GeForce RTX 3060 GPU,
featuring 6 GB of GDDR6 memory.
To evaluate the performance of our pipeline, we used Mean Absolute Er-
ror (MAE) and Mean Absolute Percentage Error (MAPE), which are com-
monly used metrics in network traffic prediction, to measure the accuracy
of the predicted values. Accurate prediction of network traffic is crucial for
optimizing network resources and improving overall network performance.
MAE measures the average magnitude of the errors in the predictions, while
MAPE measures the size of the errors relative to the actual values. These
metrics are preferred over other metrics like root mean square error as they
are less sensitive to outliers and do not penalize large errors as heavily, which
can significantly impact the network’s performance. Moreover, they are easy
to interpret by network operators and non-experts, making them useful for
validating the models.

10.6.1. AutoML vs. Traditional ML


To validate our proposed AutoML pipeline’s efficiency, we compare its
performance with a basic LSTM model and a basic Seq2Seq model without
hyperparameter tuning. The LSTM model has one layer with 128 LSTM cells
followed by a dense output layer whose number of neurons depends on the
prediction horizon. The model is compiled with the MAE loss function and
the Adam optimizer. Similarly, the Seq2Seq model has one LSTM encoder
and decoder, each with 128 cells, and one dense layer. The decoder would

87
then generate the output sequence one step at a time, taking the previous
prediction as input for the next step.
Default parameters are used for both traditional ML models, and we set
the forecast horizon to 60 seconds, a parameter that can be adjusted based
on application requirements. Our results, presented in Table 14, demonstrate
that AutoML with Seq2Seq outperforms both basic LSTM and encoder-
decoder models in terms of MAE and MAPE for all four network traffic
datasets. To illustrate, in the case of file downloading in a 5G network, Up-
grading from the LSTM model to the encoder-decoder model reduces the loss
from 6.52 % to 4.76 % for MAPE and from around 0.17 to 0.11 for MAE.
This trend was observed consistently across all the other datasets. This is
because AutoML models, such as those based on NAS, can lead to better
accuracy and lower error rates compared to manually designed models, due
to their ability to automatically search and optimize the model architecture
and hyperparameters for a given task.
Furthermore, our analysis revealed that the basic encoder-decoder model
performed better than the LSTM model. This is due to the fact that encoder-
decoder models are specifically designed for sequence-to-sequence learning
tasks, making them more suitable for the application throughput prediction
task in our study. The encoder-decoder model’s superiority is because it
is a more advanced form of LSTM that can handle both input and output
sequences, allowing it to capture more information and produce better pre-
dictions.
Overall, our findings emphasize the importance of selecting the appro-
priate model architecture for application throughput prediction and demon-
strate the superiority of AutoML models in selecting the optimal model.
Using interpretable performance metrics such as MAE and MAPE enables
network operators to easily validate the models and make informed decisions,
enhancing overall network performance.

10.6.2. Complexity-Accuracy Trade-off Analysis


In this section, we employ the AutoML pipeline to predict the application
throughput and present a detailed analysis of its performance. We investigate
the impact of utilizing different past and future sequences of varying lengths
on the prediction accuracy. Our analysis primarily focuses on the 5G dataset
for file downloading.
To begin with, we employ a fixed look-ahead of five minutes to predict the
application throughput using the previous n minutes. We vary the length of

88
Table 14: MAPE & MAE: Comparison of Three Models on 4G and 5G Datasets

4G 5G
Model Metric
Video File Video File
Stream Download Stream Download

MAE 0.2377 0.1827 0.1949 0.1663


Traditional LSTM
MAPE 9.76 % 7.28 % 10.87 % 6.52 %

Traditional Seq2Seq MAE 0.1892 0.1302 0.1478 0.1066


Encoder-Decoder
MAPE 7.71 % 6.26 % 8.91 % 4.76 %

MAE 0.0473 0.0209 0.0309 0.0166


AutoML
MAPE 5.33 % 4.59 % 5.56 % 3.58 %

the past sequence from 2 to 5 minutes and observe the corresponding changes
in the prediction accuracy. Our results, as illustrated in Figure 10, show
that utilizing fewer past timesteps leads to a higher MAE, indicating worse
predictive performance. Conversely, employing more past timesteps results in
a lower MAE but is computationally complex in terms of time and resources.
As a longer sequence is utilized, the computational time required to train
the model also increases, resulting in higher resource utilization. Thus, there
exists a trade-off between the prediction accuracy and complexity, and a
balance must be struck to optimize the model’s performance.
Furthermore, we fix the look-back to 5 minutes and explore the impact of
utilizing different future sequences of varying lengths, namely 5, 7, 10, 15, and
20 minutes, on the prediction accuracy. Our results, as illustrated in Figure
11, show that predicting longer sequences leads to a higher MAE, indicating
that more past information is needed to provide better future knowledge.
However, increasing the past sequences results in a higher computational
complexity. This finding confirms the trade-off between the prediction accu-
racy and complexity that we previously observed.
Overall, the AutoML pipeline can be effectively utilized to predict the
application throughput in the 5G network. However, the prediction accuracy
is highly dependent on the number of past and future timesteps utilized
and the associated computational complexity. Careful consideration must be
given while selecting the appropriate sequence lengths to achieve the desired
prediction accuracy while optimizing the computational resources utilized.

89
Figure 10: MAE among Varying Past Sequences

Figure 11: MAE vs. Look-ahead Sequence Length

90
10.6.3. Periodic AutoML Model Drift Monitoring
This section covers the final step of the AutoML pipeline, which involves
model updating. We will focus again on the 5G dataset for file downloading.
In this step, we set the forecast horizon to 5 minutes, with resulted in an MAE
of 0.0213. To detect model drift, we monitor the ML model periodically every
10 minutes. If the MAE exceeds 0.02556, which corresponds to 20% of the
baseline MAE, model drift is detected, and the model weights are adjusted.
It is important to note that the threshold percentage can be adjusted
accordingly and is just a parameter. Figure 12 illustrates the timeline of
the model monitoring process. For the first 10 minutes (0 ≤ t < 10mins),
MAE falls below the threshold. At 10 minutes, the model is checked, and no
model drift is detected. Between 10 and 14 minutes, MAE still falls below
the threshold. At this point in the process, a segment of the data is randomly
sampled and intentionally manipulated to mimic the occurrence of data drift.

Figure 12: Periodic AutoML Monitoring for Drift Detection and Adaptation

At 14 minutes, the MAE surpasses the threshold, but the model isn’t
checked yet. At t = 20mins, the model is checked, and model drift is de-
tected. The weights are updated accordingly, and the MAE falls back to
0.0236. The threshold is also updated to 0.02832, which is 20% of the new

91
MAE, to account for the new data distribution. After minute 20, the MAE
doesn’t exceed the threshold.
It is essential to note that model drift did occur before it is detected, at
t = 14mins, due to the periodic nature of monitoring. Decreasing the mon-
itoring period would have led to earlier detection. However, decreasing the
period means checking more often, which may be computationally exhaus-
tive. Therefore, there is a trade-off between the periodic interval and model
accuracy.
Ultimately, the model updating step of the AutoML pipeline plays a
crucial role in ensuring the model’s accuracy over time. By monitoring the
model for drift and updating its weights accordingly, we can ensure that the
model remains relevant in the face of changing data distributions. However,
determining the appropriate monitoring interval is essential to balance the
trade-off between model accuracy and computational resources.

11. Open Challenges & Future Directions


The emergence of cutting-edge innovations such as ZSM and AutoML
has brought exciting new opportunities to the networking world. However,
despite the headway made in these areas, several challenges remain to be
addressed to fully unleash their potential. A summary of these challenges
can be found in Table 15.

11.1. ZSM Challenges


ZSM systems have emerged as a promising solution to automate network
operations and improve service delivery and management. However, their
adoption comes with significant hurdles, including explainability, trustwor-
thiness, and computational complexity. Addressing these challenges is critical
for the future success of ZSM systems, and it requires further research and
development efforts to ensure the effective integration of ZSM in NGNs.

11.1.1. Explainable Zero-Touch Management


To successfully employ ZSM, its data-driven decisions must be human-
understandable. In this light, the European Commission approach to AI
centers on excellence, trust, and transparency, which will play a crucial role
in NGNs and the establishment of quality of trust [126, 127]. Explainable
Artificial Intelligence (XAI) is essential for ZSM as it provides a rationale
for the actions taken by AI systems, making them more trustworthy. This is

92
Table 15: Challenges & Future Directions

Category Challenge Description Future Work

Explainable • Data-driven decisions must be • Generate an intelligible


Zero-Touch human-understandable. ML model.
ZSM Management • XAI enhances trustworthiness. • Utilize metrics to
• XAI enables human oversight, quantify the degree of
allowing network operators to review AI explainability.
and approve the AI-driven actions.

Trustworthy • Shared data, among different • Utilize blockchain


ZSM stakeholders, may contain sensitive technology.
information like network topology, • Encrypt data in transit
user data, and operational data. and at rest.
• Confidentiality, integrity, and • Incorporate TEEs.
availability of the data are crucial.

Computational • ZSM networks struggle with • Develop optimization


Complexity in computational complexity in techniques to minimize
ZSM handling large data volumes. complexity.
• ML algorithms require significant • Implement
computational resources, which can hardware-based
conflict with the efficiency needs of strategies (e.g,
ZSM networks. FPGA-based
acceleration and GPU
processing)

Interpretability • Current AutoML approaches • Incorporate XAI


prioritize accuracy over paradigms.
AutoML interpretability, resulting in complex • Add constraints (e.g.,
models. sparsity, monotonicity,
• Interpretability is essential for ethics and causality).
and regulatory compliance in
networking.

Scalability • Large datasets and the need for • Employ parallel and
extensive model training pose distributed algorithms.
scalability challenges for AutoML. • Efficiently sample and
• Data processing and model partition data.
generation times impact network • Dynamically adapt
performance, causing latency issues hyperparameters.
for users.

Robustness • AutoML struggles with adversarial • Apply adversarial


attacks. training.
• Non-robust models in ZSM systems • Incorporate defensive
lead to unreliable performance and distillation.
potential security breaches.

Cold-Start • Search process begins with • Utilize meta-learning.


sub-optimal models or bad • Exploit domain-specific
configurations, resulting in inefficient knowledge.
resource usage and prolonged search • Apply transfer
times. learning.
• Resulting delays in ZSM services can
have an impact on the user
experience.

93
essentially important in services like remote surgery, where network manage-
ment decisions can have a significant impact on human lives [8]. XAI allows
networking experts to understand the input that drove the decisions made by
ML models and approve their actions following a human-in-the-loop model.
However, making XAI a reality in the ZSM paradigm requires an intel-
ligible ML model in addition to specific metrics to measure the level of AI
explainability [8]. Techniques to generate an intelligible ML model include
using inherently interpretable ML models (e.g., random forests) or statis-
tical procedures to describe the features on which a prediction was based.
As for the metrics, there are human-grounded evaluations and functionality-
grounded evaluations [128]. The former assesses the qualitative aspects of
the resulting explanations, such as their ability to assist humans in complet-
ing tasks and the impact of such decisions on the system. The latter relies on
formal definitions and quantitative methods to verify data-driven decisions,
such as service migration.

11.1.2. Trustworthy ZSM


One of the major challenges in ZSM is the secure data sharing among
different stakeholders, including network operators, service providers, and
third-party vendors. The data shared between these stakeholders can include
sensitive information, such as network topology, user data, and operational
data, making it critical to ensure the confidentiality, integrity, and availability
of the data.
To establish trust, different approaches can be utilized, such as blockchain
technologies, encryption techniques, and Trusted Execution Environments
(TEEs). Blockchain technology, for instance, not only ensures data gov-
ernance but also promotes multi-party trust and data usage accountability
[129]. Additionally, encrypting data in transit and at rest provides an extra
layer of protection against unauthorized access [130]. Incorporating TEEs,
which are secure areas of a processor that enable the execution of trusted
code and data, is another promising solution [131]. By integrating hardware-
based TEEs into network infrastructures, network operators can ensure that
critical operations, such as data sharing and network management, are per-
formed with a high degree of trust.

11.1.3. Computational Complexity in ZSM


ZSM networks can be severely challenged by the computational com-
plexity of managing the massive amount of generated data. This includes

94
real-time data analysis, network resource optimization, and coordination of
various network functions. ML algorithms, in particular, demand a high
level of computational resources to operate efficiently, which can conflict
with the needs of ZSM networks, where computation efficiency is as essential
as communication performance. In NGNs, the high latency associated with
complex operations is incompatible with time-sensitive services, making ML
algorithm optimization a key factor. Accordingly, it is essential to develop
optimization techniques to minimize the complexity of these models without
jeopardizing accuracy. To this end, implementing hardware-based strate-
gies, such as FPGA-based acceleration and GPU processing, can reduce the
computational complexity of ZSM [3].

11.2. AutoML Challenges


Despite the many advantages of AutoML, it is not a silver bullet, and
there are several challenges that need to be addressed. These include devising
an efficient search process, building a scalable system to handle big data,
addressing security concerns posed by adversarial attacks, and ensuring that
the models are interpretable and transparent. Nevertheless, the potential
benefits of AutoML are significant, and it is an area that is likely to see
continued growth and development in the years to come.

11.2.1. Interpretability
AutoML solutions are often seen as black boxes, which makes it difficult
for users and experts to fully understand how they work and the rationale
behind their solutions. However, interpretability is essential for building trust
and ensuring ethical considerations, especially in highly regulated domains,
such as the networking domain.
The lack of interpretability in AutoML models can lead to difficulties
in deploying and using these models in ZSM services. For example, it may
be difficult to diagnose and correct biases in the model, or to identify the
root cause of unexpected behaviors. Additionally, the lack of interpretability
can make it difficult to validate the model’s accuracy, which is critical for
ensuring the reliability and performance of the network. Therefore, the de-
velopment of transparent AutoML systems with mechanisms for explaining
and understanding their decisions is necessary. Unfortunately, many cur-
rent AutoML approaches prioritize accuracy over interpretability, resulting
in complex models that are difficult to comprehend.

95
To address this challenge, interpretable models leveraging XAI paradigms,
such as Shapley additive explanations, local interpretable model-agnostic ex-
planations, RuleFit, and partial dependence plots, can be used to increase
transparency and credibility [132]. Additionally, constraints such as sparsity
(i.e., low number of features), monotonicity, and causality can improve in-
terpretability [133]. Monotonicity guarantees that the relationship between
an input feature and the target outcome always goes in the same direc-
tion, aiding in the understanding of feature-target relationships. Causality
constraints ensure that only causal relationships are identified, promoting
effective interactions between humans and ML systems. Incorporating these
approaches can greatly improve AutoML’s accessibility and the ability of net-
work operators and other stakeholders to understand and trust the decisions
made by AutoML.

11.2.2. Scalability
The growing size of datasets, coupled with the need for an overwhelming
number of model trainings to determine the optimal final learner, present
significant scalability challenges for AutoML. This can be particularly chal-
lenging in ZSM services, where the large volume of network-generated data
requires fast processing to meet network demands. As model complexity in-
creases, so do computational requirements, making it challenging to deploy
models in resource-constrained environments. Additionally, network perfor-
mance can be impacted due to the time required to process data and generate
models, leading to latency and responsiveness issues that affect the user ex-
perience.
To address this issue, future research can focus on developing parallel
and distributed AutoML algorithms that can harness modern hardware such
as graphic processing units. Additionally, techniques that can more effi-
ciently sample the data, leverage data partitioning, or dynamically adapt
the algorithm’s hyperparameters can significantly reduce the computational
overhead. Such strategies will ensure that AutoML remains a powerful tool
for ML, irrespective of the dataset’s size.

11.2.3. Robustness
AutoML, particularly NAS, has shown remarkable performance on well-
labeled datasets such as ImageNet [134]. However, real-world datasets in-
evitably contain noise and adversarial examples, which can significantly un-
dermine the performance of AutoML models [97]. Adversarial attacks can

96
be specifically designed to fool the model, compromising its performance.
The deployment of non-robust models in ZSM systems can lead to un-
reliable network performance, which can have a significant impact on the
overall user experience. Additionally, the lack of robustness can result in
potential security breaches. Adversarial attacks can be used to exploit vul-
nerabilities in non-robust models, allowing attackers to gain unauthorized
access to the network or manipulate network behavior. This can have seri-
ous consequences, such as compromising the confidentiality and integrity of
user data, disrupting network operations, and causing financial losses. There-
fore, ensuring the robustness of AutoML models is critical to their successful
application and safe deployment in ZSM systems.
AutoML systems can improve their robustness to adversarial attacks by
incorporating techniques such as adversarial training and defensive distilla-
tion in their pipelines. Adversarial training can enhance robustness by train-
ing the model with a combination of clean and adversarial data [135]. This
exposes the model to a range of adversarial attacks during training, making it
more robust to such attacks at inference time. On the other hand, defensive
distillation is a technique that distills knowledge from a large robust model
(teacher model) into a smaller target model (student model) [136]. This en-
ables the robustness of the teacher model to be transferred to the student
model through knowledge distillation, yielding a more robust student model.

11.2.4. Cold-Start
AutoML systems may undergo a cold-start, where the search process
starts with a sub-optimal model or a bad configuration, resulting in inefficient
resource usage and prolonged search times [137]. This can be particularly
problematic in ZSM services, where processing time is crucial, and delays can
have a significant impact on the user experience. One possible technique to
warm-start the search process is meta-learning. By leveraging prior knowl-
edge from similar datasets, meta-learning can initialize the search process
with a promising configuration obtained from that previous knowledge. Ac-
cordingly, the search space is reduced, and the search process is accelerated
[138].
Furthermore, incorporating domain-specific knowledge can design a more
efficient search space, improving the initialization of the search process. For
example, in image classification tasks, domain-specific knowledge can be uti-
lized to define a search space that contains CNNs with specific architectures.
Transfer learning can also aid in warm-starting the search process by

97
providing a good initialization of the model’s weights, reducing the time
required for model training. Through transfer learning, AutoML systems
can leverage knowledge learned from related domains (pre-trained model) to
improve the performance of the target model.

12. Conclusion
Next-Generation Networks (NGNs) have unleashed a remarkable shift in
the telecommunications industry, opening up an array of possibilities for ap-
plications and service areas with diverse needs. While NGNs hold tremendous
promise to fulfill the demanding requirements of future use cases, they must
be devised as highly-adaptable infrastructures using cutting-edge technolo-
gies, such as software defined networking, network function virtualization,
and network slicing.
Nevertheless, as networks grow more complex, traditional manual ap-
proaches for network management become less efficient. Consequently, Zero-
touch network and Service Management (ZSM) has emerged as a fully au-
tomated management solution designed to introduce intelligence into mobile
networks for the purpose of automation and optimization. As explored in
this survey, ZSM has the potential to optimize network resources, boost en-
ergy efficiency, enhance security, and manage traffic in NGNs. However, it
also confronts significant ML challenges, such as the need for effective model
selection and hyperparameter tuning. The paper explores viable network
automation solutions, specifically Automated Machine Learning (AutoML)
and digital twins.
AutoML is one solution to these issues by automating the ML pipeline
within ZSM itself and thus increasing its efficiency. This paper thoroughly
analyzes the AutoML pipeline, providing insights into the techniques utilized
at each step. The practical application of AutoML is demonstrated through
a case study that predicts application throughput for 4G and 5G networks
using an online AutoML pipeline. Simulation results prove the superiority of
AutoML over ML approaches. By leveraging AutoML algorithms to generate
up-to-date predictive models, ZSM can adapt to changing traffic patterns.
This facilitates the automation of network service management, leading to
improved service quality and enhanced operational efficiency.
While ZSM has shown promise across diverse domains, much work re-
mains to be done to refine and incorporate this framework. Nonetheless, the

98
potential for NGNs to revolutionize the way we live, work, and communi-
cate remains as high as ever. ZSM and AutoML will play a pivotal role in
realizing this potential.

References
[1] F. Rancy, Imt for 2020 and beyond, 5G Outlook-Innovations and Ap-
plications (2016) 69.
[2] A. A. Barakabitze, A. Ahmad, R. Mijumbi, A. Hines, 5g net-
work slicing using sdn and nfv: A survey of taxonomy, architec-
tures and future challenges, Computer Networks 167 (2020) 106984.
doi:https://doi.org/10.1016/j.comnet.2019.106984.
[3] C. Benzaid, T. Taleb, Ai-driven zero touch network and service man-
agement in 5g and beyond: Challenges and research directions, IEEE
Network 34 (2) (2020) 186–194. doi:10.1109/MNET.001.1900252.
[4] D. Tennenhouse, J. Smith, W. Sincoskie, D. Wetherall, G. Minden,
A survey of active network research, IEEE Communications Magazine
35 (1) (1997) 80–86. doi:10.1109/35.568214.
[5] L. Jorguseski, A. Pais, F. Gunnarsson, A. Centonza, C. Will-
cock, Self-organizing networks in 3gpp: standardization and fu-
ture trends, IEEE Communications Magazine 52 (12) (2014) 28–34.
doi:10.1109/MCOM.2014.6979983.
[6] M. A. Khan, S. Peters, D. Sahinel, F. D. Pozo-Pardo, X.-T. Dang,
Understanding autonomic network management: A look into the past,
a solution for the future, Computer Communications 122 (2018) 93–
117. doi:https://doi.org/10.1016/j.comcom.2018.01.014.
[7] J. Gallego-Madrid, R. Sanchez-Iborra, P. M. Ruiz, A. F. Skarmeta,
Machine learning-based zero-touch network and service management:
A survey, Digital Communications and Networks 8 (2) (2022) 105–123.
[8] E. Coronado, R. Behravesh, T. Subramanya, A. Fernàndez-Fernàndez,
M. S. Siddiqui, X. Costa-Pérez, R. Riggio, Zero touch management:
A survey of network automation solutions for 5g and 6g networks,
IEEE Communications Surveys & Tutorials 24 (4) (2022) 2535–2578.
doi:10.1109/COMST.2022.3212586.

99
[9] G. Z. ETSI, Zero-touch network and service management (zsm); refer-
ence architecture, Group Specification (GS) ETSI GS ZSM 2.

[10] M. Liyanage, Q.-V. Pham, K. Dev, S. Bhattacharya, P. K. R. Mad-


dikunta, T. R. Gadekallu, G. Yenduri, A survey on zero touch network
and service management (zsm) for 5g and beyond networks, Journal of
Network and Computer Applications 203 (2022) 103362.

[11] S. T. Arzo, C. Naiga, F. Granelli, R. Bassoli, M. Devetsikiotis, F. H. P.


Fitzek, A theoretical discussion and survey of network automation
for iot: Challenges and opportunity, IEEE Internet of Things Jour-
nal 8 (15) (2021) 12021–12045. doi:10.1109/JIOT.2021.3075901.

[12] C.-X. Wang, M. D. Renzo, S. Stanczak, S. Wang, E. G. Larsson, Artifi-


cial intelligence enabled wireless networking for 5g and beyond: Recent
advances and future challenges, IEEE Wireless Communications 27 (1)
(2020) 16–23. doi:10.1109/MWC.001.1900292.

[13] R. Pugliese, S. Regondi, R. Marini, Machine learning-based


approach: global trends, research directions, and regulatory
standpoints, Data Science and Management 4 (2021) 19–29.
doi:https://doi.org/10.1016/j.dsm.2021.12.002.

[14] L. Yang, A. Shami, Iot data analytics in dynamic environ-


ments: From an automated machine learning perspective, Engi-
neering Applications of Artificial Intelligence 116 (2022) 105366.
doi:https://doi.org/10.1016/j.engappai.2022.105366.

[15] H. Bhavsar, A. Ganatra, A comparative study of training algorithms for


supervised machine learning, International Journal of Soft Computing
and Engineering (IJSCE) 2.

[16] R. Saravanan, P. Sujatha, A state of art techniques on machine learn-


ing algorithms: A perspective of supervised learning approaches in
data classification, in: 2018 Second International Conference on Intel-
ligent Computing and Control Systems (ICICCS), 2018, pp. 945–949.
doi:10.1109/ICCONS.2018.8663155.

[17] S. Ray, A quick review of machine learning algorithms, in:


2019 International Conference on Machine Learning, Big Data,

100
Cloud and Parallel Computing (COMITCon), 2019, pp. 35–39.
doi:10.1109/COMITCon.2019.8862451.

[18] L. Alzubaidi, J. Zhang, A. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-


Shamma, J. Santamarı́a, M. Fadhel, M. Al-Amidie, L. Farhan,
Review of deep learning: concepts, cnn architectures, chal-
lenges, applications, future directions, Journal of Big Data 8.
doi:10.1186/s40537-021-00444-8.

[19] E. S. Low, P. Ong, K. C. Cheah, Solving the optimal


path planning of a mobile robot using improved q-learning,
Robotics and Autonomous Systems 115 (2019) 143–161.
doi:https://doi.org/10.1016/j.robot.2019.02.013.

[20] M. Condoluci, T. Mahmoodi, Softwarization and vir-


tualization in 5g mobile networks: Benefits, trends
and challenges, Computer Networks 146 (2018) 65–84.
doi:https://doi.org/10.1016/j.comnet.2018.09.005.

[21] Z. Zaidi, V. Friderikos, Z. Yousaf, S. Fletcher, M. Dohler, H. Aghvami,


Will sdn be part of 5g?, IEEE Communications Surveys & Tutorials
20 (4) (2018) 3220–3258. doi:10.1109/COMST.2018.2836315.

[22] S. Parkvall, E. Dahlman, A. Furuskar, M. Frenne, Nr: The new 5g


radio access technology, IEEE Communications Standards Magazine
1 (4) (2017) 24–30. doi:10.1109/MCOMSTD.2017.1700042.

[23] P. Popovski, K. F. Trillingsgaard, O. Simeone, G. Durisi,


5g wireless network slicing for embb, urllc, and mmtc: A
communication-theoretic view, IEEE Access 6 (2018) 55765–55779.
doi:10.1109/ACCESS.2018.2872781.

[24] A. Moubayed, A. Shami, A. Al-Dulaimi,


On end-to-end intelligent automation of 6g networks, Future In-
ternet 14 (6).
URL https://www.mdpi.com/1999-5903/14/6/165

[25] W. Jiang, B. Han, M. A. Habibi, H. D. Schotten, The road towards 6g:


A comprehensive survey, IEEE Open Journal of the Communications
Society 2 (2021) 334–366. doi:10.1109/OJCOMS.2021.3057679.

101
[26] A. I. Salameh, M. El Tarhuni, From 5g to 6g - challenges, technologies,
and applications, Future Internet 14 (4).

[27] S. A. Abdel Hakeem, H. H. Hussein, H. Kim, Vision and research


directions of 6g technologies and applications, Journal of King Saud
University - Computer and Information Sciences 34 (6, Part A) (2022)
2419–2442. doi:https://doi.org/10.1016/j.jksuci.2022.03.019.

[28] ETSI zero touch network and service management (ZSM),


etsi.org/technologies/zero-touch-network-service-management.

[29] E. Zeydan, Y. Turk, Recent advances in intent-based net-


working: A survey, in: 2020 IEEE 91st Vehicular Tech-
nology Conference (VTC2020-Spring), 2020, pp. 1–5.
doi:10.1109/VTC2020-Spring48590.2020.9128422.

[30] P. H. Gomes, M. Buhrgard, J. Harmatos, S. K. Mohalik, D. Roeland,


J. Niemöller, Intent-driven closed loops for autonomous networks, Jour-
nal of ICT Standardization (2021) 257–290.

[31] M. Falkner, J. Apostolopoulos, Intent-based networking for the enter-


prise: A modern network architecture, Commun. ACM 65 (11) (2022)
108–117. doi:10.1145/3538513.

[32] B. Laliberte, The journey to intent-based networking, White paper,


Enterprise Strategy Group (2018).

[33] N. F. S. d. Sousa, C. E. Rothenberg, Clara: Closed loop-


based zero-touch network management framework, in: 2021
IEEE Conference on Network Function Virtualization and
Software Defined Networks (NFV-SDN), 2021, pp. 110–115.
doi:10.1109/NFV-SDN53031.2021.9665048.

[34] Monb5G, https://www.monb5g.eu/.

[35] Hexa-X, https://hexa-x.eu/.

[36] O. Iacoboaiea, J. Krolikowski, Z. B. Houidi, D. Rossi, From design


to deployment of zero-touch deep reinforcement learning wlans (2022).
doi:10.48550/ARXIV.2207.06172.

102
[37] M. Bunyakitanon, X. Vasilakos, R. Nejabati, D. Simeonidou,
End-to-end performance-based autonomous vnf placement with
adopted reinforcement learning, IEEE Transactions on Cogni-
tive Communications and Networking 6 (2) (2020) 534–547.
doi:10.1109/TCCN.2020.2988486.

[38] A. Dalgkitsis, P.-V. Mekikis, A. Antonopoulos, G. Kormentzas,


C. Verikoukis, Dynamic resource aware vnf placement with deep
reinforcement learning for 5g networks, in: GLOBECOM 2020
- 2020 IEEE Global Communications Conference, 2020, pp. 1–6.
doi:10.1109/GLOBECOM42002.2020.9322512.

[39] S. Moazzeni, P. Jaisudthi, A. Bravalheri, N. Uniyal, X. Vasi-


lakos, R. Nejabati, D. Simeonidou, A novel autonomous profiling
method for the next-generation nfv orchestrators, IEEE Transac-
tions on Network and Service Management 18 (1) (2021) 642–655.
doi:10.1109/TNSM.2020.3044707.

[40] A. K. Sangaiah, S. Rezaei, A. Javadpour, F. Miri, W. Zhang, D. Wang,


Automatic fault detection and diagnosis in cellular networks and
beyond 5g: Intelligent network management, Algorithms 15 (11).
doi:10.3390/a15110432.

[41] A. Shaghaghi, A. Zakeri, N. Mokari, M. R. Javan, M. Behdadfar, E. A.


Jorswieck, Proactive and aoi-aware failure recovery for stateful nfv-
enabled zero-touch 6g networks: Model-free drl approach, IEEE Trans-
actions on Network and Service Management 19 (1) (2022) 437–451.
doi:10.1109/TNSM.2021.3113054.

[42] G. Casale, A. Gluhak, S. Raza, S. Dhanasekaran, M. Imran, A. Imran,


Autonomic network slicing: A machine learning-based approach, IEEE
Communications Magazine 57 (8) (2019) 22–28.

[43] S. Vittal, A. A. Franklin, Harness: High availability support-


ive self reliant network slicing in 5g networks, IEEE Transactions
on Network and Service Management 19 (3) (2022) 1951–1964.
doi:10.1109/TNSM.2022.3157888.

[44] H. Chergui, A. Ksentini, L. Blanco, C. Verikoukis, Toward zero-


touch management and orchestration of massive deployment of net-

103
work slices in 6g, IEEE Wireless Communications 29 (1) (2022) 86–93.
doi:10.1109/MWC.009.00366.

[45] H. Baba, S. Hirai, T. Nakamura, S. Kanemaru, K. Takahashi,


T. Omoto, S. Akiyama, S. Hirabaru, End-to-end 5g network slice re-
source management and orchestration architecture, in: 2022 IEEE 8th
International Conference on Network Softwarization (NetSoft), 2022,
pp. 269–271. doi:10.1109/NetSoft54395.2022.9844088.

[46] I. Afolabi, J. Prados-Garzon, M. Bagaa, T. Taleb, P. Ameigeiras, Dy-


namic resource provisioning of a scalable e2e network slicing orchestra-
tion system, IEEE Transactions on Mobile Computing 19 (11) (2020)
2594–2608. doi:10.1109/TMC.2019.2930059.

[47] D. Breitgand, A. Lekidis, R. Behravesh, A. Weit, P. Giardina,


V. Theodorou, C. E. Costa, K. Barabash, Dynamic slice scaling mech-
anisms for 5g multi-domain environments, in: 2021 IEEE 7th Inter-
national Conference on Network Softwarization (NetSoft), 2021, pp.
56–62. doi:10.1109/NetSoft51509.2021.9492716.

[48] S. Bolettieri, D. T. Bui, R. Bruno, Towards end-to-end application


slicing in multi-access edge computing systems: Architecture discussion
and proof-of-concept, Future Generation Computer Systems 136 (2022)
110–127. doi:https://doi.org/10.1016/j.future.2022.05.027.

[49] M. Wu, F. R. Yu, P. X. Liu, Intelligence networking for autonomous


driving in beyond 5g networks with multi-access edge computing,
IEEE Transactions on Vehicular Technology 71 (6) (2022) 5853–5866.
doi:10.1109/TVT.2022.3165172.

[50] N. Sousa, M. T. Islam, R. Mustafa, D. Perez, C. Esteve Rothenberg,


P. Gomes, Machine learning-assisted closed-control loops for beyond
5g multi-domain zero-touch networks, Journal of Network and Systems
Management 30. doi:10.1007/s10922-022-09651-x.

[51] Z. Fan, R. Liu, Investigation of machine learning based net-


work traffic classification, in: 2017 International Symposium
on Wireless Communication Systems (ISWCS), 2017, pp. 1–6.
doi:10.1109/ISWCS.2017.8108090.

104
[52] S. Jaffry, S. F. Hasan, Cellular traffic prediction using recur-
rent neural networks, in: 2020 IEEE 5th International Sympo-
sium on Telecommunication Technologies (ISTT), 2020, pp. 94–98.
doi:10.1109/ISTT50966.2020.9279373.

[53] I. Alawe, A. Ksentini, Y. Hadjadj-Aoul, P. Bertin, Improv-


ing traffic forecasting for 5g core network scalability: A ma-
chine learning approach, IEEE Network 32 (6) (2018) 42–49.
doi:10.1109/MNET.2018.1800104.

[54] R. K. Gupta, A. Ranjan, M. A. Moid, R. Misra, Deep-


learning based mobile-traffic forecasting for resource utilization in
5g network slicing, in: Internet of Things and Connected Tech-
nologies, Springer International Publishing, 2021, pp. 410–424.
doi:10.1007/978-3-030-76736-5_38.

[55] Y. Hu, Z. Li, J. Lan, J. Wu, L. Yao, Ears: Intelligence-driven


experiential network architecture for automatic routing in software-
defined networking, China Communications 17 (2) (2020) 149–162.
doi:10.23919/JCC.2020.02.013.

[56] T.-J. Tan, F.-L. Weng, W.-T. Hu, J.-C. Chen, C.-Y. Hsieh,
A reliable intelligent routing mechanism in 5g core networks, in: Pro-
ceedings of the 26th Annual International Conference on Mobile Com-
puting and Networking, MobiCom ’20, Association for Computing Ma-
chinery, New York, NY, USA, 2020. doi:10.1145/3372224.3418167.
URL https://doi.org/10.1145/3372224.3418167

[57] F. Khan, K.-L. Yau, M. Ling, M. Imran, Y.-W. Chong, An intelligent


cluster-based routing scheme in 5g flying ad hoc networks, Applied
Sciences 12 (2022) 3665. doi:10.3390/app12073665.

[58] I. Rasheed, F. Hu, Y.-K. Hong, B. Balasubramanian, Intelli-


gent vehicle network routing with adaptive 3d beam alignment
for mmwave 5g-based v2x communications, IEEE Transactions
on Intelligent Transportation Systems 22 (5) (2021) 2706–2718.
doi:10.1109/TITS.2020.2973859.

[59] M. S. Omar, S. A. Hassan, H. Pervaiz, Q. Ni, L. Musavian, S. Mum-


taz, O. A. Dobre, Multiobjective optimization in 5g hybrid net-

105
works, IEEE Internet of Things Journal 5 (3) (2018) 1588–1597.
doi:10.1109/JIOT.2017.2788362.

[60] A. Dalgkitsis, L. A. Garrido, F. Rezazadeh, H. Chergui, K. Ra-


mantas, J. S. Vardakas, C. Verikoukis, Sche2ma: Scalable, energy-
aware, multidomain orchestration for beyond-5g urllc services,
IEEE Transactions on Intelligent Transportation Systems (2022) 1–
11doi:10.1109/TITS.2022.3202312.

[61] F. Rezazadeh, H. Chergui, L. Christofi, C. Verikoukis, Actor-critic-


based learning for zero-touch joint resource and energy control in net-
work slicing (2022). doi:10.48550/ARXIV.2201.08985.

[62] C. Benzaı̈d, T. Taleb, M. Z. Farooqi, Trust in 5g and


beyond networks, IEEE Network 35 (3) (2021) 212–222.
doi:10.1109/MNET.011.2000508.

[63] N. P. Palma, S. N. Matheu-Garcı́a, A. M. Zarca, J. Ortiz, A. Skarmeta,


Enhancing trust and liability assisted mechanisms for zsm 5g architec-
tures, in: 2021 IEEE 4th 5G World Forum (5GWF), 2021, pp. 362–367.
doi:10.1109/5GWF52925.2021.00070.

[64] R. Niboucha, S. B. Saad, A. Ksentini, Y. Challal, Zero-touch se-


curity management for mmtc network slices: Ddos attack detec-
tion and mitigation, IEEE Internet of Things Journal (2022) 1–
1doi:10.1109/JIOT.2022.3230875.

[65] S. Jayasinghe, Y. Siriwardhana, P. Porambage, M. Liyanage, M. Yliant-


tila, Federated learning based anomaly detection as an enabler for
securing network and service management automation in beyond 5g
networks, in: 2022 Joint European Conference on Networks and Com-
munications & 6G Summit (EuCNC/6G Summit), 2022, pp. 345–350.
doi:10.1109/EuCNC/6GSummit54941.2022.9815754.

[66] G. Carrozzo, M. S. Siddiqui, A. Betzler, J. Bonnet, G. M. Perez,


A. Ramos, T. Subramanya, Ai-driven zero-touch operations, security
and trust in multi-operator 5g networks: a conceptual architecture,
in: 2020 European Conference on Networks and Communications (Eu-
CNC), 2020, pp. 254–258. doi:10.1109/EuCNC48522.2020.9200928.

106
[67] F. Debbabi, R. Jmal, L. Chaari, R. L. Aguiar, An overview of inter-slice
& intra-slice resource allocation in b5g telecommunication networks,
IEEE Transactions on Network and Service Management (2022) 1–
13doi:10.1109/TNSM.2022.3189925.

[68] W. Zhang, D. Yang, H. Peng, W. Wu, W. Quan, H. Zhang, X. Shen,


Deep reinforcement learning based resource management for dnn in-
ference in industrial iot, IEEE Transactions on Vehicular Technology
70 (8) (2021) 7605–7618. doi:10.1109/TVT.2021.3068255.

[69] C. Qi, Y. Hua, R. Li, Z. Zhao, H. Zhang, Deep reinforcement learning


with discrete normalized advantage functions for resource management
in network slicing, IEEE Communications Letters 23 (8) (2019) 1337–
1341. doi:10.1109/LCOMM.2019.2922961.

[70] H. Huang, S. Guo, Proactive failure recovery for nfv in distributed edge
computing, IEEE Communications Magazine 57 (5) (2019) 131–137.
doi:10.1109/MCOM.2019.1701366.

[71] 5GZORRO, https://www.5gzorro.eu/5gzorro/.

[72] 6G BRAINS: Bring reinforcement-learning into radio light network for


massive connections, https://6g-brains.eu/.

[73] S. Wijethilaka, M. Liyanage, Survey on network slicing for internet of


things realization in 5g networks, IEEE Communications Surveys &
Tutorials 23 (2) (2021) 957–994. doi:10.1109/COMST.2021.3067807.

[74] S. Wijethilaka, M. Liyanage, Realizing internet of things with network


slicing: Opportunities and challenges, in: 2021 IEEE 18th Annual Con-
sumer Communications & Networking Conference (CCNC), 2021, pp.
1–6. doi:10.1109/CCNC49032.2021.9369637.

[75] F. Tonini, C. Natalino, M. Furdek, C. Raffaelli, P. Monti, Network


slicing automation: Challenges and benefits, in: 2020 International
Conference on Optical Network Design and Modeling (ONDM), 2020,
pp. 1–6. doi:10.23919/ONDM48393.2020.9133004.

[76] M. McClellan, C. Cervelló-Pastor, S. Sallent,


Deep learning at the mobile edge: Opportunities for 5g networks,

107
Applied Sciences 10 (14). doi:10.3390/app10144735.
URL https://www.mdpi.com/2076-3417/10/14/4735

[77] Y. Fu, S. Wang, C.-X. Wang, X. Hong, S. McLaughlin, Artificial intelli-


gence to manage network traffic of 5g wireless networks, IEEE Network
32 (6) (2018) 58–64. doi:10.1109/MNET.2018.1800115.

[78] R. Dangi, P. Lalwani, M. K. Mishra, 5g network traffic control: a


temporal analysis and forecasting of cumulative network activity us-
ing machine learning and deep learning technologies, International
Journal of Ad Hoc and Ubiquitous Computing 42 (1) (2023) 59–71.
arXiv:2023.127766, doi:10.1504/IJAHUC.2023.127766.

[79] Telecom italia open big data milano grid,


https://theodi.fbk.eu/openbigdata/ (2014).

[80] A. A. Khan, M. Zafrullah, M. Hussain, A. Ahmad, Performance


analysis of ospf and hybrid networks, in: 2017 International Sym-
posium on Wireless Systems and Networks (ISWSN), 2017, pp. 1–4.
doi:10.1109/ISWSN.2017.8250022.

[81] M. Chiesa, G. Kindler, M. Schapira, Traffic engineer-


ing with equal-cost-multipath: An algorithmic perspective,
IEEE/ACM Transactions on Networking 25 (2) (2017) 779–792.
doi:10.1109/TNET.2016.2614247.

[82] I. P. Chochliouros, M.-A. Kourtis, A. S. Spiliopoulou, P. Lazaridis,


Z. Zaharis, C. Zarakovitis, A. Kourtis, Energy efficiency concerns
and trends in future 5g network infrastructures, Energies 14 (17).
doi:10.3390/en14175392.

[83] C. Benzaid, T. Taleb, Zsm security: Threat surface and


best practices, IEEE Network 34 (3) (2020) 124–133.
doi:10.1109/MNET.001.1900273.

[84] Y. Wu, K. Zhang, Y. Zhang, Digital twin networks: A sur-


vey, IEEE Internet of Things Journal 8 (18) (2021) 13789–13804.
doi:10.1109/JIOT.2021.3079510.

[85] F. Hutter, L. Kotthoff, J. Vanschoren, Automated machine learning:


methods, systems, challenges, Springer Nature, 2019.

108
[86] M. Mashaly, Connecting the twins: A review on digital twin technology
& its networking requirements, Procedia Computer Science 184 (2021)
299–305, the 12th International Conference on Ambient Systems, Net-
works and Technologies (ANT) / The 4th International Conference
on Emerging Data and Industry 4.0 (EDI40) / Affiliated Workshops.
doi:https://doi.org/10.1016/j.procs.2021.03.039.

[87] K. Panetta, Top 10 strategic technology trends for 2017: Digital


twins, https://www.gartner.com/smarterwithgartner/gartners-top-10-
technology-trends-2017 (October 2016).

[88] C. Ruzsa, Digital twin technology - external data resources in creating


the model and classification of different digital twin types in manufac-
turing, Procedia Manufacturing 54 (2021) 209–215, 10th CIRP Spon-
sored Conference on Digital Enterprise Technologies (DET 2020) – Dig-
ital Technologies as Enablers of Industrial Competitiveness and Sus-
tainability. doi:https://doi.org/10.1016/j.promfg.2021.07.032.

[89] H. Ahmadi, A. Nag, Z. Khar, K. Sayrafian, S. Rahardja, Networked


twins and twins of networks: An overview on the relationship between
digital twins and 6g, IEEE Communications Standards Magazine 5 (4)
(2021) 154–160. doi:10.1109/MCOMSTD.0001.2000041.

[90] M. Vohra, Overview of digital twin, Digital Twin Technology: Funda-


mentals and Applications (2022) 1–18.

[91] C. Cimino, E. Negri, L. Fumagalli,


Review of digital twin applications in manufacturing, Comput. Ind.
113 (C). doi:10.1016/j.compind.2019.103130.
URL https://doi.org/10.1016/j.compind.2019.103130

[92] W. Kritzinger, M. Karner, G. Traar, J. Henjes, W. Sihn, Digital twin


in manufacturing: A categorical literature review and classification,
IFAC-PapersOnLine 51 (2018) 1016–1022.

[93] A. Rasheed, O. San, T. Kvamsdal,


Digital twin: Values, challenges and enablers (2019).
doi:10.48550/ARXIV.1910.01719.
URL https://arxiv.org/abs/1910.01719

109
[94] L. Hui, M. Wang, L. Zhang, L. Lu, Y. Cui, Digital twin for network-
ing: A data-driven performance modeling perspective, IEEE Network
(2022) 1–8doi:10.1109/MNET.119.2200080.
[95] M. Perno, L. Hvam, A. Haug, Implementation of digital twins
in the process industry: A systematic literature review of en-
ablers and barriers, Computers in Industry 134 (2022) 103558.
doi:https://doi.org/10.1016/j.compind.2021.103558.
[96] S. K. Karmaker (“Santu”), M. M. Hassan, M. J.
Smith, L. Xu, C. Zhai, K. Veeramachaneni,
Automl to date and beyond: Challenges and opportunities, ACM
Comput. Surv. 54 (8). doi:10.1145/3470918.
URL https://doi.org/10.1145/3470918
[97] X. He, K. Zhao, X. Chu, Automl: A survey of the state-
of-the-art, Knowledge-Based Systems 212 (2021) 106622.
doi:https://doi.org/10.1016/j.knosys.2020.106622.
[98] K. Chauhan, S. Jani, D. Thakkar, R. Dave, J. Bhatia, S. Tanwar,
M. S. Obaidat, Automated machine learning: The new wave of ma-
chine learning, in: 2020 2nd International Conference on Innovative
Mechanisms for Industry Applications (ICIMIA), 2020, pp. 205–212.
doi:10.1109/ICIMIA48430.2020.9074859.
[99] A. Jadhav, D. Pramod, K. Ramanathan, Comparison of per-
formance of data imputation methods for numeric dataset,
Applied Artificial Intelligence 33 (10) (2019) 913–933.
arXiv:https://doi.org/10.1080/08839514.2019.1637138,
doi:10.1080/08839514.2019.1637138.
[100] S. Jäger, A. Allhorn, F. Bießmann, A benchmark for data imputation
methods, Frontiers in Big Data 4. doi:10.3389/fdata.2021.693674.
[101] F. Biessmann, T. Rukat, P. Schmidt, P. Naidu, S. Schelter, A. Tap-
tunov, D. Lange, D. Salinas, Datawig: Missing value imputation for
tables, Journal of Machine Learning Research 20 (175) (2019) 1–6.
[102] L. Yang, A. Moubayed, A. Shami, Mth-ids: A multitiered hybrid intru-
sion detection system for internet of vehicles, IEEE Internet of Things
Journal 9 (1) (2022) 616–632. doi:10.1109/JIOT.2021.3084796.

110
[103] S. G. K. Patro, K. K. Sahu, Normalization: A preprocessing stage
(2015). doi:10.48550/ARXIV.1503.06462.
[104] L. Yang, A. Moubayed, A. Shami, P. Heidari, A. Boukhtouta,
A. Larabi, R. Brunner, S. Preda, D. Migault, Multi-perspective content
delivery networks security framework using optimized unsupervised
anomaly detection, IEEE Transactions on Network and Service Man-
agement 19 (1) (2022) 686–705. doi:10.1109/TNSM.2021.3100308.
[105] A. Alsharef, K. Aggarwal, M. Kumar, A. Mishra, Review of ml and au-
toml solutions to forecast time-series data, Archives of Computational
Methods in Engineering 29 (7) (2022) 5297–5311.
[106] L. Yang, A. Shami, On hyperparameter optimization of machine learn-
ing algorithms: Theory and practice, Neurocomputing 415 (2020) 295–
316. doi:https://doi.org/10.1016/j.neucom.2020.07.061.
[107] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimiza-
tion of machine learning algorithms, Advances in neural information
processing systems 25.
[108] Y. Bengio, Gradient-Based Optimization of Hyperpa-
rameters, Neural Computation 12 (8) (2000) 1889–1900.
doi:10.1162/089976600300015187.
[109] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hy-
perband: A novel bandit-based approach to hyperparameter optimiza-
tion, The Journal of Machine Learning Research 18 (1) (2017) 6765–
6816.
[110] Y. Li, Z. Wang, Y. Xie, B. Ding, K. Zeng, C. Zhang,
Automl: From methodology to application, in: Proceedings of the
30th ACM International Conference on Information & Knowledge Man-
agement, CIKM ’21, Association for Computing Machinery, New York,
NY, USA, 2021, p. 4853–4856. doi:10.1145/3459637.3483279.
URL https://doi.org/10.1145/3459637.3483279
[111] D. M. Manias, I. Shaer, L. Yang, A. Shami, Concept drift
detection in federated networked systems, in: 2021 IEEE
Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
doi:10.1109/GLOBECOM46510.2021.9685083.

111
[112] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under con-
cept drift: A review, IEEE Transactions on Knowledge and Data Engi-
neering 31 (12) (2019) 2346–2363. doi:10.1109/TKDE.2018.2876857.

[113] Z. Wang, W. Wang, Concept drift detection based on kolmogorov–


smirnov test, in: Artificial Intelligence in China: Proceedings of the
International Conference on Artificial Intelligence in China, Springer,
2020, pp. 273–280.

[114] F. Bayram, B. S. Ahmed, A. Kassler, From concept drift


to model degradation: An overview on performance-aware
drift detectors, Knowledge-Based Systems 245 (2022) 108632.
doi:https://doi.org/10.1016/j.knosys.2022.108632.

[115] J. a. Gama, I. Žliobaitundefined, A. Bifet, M. Pechenizkiy,


A. Bouchachia, A survey on concept drift adaptation, ACM Comput.
Surv. 46 (4). doi:10.1145/2523813.

[116] L. Yang, A. Shami, A lightweight concept drift detection and adapta-


tion framework for iot data streams, IEEE Internet of Things Magazine
4 (2) (2021) 96–101. doi:10.1109/IOTM.0001.2100012.

[117] L. Yang, A. Shami, A multi-stage automated online net-


work data stream analytics framework for iiot systems, IEEE
Transactions on Industrial Informatics 19 (2) (2023) 2107–2116.
doi:10.1109/TII.2022.3212003.

[118] D. Escudero Garcı́a, N. DeCastro-Garcı́a, A. L. Muñoz Castañeda, An


effectiveness analysis of transfer learning for the concept drift problem
in malware detection, Expert Systems with Applications 212 (2023)
118724. doi:https://doi.org/10.1016/j.eswa.2022.118724.

[119] L. Yang, D. M. Manias, A. Shami, Pwpae: An ensemble frame-


work for concept drift adaptation in iot data streams, in: 2021 IEEE
Global Communications Conference (GLOBECOM), 2021, pp. 01–06.
doi:10.1109/GLOBECOM46510.2021.9685338.

[120] A. Ahad, M. Tahir, M. Aman Sheikh, K. I. Ahmed, A. Mughees, A. Nu-


mani, Technologies trend towards 5g network for smart health-care us-
ing iot: A review, Sensors 20 (14). doi:10.3390/s20144047.

112
[121] S. Guo, B. Lu, M. Wen, S. Dang, N. Saeed, Customized 5g and beyond
private networks with integrated urllc, embb, mmtc, and positioning for
industrial verticals, IEEE Communications Standards Magazine 6 (1)
(2022) 52–57. doi:10.1109/MCOMSTD.0001.2100041.

[122] D. Raca, D. Leahy, C. J. Sreenan, J. Quinlan, Beyond Throughput, The


Next Generation: A 5G Dataset with Channel and Context Metrics, in:
Proceedings of the 11th ACM Multimedia Systems Conference, MMSys
’20, Association for Computing Machinery, New York, NY, USA, 2020,
p. 303–308. doi:10.1145/3339825.3394938.

[123] G. Solutions, G-NetTrack User Manual (2021).


URL https://gyokovsolutions.com/manual-g-nettrack/

[124] Z. Ma, H. Yu, J. Xia, C. Wang, L. Yan, X. Zhou, Network traffic


prediction based on seq2seq model, in: 2021 16th International Con-
ference on Computer Science & Education (ICCSE), 2021, pp. 710–713.
doi:10.1109/ICCSE51940.2021.9569477.

[125] D. M. Manias, A. Chouman, A. Shami, An nwdaf approach to 5g core


network signaling traffic: Analysis and characterization, in: GLOBE-
COM 2022 - 2022 IEEE Global Communications Conference, 2022, pp.
6001–6006. doi:10.1109/GLOBECOM48099.2022.10000989.

[126] On artificial intelligence - a european approach to excellence and trust,


White Paper (February 2020).

[127] C. Li, W. Guo, S. C. Sun, S. Al-Rubaye, A. Tsourdos, Trustworthy


deep learning in 6g-enabled mass autonomy: From concept to quality-
of-trust key performance indicators, IEEE Vehicular Technology Mag-
azine 15 (4) (2020) 112–121. doi:10.1109/MVT.2020.3017181.

[128] J. Zhou, A. H. Gandomi, F. Chen, A. Holzinger, Evaluating the quality


of machine learning explanations: A survey on methods and metrics,
Electronics 10 (5). doi:10.3390/electronics10050593.

[129] M. Xevgenis, D. G. Kogias, P. A. Karkazis, H. C. Leligou,


Addressing zsm security issues with blockchain technology, Future In-
ternet 15 (4). doi:10.3390/fi15040129.
URL https://www.mdpi.com/1999-5903/15/4/129

113
[130] Y. Siriwardhana, P. Porambage, M. Liyanage, M. Ylianttila,
Ai and 6g security: Opportunities and challenges, in: 2021
Joint European Conference on Networks and Communica-
tions & 6G Summit (EuCNC/6G Summit), 2021, pp. 616–621.
doi:10.1109/EuCNC/6GSummit51104.2021.9482503.

[131] N. P. Palma, S. N. Matheu-Garcı́a, A. M. Zarca, J. Ortiz, A. Skarmeta,


Enhancing trust and liability assisted mechanisms for zsm 5g architec-
tures, in: 2021 IEEE 4th 5G World Forum (5GWF), 2021, pp. 362–367.
doi:10.1109/5GWF52925.2021.00070.

[132] S. R. Islam, W. Eberle, S. K. Ghafoor, M. Ahmed,


Explainable artificial intelligence approaches: A survey, CoRR
abs/2101.09429. arXiv:2101.09429.
URL https://arxiv.org/abs/2101.09429

[133] D. V. Carvalho, E. M. Pereira, J. S. Cardoso, Machine learning in-


terpretability: A survey on methods and metrics, Electronics 8 (8).
doi:10.3390/electronics8080832.

[134] D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyan-


tha, J. Liu, D. Marculescu, Single-path mobile automl: Efficient
convnet design and nas hyperparameter optimization, IEEE Jour-
nal of Selected Topics in Signal Processing 14 (4) (2020) 609–622.
doi:10.1109/JSTSP.2020.2971421.

[135] F. Tramèr, D. Boneh, Adversarial Training and Robustness for Multiple


Perturbations, Curran Associates Inc., 2019.

[136] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as


a defense to adversarial perturbations against deep neural networks,
in: 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp.
582–597. doi:10.1109/SP.2016.41.

[137] M. Bahri, F. Salutari, A. Putina, M. Sozio, Automl: state of the


art with a focus on anomaly detection, challenges, and research di-
rections, International Journal of Data Science and Analytics 14.
doi:10.1007/s41060-022-00309-0.

114
[138] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, F. Hut-
ter, Auto-sklearn 2.0: Hands-free automl via meta-learning (2022).
arXiv:2007.04074.

115

You might also like