Zero Touch Networks
Zero Touch Networks
Network Automation
Mirna El Rajab, Li Yang, Abdallah Shami
arXiv:2312.04159v1 [cs.NI] 7 Dec 2023
Abstract
The Zero-touch network and Service Management (ZSM) framework repre-
sents an emerging paradigm in the management of the fifth-generation (5G)
and Beyond (5G+) networks, offering automated self-management and self-
healing capabilities to address the escalating complexity and the growing data
volume of modern networks. ZSM frameworks leverage advanced technolo-
gies such as Machine Learning (ML) to enable intelligent decision-making
and reduce human intervention. This paper presents a comprehensive survey
of Zero-Touch Networks (ZTNs) within the ZSM framework, covering net-
work optimization, traffic monitoring, energy efficiency, and security aspects
of next-generational networks. The paper explores the challenges associated
with ZSM, particularly those related to ML, which necessitate the need to
explore diverse network automation solutions. In this context, the study in-
vestigates the application of Automated ML (AutoML) in ZTNs, to reduce
network management costs and enhance performance. AutoML automates
the selection and tuning process of a ML model for a given task. Specif-
ically, the focus is on AutoML’s ability to predict application throughput
and autonomously adapt to data drift. Experimental results demonstrate
the superiority of the proposed AutoML pipeline over traditional ML in
terms of prediction accuracy. Integrating AutoML and ZSM concepts sig-
nificantly reduces network configuration and management efforts, allowing
operators to allocate more time and resources to other important tasks. The
paper also provides a high-level 5G system architecture incorporating Au-
1. Introduction
In today’s digital world, we rely on telecommunication networks for more
than just phone calls. From streaming movies to controlling smart home
devices, these networks have revolutionized the way we live, work, and com-
municate. As we move towards a world where the Internet of Things (IoT) is
becoming increasingly widespread, the need for Next-Generation Networks
(NGNs) has only grown more indispensable.
NGNs, such as the fifth-generation (5G) and the upcoming sixth-generation
(6G) networks, mark a landmark in telecommunications history; these net-
works represent not only an upgrade from their predecessors but also a
paradigm shift in terms of speed, latency, capacity, and reliability - un-
locking new possibilities for emerging applications and service areas. Ac-
cording to the International Telecommunication Union IMT-2020, three core
service areas for 5G networks include enhanced Mobile Broadband (eMBB),
ultra-Reliable Low-Latency Communication (uRLLC), and massive Machine-
Type Communication (mMTC) [1]. Each service area addresses specific use
cases such as multimedia content access (eMBB), mission-critical applica-
tions (uRLLC), or smart cities (mMTC).
NGNs have the potential to unlock the full potential and meet the chal-
lenging requirements of future use cases, but to fully realize this potential,
they must be designed as highly-flexible and programmable infrastructures
that are context-aware and service-aware. Advancements such as Software
Defined Networking (SDN), Network Function Virtualization (NFV), Multi-
access Edge Computing (MEC), and network slicing play a pivotal role in
the network architecture [2]. These technologies will open up new business
models, such as multi-domain, multi-service, and multi-tenancy models, to
support new markets.
The growth of NGNs has brought with it new challenges, particularly
in terms of network management. As networks become more intricate, tra-
2
ditional manual methods for configuring, deploying, and maintaining them
become cumbersome, time-consuming, and error prone [3]. To tackle this
issue, various efforts have been made to introduce intelligence and reason-
ing into mobile networks for automation and optimization purposes. These
efforts include active networks [4], self-organizing networks [5], autonomic
network management [6], and Zero-Touch Networks (ZTNs) [7].
The ZTN approach has emerged as a fully automated management so-
lution, enabling the network to analyze its current state, interpret it, and
provide suggestions for possible reconfigurations - while leaving validation
and acceptance up to a human operator [8]. Implementing ZTN concepts
and technologies will be essential for operators to achieve greater levels of
automation, improve network performance, and reduce time-to-market for
new features. ZTN-based solutions are available for a diverse set of prob-
lems, from managing resources to ensuring network security and privacy.
European Telecommunications Standards Institute (ETSI) has gained in-
terest in shifting towards ZTN-based solutions. In 2017, ETSI created a
Zero-touch network and Service Management (ZSM) Industry Specification
Group (ISG), to define the requirements and architecture for a network au-
tomation framework based on ZTN concepts [9]. ZSM will ensure that NGNs
remain responsive to evolving user needs and demands via proactive network
management techniques. These techniques leverage the power of Artificial
Intelligence (AI) and Machine Learning (ML) to automate and optimize net-
work operations, enabling efficient resource allocation, dynamic service pro-
visioning, and predictive maintenance. AI and ML are technologies that
enable systems to simulate human intelligence, learn from data, and make
intelligent decisions or predictions.
ZSM still faces significant ML challenges, such as the need for effec-
tive feature engineering, algorithm selection, and hyperparameter tuning.
Thus, there is a need to explore other network automation solutions that can
complement ZSM in order to achieve higher levels of automation and effi-
ciency. One such solution is Automated ML (AutoML), which helps address
these challenges by automating the ML pipeline and improving the efficiency
and effectiveness of the ZSM solution. AutoML handles crucial tasks such
as data preprocessing, feature engineering, model selection, hyperparameter
tuning, model evaluation, and even model updating. By automating these
processes, AutoML significantly reduces the manual effort needed to develop
high-performing ML models.
Accordingly, this survey aims to provide a comprehensive overview of
3
ZSM in 5G and Beyond (5G+) networks, with a focus on network optimiza-
tion, energy efficiency, network security, and traffic control. By highlighting
the ML challenges in ZSM and exploring the potential of AutoML, this survey
aims to contribute to the development of more effective network automation
solutions. In particular, this survey offers the following notable contributions:
4
1. Comprehensive survey of zero-touch applications in NGNs: It is the
first paper to comprehensively explore Zero-Touch Network Operation
(ZNO) applications, spanning network optimization, traffic control, en-
ergy efficiency, and security in 5G+ networks.
3. Practical case study of online AutoML: This paper presents the first
case study applying an online AutoML pipeline to a real network traffic
task within the ZSM context. Additionally, it outlines a high-level
architecture integrating AutoML and ZSM concepts into a 5G system.
5
Figure 1: Survey Outline
6
2. List of Acronyms
Artificial Intelligence Acronyms
AI Artificial Intelligence
AutoML Automated Machine Learning
ANN Artificial Neural Network
CNN Convolutional Neural Network
DDPG Deep Deterministic Policy Gradient
DL Deep learning
DRL Deep Reinforcement Learning
DQN Deep Q-Network
GRU Gated Recurrent Unit
LSTM Long Short-Term Memory
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
ML Machine Learning
MLP Multi-Layer Perceptron
NAS Neural Architecture Search
PCA Principle Component Analysis
Seq2Seq Sequence-to-Sequence
SL Supervised Learning
SVM Support Vector Machine
RL Reinforcement Learning
RNN Recurrent Neural Network
XAI Explainable Artificial Intelligence
7
Network Management Acronyms
AREL3P Adapted REinforcement Learning VNF Performance
Prediction module for Autonomous VNF Placement
DT Digital Twin
MD Management Domain
TM TeleManagement
8
Performance Metrics Acronyms
CAPEX Capital Expenditure
E2E End-to-End
B5G Beyond 5G
CN Core Network
gNB gNodeB
9
NSSMF Network Slice Subnet Management Function
NWDAF Network Data Analytics Function
RAN Radio Access Network
SDN Software Defined Network
UE User Equipment
UPF User Plane Function
uRLLC ultra-Reliable Low-Latency Communication
VNF Virtual Network Function
mmWave millimeter wave
10
General Telecommunication Acronyms
API Application Programming Interface
CSP Communication Service Provider
IoT Internet of Things
RSRP Reference Signal Received Power
RSRQ Reference Signal Received Quality
RSSI Received Signal Strength Indicator
FANET Flying Ad Hoc Network
V2X Vehicle-to-Everything
VPN Virtual Private Network
VM Virtual Machine
VR Virtual Reality
USR User Service Request
WLAN Wireless Local Area Network
3. Background
As the telecommunications industry moves towards the deployment of
5G+ networks and the implementation of ZSM, advanced technologies such
as AI/ML, SDN, and NFV are becoming essential components of network
infrastructure. Such advancements support intelligent, flexible, and auto-
mated network management which in turn allow for efficient and scalable
operations for network operators and service providers. AI/ML optimizes
network performance, predicts and prevents network failures, and automates
network management tasks. SDN provides a simpler approach to network
management by dividing the control and data planes and enabling the dy-
namic allocation of network resources. NFV enables the deployment of net-
work functions as the software on general-purpose hardware and eliminates
the need for proprietary hardware by decoupling network functions.
11
3.1. Artificial Intelligence and Machine Learning Paradigms
AI refers to ”the simulation of human intelligence in machines that are
programmed to think like humans and mimic their actions” [12]. In other
words, AI refers to the ability of systems to imitate human cognitive func-
tions such as learning. ML is an application of AI that enables machines to
learn from large volumes of data and make predictions without directly being
instructed. ML is considered a subset of AI. ML can be further divided into
three main categories: supervised, unsupervised, and reinforcement learning
[13]. Table 1 provides an overview of common techniques used in each ML
category, outlining their strengths and weaknesses to provide a comprehen-
sive understanding of each approach [14, 15, 16, 17, 18, 19].
Table 1: An Overview of Traditional ML Algorithms [14, 15, 16, 17, 18, 19]
12
Table 1 – continued from previous page
ML Algorithm Description Advantages Limitations
Naı̈ve Bayes A probabilistic algorithm • Address multi-class • Assume
that uses Bayes’ theorem classification independent
to predict the class of new problems features
data based on the condi- • Insensitive to • Handle discrete
tional probability of fea- irrelevant features datasets better than
tures given a class continuous datasets
Support A linear model for classifi- • Work with • Slow training with
Vector cation and regression that high-dimensional large datasets
Machine finds the best hyperplane data • Poor performance
to separate data points into • Handle non-linear with noisy data
different classes in a high- relationships
dimensional feature space through kernel
functions
13
Table 1 – continued from previous page
ML Algorithm Description Advantages Limitations
Recurrent An ANN that can handle • Well-suited for • Incapable of
Neural sequential data by using sequence-related capturing long-term
Network loops to maintain a hid- tasks and dependencies
den state that incorporates time-series data • Sensitive to the
past information • Can handle exploding gradient
variable-length and vanishing
inputs and outputs problems
14
DL also has applications in unsupervised learning, such as leveraging autoen-
coders to learn the underlying representation of data.
Reinforcement Learning (RL) is a feedback-based, environment-driven
approach in which an agent learns to behave in an environment through trial
and error. The ultimate objective is to improve performance by maximizing
a reward signal [13]. Common techniques used in RL include Q-learning and
policy gradient methods. The latter optimizes policy parameters directly
using gradient-based optimization, utilizing the policy gradient theorem to
compute gradients of the expected cumulative reward. While useful for tasks
involving continuous action spaces like robotics control, these methods may
have slow convergence rates and suffer from high variance. RL has evolved
towards Deep Reinforcement Learning (DRL), where deep neural networks
are utilized to model the value function (value-based), the agent’s policy
(policy-based), or both (actor-critic). DRL is most beneficial in problems
with high-dimensional state space.
AI/ML technologies are seen as foundational pillars for network automa-
tion in terms of development, configuration, and management [12]. They
will play a crucial role in achieving a new level of automation and intel-
ligence toward network and service management. Additionally, they will
enhance network performance, reliability, and adaptability through a series
of real-time and robust decisions based on predictions of network and user
behavior, such as user traffic.
15
of network devices is known as the SDN datapath.
16
As we move towards the future, we are witnessing the emergence of Be-
yond 5G (B5G), also known as 6G. This revolutionary technology is set to
take mobile communications to unprecedented heights by building upon the
capabilities of 5G [24]. B5G promises to offer even faster speeds, lower la-
tency, and greater capacity than 5G [25]. Specifically, B5G is envisioned
to support data transfer rates of up to 1 Tbps, a monumental leap forward
from 5G’s maximum data transfer rate of 20 Gbps. It also aims to drastically
reduce latency to a sub-millisecond range of 10 µs to 100 µs, which is a minus-
cule fraction of 5G’s latency of less than 1 ms. Additionally, B5G is expected
to support a staggering number of connected devices, with an expected ca-
pacity of up to 10 million/km2 , a tenfold increase from 5G’s capacity of up
to 1 million/km2 .
B5G intends to support a plethora of new and emerging use cases, such
as terahertz communication providing ultra-high data rates and low latency
[26, 27]. In addition to these performance enhancements, B5G will bring new
capabilities such as advanced network slicing, edge computing, and network
intelligence. These capabilities will enable more efficient and flexible network
operations, as well as new business models and revenue streams. B5G will
also focus on energy efficiency, sustainability, and security, ensuring that it
is not only technologically advanced but also environmentally conscious and
secure [26]. This represents a leap forward, not just in terms of technology
but also in terms of societal impact.
17
a need to realize the vision of zero-touch networks and service management to
enable automated orchestration and management of network resources and to
ensure End-to-End (E2E) Quality of Experience (QoE) guarantees for end-
users. The goal is to govern the services driven through an autonomous net-
work by high-level policies and rules (aka intents), which is capable of offering
Self-X life cycle operations (self-serving, self-fulfilling, and self-assuring) with
minimal, if any, human intervention [28]. To achieve this, ETSI established
the ZSM ISG in 2017. The group’s objective is to create a framework to en-
able fully-autonomous network operation and service management for 5G+
networks capable of self-{configuration, monitoring, healing, optimization}
[7].
18
cope with the degradation of other services and/or the infrastructure. These
services can also be combined to create new management services, which
is referred to as service composability. In terms of management functions,
stateless functions that separate processing from data storage are also sup-
ported.
Separation of concern in management is another key principle behind
ZSM. This principle differentiates two management concerns: Management
Domain (MD) and E2E cross-domain service management (i.e., across MDs).
Within the former, services are managed based on their respective resources.
In the latter, E2E services that span multiple MDs are managed, and coordi-
nation between MDs is orchestrated. This principle ensures non-monolithic
systems and reduces the complexity of the E2E service. To automate service
assurance, closed-loop management automation is used to achieve and main-
tain a set of objectives without any external disruption. The architecture is
also coupled with intent-based interfaces that express consumer requests in
an interpretable form and offer high-level abstraction. Overall, the architec-
ture is of minimal complexity and meets all the functional and non-functional
requirements that are discussed next [9, 10].
• General Requirements:
(a) Realize a certain degree of availability.
(b) Become energy efficient.
19
Table 2: ETSI ZSM Framework Requirements
Non- Cross-Domain Data High data availability, QoS support, task completion
Functional Services
20
(c) Achieve independence from vendors, operators, and service
providers.
(d) Follow specific monitoring requirements.
• Requirements for Cross-Domain Data Services:
(a) Realize high data availability.
(b) Support Quality of Service (QoS) specifications for data ser-
vices within and outside the framework.
(c) Complete tasks within a preset timeframe.
• Requirements for Cross-Domain Service Integration:
(a) Support the on-demand addition and removal of services.
(b) Support the co-existence of different service versions simulta-
neously.
(c) Avoid any changes to the management functions when inte-
grating services.
(d) Allow integration of new and legacy functions.
(ii) Functional requirements refer to the features and capabilities that the
ETSI ZSM framework must have in order to perform its intended func-
tions.
• General Requirements:
(a) Manage resources and services provided by the MDs.
(b) Support cross-domain management of E2E services.
(c) Support closed-loop management.
(d) Support technology domains needed for an E2E service.
(e) Support access control to services within the MD.
(f) Support open interfaces.
(g) Support hiding the management complexity of MDs and E2E
services.
(h) Automate constrained decision-making processes.
(i) Promote automation of operational life-cycle management func-
tions.
• Requirements for Data Collection:
(a) Allow the collection and storage of real-time data.
21
(b) Enable the preprocessing and filtering of collected data.
(c) Support attaching metadata to collected data.
(d) Allow common access to the collected data across the MDs.
(e) Support the aggregation of collected data cross-domain.
(f) Enforce data governance by supporting various degrees of data
sharing/collection velocity and volume.
(g) Manage the data distribution to maintain consistency.
(h) Provide data to the consumer based on their requirements.
• Requirements for Cross-Domain Data Services:
(a) Allow separation of data storage and processing.
(b) Logically centralize the storage/processing of data.
(c) Enable data sharing within the framework.
(d) Automate management of redundant data.
(e) Automate overload handling of data services.
(f) Automate data service failover.
(g) Automate data recovery.
(h) Automate policy-based data processing.
(i) Automate the processing of data services with distinct data
types.
• Requirements for Cross-Domain Service Integration and Access:
(a) Enable the discovery and registration of management services.
(b) Provide information on accessing the discovered service.
(c) Invoke services indirectly or directly (by the consumer).
(d) Support both synchronous and asynchronous communication
between the consumer and the service provider.
• Requirements for Lawful Intercept: ZSM architecture must en-
sure that lawful interception is not interrupted regardless of any
management service performed by the framework.
(iii) Security requirements refer to the measures that must be taken to en-
sure the security and privacy of the network and its data.
22
(b) Ensure the security of resources in addition to management ser-
vices and functions.
(c) Provide special attention to the privacy of personal data by utiliz-
ing mechanisms such as privacy-by-design or privacy-by-default.
(d) Ensure the availability of data, resources, functions, and services.
(e) Apply relevant security policies based on the compliance status of
services regarding security requirements.
(f) Allow authorized access to services by authenticated users.
(g) Automatically detect, identify, prevent, and migrate attacks.
(h) Supervise decisions of ML/AI regarding privacy and security to
prevent attacks from spreading.
23
Figure 2: ETSI ZSM Framework Reference Architecture [9]
24
loop automation.
4.3. Intents
Intents are a crucial component of network automation in zero-touch net-
works. They provide a high-level, abstract representation of the desired state
of the network, making it easier for network administrators to manage and
configure large and complex networks [29].
The main goal of intents is to make network configuration and manage-
ment more efficient, accurate, and scalable. They achieve this by allow-
ing network administrators to describe the network’s desired state using a
domain-specific language that is then translated into the underlying network
configurations [30]. This falls under the intent-driven management paradigm.
This paradigm eliminates the need for manual intervention, reducing the risk
of human error and freeing up time for more critical tasks.
The benefits behind intents in zero-touch networks are significant [31, 32].
Some of the key advantages include:
25
4.3.1. Example Use Case: Intent-Based Approach for Configuring a 5G Net-
work to Support a Virtual Reality Service
For instance, suppose a network operator is tasked with configuring a
5G network to support a new Virtual Reality (VR) service. The VR service
requires low latency and high bandwidth to provide an immersive experi-
ence for users. The network should also be able to dynamically allocate
network resources to meet the changing demands of the VR service. In a
traditional approach, the network operator would have to manually config-
ure the necessary network settings and policies to meet the VR service’s
requirements. However, in a zero-touch network, the network operator can
use an intent-based approach to simplify the process, as shown in Listing 1.
The intent in this example describes the desired state of the network as
follows:
Once the intent is specified, it can be translated into the necessary network
configurations and policies. These can then be automatically implemented
and enforced by the zero-touch network management system, resulting in
a network optimized to support the VR service with the aforementioned
requirements.
26
4.3.2. Use of Intents in the ZSM Framework
Intent should serve as the sole method of communicating requirements
between the zero-touch system and human operators, as well as between
the different subsystems and layers of the management system. In the ZSM
framework, this means that the service specification provided by ZSM frame-
work consumers must be conveyed through an intent object. The E2E service
domain is responsible for translating it into sub-intents that specify specific
requirements for each MD. Communication based on intent objects is a uni-
versal mechanism that can be applied to any MD within the ZSM framework.
With intents, domain-specific semantics can be encapsulated in shared infor-
mation models, and endpoints based on intents can leverage a generic knowl-
edge management service for the life-cycle management of intent objects. In
line with this, Gomes et al. introduced a cutting-edge framework for the
management of autonomous networks within the ZSM framework [30]. This
framework leverages the concept of intent-based models, which are translated
into a set of rules and constraints that drive the configuration and operation
of the network, resulting in a closed-loop control mechanism. One of the key
features of the framework is its ability to continuously monitor the network’s
state and adjust its configuration and operation accordingly, ensuring that
it remains aligned with the specified intent. The framework employs feed-
back mechanisms to collect data from the network and update its configu-
ration and operation in real-time, allowing for rapid adaptation to changing
network conditions. In addition to its closed-loop control capabilities, the
framework also provides abstraction and simplification, presenting the net-
work operator with a simplified view of the underlying infrastructure. This
abstraction reduces the complexity of network management and enhances the
efficiency of decision-making. Results show that the framework can signifi-
cantly improve network performance and stability, compared to traditional
manual approaches to network management. Additionally, results show that
the framework can significantly reduce operational complexity, streamlining
network management and enabling the deployment of more advanced au-
tonomous networks. Future work includes investigating the interoperability
of the proposed framework based on the intent meta-models.
Another Closed Loop-based zero-touch network mAnagement fRAme-
work, CLARA, was developed by Sousa et al. [33]. CLARA’s two main
components are the closed-loop data plane and closed-loop control plane.
The closed-loop data plane is the component of the CLARA framework that
27
realizes and implements the intents defined in the intent definition language.
The data plane is designed to be programmable and flexible, allowing it to
adapt to changing network conditions and accommodate new network ser-
vices. The closed-loop control plane, on the other hand, is the component
of the CLARA framework that monitors and enforces the intents defined in
the intent definition language. It continuously receives feedback from the
network and adjusts its behavior to maintain the desired state of the net-
work. The control plane is responsible for detecting deviations from the
desired state and triggering corrective actions to restore the network to its
desired state. Similar to the results of the previous framework [30], Sousa et
al. demonstrate CLARA’s superiority over traditional network management
approaches in terms of automation, reliability, and scalability. Upcoming
initiatives include integrating ML algorithms into the framework to improve
its accuracy and efficiency. This could include algorithms for network opti-
mization, fault detection and diagnosis, and proactive network management.
28
4.4.2. TM Forum Zero-touch Orchestration, Operations and Management
(ZOOM)
TM Forum’s ZOOM project aims to define a new management archi-
tecture of virtual networks and services through automated configuration,
provisioning, and assurance. The guiding principles of ZOOM include near
real-time request execution with no human intervention, open standard Ap-
plication Programming Interfaces (APIs), closed-loop control, and E2E man-
agement [3, 8]. These principles are also shared by ZSM networks.
4.4.4. Hexa-X
This is another EU-funded H2020 project representing a flagship for the
6G vision [35]. The objective is to interconnect three worlds, namely the hu-
man, physical, and digital worlds, via technology enablers. Over a duration
of 36 months, this project will focus on creating 6G use cases, developing
essential 6G technologies, and defining a new architecture for an intelligent
fabric that weaves together the key technology enablers. In the ZSM do-
main, the Hexa-X project defines AI/ML-driven orchestration as an essential
component for 5G+ networks, which will, in turn, support data-driven and
zero-touch approaches.
29
network slicing enables the creation of virtualized network slices that
can be automatically customized for specific use cases. MEC can also
be leveraged to automate network functions at the edge, reducing la-
tency and improving the overall user experience.
30
5. Network Resource Management
To achieve the full potential of the envisioned pervasive network, current
5G networks need improvements. Specifically, automation is limited as net-
work monitoring via analytics is not fully supported [66]. The performance
requirements specified by 3GPP rely on incorporating technologies such as
dynamic resource/spectrum sharing and cognitive zero-touch network orches-
tration for an optimized network.
Network optimization is the art and science of fine-tuning and configuring
a network to achieve the best possible performance, efficiency, and scalability.
This includes optimizing the configuration of network devices, such as routers
and switches, and adjusting the parameters of different protocols and services
that run on the network. The aim is to ensure that the network is running
at its best possible performance, which can include factors such as reducing
network congestion, increasing network throughput, and improving network
availability. It is a continuous process that aims to keep the network run-
ning smoothly and efficiently, with the ultimate goal of providing a reliable
and high-quality service to the users. Traffic engineering, capacity planning,
and network design are among the various techniques and approaches used
for network optimization, which can optimize routing, bandwidth allocation,
and QoS, among other aspects of the network. Monitoring and troubleshoot-
ing tools can also be used to diagnose performance issues in real-time and
automate the network optimization process.
In the context of 5G networks, network optimization becomes even more
crucial due to their unique requirements. These networks are characterized
by high bandwidth, low-latency, and high-concurrency, which demand ad-
vanced techniques for network optimization, such as network slicing, edge
computing, and advanced resource allocation. 5G networks have to handle
a large number of connected devices and services, each with different re-
quirements and characteristics. Resource allocation involves managing the
available network resources, such as bandwidth, processing power, and stor-
age, in order to provide an optimal service to each device and service. This
can include allocating resources dynamically in response to changing network
conditions and user demands, as well as using advanced techniques such as
ML and optimization algorithms to improve the efficiency of resource alloca-
tion. Another important aspect of network optimization in 5G networks is
network slicing, paving the way for efficient and flexible allocation of network
resources, as well as the ability to support diverse and dynamic requirements
31
of 5G networks. Edge computing is also a key technology for network opti-
mization in 5G networks. Edge computing involves moving computing and
storage resources closer to the network edge, where they can be used to
reduce network congestion and improve the responsiveness of the network.
This is particularly important for 5G networks, which will support a wide
range of low-latency and high-bandwidth services, such as virtual reality and
augmented reality applications. Tables 4, 5, and 6 provide an overview of
different proposed schemes and frameworks addressing dynamic resource al-
location, network slicing, and edge computing, respectively.
32
the total service time of an E2E application running VNF video transcod-
ing. Results show the resilience of AREL3P to network dynamics in addition
to the ability to generalize better than SL algorithms, thus tackling adapt-
ability concerns. However, RL approaches slowly converge to the optimal
policy in large-state action sets, rendering it challenging to use in large-scale
5G deployments. This led to DRL, where the intersection of RL and deep
learning helps overcome this limitation [69]. Subsequently, Dalgkitsis et al.
proposed an intelligent VNF placement solution using a deep deterministic
policy gradient RL algorithm [38]. The objective is to minimize the average
E2E latency between the users and the VNFs that compose the uRLLC ser-
vice provided by the network, while considering the distribution of the avail-
able computational resources (CPU, memory, storage) at the network edge.
Results highlight the advantages of the proposed solution over the baseline
algorithm (that rejects any VNF placement request to an edge data center
if it has reached 90% utilization capacity) by achieving the least amount of
SLA violations and the least number of VNF rejections at any traffic level.
To enhance the proposed algorithm in future work, it is suggested to build
the algorithm based on LSTM RNN to provide better insights into the usage
trend of each unit in the network.
For AI/ML to work well in managing network services, it is important
to have a good understanding of how resources are used by the network
and its components. This will allow AI/ML to make better decisions and
improve the user experience. Accordingly, Moazzeni et al. introduced a
Novel Autonomous Profiling method, known as NAP, that can be applied
within the ambit of ZSM for the next generation of NFV orchestration [39].
This NAP method encompasses three key acts:
1. NAP utilizes a weighted resource configuration selection algorithm,
which automatically generates a profiling dataset for VNFs by select-
ing the configuration of resources that have the greatest impact on
the performance goals and Key Performance Indicator targets within a
confined profiling time frame.
33
the performance metrics in the target environments.
34
section were identified individually. As the data available was unlabeled raw
data, a clustering algorithm method (i.e., unsupervised learning approach)
was employed. By applying different algorithms with varying numbers of
clusters, five clusters with a Silhouette coefficient of 0.4509 were obtained for
traffic data and 6 clusters with a Silhouette coefficient of 0.503 for signaling
data. Each cluster represented a specific cause for a fault in the network.
Finally, various classification algorithms were applied to the labeled data
obtained from clustering to evaluate the results accurately. The best accu-
racy in test data was achieved by combining the results of different classifiers
through opinion voting for both traffic and signaling data. One root cause
of a fault is the lack of capacity in the traffic and signaling channel. In that
case, the proposed solution is to increase the capacity (scaling) and allocate
dynamic resources, specifically capacity, to the required channels according
to the network traffic situation. Future work suggests examining subscriber
complaint data in more detail, including the explanations that the subscribers
provide to the complaints center, to identify the fault type and analyze its
cause [40].
The realm of failure recovery in networks is comprised of two distinct
approaches, known as Proactive Failure Recovery (PFR) and reactive fail-
ure recovery. This recovery process involves three key stages, including the
deployment of backup VNFs and image migration, flow reconfiguration, and
state synchronization. However, the execution of each stage incurs a signif-
icant delay, resulting in not only a decline in network performance but also
a violation of SLAs due to prolonged interruption of service [70]. By uti-
lizing failure prediction, the PFR approach can decrease recovery delay by
initiating certain stages of the recovery procedure prior to the manifestation
of the failure. For instance, PFR can save delays in flow rescheduling and
backup launch by initiating these stages beforehand. In this manner, if we
are able to recover failed VNFs using PFR, the performance of the network
can be significantly improved by reducing interruption time during recovery.
This motivates the proposal of a PFR framework for future 6G networks.
Given the constraints of resources and the maximum allowable interrup-
tion time caused by failures, Shaghaghi et al. established a network that
is both highly reliable and resource-efficient by introducing Zero-Touch PFR
(ZT-PFR) approach [41]. This approach utilizes DRL to enhance the fault-
tolerance of networks enabled by NFV. This is formulated as an optimization
problem that aims to minimize a weighted cost, which takes into account fac-
tors such as resource costs and penalties for incorrect decisions. Shaghaghi et
35
al. adopted state-of-the-art DRL-based methods such as soft-actor-critic and
proximal-policy-optimization as solutions to this problem. To train and test
the proposed DRL-based framework, the authors construct an environment
simulator using a simulated model of impending failure in NFV-based net-
works inspired by ETSI. Additionally, to capture time-dependent features,
the agents are equipped with LSTM layers. Additionally, the concept of age
of information is applied to balance between event-based and scheduled mon-
itoring in order to ensure that network status information is up-to-date for
decision-making. Given the ever-changing nature of NFV environments, it
is important to develop learning methods that are online, fast, and efficient.
Thus, further research in this direction could be of great interest [41].
To fulfill the 5G vision in terms of E2E automation and resource shar-
ing/allocation, the 5GZORRO project [71] has been launched by the H2020
program. Its main objective is to utilize distributed AI and Distributed
Ledger Technologies (DLTs) to design a secure and trusted E2E zero-touch
service and network management and orchestration within the 5G network
with a shared spectrum market for real-time trading on spectrum alloca-
tion. While AI is a pillar behind a zero-touch cognitive network orchestrator
and manager, DLT (or blockchain technology) is a protocol that enables
the secure and trusted functioning of a distributed 5G E2E service chain.
Thus, the 5GZORRO framework creates a 5G service layer across different
parties where SLAs are monitored, spectrum is shared, and orchestration
is automated [71, 66]. Another project launched by the H2020 is the 6G
BRAINS project which started on January 1st , 2021, and is expected to run
for 36 months [72]. It focuses on developing an AI-driven multi-agent DRL
algorithm for dynamic resource allocation exceeding massive machine-type
communications with new spectrum links, including THz, Sub-6 GHz, and
optical wireless communications. The aim is to improve the capacity, relia-
bility, and latency for various vertical sectors, such as eHealth and intelligent
transportation.
36
Table 4: Dynamic Resource Allocation Schemes
NAP [39] Design an autonomous profiling Accurately predict Extend the work by
method for the next generation untested resource covering additional
of NFV orchestration in 5G configurations. types of resources.
networks.
ZT-PFR Enhance the fault-tolerance of Network status infor- Apply online learning
[41] networks through a DRL-based mation is up-to-date ML models to address
zero-touch proactive failure for decision-making. the ever-changing
recovery scheme nature of NFV envi-
ronments.
37
ally, managing the allocation of network resources across multiple slices can
be complex and time-consuming, especially as the demand for each slice fluc-
tuates over time. Accordingly, it is necessary to have a reliable management
system to automate the process of creating, configuring, and deploying slices,
monitor their performance, and troubleshoot any issues that may arise. An-
other challenge lies in the need for advanced security mechanisms to protect
virtual networks from unauthorized access and malicious attacks. There ex-
ists another need to ensure the interoperability of virtual networks with exist-
ing networks and systems. This requires the development of new standards
and protocols to ensure seamless communication between different virtual
networks and existing systems. Business-wise, the deployment and mainte-
nance of the network slicing technology can be costly, and service providers
must find ways to effectively monetize the services they offer to recoup their
investment.
Network slicing can unlock new opportunities for service providers, but
it also poses a number of technical and operational challenges that are be-
ing addressed academically. One of the main areas of research has focused
on developing automated methods for creating, configuring, and deploying
network slices without any manual intervention. This includes the use of
AI/ML algorithms to predict network resource demand and dynamically al-
locate resources to different slices. For instance, Casale et al. proposed a
ML-based approach to predict network resource demand and dynamically
allocate resources to different slices [42]. The proposed approach is able to
adapt to changes in network conditions and user demands, and make real-
time decisions about the allocation of resources to different slices. In fact,
the proposed algorithm is based on RL, which learns from past decisions to
improve the performance of future decisions. The algorithm uses a combina-
tion of decision-making policies, including a greedy policy, a random policy,
and a Q-learning policy. Results compare the performance of the proposed
approach using simulations and compare it to traditional static allocation
methods. The approach shows a better performance in terms of resource
utilization, by allocating resources to the slices that need them the most.
However, this approach has some limitations in terms of scalability, as it
requires a large amount of data to train the ML models. Additionally, it
assumes that the network conditions are static and do not change rapidly.
Another area of research has focused on developing zero-touch manage-
ment and orchestration systems for network slicing in 5G+ networks. This
includes the use of SDN and NFV technologies to enable the dynamic cre-
38
ation, configuration, and management of network slices. As such, Vittal et
al. presented HARNESS, a novel High Availability supportive self-Reliant
NEtwork Slicing System for the 5G core, powered by the SON paradigm
[43]. HARNESS intelligently handles control plane User Service Requests
(USRs), ensuring uninterrupted high-availability service delivery for delay-
tolerant and delay-sensitive slices. It addresses scaling, overload manage-
ment, congestion control, and failure recovery of primary slice types, namely
eMBB, uRLLC, and mMTC. The proposed HARNESS mechanism outper-
forms traditional scheduling methods, minimizing dropped USRs and im-
proving response times. Experimentally, HARNESS achieved 3.2% better
slice service high-availability in a minimal active/active cluster configura-
tion. Future work involves scaling the HARNESS framework and exploring
the selective offloading of control plane USRs on smart network components
for different slice types in a 5G system.
In the context of scaled systems, Chergui et al. proposed a distributed
and AI-driven management and orchestration system for large-scale deploy-
ment of network slices in 6G [44]. The proposed framework is compliant
with both ETSI standards, ZSM and ENI, focusing on autonomous and in-
telligent network management and orchestration to enable autonomous and
scalable management and orchestration of network slices and their dedicated
resources. Future work suggests mapping the framework to different architec-
tures to test its effectiveness. Another compliant framework was introduced
by Baba et al., representing a resource orchestration and management archi-
tecture for 5G network slices. This framework comprises a per-MD resource
allocation mechanism and an MD interworking function, aimed at facilitating
the provision of E2E network services over network slices in the context of
5G evolution [45]. This proposed architecture is underpinned by a plethora
of standard APIs and data models, and its efficacy is demonstrated through
the successful orchestration across multiple domains, and the automation of
closed-loop scenarios. The architecture has been verified and certified as a
proof of concept by the ETSI ZSM.
Similarly, Afolabi et al. proposed a novel and comprehensive global E2E
mobile network slicing orchestration system (NSOS) that enables network
slicing for next-generation mobile networks by considering all aspects of the
mobile network spanning across access, core, and transport parts [46]. The
high-level architecture of the system comprises a hierarchical structure, in-
cluding a global orchestrator and multiple domain-specific orchestrators and
their respective system components. The focus of the system is on allowing
39
customers to request and monitor network slices only, while the proposed
Dynamic Auto-Scaling Algorithm (DASA) ensures that the system can react
instantly to changes in workload. The DASA includes both proactive and
reactive provisioning mechanisms, where the proactive mechanism relies on a
workload predictor implemented using ML techniques, and the reactive provi-
sioning module triggers asynchronous requests to scale in or out the different
entities of the NSOS. The core of the solution is a resource dimensioning
heuristic algorithm which determines the required amount of computational
and virtual resources to be allocated to the NSOS for a given workload so
that a maximum response time of the NSOS is guaranteed. Namely, the
resource dimensioning algorithm is based on a queuing model and will be
invoked when a provisioning decision is taken to decide how many resources
have to be requested or released. The system’s performance is evaluated
through system-level simulations, showing that the algorithm is able to find
the minimal required resources to keep the mean response time of the NSOS
under a given threshold. The response time is defined as the sum of all pro-
cessing and waiting times experienced by a slice orchestration request (e.g.,
slice creation or release) when passing through different NSOS’s entities dur-
ing its lifetime in the orchestrator. The simulation results also suggest that
the request rejection rate during a given period is determined by the reaction
time of the reactive provisioning mechanism, which is in turn affected by the
slice’s instantiation time [46]. As CPU resources are the only resources taken
into account, the inclusion of other resources, such as memory, is encouraged
in the future.
As yet, the NSOS has been purely focused on the technical aspect. Bre-
itgand et al. delved into the issue of coordinating and orchestrating busi-
ness processes across domains in order to facilitate efficient resource sharing
among multiple Communication Service Providers (CSPs) [47]. The lack
of a standard for this aspect of inter-CSP collaboration is identified as a
major hurdle for achieving optimal resource utilization. To address this,
Breitgand et al. proposed a set of design principles that include autonomy
for CSPs in their business and technical processes, non-intrusive extensions
to existing NFV MANO frameworks, preservation of slice isolation, sepa-
ration of concerns between technical and business aspects of orchestration,
and a cloud-native declarative orchestration approach using Kubernetes as
the cross-domain control plane. The proposed dynamic NS scaling occurs
through collaboration between CSPs facilitated by DLT transactions. This
approach utilizes ML techniques to automate the process of extending slices
40
and ensuring QoS requirements, which is inspired by ETSI’s ZSM closed-
loop architecture. This orchestrator is demonstrated on the 5GZORRO vir-
tual content delivery network use case scenario for highly populated areas.
Content delivery networks are geographically distributed networks of compu-
tation and storage resources that offer high-availability and high-performance
services such as web content, application data, and live/on-demand streaming
media. The proposed approach has been validated in a development envi-
ronment, and future work will involve evaluating it in a larger testbed and
with additional use cases to quantify the benefits of inter-CSP slice scaling
[47].
41
Table 5: Network Slicing Schemes
Resource Propose a per-MD resource • Architecture has been Address the challenges
MANO allocation mechanism and verified and certified of security and privacy
Architecture an MD interworking func- as a proof of concept in network slicing.
for 5G tion, aimed at facilitating by the ETSI ZSM.
Network the provision of E2E net- • It supports multiple
Slices [45] work services over 5G net- use cases and services.
work slices. • It is a flexible and
scalable architecture.
NSOS [46] Design a global E2E mobile • Instantly react to Account for additional
network slicing orchestration workload changes resources other than
system that enables network with DASA. CPU.
slicing for NGNs taking • Find minimal
into account access, core, resources to maintain
and transport parts of the the NSOS response
network. time threshold.
42
segment is assigned specific management and orchestration responsibilities.
However, to improve the feasibility of MEC application relocation between
tenants, a solution must be devised that enhances the interaction between
the proposed architecture and the 5G CN functions, facilitating the synchro-
nization of traffic forwarding rules across various administrative domains.
Another framework leveraging MEC network technology was introduced by
Wu et al. to allow Autonomous Vehicles (AVs) to adapt to changing driv-
ing conditions by sharing their driving intelligence [49]. In this framework,
named Intelligence Networking (Intelligence-Net), driving intelligence refers
to a trained neural network model for autonomous driving. Key features
Intelligence-Net include:
• Sharing of driving intelligence: A unique MEC network-assisted Intelligence-
Net is proposed to facilitate real-time sharing of driving intelligence
between AVs, allowing for adaptation to changing environmental con-
ditions.
• Segmentation of roads: The road is divided into segments, each with
its own dedicated driving model tailored to its specific environmental
features, reducing the dimensionality of each road segment.
• Continuous model updates: Whenever a specific road segment experi-
ences environmental changes, new data is collected and used to retrain
the generic base driving model, improving its ability to adapt to the
new conditions.
• Secure and efficient learning: To ensure security and efficiency, the
framework implements blockchain-enabled federated learning, which
combines the privacy benefits of federated learning with the reduced
communication and computation costs of transfer learning. The blockchain
technology authenticates learning participants and secures the entire
learning process.
Simulation results indicate that this solution can produce updated driving
models that better adapt to environmental changes compared to traditional
methods. AVs can then adopt these changes by downloading the updated
driving models. The proposed Intelligence-Net framework has yet to fully
leverage the available resources. While it poses a challenge, utilizing hetero-
geneous edge computing resources to optimize the system remains a desirable
objective in the future.
43
The integration of MEC into 5G+ networks is a key enabler of ZSM,
which represents a new approach to network management [50]. With MEC,
network functions can be deployed and managed dynamically, providing the
necessary processing and storage resources to support the rapidly chang-
ing demands of mobile users. This enables ZSM to provide dynamic and
efficient network management, improving the user experience and reducing
operational costs. Following this, Sousa et al. introduced a self-healing archi-
tecture based on the ETSI ZSM framework for multi-domain B5G networks
[50]. This architecture utilizes ML-assisted closed control loops across vari-
ous ZSM reference points to monitor network data for estimating end-service
QoE KPIs and to identify faulty network links in the underlying transport
network. To demonstrate this architecture in action, the authors have instan-
tiated it in the context of automated healing of Dynamic Adaptive Streaming
over HTTP video services. Two ML techniques, online and offline, are pre-
sented for estimating an SLA violation through a QoE probe at the edge and
identifying the root cause in the transport network. Experimental evalua-
tion indicates the potential benefits of using ML for QoS-to-QoE estimation
and fault identification in MEC environments. Further work will consider
improvements to the ML pipelines, such as model generalization.
44
6. Network Traffic Control
Network traffic control is an essential aspect of modern networking that
aims to effectively manage and regulate the flow of data packets in a net-
work [77]. Its primary goal is to ensure the efficient utilization of network
resources, prevent congestion, and guarantee a high level of service quality
for all network users.
One key component of network traffic control is traffic prediction, which
uses advanced techniques such as ML to forecast future network traffic pat-
terns. This information is used to proactively manage network congestion
and optimize network resources, thereby ensuring that data is delivered in
a timely and reliable manner [78]. Another crucial aspect of network traf-
fic control is intelligent routing. This process uses advanced algorithms and
data analysis to determine the most efficient path for data to travel from its
source to its destination. The routing algorithm takes into account various
inputs, such as network conditions, available resources, and traffic patterns
to make informed decisions on how to route data [55].
When combined, traffic prediction and intelligent routing form a powerful
system for network traffic control. By providing a comprehensive understand-
ing of network traffic patterns and conditions, this system enables network
administrators to make informed decisions on how to best manage and regu-
late the flow of data. This ultimately leads to a more efficient use of network
resources, improved performance, and a higher level of service quality for all
network users.
45
vehicles, and IoT devices. With this increased complexity and diversity of
network traffic, it becomes all the more crucial to be able to predict and
classify network traffic with a high degree of accuracy. This is essential for
ensuring that the network can handle the increased demand, and that the
various types of traffic are properly managed and optimized. Furthermore,
5G networks are designed to be highly dynamic and adaptable, capable of
adjusting to changes in traffic patterns in real-time. This makes it even more
imperative to have accurate and up-to-date traffic predictions and classifica-
tions.
The integration of ZSM technology into 5G networks serves to elevate the
already intelligent process of network traffic prediction and classification to
new heights of precision and automation. ZSM, as a SON technology, enables
5G networks to automatically configure and optimize themselves in real-time,
based on the predictions and classifications of network traffic [7]. Through
the utilization of advanced ML algorithms and analytics, ZSM conducts a
thorough analysis of historical traffic data, which is then employed to predict
future traffic patterns with a high degree of accuracy. These predictions, in
turn, are utilized to optimize network resources and configure the network
to handle the expected traffic with optimal efficiency. Furthermore, ZSM
employs advanced analytics to classify the various types of traffic traversing
the network, such as voice, video, and data, and to identify specific applica-
tions and services being utilized by the network’s users. This information is
then used to optimize network performance, ensuring that the different types
of traffic are properly managed and delivered to their intended destinations
with the utmost precision.
Several approaches have been proposed for predicting and classifying net-
work traffic in 5G networks. Table 7 presents an overview of the traffic
prediction schemes discussed in this survey. One of the most widely used
methods is ML, which can be used to identify patterns and trends in net-
work traffic data. Various ML algorithms, such as ANNs, decision trees, and
SVMs, have been applied to network traffic prediction and classification in
5G networks. For example, in a study by Fan and Liu, the authors apply
supervised SVM and unsupervised K-means clustering algorithms for net-
work traffic classification [51]. The dataset includes flow parameters directly
obtained from packet headers, such as segment size and packet inter-arrival
time. The dataset is manually labeled with ten traffic types, including multi-
media, mail, database, and attacks. The comparison of these two algorithms
highlights the enhanced performance of the SVM model compared to the K-
46
means clustering algorithm. However, K-means is able to characterize new
or unknown application types as training samples do not require manual la-
beling in advance. Future work aims to use the proposed model to predict
the number of user plane 5G network functions needed to manage the data
plane traffic in a virtualized environment. The next steps comprise applying
the described classification models to real SDN traffic data.
DL is another approach that has been proposed for network traffic pre-
diction and classification in 5G networks. DL algorithms, such as CNNs
and RNNs, have been shown to be particularly effective for this task. The
advantage of DL is that it can automatically learn features from raw data,
which reduces the need for feature engineering. In a study by Jaffry et al.,
the authors proposed a RNN with an LSTM approach for cellular traffic pre-
diction using real-world call data records [52]. The call data record utilized
was published by Telecom Italia for the Big Data Challenge competition and
collected for 62 days starting November 1st , 2013 [79]. Sample data collected
includes country code, inbound/outbound SMS activities, and inbound/out-
bound call activities. The proposed LSTM model has a hidden layer with
50 LSTM cells followed by a dense layer with one unit. Results highlight
the enhanced performance of the proposed model over vanilla neural net-
works and statistical autoregressive integrated moving average model. This
work can be extended by using this model to design an autonomous resource
allocation scheme for 5G+ networks. Similarly, Alawe et al. presented a
combination of LSTM and deep neural networks to proactively predict the
number of resources along with the network traffic to manage and scale the
CN in terms of the resources used for the Access and Mobility Management
Function (AMF) in 5G systems [53]. Their experimental results reveal that
the use of ML approaches improves the scalability and reacts to the change
in traffic with lower latency. Gupta et al. delved into the examination of
various DL models, namely the Multi-Layer Perceptron (MLP), Attention-
based Encoder Decoder, Gated Recurrent Unit (GRU), and LSTM, on the
Dataset-Unicauca-V2 mobile-traffic dataset [54]. This dataset comprises a
compendium of six days of mobile traffic data, boasting a total of 87 features
and 3,577,296 instances. The data presented was procured from the web
division of Universidad Del Cauca, Colombia, Popayan, where it was meticu-
lously recorded over a span of six days, specifically April 26, 27, 28 and May 9,
11, 15, 2017, at various hours, including both morning and evening. The data
has been meticulously classified into four distinct categories, namely stream-
ing, messaging, searching, and cloud. Each sample contains comprehensive
47
information about IP traffic generated by network equipment, including the
IP address of origin and destination, port, arrival time, and layer 7 protocol
(application). As for performance metrics, recall, precision, and f1-score were
utilized to measure the performance of the models. Findings indicate that the
MLP and Encoder-Decoder models yielded average results for mobile-traffic
forecasting, while the GRU and LSTM models performed exceptionally well,
with the latter yielding the optimal outcome. In the future, Gupta et al. aim
to investigate other time-prediction approaches for resources and to work in
MEC in industrial IoT applications to support industry 4.0.
Table 7: Traffic Prediction Schemes
SVM and Predict and classify the • SVM shows enhanced • Predict the number of
K-means network traffic in 5G performance user plane 5G network
Algorithms for networks. compared to functions to manage
Network K-means. the data plane traffic.
Traffic • K-means is suitable • Apply the proposed
Classification for new/unknown models to real SDN
[51] application types. traffic data.
DL-based Predict cellular traffic The proposed model Extend the work to de-
Cellular Traffic with RNN LSTM model outperforms vanilla sign an autonomous re-
Prediction [52] using real world call neural networks and source allocation scheme
data record. autoregressive integrated for 5G+ networks.
moving average model.
Scaling AMF Proactively predict the The proposed model Utilize the proposed
in 5G Systems number of resources in reacts to the change model to estimate the
[53] addition to the network in traffic with lower number of user plane 5G
traffic to manage re- latency. CN functions needed to
sources and scale the handle the traffic.
AMF in 5G systems.
48
intelligence-driven experiential network architecture that seamlessly inte-
grates the cutting-edge technologies of SDN and DRL to usher in the era
of intelligent and autonomous routing [55]. Hu et al. addressed the limita-
tions of conventional routing strategies, which are heavily reliant on manual
configuration, and introduced a DRL algorithm to optimize data flow rout-
ing. This DRL agent is highly adaptable, learning routing strategies through
its interactions with the network environment. Furthermore, EARS incorpo-
rates advanced network monitoring technologies, such as network state col-
lection and traffic identification, to provide closed-loop control mechanisms,
enabling the DRL agent to optimize routing policies and enhance network
performance. As such, EARS, an intelligence-driven experiential network ar-
chitecture, harnesses the Deep Deterministic Policy Gradient (DDPG) algo-
rithm to dynamically generate routing policies. DDPG utilizes a deep neural
network to simulate the Q-table, and another neural network to produce the
strategy function, allowing it to effectively tackle large-scale continuous con-
trol problems. Through relentless training, EARS can learn to make better
control decisions by interacting with the network environment, and adjust-
ing services and resources based on network requirements and environmental
conditions. Simulations, which compare EARS with typical baseline schemes
such as Open Shortest-Path First [80] and Equal-Cost Multi-Path Routing
[81], demonstrate that EARS surpasses these schemes by achieving superior
network performance in terms of throughput, delay, and link utilization. As
a future undertaking, the algorithm will be evaluated in a real-world SDN
setup. Similarly, Tan et al. introduced a load balancing algorithm, the Re-
liable Intelligent Routing Mechanism (RIRM), designed to optimize traffic
data routing based on the traffic load in 5G CNs [56]. This algorithm takes
into account multiple factors, such as the shortest path, link latency, and node
loading to prevent packet loss. It has been implemented on a 5G testbed,
free5GC, by adding two elements, the RIRM traffic tracker and the RIRM
traffic controller. The tracker continuously monitors user traffic data flow
and reports to the controller, which then calculates the best route to avoid
congestion. Experimental results demonstrate that the proposed algorithm
surpasses traditional round-robin load balancing methods in terms of packet
loss, latency, and average data throughput. This algorithm is specifically
tailored for the uRLLC use case of 5G, and further research will explore its
performance in other use cases, such as mMTC.
One application of 5G access networks is Flying Ad Hoc Networks (FANETs)
that utilize unmanned aerial vehicles as nodes to provide wireless access.
49
These networks are characterized by their limited resources and high mobil-
ity, which pose significant challenges for efficient routing [57]. In this context,
intelligent routing is a crucial component of FANETs, as it enables the net-
work to adapt to the dynamic conditions of the environment and optimize the
use of resources. The use of advanced techniques, such as DRL, is a promis-
ing approach to addressing these challenges. Deep Q-Network (DQN)-based
vertical routing, proposed by Khan et al., is an example of such an intelligent
routing approach, which aims to select routes with better energy efficiency
and lower mobility across different network planes [57]. DQN, the underly-
ing mechanism of this proposed routing method, leverages the principles of
RL and DL to empower an agent to make informed decisions based on the
current state of the system. The integration of deep neural networks within
DQN enables the agent to learn about various states, such as the residual
energy and mobility rate, to predict optimal actions. The training process
of DQN involves the use of mini-batches of experiences, and there are three
vital features that aid in achieving an accelerated convergence toward the
most optimal route:
1. Delayed rewards from the replay memory allow for better prediction of
state-action values in a dynamic environment.
2. The decaying variable shifts the focus from exploration to exploitation
as the number of episodes increases.
3. Mini batches of run-time states from the replay memory are used to
train and minimize a loss function.
The main objective is to improve network performance by reducing frequent
disconnections and partitions. The proposed method is a hybrid approach
that utilizes both a central controller and distributed controllers to share in-
formation and handle global and local information, respectively. It is suitable
for highly dynamic FANETs and can be applied in various scenarios, such as
border monitoring and targeted operations. By clustering the network across
different planes, this method offloads data traffic and improves network life-
time. Simulation results show that it can increase network lifetime by up to
60%, reduce energy consumption by up to 20%, and reduce the rate of link
breakages by up to 50% compared to traditional RL methods. Khan et al.
suggested that there are several open issues that can be explored in future
research to improve the proposed routing method for FANETs in 5G access
networks. These include:
50
• Enhancing the vertical routing method to reduce the routing overhead
incurred to establish and maintain inter- and intra-cluster, and inter-
and intra-plane routes, by allowing clusters to adjust their sizes for
optimal performance.
51
Table 8: Intelligent Routing Schemes
RIRM [56] Design a load balancing The proposed algorithm Test the algorithm in
algorithm to optimize surpasses traditional other use cases, such as
traffic data routing round-robin load balanc- mMTC.
based on the traffic ing methods in terms of
load in 5G CNs for the packet loss, latency, and
uRLLC use case. average data throughput.
DQN-based Select routes with bet- • Improve the network • Reduce routing
Vertical ter energy efficiency and performance by overhead via
Routing [57] lower mobility across reducing frequent adjustable size clusters.
different network planes disconnections and • Explore other variants
for FANETs. partitions. of DQN, such as
• Offload data traffic and DDPG and double
improves network DQN.
lifetime by clustering • Examine the use of
the network across mini-batches from the
different planes. replay memory to
• Work with highly prevent overfitting.
dynamic FANETs. • Evaluate the scheme
with different types of
terrains and obstacles.
52
One of the key approaches for achieving energy efficiency in networking is
through the use of SDN and NFV technologies [82]. These technologies enable
the dynamic provisioning and scaling of network resources, which can lead
to significant energy savings. Additionally, the use of VNFs can reduce the
energy consumption of network devices by consolidating multiple functions
on a single physical platform. These enablers comprise 5G+ networks.
Energy efficiency in 5G+ networks is a critical aspect of the next gener-
ation of mobile networks as it directly impacts the environmental and eco-
nomic sustainability of these networks. The deployment of 5G+ networks
is expected to result in a significant increase in energy consumption due to
the expansion of network infrastructure, the deployment of new technologies
such as massive multiple-input/multiple-output and mmWave communica-
tions, and the increased demand for high-bandwidth applications [82].
There are several key strategies that have been proposed to improve the
energy efficiency of 5G+ networks. One of the key strategies is the use of
energy-efficient network architecture and protocols. This includes the use
of energy-efficient radio access technologies, such as energy-aware scheduling
and power control, as well as the use of energy-efficient network functions and
infrastructure. Another key strategy is the use of energy-efficient devices,
such as energy-efficient smartphones and IoT devices, which can reduce the
overall energy consumption of the network.
Zero-touch automation is another promising approach to improve the en-
ergy efficiency of 5G+ networks to automate the configuration, management,
and optimization of network functions and infrastructure. This can help to
reduce the energy consumption of the network by reducing the need for man-
ual intervention and by enabling the network to adapt to changes in traffic
demand and network conditions in a more efficient manner.
As such, Omar et al. formulated an optimization problem to compute
a green efficient solution to maximize energy efficiency under minimal area
spectral efficiency and outage probability in a 5G heterogeneous network [59].
Using a convex method, this problem was solved. Results prove that network
densification does not always result in the optimal efficient solution as the
increase in mmWave base stations increases area spectral efficiency and de-
creases energy efficiency. As for introducing mmWave small cells, it has been
established that they improve coverage, and consequently spectral efficiency.
Future work suggests designing deployment strategies that have the environ-
ment in mind. Tackling a specific 5G use case, Dalgkitsis et al. examined
the impact of network automation on energy consumption and overall op-
53
erating costs in the context of 5G networks, specifically for uRLLC services
[60]. A framework, known as Service CHain Energy-Efficient Management
or SCHE2MA, is proposed, which utilizes distributed RL to intelligently
deploy service function chains with shared VNFs across multiple domains.
SCHE2MA framework is designed to be decentralized, eliminating the poten-
tial for central points of failure, allowing for scalability, and avoiding costly
network-wide configurations. Parallelism is also achieved by introducing the
auction mechanism, a system that enables inter-domain VNF migration in
a distributed multi-domain network. Results show the reduction of average
service latency with the enhancement of energy efficiency by 17.1% com-
pared to a centralized RL solution. This approach addresses the important
challenge of balancing the performance constraints of uRLLC services with
energy efficiency in the context of 5G+ networks, where reducing carbon
emissions and energy consumption is of paramount importance. Future work
will focus on implementing the auction mechanism in a fully decentralized
manner by employing DLTs.
In the context of zero-touch, Rezazadeh et al. proposed a framework
for fully automated MANO of 5G+ communication systems which utilizes
a knowledge plane and incorporates recent network slicing technologies [61].
The knowledge plane plays the role of an all-encompassing system within
the network by creating and retaining high-level models of what the network
is supposed to do, in order to provide services to other blocks in the net-
work. In other words, the knowledge plane joins the architectural aspects
of network slicing to achieve synchronization in a continuous control setting
by revisiting ZSM operational closed-loop building blocks. This framework,
known as Knowledge-based Beyond 5G or KB5G, is based on the use of
algorithmic innovation and AI, utilizing a DRL method to minimize energy
consumption and the cost of VNF instantiation. A unique Actor-Critic based
approach, known as the twin-delayed double-Q soft actor-critic method, al-
lows for continuous learning and accumulation of past knowledge to minimize
future costs. This stochastic method supports continuous state and action
spaces while stabilizing the learning procedure and improving time efficiency
in 5G+. It also promotes a model-free approach reinforcing the dynamism
and heterogeneous nature of network slices while reducing the need for hyper-
parameter tuning. This framework was tested on a 5G RAN NS environment
called smartech-v2, which incorporates both CPU and energy consumption
simulators with an OpenAI Gym-based standardized interface to guarantee
the consistent comparison of different DRL algorithms. Numerical results
54
demonstrate the advantages of this approach and its effectiveness in terms
of energy consumption, CPU utilization, and time efficiency. Future direc-
tions suggest the inclusion of different resources, such as memory. Table 9
summarizes the mentioned schemes, including future work to enhance their
performance and utility.
KB5G [61] • A framework for fully • Support continuous Take into account
automated MANO of 5G+ state and action resources other than
communication systems is spaces. CPU, such as memory.
proposed. • Stabilize the learning
• A twin-delayed double-Q procedure.
soft actor-critic algorithm • Improve time
is designed to minimize efficiency in 5G+.
energy consumption and
the cost of VNF
instantiation.
55
8.1. Safeguarding 5G+ Networks: Security Measures & Weaknesses
5G+ networks are characterized by a highly distributed architecture,
which consists of multiple network elements such as the RAN, the CN, and
the transport network [8]. This architecture poses new security challenges,
as it increases the attack surface and creates new attack vectors for mali-
cious actors. 5G+ networks also have a greater number of connected devices
compared to previous generations of communication networks, which further
exacerbates the security and privacy risks. Additionally, the transmission of
data in 5G+ networks is characterized by elevated levels of speed, bandwidth,
and connectivity. This has dramatically increased the volume of data trans-
mitted over these networks. As such, it is imperative to implement robust
security and privacy measures to protect against the threat of cyberattacks
and promote trust in 5G+ networks.
5G+ networks integrate a plethora of security mechanisms to secure the
transmission of data, protect network infrastructure, and prevent unautho-
rized access to sensitive information [83]. These mechanisms include, but are
not limited to, cryptographic algorithms, firewalls, Virtual Private Networks
(VPNs), Intrusion Detection Systems (IDSs), NFV, and SDN. Cryptographic
algorithms, such as Advanced Encryption Standard, ensure the confidential-
ity and integrity of the data transmitted over the network. Firewalls act
as a barrier between the internal and external networks, providing a line of
defense against unauthorized access and malicious attacks. VPNs allow se-
cure communication over an insecure network, and IDSs detect and respond
to security threats in real-time. NFV and SDN technologies abstract the
underlying hardware from the network services, enabling the automation of
network management and reducing the attack surface.
Despite these robust security measures, 5G+ networks remain vulnerable
to a wide range of threats. Threats to 5G networks can be distinguished
based on the technological domains that they impact [64].
56
presence of rogue base stations that launch Man in the Middle (MitM)
attacks can compromise user information, break privacy, track users,
and cause Denial of Service (DoS).
5. SDN Threats: SDN separates control and user (data) planes, making
it a potential target for malicious actors. These attacks target the link
between the control and user planes and can take the form of distributed
DoS attacks or gaining control over network devices through Topology
Poisoning attacks.
57
Table 10: Security Threats within the ZSM Context [83, 10]
Open API Identity Attack Attack attempts to access a targeted API by using a list of
previously compromised passwords, stolen credentials, or
tokens.
DoS Attack An attacker floods the API with a high volume of requests,
rendering it unavailable.
Application and They involve unauthorized access to data, alteration/deletion
Data Attack of data, insertion of malicious code, and disruption of scripts.
MiTM Attack An attacker intercepts the communication between the API
client and server to steal confidential information
Data Exposure An unauthorized individual intercepts information related to
the application’s purpose (e.g., advertising content), exposing
Intents the system’s goals to risks and triggering additional attacks.
Tampering The attacker makes physical modifications to a connection
point or interface.
MITM Attack An attacker intercepts messages between two entities in order
to remotely eavesdrop on or alter the traffic.
Automated Deception Attack The deceiver convinces the target to believe a false version of
Closed- the truth and manipulates the target’s actions to benefit the
Loop deceiver.
DoS Attack A DoS attack overloads the network with a high volume of
traffic, making the network unavailable to its users.
Privilege Escalation An intruder gains access to a target account, bypassing
authorization and gaining unauthorized access to data.
SDN/NFV
Spoofing An attacker sends false address resolution protocol messages
over a local area network.
Adversarial Attack A malicious actor tampers the training data and/or inserts
small perturbations into the test instances.
AI/ML Model Extraction The attack attempts to steal the model’s parameters to
Attack recreate a similar ML model.
Model Inversion The attack aims to recover the training data or the underlying
Attack information from the model’s outputs.
58
the data processed by ML pipelines. This framework records and stores in-
puts and outputs of ML pipelines on a tamper-evident log, and uses smart
contracts to enforce data quality requirements and validate the data pro-
cessed. By providing transparency and verifiability in the data used by ML
pipelines, the blockchain technology helps detect and correct any data tam-
pering or manipulation, thus improving trust and reliability in ML results and
contributing to the overall security and privacy of 5G+ networks. Addition-
ally, Palma et al. enhanced the security and trustworthiness of 5G+ networks
by integrating Manufacturer Usage Description (MUD) and Trust and Repu-
tation Manager (TRM) into the INSPIRE-5GPlus framework. [63]. MUD is
a standard that provides access control by specifying the type of access and
network functionalities available for different devices in the infrastructure. It
helps to configure monitoring tools and learn about the normal behavior of
devices, which enables the identification of abnormal events on the 5G infras-
tructure. TRM assesses trust in the infrastructure using multiple values and
enables MUD security requirements to be enforced in a trustworthy manner.
The integration of MUD and TRM into INSPIRE-5GPlus enhances security
and trust by enforcing security properties and continuously auditing the in-
frastructure and security metrics to compute trust and reputation values.
These values are used to enhance the trustworthiness of zero-touch decision-
making, such as the ones orchestrating E2E security in a closed-loop. Future
research aims to develop trust tools at each domain level to create a com-
prehensive trust framework. Another study by Niboucha et al. incorporated
a zero-touch security management solution tackling the problem of in-slice
DDoS attacks in mMTC network slices of 5G [64]. The proposed solution em-
ploys a closed-control loop that monitors and detects any abnormal behavior
of MTC devices, and in the event of an attack, it automatically disconnects
and blocks the compromised devices. This was achieved by following 3GPP
traffic models and training a ML model using gradient boosting to identify
normal and abnormal traffic patterns. The detection algorithm then calcu-
lates the detection rate for each device, and the decision engine takes the
necessary steps to mitigate the attack by severing the connection between
the devices and the network, thereby safeguarding against any potential re-
occurrence of similar attacks. Results show its effectiveness in detecting and
mitigating DDoS attacks efficiently. Future lines of research will focus on
utilizing online learning techniques in the event of encountering new forms
of attacks.
In regards to the ZSM paradigm, the use of ML models in the architec-
59
ture raises privacy and resource limitations concerns. In turn, Jayasinghe et
al. presented a multi-stage federated learning-based model that incorporates
ZSM architecture for network automation [65]. The proposed model is a hi-
erarchical anomaly detection mechanism consisting of two stages of network
traffic analysis, each with an federated learning-based detector to remove
identified anomalies. The complexity and size of the detector’s database
vary depending on the stage. The authors simulate the proposed system
using the UNSW-NB 15 network dataset and demonstrate its accuracy by
varying the anomaly percentage in both stages. The results show that the
model reaches a minimum accuracy of 93.6%. Future work aims to increase
the accuracy of the model and apply it in a security analytics framework in
the ZSM security architecture. To summarize, the key findings and future
directions from the schemes analyzed in this section are presented in Table
11.
Table 11: Network Trust Management Schemes
MUD and Integrate MUD and TRM The proposed Incorporate trust
TRM into INSPIRE-5GPlus. scheme enforces tools at each domain
Integration [63] security properties. level to create a
comprehensive trust
framework.
Zero-touch • Monitor and detect Detect and mitigate Apply online learn-
Security abnormal mMTC device DDoS attacks effi- ing techniques to
Management behavior. ciently. tackle new forms of
Solution for • Automatically disconnect attacks.
In-slice DDoS and block compromised
Attacks [64] devices during an attack.
60
9. Network Automation Solutions
The evolution of networking technology has triggered a fundamental shift
in network management, owing to the growing scale and complexity of net-
works, making traditional manual configuration and management techniques
inefficient and error-prone. In this regard, network automation solutions,
such as the ZSM framework, offer a revolutionary approach to network man-
agement, leveraging cutting-edge technologies such as ML, AI, and SDN to
streamline operations and enhance network performance [7]. These method-
ologies play a pivotal role in enabling automated self-management function-
alities of ZSM, thus leading to an enhancement in service delivery and a
reduction in operating expenses.
At the heart of network automation lies the use of software to automate
tedious and time-consuming tasks like device configuration, policy enforce-
ment, and network monitoring. By exploiting programmable interfaces and
open standards, network automation solutions facilitate seamless interoper-
ability and integration across multi-vendor environments, providing a unified
network view, and enabling fast issue resolution and troubleshooting.
Furthermore, network automation solutions leverage the power of AI and
ML to facilitate self-healing and self-optimizing networks [11]. By monitor-
ing network performance in real-time and detecting anomalies, automation
solutions can trigger automated workflows that mitigate potential issues and
restore normal operation. This capability substantially reduces downtime
and boosts network resiliency, while the identification of under/over-utilized
areas provides efficient resource optimization. Additionally, ML algorithms
enhance security by flagging and mitigating potential threats through traffic
patterns and user behavior analysis, offering a proactive response to security
threats such as malware or cyberattacks.
Nevertheless, the application of AI techniques in automation solutions
presents challenges and risks. Although AI and ML techniques enable cog-
nitive processing in the ZSM system, resulting in complete automation, the
performance of such an implementation is not entirely satisfactory [10]. Net-
work operators expect superior service availability and reliability to minimize
network outages and SLA violations, which could result in significant finan-
cial losses.
To address these challenges, one can leverage the power of DTs and Au-
toML. DTs facilitate the creation of virtual replicas of the physical network
infrastructure, enabling operators to test and validate network configuration
61
changes in a simulated environment before deploying them to the live network
[84]. This reduces the risk of misconfiguration, human errors, and downtime.
Moreover, AutoML algorithms enable the automatic processing of data and
the discovery of optimal configurations and models, reducing the need for
manual intervention [14]. This section highlights significant ML challenges
and proposes solutions to address them.
62
extraction involves selecting and transforming the relevant features of the
data. These tasks require a significant amount of time and expertise, making
them a bottleneck for efficient ML.
DTs can generate realistic, simulated network data, enabling the auto-
mated processing and cleaning of data through AutoML. By automating
these tasks, organizations can reduce the time and resources required for
data preparation, freeing them up for more complex tasks.
63
9.1.5. Dynamic Nature of Wireless Networks
The uncertainty and dynamic nature of wireless networks represent signif-
icant challenges in the context of network automation. Wireless networks are
subject to a range of factors that can impact network performance, includ-
ing interference, noise, and mobility of devices. The ever-changing nature of
these networks makes it difficult to build accurate and reliable models that
can effectively control and predict network behavior [10, 7]. Specifically,
unpredictable changes in data streams can impair the performance of ML
models. This challenge can be particularly problematic in today’s fast-paced
environment, where rapid response to network issues is critical.
To overcome these challenges, DTs can be employed to simulate changes
in the wireless network and generate new training data to retrain ML models.
With AutoML, a new model can be trained and selected automatically to
replace the outdated one, without the need for manual intervention. This ap-
proach ensures that the ML model remains effective and up-to-date, despite
the non-stationary environment.
64
able to simulate a variety of scenarios and test different conditions to analyze
performance and behavior. The idea gained even more traction when it
proved vital in resolving technical problems during the infamous Apollo 13
mission. Fast-forward to the early 2000s, Michael Grieves introduced the
concept of DTs for the manufacturing industry, creating virtual replicas of
factories [91, 92]. These DTs serve as an impeccable tool for monitoring
processes, predicting failures, and increasing productivity, forever changing
the landscape of industrial innovation.
The adoption of DTs represents a remarkable leap forward for industries
seeking to thrive in the digital era. Real-time monitoring, control, and data
acquisition are just some of the advantages that DTs bring to the table
[86, 93]. These sophisticated tools enable remote access, ensuring business
continuity and increasing overall efficiency. With DTs, decision-making is
based on highly-informed predictions that consider both the present and the
future, and the risks associated with each course of action can be assessed
and mitigated in real-time. By testing and optimizing solutions in a virtual
environment, DTs increase overall system efficiency and minimize potential
disruptions. A DT allows virtual testing of various solutions to perform what-
if analysis to evaluate these solutions without affecting the physical system
[94]. In addition, all data is easily accessible in one platform, allowing for
faster and more efficient business decisions by data analytics tools.
A DT network is composed of three pillars, namely physical, digital/vir-
tual, and connection pillars, as illustrated in Figure 3 [89]. The physical pillar
represents the physical asset, the virtual/digital pillar represents the DT, and
the connection pillar allows for the exchange of data and control commands
among them. The system’s modularity enables the system to evolve as the
technology on each component evolves. A DT can be highly modular, which
allows for the rapid reproduction of processes and knowledge transfer [89, 94].
In addition, the modularity of a DT allows for creating hybrid simulation and
prototyping systems, which can accelerate the design process.
65
Figure 3: Sample Architecture of a DT network for a MEC Network
66
such as ML.
Additionally, the development and deployment of DTs require the use
of a range of other technologies, including sensors, edge computing, and
AI [84]. Sensors are used to capture real-time data from physical systems.
These sensors collect a wide range of data, such as temperature, pressure,
vibration, and more, enabling accurate and comprehensive modeling of the
physical system [86]. Edge computing allows for data processing and analysis
to take place closer to the source of the data. Additionally, ML algorithms
enable the DT network to learn and adapt to new operating conditions and
provide automated and proactive responses to system events.
DTs can be used in various applications, including manufacturing, trans-
portation, and healthcare [86]. For instance, in manufacturing, DT networks
can be used to simulate the behavior of a production line, predict machine
failure, and optimize production efficiency. In transportation, DT networks
can be used to simulate traffic patterns, optimize traffic flow, and predict road
accidents. In healthcare, DT networks can be used to simulate the behav-
ior of the human body, predict disease progression, and optimize treatment
plans.
Nevertheless, the high demand for throughput, reliability, resilience, and
low latency required by DT technology goes beyond what is currently of-
fered by 5G [89]. Although DT technology already exists in some industrial
applications supported by 5G or even 4G, it has not been widely adopted
in other sectors, and has not reached its full potential. Therefore, 6G can
be considered an enabler for the massive adoption of DTs, particularly in
high-connectivity-demanding and rapidly emerging applications of aerospace,
Industry 4.0, and healthcare.
67
tuning and model selection [97]. With AutoML, these processes can be per-
formed in a fraction of the time it would take to do them manually, freeing
up valuable time and resources for more complex and strategic work.
AutoML is not just a productivity tool, but a solution that democratizes
access to the power of ML. It makes the process of building and deploying
ML models accessible to a broader audience, without requiring extensive ex-
pertise in the field. This capability has the potential to revolutionize the way
businesses operate, giving them the ability to leverage data-driven insights
for decision-making and product development.
At its core, AutoML is a comprehensive and integrated solution that
covers the entire ML pipeline, from data preprocessing to model updating, as
illustrated in Figure 4 [14]. This pipeline operates on search and optimization
algorithms to identify the optimal model and corresponding hyperparameters
for a given problem.
68
(i) Data Transformation: Data transformation involves converting between
numerical and categorical features. In real-world applications, data is
often represented as strings, requiring encoding to make it machine-
readable. Techniques such as label encoding and one-hot encoding
are used to assign values or columns to categorical features, but lack
meaningful information. Target encoding, on the other hand, replaces
categorical values with meaningful values, such as the median or mean
of that variable, creating better features for ML models [98].
69
number of samples in majority classes (under-sampling) or increasing
the number of samples in minority classes (over-sampling) are essential
to address this class imbalance issue [98, 102].
70
Feature extraction (e.g., PCA) employs mapping functions to reduce di-
mensionality [97]. It alters the original features to extract more informative
features that can replace the original features. Although not mandatory in
the feature engineering process, feature extraction can be useful when the
produced feature set is high dimensional or underperforming.
As such, automated feature engineering can be seen as a dynamic and
synergistic combination of these three processes to optimize the feature set
for ML models.
71
Table 12: An Overview of Common Optimization Techniques [105, 98, 107, 108, 105, 109,
106]
72
the model but are not learned from the data [14, 105]. AutoML tools use a
range of search and optimization techniques to search for the most effective
combination of hyperparameters for the model. [110]
With the model architecture and hyperparameters optimized, AutoML
tools initiate the training process, whereby the model is trained on the pro-
vided dataset. This involves utilizing the optimization algorithm to minimize
the objective function, which typically measures the difference between the
predicted output and the true output. Advanced optimization techniques,
such as stochastic gradient descent and Adam, are utilized to speed up the
training process and improve the accuracy of the model [14, 97]. Addition-
ally, regularization techniques such as L1 and L2 regularization are used to
prevent overfitting, and ensemble methods such as bagging and boosting can
be employed to improve model robustness and performance [97].
The resulting trained model is then evaluated on a validation dataset
to estimate its generalization performance. Evaluation metrics, such as ac-
curacy, precision, recall, and F1 score, are computed to compare the per-
formance of different models and hyperparameters and identify the best-
performing model [97]. The best model is then selected based on its ability
to generalize well to unseen data and meet the desired performance criteria.
73
the empirical distribution of a feature or target variable over time with a
reference distribution, such as the distribution at the start of the data stream
[113]. A significant change in the distribution suggests the presence of model
drift.
Performance-based methods, on the other hand, assume that model drift
leads to a decline in model performance. Statistical tests, window-based
approaches, and ensemble-based approaches are common techniques for de-
tecting changes in model performance [114, 14]. Statistical tests (e.g., Fried-
man test) compare the current model’s performance with a reference dataset.
Window-based approaches track model performance over a rolling window
of data and compare it to the previous window. Ensemble-based approaches
compare the current model’s performance with an ensemble of previous mod-
els trained on different time intervals.
Once model drift is detected, the AutoML pipeline can adapt the model
to the new data distribution. This is done through a process called model
adaptation, which involves updating the model to reflect the changes in the
data [115, 116]. There are several techniques for model adaptation, including:
• Retraining the model on the new data to ensure that it remains accurate
and effective [14]
74
dataset, AutoML framework, 5G system architecture, and present insight-
ful results and analysis. These include a comparison with traditional ML
approaches, a complexity-accuracy trade-off analysis, and periodic AutoML
model drift monitoring. This study demonstrates the potential of AutoML
in optimizing application throughput prediction for dynamic networks.
75
With the deployment of new technologies such as network slicing and
edge computing in 5G+ networks, predicting application throughput be-
comes even more critical. Network slicing allows operators to create cus-
tomized virtual networks to support specific services and applications, and
predicting application throughput is essential to ensure that the resources
allocated to each slice are sufficient to meet the performance requirements
of the applications running on that slice. Edge computing reduces latency
and improves application performance, and predicting application through-
put in such an environment can help operators decide where to deploy their
computing resources to achieve optimal performance.
Real-time monitoring and adjustments to network resources are necessary
to meet a diverse set of KPIs. ZSM is one promising solution for automat-
ing the process of predicting application throughput and scaling network
resources accordingly to ensure a seamless user experience. ZSM proactively
detects and resolves potential network issues before they impact service qual-
ity by predicting and optimizing network performance metrics such as down-
load rate at the application layer. By leveraging AutoML algorithms to gen-
erate up-to-date predictive models, ZSM can autonomously adapt to changes
in traffic patterns and automate the optimization and management of net-
work services, resulting in improved service quality and increased operational
efficiency. This is crucial in a dynamic and fast-changing environment like
5G+ networks.
In this study, we utilize an open-source production dataset to predict
application throughput autonomously using AutoML. To demonstrate the
effectiveness of AutoML in generating up-to-date predictive models, we will
simulate a real-world scenario where the application throughput experiences
a sudden and significant change due to traffic congestion. This scenario is
inspired by the increasing usage of clouds to upload and access one’s files.
Model drift is not restricted to changes in user behavior. It can also occur
due to changes in the application infrastructure. For instance, deploying
additional servers to handle increased traffic will result in an increase in
application throughput, which in turn can lead to a decrease in the model’s
predictive accuracy. In order to mitigate the effects of model drift, the model
is monitored and adjusted based on the new data that reflects the current
state of the application infrastructure.
Overall, our use of AutoML and the open-source production dataset will
allow us to accurately predict application throughput and respond to changes
in network conditions in real-time. By continuously monitoring the network
76
and training the model on the latest data, we can ensure that our predictive
model stays up-to-date to accurately reflect the current state of the network,
even in the face of rapidly changing conditions.
77
Table 13: Production Dataset Metrics [122, 123]
Feature Description
Longitude
GPS coordinates of the device
Latitude
State State of the download process (’I’ for idle & ’D’ for
downloading)
CellID, CellIDhex, CellIDraw ID of serving cell for mobile along with its hexadecimal and raw
formats
Signal-to-Noise Ratio Difference between the received wireless signal and the noise
floor (dB)
Received Signal Strength Measurement of the power present in a received radio signal
Indicator (RSSI) (dBm)
Reference Signal Received Linear average of power for resource elements carrying cell-
Power (RSRP) specific reference signals (dBm)
NRxRSRP
RSRP and RSRQ values for the neighboring cell
NRxRSRQ
78
• Comprehensive Metrics: The dataset includes diverse cellular KPIs
obtained from G-NetTrack Pro. These metrics cover various network
aspects, enabling a comprehensive analysis of network performance.
79
(a) Distribution of Application Throughput on Saturday, 2019-12-14
80
(a) Distribution of Application Throughput on Thursday, 2020-01-16
81
values for downloading a file in the early morning hours (specifically, at 7:00
a.m. and 8:00 a.m.) are concentrated within the first bin of the histogram.
Such an observation suggests that the distribution of throughput is heavily
skewed towards low values, characterized by a download rate of 50 Mbps
or less. Such a phenomenon could be a result of lower network traffic or
reduced user activity during this particular time window. A similar pattern
is observed in Figure 6(a) for the early morning hours, 7:00 a.m. and 9:00
a.m., of January 16, 2020.
At present, we have yet to detect any data drift, which indicates a consis-
tent pattern of application usage given the same network operator. Nonethe-
less, we establish a baseline by building a model and assessing its performance
on a testing set. Even in the absence of detected model drift, continuous mon-
itoring of the model is vital to detect any potential drops in performance due
to changes in user behavior and adapt accordingly.
Consider, for instance, a renowned music streaming platform that allows
users to download songs and albums for offline listening. As the platform ex-
pands to incorporate fresh artists and genres, users may alter their download
behavior, transitioning to larger file downloads, such as complete albums or
extended playlists. This trend becomes more pronounced during the release
of a highly anticipated album. Such a shift in user behavior may cause a
corresponding impact on the application’s throughput, resulting in longer
wait times and slower download speeds for users.
This change may impair the model’s ability to accurately predict the ap-
plication throughput and adjust the download speed to user behavior (e.g.,
by allocating network resources accordingly). If the model is not updated
to reflect the new user behavior, it may overestimate or underestimate the
required throughput, leading to sub-optimal download speeds and user ex-
perience.
To evaluate the model’s ability to adapt to changes in the underlying
data distribution, we will simulate the aforementioned scenario by introduc-
ing model drift, specifically data drift. Through periodic evaluation on the
incoming data, we will track the model’s performance and determine whether
it can maintain high performance levels in the presence of data drift.
82
Figure 7: Respective AutoML Framework for Case Study
83
a time. The decoder leverages the encoder’s context vector and the previ-
ously generated output elements as context to predict the subsequent output
element in the sequence. The Seq2Seq models are equipped with LSTM
units in both the encoder and decoder components to further enhance the
model’s performance. A sample encoder-decoder architecture is illustrated in
Figure 8. The encoder component incorporates two LSTM layers, with the
second output serving as the context vector. This context vector, along with
the current time-step, is then passed to the decoder. The decoder, likewise,
comprises two LSTM layers. The final output from the decoder represents
the subsequent time-step, which is then fed back into the decoder to predict
the succeeding time-step. This iterative process continues until the entire
sequence is predicted. Both the architecture selection and hyperparameter
tuning phases are conducted concurrently via Bayesian optimization. The
model architecture is optimized in terms of the number of hidden dense and
recurrent layers in the encoder and decoder, as well as the number of dense
and LSTM units. The hyperparameters that are tuned are the learning rate
of the Adam optimizer and the dropout rate.
The final step in the pipeline involves monitoring the model’s performance
over time to detect any potential data drift. We utilize a performance-based
approach, specifically a window-based method, where we check the model’s
performance at regular intervals (e.g., every ten minutes with a window size
of 600 seconds). This allows us to detect any significant changes in the
data distribution, which can cause the model’s performance to deteriorate.
Data drift is detected if the performance falls below a dynamic threshold,
which is a certain percentage of the current score. The choice of window size
and threshold percentage involves a trade-off between accuracy and com-
putational complexity. The more strict the values, the more accurate the
model is, but the more computationally complex it becomes. In case of data
drift, we update the model’s weights incrementally to adapt to the new data
distribution.
84
Figure 8: Sample Encoder-Decoder Architecture
85
Figure 9: 5G System Architecture
86
appropriate network slice for the UE and manage the resources within the
network slice, including resource allocation and orchestration.
Finally, the AMF and UPF coordinate resource allocation to guarantee
that each UE receives the requested QoS for the selected network slice. In
this way, the AES entities and the ZSM framework work together to collect
data, generate predictions, and optimize resource allocation to ensure an
optimal user experience for this specific application on the network.
87
then generate the output sequence one step at a time, taking the previous
prediction as input for the next step.
Default parameters are used for both traditional ML models, and we set
the forecast horizon to 60 seconds, a parameter that can be adjusted based
on application requirements. Our results, presented in Table 14, demonstrate
that AutoML with Seq2Seq outperforms both basic LSTM and encoder-
decoder models in terms of MAE and MAPE for all four network traffic
datasets. To illustrate, in the case of file downloading in a 5G network, Up-
grading from the LSTM model to the encoder-decoder model reduces the loss
from 6.52 % to 4.76 % for MAPE and from around 0.17 to 0.11 for MAE.
This trend was observed consistently across all the other datasets. This is
because AutoML models, such as those based on NAS, can lead to better
accuracy and lower error rates compared to manually designed models, due
to their ability to automatically search and optimize the model architecture
and hyperparameters for a given task.
Furthermore, our analysis revealed that the basic encoder-decoder model
performed better than the LSTM model. This is due to the fact that encoder-
decoder models are specifically designed for sequence-to-sequence learning
tasks, making them more suitable for the application throughput prediction
task in our study. The encoder-decoder model’s superiority is because it
is a more advanced form of LSTM that can handle both input and output
sequences, allowing it to capture more information and produce better pre-
dictions.
Overall, our findings emphasize the importance of selecting the appro-
priate model architecture for application throughput prediction and demon-
strate the superiority of AutoML models in selecting the optimal model.
Using interpretable performance metrics such as MAE and MAPE enables
network operators to easily validate the models and make informed decisions,
enhancing overall network performance.
88
Table 14: MAPE & MAE: Comparison of Three Models on 4G and 5G Datasets
4G 5G
Model Metric
Video File Video File
Stream Download Stream Download
the past sequence from 2 to 5 minutes and observe the corresponding changes
in the prediction accuracy. Our results, as illustrated in Figure 10, show
that utilizing fewer past timesteps leads to a higher MAE, indicating worse
predictive performance. Conversely, employing more past timesteps results in
a lower MAE but is computationally complex in terms of time and resources.
As a longer sequence is utilized, the computational time required to train
the model also increases, resulting in higher resource utilization. Thus, there
exists a trade-off between the prediction accuracy and complexity, and a
balance must be struck to optimize the model’s performance.
Furthermore, we fix the look-back to 5 minutes and explore the impact of
utilizing different future sequences of varying lengths, namely 5, 7, 10, 15, and
20 minutes, on the prediction accuracy. Our results, as illustrated in Figure
11, show that predicting longer sequences leads to a higher MAE, indicating
that more past information is needed to provide better future knowledge.
However, increasing the past sequences results in a higher computational
complexity. This finding confirms the trade-off between the prediction accu-
racy and complexity that we previously observed.
Overall, the AutoML pipeline can be effectively utilized to predict the
application throughput in the 5G network. However, the prediction accuracy
is highly dependent on the number of past and future timesteps utilized
and the associated computational complexity. Careful consideration must be
given while selecting the appropriate sequence lengths to achieve the desired
prediction accuracy while optimizing the computational resources utilized.
89
Figure 10: MAE among Varying Past Sequences
90
10.6.3. Periodic AutoML Model Drift Monitoring
This section covers the final step of the AutoML pipeline, which involves
model updating. We will focus again on the 5G dataset for file downloading.
In this step, we set the forecast horizon to 5 minutes, with resulted in an MAE
of 0.0213. To detect model drift, we monitor the ML model periodically every
10 minutes. If the MAE exceeds 0.02556, which corresponds to 20% of the
baseline MAE, model drift is detected, and the model weights are adjusted.
It is important to note that the threshold percentage can be adjusted
accordingly and is just a parameter. Figure 12 illustrates the timeline of
the model monitoring process. For the first 10 minutes (0 ≤ t < 10mins),
MAE falls below the threshold. At 10 minutes, the model is checked, and no
model drift is detected. Between 10 and 14 minutes, MAE still falls below
the threshold. At this point in the process, a segment of the data is randomly
sampled and intentionally manipulated to mimic the occurrence of data drift.
Figure 12: Periodic AutoML Monitoring for Drift Detection and Adaptation
At 14 minutes, the MAE surpasses the threshold, but the model isn’t
checked yet. At t = 20mins, the model is checked, and model drift is de-
tected. The weights are updated accordingly, and the MAE falls back to
0.0236. The threshold is also updated to 0.02832, which is 20% of the new
91
MAE, to account for the new data distribution. After minute 20, the MAE
doesn’t exceed the threshold.
It is essential to note that model drift did occur before it is detected, at
t = 14mins, due to the periodic nature of monitoring. Decreasing the mon-
itoring period would have led to earlier detection. However, decreasing the
period means checking more often, which may be computationally exhaus-
tive. Therefore, there is a trade-off between the periodic interval and model
accuracy.
Ultimately, the model updating step of the AutoML pipeline plays a
crucial role in ensuring the model’s accuracy over time. By monitoring the
model for drift and updating its weights accordingly, we can ensure that the
model remains relevant in the face of changing data distributions. However,
determining the appropriate monitoring interval is essential to balance the
trade-off between model accuracy and computational resources.
92
Table 15: Challenges & Future Directions
Scalability • Large datasets and the need for • Employ parallel and
extensive model training pose distributed algorithms.
scalability challenges for AutoML. • Efficiently sample and
• Data processing and model partition data.
generation times impact network • Dynamically adapt
performance, causing latency issues hyperparameters.
for users.
93
essentially important in services like remote surgery, where network manage-
ment decisions can have a significant impact on human lives [8]. XAI allows
networking experts to understand the input that drove the decisions made by
ML models and approve their actions following a human-in-the-loop model.
However, making XAI a reality in the ZSM paradigm requires an intel-
ligible ML model in addition to specific metrics to measure the level of AI
explainability [8]. Techniques to generate an intelligible ML model include
using inherently interpretable ML models (e.g., random forests) or statis-
tical procedures to describe the features on which a prediction was based.
As for the metrics, there are human-grounded evaluations and functionality-
grounded evaluations [128]. The former assesses the qualitative aspects of
the resulting explanations, such as their ability to assist humans in complet-
ing tasks and the impact of such decisions on the system. The latter relies on
formal definitions and quantitative methods to verify data-driven decisions,
such as service migration.
94
real-time data analysis, network resource optimization, and coordination of
various network functions. ML algorithms, in particular, demand a high
level of computational resources to operate efficiently, which can conflict
with the needs of ZSM networks, where computation efficiency is as essential
as communication performance. In NGNs, the high latency associated with
complex operations is incompatible with time-sensitive services, making ML
algorithm optimization a key factor. Accordingly, it is essential to develop
optimization techniques to minimize the complexity of these models without
jeopardizing accuracy. To this end, implementing hardware-based strate-
gies, such as FPGA-based acceleration and GPU processing, can reduce the
computational complexity of ZSM [3].
11.2.1. Interpretability
AutoML solutions are often seen as black boxes, which makes it difficult
for users and experts to fully understand how they work and the rationale
behind their solutions. However, interpretability is essential for building trust
and ensuring ethical considerations, especially in highly regulated domains,
such as the networking domain.
The lack of interpretability in AutoML models can lead to difficulties
in deploying and using these models in ZSM services. For example, it may
be difficult to diagnose and correct biases in the model, or to identify the
root cause of unexpected behaviors. Additionally, the lack of interpretability
can make it difficult to validate the model’s accuracy, which is critical for
ensuring the reliability and performance of the network. Therefore, the de-
velopment of transparent AutoML systems with mechanisms for explaining
and understanding their decisions is necessary. Unfortunately, many cur-
rent AutoML approaches prioritize accuracy over interpretability, resulting
in complex models that are difficult to comprehend.
95
To address this challenge, interpretable models leveraging XAI paradigms,
such as Shapley additive explanations, local interpretable model-agnostic ex-
planations, RuleFit, and partial dependence plots, can be used to increase
transparency and credibility [132]. Additionally, constraints such as sparsity
(i.e., low number of features), monotonicity, and causality can improve in-
terpretability [133]. Monotonicity guarantees that the relationship between
an input feature and the target outcome always goes in the same direc-
tion, aiding in the understanding of feature-target relationships. Causality
constraints ensure that only causal relationships are identified, promoting
effective interactions between humans and ML systems. Incorporating these
approaches can greatly improve AutoML’s accessibility and the ability of net-
work operators and other stakeholders to understand and trust the decisions
made by AutoML.
11.2.2. Scalability
The growing size of datasets, coupled with the need for an overwhelming
number of model trainings to determine the optimal final learner, present
significant scalability challenges for AutoML. This can be particularly chal-
lenging in ZSM services, where the large volume of network-generated data
requires fast processing to meet network demands. As model complexity in-
creases, so do computational requirements, making it challenging to deploy
models in resource-constrained environments. Additionally, network perfor-
mance can be impacted due to the time required to process data and generate
models, leading to latency and responsiveness issues that affect the user ex-
perience.
To address this issue, future research can focus on developing parallel
and distributed AutoML algorithms that can harness modern hardware such
as graphic processing units. Additionally, techniques that can more effi-
ciently sample the data, leverage data partitioning, or dynamically adapt
the algorithm’s hyperparameters can significantly reduce the computational
overhead. Such strategies will ensure that AutoML remains a powerful tool
for ML, irrespective of the dataset’s size.
11.2.3. Robustness
AutoML, particularly NAS, has shown remarkable performance on well-
labeled datasets such as ImageNet [134]. However, real-world datasets in-
evitably contain noise and adversarial examples, which can significantly un-
dermine the performance of AutoML models [97]. Adversarial attacks can
96
be specifically designed to fool the model, compromising its performance.
The deployment of non-robust models in ZSM systems can lead to un-
reliable network performance, which can have a significant impact on the
overall user experience. Additionally, the lack of robustness can result in
potential security breaches. Adversarial attacks can be used to exploit vul-
nerabilities in non-robust models, allowing attackers to gain unauthorized
access to the network or manipulate network behavior. This can have seri-
ous consequences, such as compromising the confidentiality and integrity of
user data, disrupting network operations, and causing financial losses. There-
fore, ensuring the robustness of AutoML models is critical to their successful
application and safe deployment in ZSM systems.
AutoML systems can improve their robustness to adversarial attacks by
incorporating techniques such as adversarial training and defensive distilla-
tion in their pipelines. Adversarial training can enhance robustness by train-
ing the model with a combination of clean and adversarial data [135]. This
exposes the model to a range of adversarial attacks during training, making it
more robust to such attacks at inference time. On the other hand, defensive
distillation is a technique that distills knowledge from a large robust model
(teacher model) into a smaller target model (student model) [136]. This en-
ables the robustness of the teacher model to be transferred to the student
model through knowledge distillation, yielding a more robust student model.
11.2.4. Cold-Start
AutoML systems may undergo a cold-start, where the search process
starts with a sub-optimal model or a bad configuration, resulting in inefficient
resource usage and prolonged search times [137]. This can be particularly
problematic in ZSM services, where processing time is crucial, and delays can
have a significant impact on the user experience. One possible technique to
warm-start the search process is meta-learning. By leveraging prior knowl-
edge from similar datasets, meta-learning can initialize the search process
with a promising configuration obtained from that previous knowledge. Ac-
cordingly, the search space is reduced, and the search process is accelerated
[138].
Furthermore, incorporating domain-specific knowledge can design a more
efficient search space, improving the initialization of the search process. For
example, in image classification tasks, domain-specific knowledge can be uti-
lized to define a search space that contains CNNs with specific architectures.
Transfer learning can also aid in warm-starting the search process by
97
providing a good initialization of the model’s weights, reducing the time
required for model training. Through transfer learning, AutoML systems
can leverage knowledge learned from related domains (pre-trained model) to
improve the performance of the target model.
12. Conclusion
Next-Generation Networks (NGNs) have unleashed a remarkable shift in
the telecommunications industry, opening up an array of possibilities for ap-
plications and service areas with diverse needs. While NGNs hold tremendous
promise to fulfill the demanding requirements of future use cases, they must
be devised as highly-adaptable infrastructures using cutting-edge technolo-
gies, such as software defined networking, network function virtualization,
and network slicing.
Nevertheless, as networks grow more complex, traditional manual ap-
proaches for network management become less efficient. Consequently, Zero-
touch network and Service Management (ZSM) has emerged as a fully au-
tomated management solution designed to introduce intelligence into mobile
networks for the purpose of automation and optimization. As explored in
this survey, ZSM has the potential to optimize network resources, boost en-
ergy efficiency, enhance security, and manage traffic in NGNs. However, it
also confronts significant ML challenges, such as the need for effective model
selection and hyperparameter tuning. The paper explores viable network
automation solutions, specifically Automated Machine Learning (AutoML)
and digital twins.
AutoML is one solution to these issues by automating the ML pipeline
within ZSM itself and thus increasing its efficiency. This paper thoroughly
analyzes the AutoML pipeline, providing insights into the techniques utilized
at each step. The practical application of AutoML is demonstrated through
a case study that predicts application throughput for 4G and 5G networks
using an online AutoML pipeline. Simulation results prove the superiority of
AutoML over ML approaches. By leveraging AutoML algorithms to generate
up-to-date predictive models, ZSM can adapt to changing traffic patterns.
This facilitates the automation of network service management, leading to
improved service quality and enhanced operational efficiency.
While ZSM has shown promise across diverse domains, much work re-
mains to be done to refine and incorporate this framework. Nonetheless, the
98
potential for NGNs to revolutionize the way we live, work, and communi-
cate remains as high as ever. ZSM and AutoML will play a pivotal role in
realizing this potential.
References
[1] F. Rancy, Imt for 2020 and beyond, 5G Outlook-Innovations and Ap-
plications (2016) 69.
[2] A. A. Barakabitze, A. Ahmad, R. Mijumbi, A. Hines, 5g net-
work slicing using sdn and nfv: A survey of taxonomy, architec-
tures and future challenges, Computer Networks 167 (2020) 106984.
doi:https://doi.org/10.1016/j.comnet.2019.106984.
[3] C. Benzaid, T. Taleb, Ai-driven zero touch network and service man-
agement in 5g and beyond: Challenges and research directions, IEEE
Network 34 (2) (2020) 186–194. doi:10.1109/MNET.001.1900252.
[4] D. Tennenhouse, J. Smith, W. Sincoskie, D. Wetherall, G. Minden,
A survey of active network research, IEEE Communications Magazine
35 (1) (1997) 80–86. doi:10.1109/35.568214.
[5] L. Jorguseski, A. Pais, F. Gunnarsson, A. Centonza, C. Will-
cock, Self-organizing networks in 3gpp: standardization and fu-
ture trends, IEEE Communications Magazine 52 (12) (2014) 28–34.
doi:10.1109/MCOM.2014.6979983.
[6] M. A. Khan, S. Peters, D. Sahinel, F. D. Pozo-Pardo, X.-T. Dang,
Understanding autonomic network management: A look into the past,
a solution for the future, Computer Communications 122 (2018) 93–
117. doi:https://doi.org/10.1016/j.comcom.2018.01.014.
[7] J. Gallego-Madrid, R. Sanchez-Iborra, P. M. Ruiz, A. F. Skarmeta,
Machine learning-based zero-touch network and service management:
A survey, Digital Communications and Networks 8 (2) (2022) 105–123.
[8] E. Coronado, R. Behravesh, T. Subramanya, A. Fernàndez-Fernàndez,
M. S. Siddiqui, X. Costa-Pérez, R. Riggio, Zero touch management:
A survey of network automation solutions for 5g and 6g networks,
IEEE Communications Surveys & Tutorials 24 (4) (2022) 2535–2578.
doi:10.1109/COMST.2022.3212586.
99
[9] G. Z. ETSI, Zero-touch network and service management (zsm); refer-
ence architecture, Group Specification (GS) ETSI GS ZSM 2.
100
Cloud and Parallel Computing (COMITCon), 2019, pp. 35–39.
doi:10.1109/COMITCon.2019.8862451.
101
[26] A. I. Salameh, M. El Tarhuni, From 5g to 6g - challenges, technologies,
and applications, Future Internet 14 (4).
102
[37] M. Bunyakitanon, X. Vasilakos, R. Nejabati, D. Simeonidou,
End-to-end performance-based autonomous vnf placement with
adopted reinforcement learning, IEEE Transactions on Cogni-
tive Communications and Networking 6 (2) (2020) 534–547.
doi:10.1109/TCCN.2020.2988486.
103
work slices in 6g, IEEE Wireless Communications 29 (1) (2022) 86–93.
doi:10.1109/MWC.009.00366.
104
[52] S. Jaffry, S. F. Hasan, Cellular traffic prediction using recur-
rent neural networks, in: 2020 IEEE 5th International Sympo-
sium on Telecommunication Technologies (ISTT), 2020, pp. 94–98.
doi:10.1109/ISTT50966.2020.9279373.
[56] T.-J. Tan, F.-L. Weng, W.-T. Hu, J.-C. Chen, C.-Y. Hsieh,
A reliable intelligent routing mechanism in 5g core networks, in: Pro-
ceedings of the 26th Annual International Conference on Mobile Com-
puting and Networking, MobiCom ’20, Association for Computing Ma-
chinery, New York, NY, USA, 2020. doi:10.1145/3372224.3418167.
URL https://doi.org/10.1145/3372224.3418167
105
works, IEEE Internet of Things Journal 5 (3) (2018) 1588–1597.
doi:10.1109/JIOT.2017.2788362.
106
[67] F. Debbabi, R. Jmal, L. Chaari, R. L. Aguiar, An overview of inter-slice
& intra-slice resource allocation in b5g telecommunication networks,
IEEE Transactions on Network and Service Management (2022) 1–
13doi:10.1109/TNSM.2022.3189925.
[70] H. Huang, S. Guo, Proactive failure recovery for nfv in distributed edge
computing, IEEE Communications Magazine 57 (5) (2019) 131–137.
doi:10.1109/MCOM.2019.1701366.
107
Applied Sciences 10 (14). doi:10.3390/app10144735.
URL https://www.mdpi.com/2076-3417/10/14/4735
108
[86] M. Mashaly, Connecting the twins: A review on digital twin technology
& its networking requirements, Procedia Computer Science 184 (2021)
299–305, the 12th International Conference on Ambient Systems, Net-
works and Technologies (ANT) / The 4th International Conference
on Emerging Data and Industry 4.0 (EDI40) / Affiliated Workshops.
doi:https://doi.org/10.1016/j.procs.2021.03.039.
109
[94] L. Hui, M. Wang, L. Zhang, L. Lu, Y. Cui, Digital twin for network-
ing: A data-driven performance modeling perspective, IEEE Network
(2022) 1–8doi:10.1109/MNET.119.2200080.
[95] M. Perno, L. Hvam, A. Haug, Implementation of digital twins
in the process industry: A systematic literature review of en-
ablers and barriers, Computers in Industry 134 (2022) 103558.
doi:https://doi.org/10.1016/j.compind.2021.103558.
[96] S. K. Karmaker (“Santu”), M. M. Hassan, M. J.
Smith, L. Xu, C. Zhai, K. Veeramachaneni,
Automl to date and beyond: Challenges and opportunities, ACM
Comput. Surv. 54 (8). doi:10.1145/3470918.
URL https://doi.org/10.1145/3470918
[97] X. He, K. Zhao, X. Chu, Automl: A survey of the state-
of-the-art, Knowledge-Based Systems 212 (2021) 106622.
doi:https://doi.org/10.1016/j.knosys.2020.106622.
[98] K. Chauhan, S. Jani, D. Thakkar, R. Dave, J. Bhatia, S. Tanwar,
M. S. Obaidat, Automated machine learning: The new wave of ma-
chine learning, in: 2020 2nd International Conference on Innovative
Mechanisms for Industry Applications (ICIMIA), 2020, pp. 205–212.
doi:10.1109/ICIMIA48430.2020.9074859.
[99] A. Jadhav, D. Pramod, K. Ramanathan, Comparison of per-
formance of data imputation methods for numeric dataset,
Applied Artificial Intelligence 33 (10) (2019) 913–933.
arXiv:https://doi.org/10.1080/08839514.2019.1637138,
doi:10.1080/08839514.2019.1637138.
[100] S. Jäger, A. Allhorn, F. Bießmann, A benchmark for data imputation
methods, Frontiers in Big Data 4. doi:10.3389/fdata.2021.693674.
[101] F. Biessmann, T. Rukat, P. Schmidt, P. Naidu, S. Schelter, A. Tap-
tunov, D. Lange, D. Salinas, Datawig: Missing value imputation for
tables, Journal of Machine Learning Research 20 (175) (2019) 1–6.
[102] L. Yang, A. Moubayed, A. Shami, Mth-ids: A multitiered hybrid intru-
sion detection system for internet of vehicles, IEEE Internet of Things
Journal 9 (1) (2022) 616–632. doi:10.1109/JIOT.2021.3084796.
110
[103] S. G. K. Patro, K. K. Sahu, Normalization: A preprocessing stage
(2015). doi:10.48550/ARXIV.1503.06462.
[104] L. Yang, A. Moubayed, A. Shami, P. Heidari, A. Boukhtouta,
A. Larabi, R. Brunner, S. Preda, D. Migault, Multi-perspective content
delivery networks security framework using optimized unsupervised
anomaly detection, IEEE Transactions on Network and Service Man-
agement 19 (1) (2022) 686–705. doi:10.1109/TNSM.2021.3100308.
[105] A. Alsharef, K. Aggarwal, M. Kumar, A. Mishra, Review of ml and au-
toml solutions to forecast time-series data, Archives of Computational
Methods in Engineering 29 (7) (2022) 5297–5311.
[106] L. Yang, A. Shami, On hyperparameter optimization of machine learn-
ing algorithms: Theory and practice, Neurocomputing 415 (2020) 295–
316. doi:https://doi.org/10.1016/j.neucom.2020.07.061.
[107] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimiza-
tion of machine learning algorithms, Advances in neural information
processing systems 25.
[108] Y. Bengio, Gradient-Based Optimization of Hyperpa-
rameters, Neural Computation 12 (8) (2000) 1889–1900.
doi:10.1162/089976600300015187.
[109] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hy-
perband: A novel bandit-based approach to hyperparameter optimiza-
tion, The Journal of Machine Learning Research 18 (1) (2017) 6765–
6816.
[110] Y. Li, Z. Wang, Y. Xie, B. Ding, K. Zeng, C. Zhang,
Automl: From methodology to application, in: Proceedings of the
30th ACM International Conference on Information & Knowledge Man-
agement, CIKM ’21, Association for Computing Machinery, New York,
NY, USA, 2021, p. 4853–4856. doi:10.1145/3459637.3483279.
URL https://doi.org/10.1145/3459637.3483279
[111] D. M. Manias, I. Shaer, L. Yang, A. Shami, Concept drift
detection in federated networked systems, in: 2021 IEEE
Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
doi:10.1109/GLOBECOM46510.2021.9685083.
111
[112] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under con-
cept drift: A review, IEEE Transactions on Knowledge and Data Engi-
neering 31 (12) (2019) 2346–2363. doi:10.1109/TKDE.2018.2876857.
112
[121] S. Guo, B. Lu, M. Wen, S. Dang, N. Saeed, Customized 5g and beyond
private networks with integrated urllc, embb, mmtc, and positioning for
industrial verticals, IEEE Communications Standards Magazine 6 (1)
(2022) 52–57. doi:10.1109/MCOMSTD.0001.2100041.
113
[130] Y. Siriwardhana, P. Porambage, M. Liyanage, M. Ylianttila,
Ai and 6g security: Opportunities and challenges, in: 2021
Joint European Conference on Networks and Communica-
tions & 6G Summit (EuCNC/6G Summit), 2021, pp. 616–621.
doi:10.1109/EuCNC/6GSummit51104.2021.9482503.
114
[138] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, F. Hut-
ter, Auto-sklearn 2.0: Hands-free automl via meta-learning (2022).
arXiv:2007.04074.
115