0% found this document useful (0 votes)
15 views17 pages

Federated Deep Reinforcement Learning For Predicti

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Federated Deep Reinforcement Learning For Predicti

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

This article has been accepted for publication in IEEE Transactions on Mobile Computing.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

Federated Deep Reinforcement Learning for


Prediction-Based Network Slice Mobility in 6G
Mobile Networks
Zhao Ming, Student Member, IEEE, Hao Yu, Member, IEEE, and Tarik Taleb, Senior Member, IEEE

Abstract—Network slices are generally coupled with services On the other hand, built on top of physical facilities to
and face service continuity/unavailability concerns due to the provide customized services to user equipment (UE), network
high mobility and dynamic requests from users. Network slice slices not only depend on physical facilities to adjust the
mobility (NSM), which considers user mobility, service migration,
and resource allocation from a holistic view, is witnessed as a virtual network topology and to decide on the resource al-
key technology in enabling network slices to respond quickly to location strategies, but also are deeply coupled with services
service degradation. Existing studies on NSM either ignored the and UEs for ensuring service continuity [10], [11]. Under this
trigger detection before NSM decision-making or didn’t consider circumstance, when the slices’ serving UEs move across areas,
the prediction of future system information to improve the NSM the ongoing sessions between each pair of the moving UEs and
performance, and the training of deep reinforcement learning
(DRL) agents also faces challenges with incomplete observations. slices may suffer a degradation in quality of service (QoS) and
To cope with these challenges, we consider that network slices thus induce service continuity concern [12], [13]. Additionally,
migrate periodically and utilize the prediction of system informa- the dynamically changing requests from associated UEs also
tion to assist NSM decision-making. The periodical NSM problem require the slices to quickly adjust their provisioning resources,
is further transformed into a Markov decision process, and we which can lead to resource unavailable issues. These problems
creatively propose a prediction-based federated DRL framework
to solve it. Particularly, the learning processes of the prediction will even aggregate in 6G networks due to the increasing
model and DRL agents are performed in a federated learning number of connected equipment and emerging scenarios with
paradigm. Based on extensive experiments, simulation results high mobility or dynamic resource requests. To tackle these
demonstrate that the proposed scheme outperforms the consid- issues, the authors in [14] considered user mobility, request
ered baseline schemes in improving long-term profit, reducing dynamics, and service continuity simultaneously and proposed
communication overhead, and saving transmission time.
the notion network slice mobility (NSM) to decide the slice
Index Terms—Prediction-based Network Slice Mobility, migration and resource allocation from a holistic perspec-
Incomplete Observation, Deep Reinforcement Learning tive. Specifically, in the considered NSM paradigm, network
slices are assembled from multiple virtual network functions
I. I NTRODUCTION (VNFs), which can monitor the requests/positions of users in
a timely manner and collect system information, including
During the past few years, network technology has de-
available physical facilities resources. Additionally, the slices
veloped rapidly and brought humans into the 6G era with
can manipulate their VNFs to migrate between physical hosts,
much stricter and more heterogeneous network requirements,
scale the allocated resources, and even change the connection
e.g., increased data rates, enhanced network capacity, ultra-
relationships among VNFs. The main NSM triggers were
low latency, and massively connected devices [1]–[3]. To cope
defined and summarized in [15], where the trigger selection
with these challenges, network slicing, which is empowered by
methods were also investigated.
the emerging software-defined networking (SDN) and network
Lots of pioneering studies have investigated NSM-
function virtualization (NFV) technologies, aims at building
related issues including slice trigger classification [14], slice
multiple virtual logically independent networks by construct-
anomaly/trigger detection [15], [16], and VNF/service place-
ing flexible and highly adaptable communication networks.
ment and migration [17], [18], but there still exists several
Network slicing has achieved great progress in supporting
challenges that have not been resolved. Firstly, these studies
specific demands of customers [4], [5], industries [6], and
either focused on the detection of slice anomalies/triggers
emerging applications [7]–[9] and is witnessed as a key
or aimed at slice deployment and migration independently.
technology for supporting specific demands in 6G networks.
However, to ensure service continuity, we maintain that slice
Part of this work was presented at IEEE International Conference on migration should be performed only after a slice anomaly
Communications (ICC), Rome, Italy, May 2023, which is cited as [1]. This occurs. As the anomalies in slices occur intermittently, NSM
work has made a significant extension on system modeling, scheme design, should also be performed intermittently rather than frequently,
and evaluation results. (Corresponding author: Hao Yu).
Zhao Ming is with the Centre for Wireless Communications (CWC), which is rarely considered. Secondly, though DRL-based
University of Oulu, Oulu, 90570 Finland (email: [email protected]). schemes are widely adopted for solving NSM decision-making
Hao Yu is with ICTFicial Oy, 02130, Espoo, Finland (email: problems and have demonstrated their efficiencies, when we
[email protected]).
Tarik Taleb is with the Ruhr University Bochum, Bochum, Germany (email: consider NSM intermittently, the DRL agents can only observe
[email protected]). the system information (including user requests and resource

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

utilization of servers) for one time slot to make decisions, after •We formulate the problem as maximizing the long-
the NSM, the requests and positions of UEs will continuously term system profit and propose a prediction-based FDRL
change until the next anomaly triggers. Under this circum- framework to solve this problem. Specifically, the frame-
stance, the DRL agents face the challenge that they only have work utilizes LSTM for future information prediction and
incomplete observations of the environment, as they cannot DDQN for decision-making, and the learning processes
observe future requests that affect the feedback. Thus, they are of the LSTM model and DRL agents are integrated with
unable to receive stable rewards. Lots of studies also discussed the FL paradigm.
the effect of incomplete observation on unstable reward in • Simulation results demonstrate that the proposed
DRL training [19]–[21]. Finally, the dynamically changing prediction-based NSM scheme outperforms the consid-
user requests and continuously growing BSs in 6G networks ered baseline schemes in improving the long-term system
also put great challenges to the flexibility and scalability of the profit, reducing the communication overhead, and saving
core network and radio access network (RAN), as well as the transmission time.
cost for the MNOs to have dedicated RAN facilities. Moreover, The remainder of this paper is organized as follows. Sec-
centralized algorithms for detecting the NSM anomalies or tion II discusses the related work. Section III introduces the
deciding the slice migration strategies have high computation system model and then formulates the problem. In Section IV,
complexity that increases with the number of slices, which will we propose a prediction-based NSM framework. Simulation
introduce high communication overhead and execution time. results are provided in Section V. Finally, Section VI con-
To cope with these challenges, in this paper, we design a cludes this paper.
general network architecture for NSM in 6G networks, which
considers the decentralized core network to be realized by II. R ELATED W ORK
the VNFs deployed at edge servers (ESs) and the RAN built Lots of researchers have investigated NSM-related tech-
by the VNFs under the Open-RAN paradigm. Afterward, to niques, including slice anomaly/trigger detection and decision-
mimic the intermittent NSM process, we model it periodically making for migration and resource allocation.
without loss of generality, assuming that the anomaly occurs at
the first time slot of each period. To cope with the incomplete
A. Anomaly and Trigger Detection
observation challenge, we creatively integrate the prediction
of user behaviors to improve the NSM decision-making per- The authors in [24] addressed distributed online anomaly
formance inspired by [22], [23]. The problem is formulated detection of network slices based on the decentralized one-
to maximize the long-term system profit of MNOs, and we class support vector machine by analyzing real-time mea-
propose a prediction-based federated DRL (FDRL) framework surements of virtual nodes mapped to physical nodes and
that learns users’ requests and position information to solve correlation of measurements between neighbor virtual nodes.
this problem. Specifically, the framework consists of a long However, this study only focused on one type of anomaly
short-term memory (LSTM) prediction module for prediction in network slices, i.e., the anomalies of physical nodes. It
and a double deep Q-learning (DDQN)-based decision-making didn’t consider the degradation of service introduced by other
module for determining the NSM strategies. Particularly, dur- reasons like unreasonable resource allocation. In this regard, if
ing each period, the DRL agents can only observe the first time a physical node could support most of its slices normally but
slot’s system information and make decisions with the current allocated limited bandwidth resources to a VNF, the unsatisfied
observation and the predictions, after which the overall system service level agreement (SLA) of the corresponding slice
profit of the period is set as the feedback. The training of the will not be able to be detected. Moreover, the anomalies in
LSTM model and DRL agents is performed in a federated the transmission of the RAN, transport network, and core
learning (FL) manner to reduce the communication overhead network, the mobility of UEs, and the dynamically changing
and transmission time of original user data and preserve user demand were neither considered in this study. The authors
users’ privacy. Based on extensive experiments, simulation in [25] proposed a cognitive network and slice management
results demonstrate that our proposed scheme outperforms system, which adopted anomaly detection/prediction to detect
the considered baseline schemes in improving the long-term arising anomalies in the routes taken by ambulances or their
system profit and reducing communication overhead/time. The demanded QoS level. The authors considered AI-based tech-
main contributions of this paper are summarized as follows: niques to detect and predict anomalies and the cooperation
among AI models. Besides, Wang et al. proposed a transfer
• We design a general NSM network architecture that learning-based hidden Markov model to detect abnormal net-
considers a decentralized core network to be realized by work slices affected by anomalies in shared physical nodes,
the VNFs deployed at ESs and the RAN built under the which utilized the similarity between physical nodes to speed
Open-RAN paradigm to improve the system flexibilities up the convergence and achieved high detection accuracy [16].
and scalabilities. The authors in [15] investigated slice mobility trigger selection
• To cope with the dynamic user requests and unstable by a DRL-based method, in which the DRL agent decides
feedback caused by incomplete observation of agents, we where to migrate at each time slot and gives the trigger
consider NSM from a periodical perspective and utilize when the chosen target host is not the current host. However,
future information prediction for DRL agent training to these studies neglected the dynamic changes in user resource
improve the NSM decision-making performance. demands, and frequent migration of slices would result in the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

degradation of service continuity. Therefore, applying these problem. For instance, the authors in [19] discussed the effect
studies in real-world networks would be difficult. of noise on the observation of agents and tried to extract real
and complete observations from the original ones. In [20] and
[21], the authors studied the problem of unstable feedback
B. Slice Migration and Resource Allocation
of agents when dealing with highly dynamic environments. In
On the other hand, in [18], the authors proposed a addition, the combination of future information prediction and
prediction-assisted VNF placement and link allocation frame- DRL has also been extensively studied [32]. Specifically, the
work to minimize energy consumption and total cost under authors investigated state prediction for agents’ pre-training
the premise of meeting the network QoS. In [26], the authors and allocating resources in advance [33], [34]. Besides, authors
proposed an algorithm based on the belief-state partially in [35] and [36] studied reward prediction and proposed using
observable Markov decision process (MDP) to provide parti- prediction to control agents’ future actions. Recently, the
tions/slices with energy and computational resources. These authors in [22] and [23] proposed to cope with the partially
methods considered dynamic user requests and can avoid observable MDP problem by embedding the predictions to the
wasting resources [27]. Besides, to improve resource utiliza- state space, which achieved good performances for agents in
tion efficiencies, decision-making strategies have been widely real-time interactions with the environment. Inspired by this
studied, and some researchers also proposed game theory- concept, in our periodical NSM process, the unobservable
based solutions to cope with the conflicts of resource allocation future system information can be predicted to embed with the
of different servers [26], [28]. Moreover, in [18], the authors state space to achieve more stable feedback for agents.
proposed an online approach to dynamically determine the Overall, the slice anomaly detection and resource allocation
slice placement policies and decide on the VNF numbers strategies are still worth investigating, the NSM problem
and resource allocation strategies. The prediction of network is still unexplored well, especially in the periodical NSM
demand was considered to estimate the traffic rate demand process modeling, prediction-based decision-making, and the
in advance. Additionally, the authors in [29] proposed to integration of distributed learning paradigms.
minimize the total latency of the computing tasks with energy
constraints by leveraging the combination of non-orthogonal III. S YSTEM M ODEL AND P ROBLEM F ORMULATION
multiple access technique and edge computing. However, the
resource allocation policies were determined by centralized In this section, we introduce the system model and formu-
heuristic or near-optimal solutions, which may face scalability late the optimization problem. The main notations used in this
and robustness issues [30]. Moreover, distributed learning paper are summarized in Table I.
paradigms and their cooperation with networks were not
considered in these studies. A. System Model
Meanwhile, Chergui et al. proposed a statistical FL-based We consider a general network architecture for prediction-
framework that performed slice-level resource prediction to based NSM in 6G networks, which consist of the ground
minimize energy consumption, where the non-convex FL network, the space and air network, and the remote cloud,
problem was solved by a proxy-Lagrangian strategy [31]. Fur- as shown in Fig. 1. In the ground network, several small
thermore, the authors in [17] introduced a knowledge plane- base stations (BSs) and macro BSs are geographically evenly
based management and orchestration framework that invoked distributed, each of which includes an ES with limited calcula-
a continuous model-free DRL method to minimize energy tion/storage resources and runs a hypervisor1 . Moreover, each
consumption and VNF instantiation cost. However, resource BS covers a cell area and serves multiple UEs including smart-
allocation and optimization of cost and energy are multi- phones, laptops, IoT/IoV equipment in the ground network,
goal optimization problems, and interactive decision-making and satellites/UAVs in the space and air network. These UEs
in multiple network slices will conflict with each other. To can dynamically move across different cells and connect to
address these issues, the authors in [26], [28] proposed game different BSs by cellular links according to their positions. To
theory solutions based on distributed SDN to decide the re- support heterogeneous resource requests from users, multiple
source allocation, where the resource allocation strategies can network slices are built and empowered by the NFV/SDN
be derived more closely at the gateway level. In [1], we pro- techniques. We assume that each ES can instantiate several
posed a prediction-based intelligent slice mobility framework, VNFs based on the deployed hypervisor. These VNFs run a
which considered future information like users’ positions and diverse range of services for users, and each VNF incorporates
requests to assist the decision-making in advance. This method an agent to adjust the allocated resources, physical host, and
introduces higher system costs but also achieves much higher connections to form different types of slices to provide users
system revenue. However, this work didn’t consider the slice with differentiated services. Here, we consider the RAN part is
trigger either and tended to make decisions at each time built under the Open-RAN paradigm realized by the VNFs at
slot. Moreover, the resource allocation strategies can also be ESs with open standards and interfaces, which is witnessed to
optimized with future information prediction. have more advantages in maintaining the network and reducing
In recent years, DRL-based methods have played a vital role the cost by enhancing the flexibility and scalability of the
in the decision-making field, but utilizing conventional DRL- future 6G system [37]. Besides, the BSs are also connected
based schemes for NSM still faces the problem of imperfect
observation, and several studies have also considered this 1 In this paper, we use BS and ES interchangeably.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125


TABLE I
K EY M ODELING PARAMETERS AND N OTATIONS .
Slice 2

Slice 1 Notation Definition


Cloud
u, U Indexes and set of Users
n, N Indexes and set of ESs
Rn = ∪ {Rn,i } Total resources of ES n
i
t Time slot index
… …
Decentralized
Core
i Resource type index
Rava ava
n (t) = ∪ {Rn,i (t)} Available resources of ES n at time slot t
Space and … i
Air Network … xn , y n Coordinate of ES n
OpenRAN
xu (t), yu (t) Coordinate of user u at time slot t
Ground Network Nu (t) ES serves u at time slot t
s, S Indexes and set of slices
Small Base Station UE Hypervisor VNF Cellular Link v, Vs Indexes and set of VNFs of slice s
Macro Base Station Satellite Server Switch Optical Fiber
πv (t) Place hosts v at time slot t
Rreq req
u (t) = ∪ {Ru,i (t)} Requested resources of u at time slot t
i
Fig. 1. The considered network architecture in space-air-ground 6G systems. ps , p v , p u Priority setting of s,v, and u
Rv (t) = ∪ {Rv,i (t)} Allocated resources of VNF v at time slot t
i
P Basic running price of a resource unit
COrun
v (t) The running cost of VNF v at time slot t
to their neighbors by BS-to-BS links and to the remote cloud λi Weights of prices of resource type i
by backhaul links, these links are generally built with high- Rreq req
v (t) = ∪ {Rv,i (t)} Requested resources of VNF v at time slot t
i
speed optical fibers and have sufficient transmission speed. Rust ust
v (t) = ∪ {Rv,i (t)} Unsatisfied resource of VNF v at time slot t
i
We assume the core network is built in a cloud-native way η Penalty of an unsatisfied resource unit
and realized by the VNFs deployed at the ESs to achieve COust
v (t) Unsatisfied cost of VNF v at time slot t
a decentralized core network. In the remote cloud swarm, δ Migration penalty
multiple servers with heterogeneous calculation and storage ∆v (t) ∈ {0, 1} Indicator for measuring if v migrates
COmig
v (t) Migration cost of VNF v at time slot t
resources are geographically distributed and connected to their
COv (t) Cost of VNF v at time slot t
neighbors by switches. We regard the cloud swarm as a logic Rsat sat
v (t) = ∪ {Rv,i (t)} Satisfied resources of v at time slot t
i
cloud center that can dynamically adjust the number of servers lu,v (t) Latency from Nu (t) to πv (t)
and thus has elastic resources. lv (t) Total latency of VNF v at time slot t
When UEs’ requests cannot be met, VNFs from related Ls Latency constraint of slice s
µs Latency from a user to its connected ES in s
slices will partly move to new hosts and adjust resources to en- φs Latency between two connected Ess in s
sure service continuity, known as NSM [14]. Additionally, as ρs Latency from the ESs to the cloud in s
NSM triggers occur intermittently, without loss of generality, Ω(Nu (t), πv (t)) Nodes on shortest path from Nu (t) to πv (t)
we model the NSM process from a periodical view to mimic ω Coefficient to measure revenue based on P
RNv (t) Revenue of VNF v at time slot t
this process while assuming that the NSM triggers occur in PRv (t) Profit of VNF v at time slot t
the first time slot of each period, as shown in Fig. 2. We aim Θv,X (t) ∈ {0, 1} Indicator measures if πv (t) equals X
to explore prediction-based NSM that utilizes system infor-
mation predictions, such as requested resources and moving
trajectories of UEs, to aid NSM decision-making. Specifically, users’ demands over the entire period. After the migration,
the overall NSM process is separated into the initialization, the users dynamically change their positions and required
collection, and migration phases. In the initialization phase, resources until the next period’s NSM decision-making. In
ESs initialize VNFs based on serving users’ request priorities this paper, we neglect the time for decision-making, VNF
and build network slices by grouping users of the same priority migration, and the disk resources for caching user information.
into the same slice. After that, in the collection phase, each In the considered system, we denote the set of users and ESs
ES collects and caches the serving UEs’ request resources as U and N , the total resources of ES n ∈ N as Rn , where
and users’ geographical coordinates (defined as UE cache). Rn includes I types of resources like CPU, RAM, and disk.
We assume that user behavior data follows identical and Thus, Rn can be expressed as Rn = ∪ Rn,i , where
independent distribution, thus, the UE cache from multiple i∈{1,...,I}

slices can be used to train a global prediction model that Rn,i denotes the total amount of resource type i of ES n. As
learns the behavior of all users. In the following migration the available resources of ESs will dynamically change when
phase, during each period, the agent of each VNF observes allocating or scaling resources for VNFs, similarly, we denote
the serving users’ requests and positions at the first time slot the available resource of ES n at time slot t as Rava n (t) =
ava
and predicts their information for the following several time ∪ Rn,i (t). Considering that ESs have fixed positions
i∈{1,...,I}
slots based on historical user behavior and the well-trained while users move dynamically over time slots, we denote the
model. Meanwhile, the agent also observes the ESs’ available coordinate of ES n as (xn , yn ) and the coordinate of user
resources at the first time slot. Based on the current system u ∈ U at time slot t as (xu (t), yu (t)). Assuming a user can
observations and future predictions, each VNF’s agent decides only connect to one ES at each time slot, we denote the ES that
on migration destinations and resource allocations to meet user u connects to at time slot t as Nu (t). Generally, Nu (t)

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

NSM Decision-making NSM Decision-making NSM Decision-making

TS 1 TS 2 TS T TS 1 TS 2 TS T
… …

… …
… … …

Time period Time period Time period



Observe Predict

Ini aliza on Phase Collec on Phase Migra on Phase

Fig. 2. The illustration of periodical modeling of network slice mobility.

should be the ES that is the closest to the user’s position [38], customized network configurations and latency settings for
which can be obtained by wireless/wired connections by configuring separate network
channels/bandwidth [40], [41]. Thus, in slice s, we denote the
Nu (t) = arg min{(xn − xu (t))2 + (yn − yu (t))2 }. (1) wireless communication latency for its users to connect to the
n∈N
serving ES as µs , the wired communication latency between
We denote the set of slices in the system as S. In the two directly connected ESs as φs . Besides, as the latency to
initialization phase, each user is assigned to a slice and then access the remote cloud is generally much higher than local
served by the VNFs in that slice. We assume each VNF can connections, we consider the latency from the VNFs to the
only belong to one slice and denote the VNFs of slice s as remote cloud as a fixed value denoted as ρs . Thus, the latency
Vs , thus, we have Vs ∩ Vs′ = ∅, ∀s, s′ ∈ S. For VNF v ∈ Vs , lu,v (t) for user u ∈ Uv to connect to VNF v at time slot t can
we denote πv (t) ∈ {0, 1, . . . , |N |} to indicate the place that be calculated by
hosts v at time slot t, where “0” means v is migrated to the 
µs + ρs , πv (t) = 0
cloud, 1, . . . , |N | means v is hosted by the corresponding ES1 . lu,v (t) = ,
µs + φs |Ω(Nu (t), πv (t))|, otherwise (2)
Furthermore, to ensure service continuity, we assume that a UE
∀u ∈ Uv , ∀v ∈ Vs , ∀s ∈ S,
cannot switch the VNF serving it in the prediction-based NSM
scenario, and an illustrative case is shown as Fig. 3. Initially, where Ω(Nu (t), πv (t)) denotes the set of shortest path from
UE A and UE B are served by VNFs hosted on ESs dedicated ES Nu (t) to facility πv (t). Moreover, considering the cus-
to them. As time progresses, UEs move and are served by tomized latency requirements of slices, we denote the latency
different ESs, causing VNFs to migrate accordingly. Under this constraint of slice s as Ls , ∀s ∈ S and have lu,v (t) ≤ Ls ,
circumstance, UEs still should connect to their corresponding which indicates that the access latency of each user should
VNFs by serving ESs for uninterrupted service. Thus, users meet the latency requirements of slice s. As a result, we can
served by a specific VNF should remain constant over time P of VNF v at time slot t for serving all
derive the total latency
slots and can be denoted as Uv . its users as lv (t) = u∈Uv lu,v (t).
The requested resources of user u at time slot t can be de-
req req
noted as Ru (t) = ∪ Ru,i (t), where Ru,i (t) represents B. Problem Formulation
i∈{1,...,I}
different types of resources. Moreover, we further consider We denote the running price of a resource unit as P, and
the priority of users and denote the priority of the request the running cost of VNF v at time slot t can be calculated by
from user u as pu . The priority of a slice is based on the I
users’ priorities, as well as the VNFs of the slice, thus, we 1 X
COrun
v (t) = pv P λi Rv,i (t), ∀v ∈ Vs , ∀s, ∀t, (3)
denote ps and pv as the priority of slice s and VNF v ∈ Vs , lv (t) i=1
and have ps = pv = pu , ∀u ∈ Uv , ∀v ∈ Vs , ∀s. Running
where λi denotes the weights for measuring the prices of
the network slices for supporting user requests introduces cost
resource type i. The requested resources of VNF v at time
and the MNOs get revenue from users, which is relevant to
slot t can be the sum of the requests from its users and can
users’ requests and the continuity of service [39]. For instance, req req
be denoted as Rreq
v (t) = ∪ Rv,i (t), where Rv,i (t) can
while some of the users’ requests are not satisfied, they i∈{1,...,I}
req P req
may complain about it, which affects MNO’s maintenance. be calculated by Rv,i (t) = u∈Uv Ru,i (t). As the allocated
We denote the allocated resources of VNF v at time slot resources of VNFs are determined when performing the NSM
t as Rv (t) = ∪ Rv,i (t), which is determined by the process and the requested resources of users change timely, the
i∈{1,...,I}
requested resources of VNFs can be more than their allocated
requested resources of VNFs and the available resources of
resources when users have spike resource requests, introducing
the physical host of VNF v.
unsatisfied resources. We denoted the unsatisfied resources
When a UE tends to access the VNF that serves it, it
of VNF v at time slot t as Rust v (t) = ∪ Rv,i ust
(t).
first connects to its nearest ES through cellular links and i∈{1,...,I}
then routes to the ES/cloud that hosts the corresponding Intuitively, the unsatisfied resources of type i should be 0 when
VNF by optical links. Generally, different slices can have the requested resources are less than the allocated resources,
otherwise should be a difference between them, thus, we can
ust req
1 Here, |·| denotes the number of nodes in the set. derive Rv,i (t) = max{Rv,i (t) − Rv,i (t), 0}. We denote η as

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

… …
VNF A VNF B VNF A VNF B

UE A UE B UE A UE B

Initialization Phase Migration Phase

Fig. 3. The illustration of user mobility and slice migration.

the penalty of an unsatisfied resource unit, thus, the unsatisfied X or not, thus, the problem in this paper can be formulated
cost of VNF v at time slot t can be calculated by as
XX X
I
X max PRv (t), (9a)
COust
v (t) =η ust
λi Rv,i (t), ∀v ∈ Vs , ∀s, ∀t. (4) πv (t),Rv (t)
t s∈S v∈Vs
i=1 XX
s.t. Θv,0 (t)Rv,i (t) ≤ RiC , ∀i, ∀t, (9b)
Besides, as the VNFs’ migration from one host to another s∈S v∈Vs
may cause user service switch and resource reconfiguration, XX
as well as service replacement, we denote δ as the migration Θv,n (t)Rv,i (t) ≤ Rn,i , ∀i, ∀n, ∀t, (9c)
s∈S v∈Vs
penalty to measure the cost of MNOs to perform VNF
migration; to indicate if VNF v migrates to another physical lu,v (t) ≤ Ls , ∀u ∈ Uv , ∀v ∈ Vs , ∀s, ∀t, (9d)
host at time slot t, we further denote ∆v (t) ∈ {0, 1}, where πv (t) ∈ {0, . . . , |N |}, ∆v (t) ∈ {0, 1},
1 means VNF v migrates to another host and 0 means not. ∀v ∈ Vs , ∀s, ∀t, (9e)
Thus, ∆v (t) can be calculated by
 Vs ∩ Vs′ = ∅, ∀s, s′ ∈ S, (9f)
0, πv (t) = πv (t − 1) I
∆v (t) = , ∀v ∈ Vs , ∀s, ∀t, X
1, otherwise λi = 1. (9g)
(5) i=1
thus, the migration cost of VNF v at time slot t can be Here, (9b) and (9c) ensure that the allocated resources of
calculated by VNFs in the physical host will not exceed the total resources
COmig (t) = δ∆v (t), ∀v ∈ Vs , ∀s, ∀t, (6) of the host; (9d) ensures that the serving latency of users in
v
slices meet the SLA requirements; (9e) ensures that each VNF
and the cost of VNF v at time slot t can be calculated as selects only one host for migration and has only two distinct
COv (t) = COrun ust mig
(t), ∀v ∈ Vs , ∀s, ∀t. migration statuses at each time slot; (9f) ensures that each VNF
v (t)+COv (t)+COv
(7) can only be assigned to one slice; (9g) ensures a thorough
weight setting for measuring the prices of resources.
At the same time, the revenue from the VNFs should be
The above-formulated problem is a Mixed Integer Linear
measured not only based on the satisfied resources of users
Programming (MILP) problem due to the integer variables
but also the requests’ priorities and the latency for satisfying
Θv,X (t), ∆v (t), πv (t) and the continuous variable Rv,i (t),
the requests. Generally, the higher priority will have higher
which is proved to be NP-hard [42], we consider using a
prices from the MNOs, introducing higher revenue [1]. We
DRL-based scheme integrated with the FL paradigm to solve
denote the satisfied resources of v at time slot t as Rsat
v (t) =
sat this problem considering: 1) Users have dynamically changing
∪ Rv,i (t). When the requested resources are less than
i∈{1,...,I} requests and positions, conventional heuristic schemes are
the allocated resources, the satisfied resources should be the hard to cope with new user requests, in contrast, DRL has
requested resources; otherwise, the users can only be fulfilled strong generalization capabilities, well-trained DRL agents can
by the allocated resources, and the satisfied resources are then efficiently deal with the dynamic environment; 2) In a 6G
equal to the allocated resources. Thus, for resource type i, the system with massive BSs that continue to grow, the dimensions
satisfied resources of v at time slot t can be calculated by of the problem will grow with the number of slices, heuristic
sat req
Rv,i (t) = min{Rv,i (t), Rv,i (t)}. We denote the coefficient schemes in this case will spend more time searching for the
to measure the revenue based on P as ω, thus, the revenue of solutions, which means that it suffers a significant performance
VNF v at time slot t can be calculated as degradation in a finite time scale.
I
1 X
sat
RNv (t) = ωpv P λi Rv,i (t), ∀v ∈ Vs , ∀s, ∀t. (8) IV. P REDICTION - BASED N ETWORK S LICE M OBILITY
lv (t) i=1 F RAMEWORK
We aim to maximize the long-term profit of the MNOs, In this section, we introduce the framework design, elab-
which is affected by satisfied user requests, latencies for ser- orate on the model for learning users’ behaviors for future
vice, and request priority. Denoting the profit of VNF v at time information prediction, solve the problem using a DDQN-
slot t as PRv (t) calculated by PRv (t) = RNv (t)−COv (t), and based method with the predictions, and finally integrate the
Θv,X (t) ∈ {0, 1} as an indicator for measuring if πv (t) equals training processes with the FL paradigm.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

Collection Migration Main Network Agent Target Network


Phase Phase
Copy
Mini




Batch


Replay Buffer 3 4

State Action Reward Next State


Req
Pos
3.1 5
Observed
1st TS Observe
Migrate Overall
3.2 Profit
Sensing Predicted Req
Process TSs Pos Predict
TS 1 TS 2 TS T TS 1 TS 2 TS T TS 1 TS 2 TS T
Req Req Req Req
… Pos … Pos … Req Req Req Req Req Req Req Req
… Pos … Pos

Pos Pos Pos Pos
… Pos … Pos Pos Pos

Time period 1 … Time period Time period


Caching 1
Process
2 …
LSTM
Req Req Req Req … Req
Pos Pos Pos Pos Pos Model

Fig. 4. The illustration of the overall framework in prediction-based slice mobility.

A. Overall Framework Design significantly reduce learning difficulty by providing long-term


In our proposed framework, during the collection phase, we memory for valuable information [43], [44].
The training process of the LSTM model is elaborated
have two independent processes of each VNF, i.e., the sensing
in Algorithm 1. Specifically, we consider each period T
and caching processes, as shown in Fig. 4. Specifically, the
consists of ζ time slots and the collection phase consists of
sensing process continuously collects the requested resources
Ψ periods for collecting user information as training data
and position information of the serving users and periodically
before the agent starts the NSM decision-making. To obtain the
sends the information to the caching process. After that, the
training data, we firstly combine the user request and position
caching process caches the user information and all the VNFs
information from the Ψ periods as a sequence, denoted as
collaboratively train a global user behavior prediction model.
Dcob = ∪ Dcob cob
u , where Du can be expressed as
During each period in the following migration phase, the ∀u∈U
VNFs observe serving users’ information and the available {Rreq (1), xu (1), yu (1), . . . , Rreq (ζ), xu (ζ), yu (ζ) , . . . ,
resources of system facilities at the first time slot and call |u {z u }
the prediction model to predict the user information of the Time period 1
req
following several time slots. Based on the observation and Ru (Ψζ − ζ + 1), xu (Ψζ − ζ + 1), yu (Ψζ − ζ + 1), . . . ,
prediction information, each VNF’s agent determines which Rreq
u (Ψζ), xu (Ψζ), yu (Ψζ)}.
physical host to migrate and how many resources should scale (10)
up/down. This decision-making process aims to maximize the After that, we split the combined sequence Dcob to be
MNOs’ long-term profit. We transform this problem into an a dataset for training the LSTM model. Considering
MDP to solve it using the DRL-based scheme, where the state each training sample consists of θ time slots’ user
information includes the observation of the first time slot and request and position information, for each user u,
predictions of the following several time slots, and the agent we can split Ψζ − θ training samples expressed as
of each VNF performs action as selecting the physical host for Xu,1 = {Rreq req
u (1), xu (1), yu (1) . . . , Ru (θ), xu (θ), yu (θ)},
NSM. Besides, the allocated resources are determined based Xu,2 = {Ru (2), xu (2), yu (2) . . . , Rreq
req
u (θ + 1), xu (θ +
on the chosen host and user requests. After the migration, each 1), yu (θ + 1)}, . . ., Xu,Ψζ−θ = {Rreq u (Ψζ − θ), xu (Ψζ −
VNF stays with the chosen host for the overall period, and the θ), yu (Ψζ − θ), . . . , Rreq
u (Ψζ − 1), xu (Ψζ − 1), yu (Ψζ − 1)}.
total profit for this period is set as the reward. Moreover, the corresponding testing samples can be
expressed as Yu,1 = {Rreq u (θ + 1), xu (θ + 1), yu (θ + 1)},
Yu,2 = {Rreq u (θ + 2), xu (θ + 2), yu (θ + 2)}, . . .,
B. User Behavior Prediction Yu,Ψζ−θ = {Rreq (Ψζ), x (Ψζ), yu (Ψζ)} (Line 1). The
u u
We adopt the LSTM model for user request/position predic- training dataset and testing dataset will be sent to the model
tion. Specifically, as a special morph of the conventional recur- for updating the parameters by the backward propagation
rent neural networks (RNN), LSTM introduces the mechanism algorithm and adaptive moment estimation (Adam) optimizer
of the gate to cope with the long-distance dependency problem [45], [46] based on the mean squared error (MSE) loss
by adding internal LSTM cell loops. As a result, LSTM can between the testing data and model prediction (Lines 3-6).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

Algorithm 1: LSTM Training and Inference Algorithm settings, and aim to maximize the value function for improving
Input: ζ, Ψ, χ, θ, Dcob the long-term profit of the MNOs.
u , ∀u ∈ U.
cob
1 Initialize: Split Du , ∀u ∈ U to be training samples 1) VNF State: For each period T in the migration phase,
and testing samples of user ∀u ∈ U, obtain each VNF v observes the serving users’ request and position
Xu,1 , Xu,2 , . . . , Xu,Ψζ−θ and information at the first time slot T1 and receives the prediction
Yu,1 , Yu,2 , . . . , Yu,Ψζ−θ . information Pu (T ). To determine where to migrate and how
2 for u ∈ U do many resources should be allocated, VNF v also observes the
3 for Each train epoch do available resources of all ESs. Thus, the state of VNF v in
4 With Xu,1 , Xu,2 , . . . , Xu,Ψζ−θ , fit the LSTM period T should include the available resources of the ESs, the
model, obtain Ŷu,1 , Ŷu,2 , . . . , Ŷu,Ψζ−θ . serving users’ request and position information at the first time
5 Calculate the MSE loss between slot T1 , and the predicted user information of following χ time
Yu,1 , Yu,2 , . . . , Yu,Ψζ−θ and slots. Moreover, considering each VNF may serve different
Ŷu,1 , Ŷu,2 , . . . , Ŷu,Ψζ−θ . numbers of users, to unify the state space, we extend the state
6 Update the parameters of LSTM by Adam to cover at most |U| users’ information. If the serving users
optimizer and backward propagation method. are less than |U|, we set the corresponding value in the state
space as “-1”. As each serving user’s information includes I
7 for Each period T > Ψ do types of resources and 2 position coordinates for χ + 1 time
8 Get historical stack sequence with length θ as slots, the state information of v should be expressed as
Hu (T ) = {Rreq u (T1 − θ + 1), xu (T1 − θ + 1),
yu (T1 − θ + 1) . . . , Rreq u (T1 ), xu (T1 ), yu (T1 )}. Sv (T ) = ( ∪ Rava
n (T1 ), ∪ {Ru (T1 ), xu (T1 ),
∀n∈N ∀u∈Uv
9 Set Pu (T ) = ∅ to store the prediction of user
information in period T . yu (T1 ), Pu (T )}, −1, . . . , −1 ). (11)
| {z }
10 for time slot t ∈ {T2 , T3 , . . . , Tχ+1 } do (I+2)×(χ+1)×(|U |−|Uv |)
11 Input the historical stack Hu (T ) to the
LSTM to obtain the predicted information 2) VNF Action: VNF v determines where to migrate based
{Rpre pre pre
u (t), xu (t), yu (t)} of time slot t.
on the state information, and the allocated resources should be
12 Pop out the first record of historical stack decided based on the current and future requested resources
Hu (T ). and the available resources of the chosen host. We denote the
13 Update Hu (T ) ← action of VNF v at period T as Av (T ) to represent where to
Hu (T ) + {Rpre pre pre
u (t), xu (t), yu (t)}.
migrate, i.e., to ES 1, . . . , |N |, or to the cloud 0. Thus, we
14 Update Pu (T ) ← have πv (T1 ) = πv (T2 ) = . . . = πv (Tζ ) = Av (T ). Moreover,
Pu (T ) + {Rpre pre pre
u (t), xu (t), yu (t)}.
the allocated resources should jointly consider the requested
resources of the first time slot and the prediction of the further
Output: Pu (T ), ∀T, ∀u ∈ U. time slots and should be less than the available resources of
the chosen host, thus, we have

Rv (T1 ) = Rv (T2 ) = . . . = Rv (Tζ ) =


After the training process, we utilize the trained LSTM model req PTχ+1 pre
for future information inference during the migration phase, Rv,i (T1 ) + t=T Rv,i (t)
∪ min{RAava
v (T ),i
(T1 ), 2
}.
specifically, for each period T , at the first time slot T1 we set i={1,2,...,I} 1+χ
the historical stack as the user request and position information (12)
of current time slot and the previous θ − 1 time slots, i.e., 3) VNF Reward: To maximize the long-term system profit,
from time slot T1 − θ + 1 to T1 (Line 8). As the trained in each period T , we set the reward as the overall profit of the
model can only predict one time slot’s future information period, we denote the reward of period T as Rv (T ) and can
PTζ
and to support multiple time slots’ prediction, we update the be derived by Rv (T ) = t=T 1
PRv (t). We define the NSM
historical stack based on the prediction information iteratively decision-making policy as the mapping from the state to the
(Lines 10-14). We set the prediction order as χ ∈ [0, ζ − 1], possible actions, denoted as κ, where κ(A|Sv (T )) indicates
with 0 indicating no future prediction and ζ − 1 representing the possibility of taking action A with state Sv (T ) under the
prediction for the entire period. Thus, the predicted user policy κ. The VNFs aim to find an optimal policy to maximize
information with χ time slots can be obtained as Pu (T ) = the long-term system profit, and the value function is given as
{Rpre pre pre pre pre
u (T2 ), xu (T2 ), yu (T2 ), . . . , Ru (Tχ+1 ), xu (Tχ+1 ), "∞ #
pre
yu (Tχ+1 )}, which will then be combined with the X
T −1
V (S) = E γ Rv (T )|Sv (Ψ) = S , (13)
observations at the first time slot T1 to determine the NSM
Ψ
strategies.
where γ is a discount factor. Based on the Bellman function
[47], the value function can be further transformed to
C. DDQN-based NSM Decision-making " #
In this part, we model the decision-making process of each V (S) = X κ(A|S) R + γ X P r{S′ |(S, A)} · V (S′ ) , (14)
VNF as an MDP, introduce the state, action, and reward A S′

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

where A is the action the VNF v takes in state S, R denotes Algorithm 2: DDQN-Based NSM Algorithm
the received reward, and S′ represents the possible next state. Input: N , ζ, Pu (T ), ∀u ∈ U.
Thus, the above formulated problem can be transformed to 1 Initialize: The Q-function Q(Sv (T ), Av (T ); β(T )) of

max V (S), the target network with random β(T ), the decay rate
(15) of ϵ as ξ, the episode index ι = 1.
s.t. (9b), (9c), (9d), (9e), (9f), (9g).
2 for ι ≤ Epsisode Number do
In this paper, we use the DDQN model to train agents 3 Get the system state Sv (T ) from the environment.
to avoid possible overestimation of the conventional DQN. 4 With probability ϵ, select an action Av (T )
Specifically, the DDQN model leverages two neural networks randomly, otherwise select
(target network and main network) for action selection and Av (T ) = arg maxQι (Sv (T ), Av (T ); β(T )).
evaluation, and the Q-function can be given as Av (T )
X 5 VNF v migrates to the corresponding facility that
Q(Sv (T ), Av (T )) = Rv (T ) + γ P r{Sv (T + 1)| Av (T ) represents with allocated resource
Sv (T +1) (16) calculated by (12).
Sv (T ), Av (T )} · max Q(Sv (T + 1), Av (T + 1)). 6 Set ∆v (T1 ) = 1 if Av (T ) represents the current
Av (T +1) facility else 0.
We adopt deep neural network (DNN) to approximate 7 Set ∆v (T2 ) = ∆v (T3 ) = . . . = ∆v (Tζ ) = 0.
Q(Sv (T ), Av (T )) and update the parameters of DNN 8 Calculate Rv (T ) during period T and get the
by the stored experience in the replay buffer. Let next-state Sv (T + 1).
Qι (Sv (T ), Av (T ); β(T )) denote the DDQN model with pa- 9 Store the data (Sv (T ), Av (T ), Rv (T ), Sv (T + 1))
rameters in episode ι, we have in the replay buffer.
10 Randomly sample from the replay buffer.
Qι+1 (Sv (T ), Av (T ); β(T )) = 11 Calculate the loss function L(β(T )) by (18).
Qι (Sv (T ), Av (T ); β(T )) + αι · {Rv (T ) + γ · Qι (Sv (T + 1), 12 Update β(T ) to minimize L(β(T )) by gradient
arg max Qι (Sv (T + 1), Av (T + 1); β(T )); β̂(T ))− descent.
Av (T +1) 13 Update ϵ ← e−ι/ξ and ι ← ι + 1.
Qι (Sv (T ), Av (T ); β(T ))}, 14 Update the parameters in the target network
(17) periodically, i.e., β̂(T ) ← β(T ) after several
where αι denotes the learning rate, β(T ) and β̂(T ) denote episodes.
the parameters of the main network and the target network.
Thus, the main network’s loss function used for updating the
parameters by gradient descent can be expressed as
X including user requests and positions for Ψ periods. After that,
L(β(T )) = (yι − Qι (Sv (T ), Av (T ); β(T )))2 , VNFs train the LSTM model locally and upload the model
(Sv (T ),Av (T ))∈BT parameters to the cloud for parameter aggregation, which is
(18) generally achieved by the FedAvg method [48], and then the
where yι = Rv (T ) + γ · Qι (Sv (T + 1), arg max Qι (Sv (T + aggregated new parameters will be broadcast to all VNFs. In
Av (T +1)
the migration phase, the VNFs send the parameters of local
1), Av (T + 1); β(T )); β̂(T )). Here, Rv (T ) denotes the reward
DRL agents instead. We use τ to denote the iteration index
in episode ι, BT denotes a mini-batch. As a result, Algorithm 2
for model training and update. Denote each VNF v has a
illustrates the proposed DDQN-based NSM process.
dataset Dv , which can be the user information for prediction
model training or experience for NSM decision-making policy
D. Federated Learning Framework Integration learning. For each sample d in Dv , we denote the loss function
Each VNF has to decide the NSM strategies based on the as ld (w). Here, ld (w) can be the MSE of the predicted user
prediction model and the DRL agent, which will be trained information of user u, i.e., Ŷu,1 , Ŷu,2 , . . . , Ŷu,Ψζ−θ and the
using the user data collected by the ESs. However, it is not real user information Yu,1 , Yu,2 , . . . , Yu,Ψζ−θ in the predic-
reasonable to train the prediction model and the DRL agents tion model learning process, and the loss function (18) in DRL
individually due to: 1) Each VNF can only collect very few learning process. Denote tha parameter of v as wv , we can
user data in each period, which will take a long time for calculate the local loss function at VNF v by
the ES to collect enough user data for model training; 2)
1 X
Few training data samples will unavoidably lead to model Lv (wv ) = ld (w). (19)
overfitting. Moreover, uploading all the original user data |Dv |
d∈Dv
to the centralized cloud for model training will induce high
communication costs and possible user data leakage. We denote and calculate the dataset of all VNFs in the system
To cope with these problems, we consider using the FL as D = ∪ Dv , the global loss function can be given by
v
paradigm for model training and describe the FL process of
training the LSTM model and the DRL agents as follows. In 1 X 1 XX
L(w) = ld (w) = |Dv |Lv (wv ). (20)
the collection phase, all the VNFs collect the user information, |D| |D|
d∈D s∈S v∈Vs

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

10

Algorithm 3: Federated Network Slice Mobility it converges (Lines 6-10). In the migration phase, the VNFs
Framework Implementation Algorithm initialize the parameters of their local agents and get the state
Input: N , ζ, Pu (T ), ∀u ∈ U. with current observations and future predictions. Based on the
1 Initialize: Set the available resources of ESs as their state information, the local agents give the actions under the
total resources, i.e., Rava ϵ-greedy policy, and then VNFs migrate to the chosen host
n (0) = Rn , ∀n ∈ N .
2 for n ∈ N do to stay for an overall period. The total profit of this period
3 Get the serving users from U if Nu (0) = n, is set as the reward for the agent, and the next state can
denoted as Un (0). be determined at the next period’s first time slot. The stored
4 Classify the serving users according to pu and tuples with mini-batch are utilized for local parameter update
generate corresponding VNFs. of the DRL agents based on (21), and the parameters are then
aggregated at the cloud based on (22) (Lines 12-16). After
5 VNFs initialize local prediction model parameters.
several periods, the local prediction model will also be updated
6 for T ≤ Ψ do
based on new user information data and be aggregated and
7 Each VNF v collects serving users’ information
dispatched back to all VNFs (Line 17).
Rreq
u (t) and xu (t), yu (t), ∀u ∈ Uv , ∀t.
8 v updates the local model’s parameter by (21).
9 v sends local model’s parameter to the cloud. E. Complexity Analysis
10 Cloud aggregates the global parameter of the In Algorithm 1, Line 1 takes O((Ψζ − θ) × |U|) to split the
LSTM model by (22) and dispatches to all VNFs. dataset for |U| users to get Ψζ − θ sample pairs. Afterward,
11 VNFs initialize local DRL agent parameters. Lines 3-6 take O(EF ) to train the global LSTM model,
12 for T ≥ Ψ + 1 do where E is the number of training epochs and O(F ) is the
13 Each VNF v gets Sv (T ), Av (T ), Rv (T ) and computation complexity for LSTM training and is related to
Sv (T + 1) based on Algorithm 2. the number of LSTM parameters [49]. In our case, the input
14 v updates local agent’s parameter by (21). and output dimensions are I + 2, indicating the considered
15 v sends local agent’s parameter to the cloud. resource types and coordinates. Denote the number of LSTM
16 Cloud aggregates the global parameter of DRL layers as L and each layer consists of U neurons, we have
agents by (22) and dispatches to all VNFs. O(F ) calculated as O(LU (4LU + 5I + 13)). As calculating
17 Perform Lines 7-10 after several periods. the MSE loss takes O(Ψζ − θ) and updating the parameters in
each episode takes O(LU (4LU + 5I + 13)) similar to LSTM
training, the complexity of Algorithm 1 can be calculated as
O(|U| · ((Ψζ − θ) + 2ELU (4LU + 5I + 13)).
To find the optimal parameter w∗ = arg min L(w), let each The computations of Algorithm 2 mainly come from
w
VNF v computes its parameters wvτ according to the update Lines 4, 12, and 13. In Line 4, when an agent adopts greedy
rule by policy to choose the action, it needs to calculate the Q function
wvτ = wvτ −1 − ϱ∇Lv (wvτ −1 ), (21) to obtain the action, which takes O(|Av (T )|) = O(|N | + 1).
The complexity of calculating the loss function and updating
where ϱ ≤ 0 denotes the gradient step size, and the global
the parameters is related to the number of agent parameters
parameter wτ can be updated by
and the random samples from the replay buffer. Denote the
|Dv |wvτ
P P
training process has E ′ episodes, Nι ramdom samples, and
wτ = s∈S v∈Vs . (22)
|D| L′ DNN layers, each layer has U ′ neurons, the complexity of
Algorithm 2 is O(((|N | + 1)E ′ + E ′ L′ U ′ Nι )) [50].
Thus, based on the analysis above, we can integrate the
In Algorithm 3, Lines 2-4 take O(|N ||U |) for the ESs to
overall framework with the FL shown as Algorithm 3. In
obtain their users and assign them to the corresponding slices
the initialization phase, all the BSs get the serving users
according to their priorities. During Lines 6-10, the VNFs
by the user positions and receive users’ request information,
collect user information, update the local parameters, and send
including users’ requested resources and priorities, and classify
to the cloud in parallel, which takes O(max |Uv |((Ψζ − θ) +
the users into different groups according to their request v
priorities. Based on the groups (as well as the priorities), the 2ELU (4LU + 5I + 13))). Similarly, during Lines 12-16, the
BSs generate network slices and allocated resources for each VNFs update the parameters of local DRL agents and send
VNF of slices (Lines 2-4). Note that to avoid the resource them to the cloud, which takes O((|N | + 1)E ′ + E ′ L′ U ′ Nι ).
unavailable concern in the initialization phase, we consider Thus, the complexity of Algorithm 3, as well as the overall
the users’ requested resources to be slight at the beginning. framework can be expressed as O(|N ||U |+max |Uv |((Ψζ −
v
Afterward, in the collection phase, the VNFs initialize the θ) + 2ELU (4LU + 5I + 13)) + (|N | + 1)E ′ + E ′ L′ U ′ Nι ). As
parameters of the prediction model and collect their users’ the training parameters E, E ′ et al. are generally constant in
information, including user request and user position, to train simulation, when we have |N | and |U| growing large enough,
the local model by updating the model parameters based the complexity can be closely approximated to O(|N ||U |).
on (21), the local parameters are then sent to the cloud for Compared to other baselines without distributed paradigm
parameter aggregation by (22). The aggregated model will be integration [1], [15], the complexity of the proposed FL-based
dispatched back to all VNFs, and this process will iterate until scheme does not directly increase with the number of slices.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

11

Position 3

BS 6 BS 3 BS 1

Position 1
Position 5

BS 7 Position 4
BS 8 BS 9

BS 4
BS 0 BS 2
BS 5
UE Position 2

Fig. 5. The demo network topology for 10 BSs and the mobility case of a user for 5 time slots.

TABLE II each time slot t, we consider the users in the system randomly
BASIC S IMULATION PARAMETERS . move from their current positions to new ones and illustrate
Parameter Value the coordinate trajectories xu (t), yu (t) of a demo user u for 5
The size of area 100 × 100 M2 time slots by blue dots and red dash lines. From Fig. 5, we can
The ES number |N | 10 see the user firstly initializes the request at position 1 (next
The user number of each ES 5
Unsatisfied penalty η 50
to BS 6), then continuously moves to position 2 (the middle
Migration penalty δ 10 between BSs 0 and 2), and then to position 3 located at the
Weights λi (i = 1, 2, 3) 0.4, 0.4, 0.2 edge of the scenario, position 4 that with dense BSs serving,
Priorities 1, 2, 3 and finally to position 5 (next to BS 9). We consider three
Total resources of ESs 4,8,128
Latency µs 1 - 3 ms types of resources for user requests, i.e., the CPU, RAM, and
Latency φs 3 - 5 ms disk resources, and set I = 3. Since the resources of ESs are
Latency ρs 100 - 120 ms limited, we set the total resources of each ES as 4 CPU cores,
Latency constraint Ls 30 - 150 ms
Revenue coefficient ω 150
8 GBits RAM, and 128 GBits disk, respectively. The requested
Run price per resource unit P 100 resources for user u at time slot t ≥ 1 are randomly set from
Time slot number of each period ζ 5 0 to 2t CPU cores, 4t GBits RAM, and 12t disk resources,
Learning rate 0.001 respectively. To distinguish the users according to their request
Batch size 256
Discount factor 0.9 priorities for slice generation, we set the priorities of users as
Memory capacity 1024 pu ∈ {1, 2, 3}, ∀u. To evaluate the communication overhead
Epsilon start 0.9 and transmission time for the FL training comparison, we set
Epsilon end 0.01 the upload speed from VNFs to the cloud uniformly as 5
Epsilon decay 1000
Hidden layer number 2 Mbits/s, the download speed as 10 Mbits/s, and the aggregation
Hidden layer neural number 64 round for FL training as 10. At last, we do the simulation in
a server with two Intel Xeon Gold 6226 2.7 GHz CPUs, 376
GB of RAM, and a Tesla V100 GPU.
Considering the scalability issue in the 6G system [37], the
Moreover, to measure the customized slice settings, in slice
proposed scheme will be feasible to deploy slices in scenarios
s, we set the wireless communication latency µs randomly
with a relatively stable number of BSs and UEs.
as 1 − 3 ms, the optical communication latency φs as 3 − 5
ms, and the latency to the remote cloud ρs as 100 − 120 ms.
V. S IMULATION R ESULTS Additionally, we set the latency constraint Ls randomly from
This part presents the simulation settings, baselines, perfor- 30 − 150 ms. To measure the system cost and profit from the
mance metrics, and evaluation results from different perspec- MNO perspective, we set the running price of a resource unit P
tives. as 100 and the revenue coefficient to measure the revenue ω as
150. To mimic the considered periodical NSM scenario, we set
each period consisting of ζ = 5 time slots and predict 4 time
A. Settings slots except the first time slot and set χ = 4. The migration
We consider a scenario for NSM as an area with 100 × 100 of VNFs between two time periods will introduce migration
M2 consisting of several BSs spread geographically evenly, costs, and provisioning user requests during an overall time
each BS covers 5 UEs. The network topology of the BSs is period will introduce unsatisfied costs. To measure these costs,
generated by the Networkx package in Python [51]. As a case, we set the migration penalty δ = 10 and the unsatisfied penalty
we present the network topology of 10 BSs according to their η = 100. Finally, we set the weights λi for measuring the
positions as shown in Fig. 5, in which the black solid lines prices of the CPU, RAM, and disk resources as 0.4, 0.4,
represent the connection relationships of the BSs. Moreover, at and 0.2, respectively. As a result, we summarize the basic

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

12

1e4 (a) Average Reward 1e4 (b) Average Reward 1e4 (c) Average Reward 1e4 (d) Average Reward
9 9 9 9
8 8 8 8
Reward

Reward

Reward

Reward
7 7 7 7
6 6 6 6
LR=0.01 BATCH_SIZE=64 GAMMA=0.9 MEMORY=256
5 LR=0.001 5 BATCH_SIZE=128 5 GAMMA=0.95 5 MEMORY=512
LR=0.0001 BATCH_SIZE=256 GAMMA=0.99 MEMORY=1024
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Episode Episode Episode Episode

Fig. 6. The average reward of the proposed scheme versus training episodes with different training parameters.

simulation parameters, including the system configuration and to demonstrate the performance of the FL training paradigm,
parameters in LSTM model training and DRL agent training we compare the proposed scheme without FL to see how FL
in Table II. can reduce communication overhead and transmission time
(Proposed, NoFL).
The considered performance metrics should be
B. Baseline and Performance Metrics related to the optimization goal, as well as the
To evaluate our proposed NSM framework, we consider the intermediate results. Thus,
P Pwe P choose the averageP long-
corresponding baselines as 1) DRL-based NSM scheme with term systemP P costP t s∈S v∈V Ps CO v (t)/ s∈S |Vs |,
no future information prediction (NoPre-DRL) [15], in which revenue
P P Pt s∈S v∈VsP RN v (t)/ s∈S |V s |, and profit
the DRL method is adopted in NSM decision-making and no t s∈S v∈Vs PR v (t)/ s∈S |Vs | as the performance
further user information is provided, which means the DRL metrics to evaluate the proposed scheme and baselines.
agents can only make decisions based on the first time slot’ Moreover, P the average Plong-term unsatisfied cost
P P ust
observation in each period; 2) Simulated annealing (SA)-based t s∈S v∈Vs CO v (t)/ s∈S |Vs | is also set as a
NSM scheme with no future information prediction (NoPre- performance metric to see the different resource allocation
SA), in which the SA method with 500 iterations for each strategies between the considered schemes. Note that the
state in each episode is adopted in NSM decision-making system profit is the final goal of optimization and the most
and no further user information is provided; 3) Reset NSM important indicator that the MNOs care about among all the
scheme with no future information prediction (NoPre-Reset) performance metrics.
[1], in which the VNFs stay in the original physical host and
re-allocate the resources based on the first time slot’ user C. Evaluation Results With Different Learning Parameters
requests during periods; 4) Random NSM scheme with no We first illustrate the average reward of the DRL agents of
future information prediction (NoPre-Random), in which the the VNFs to show the convergence performance of the pro-
VNFs randomly migrate to a physical host when performing posed prediction-based NSM scheme with different learning
NSM without future prediction information. Additionally, for rates, batch sizes, discount factors, and memory capacities,
the proposed prediction-based scheme, to measure how the as shown in Fig. 6. From Fig. 6(a), we can observe that
prediction orders affect the simulation results, we present the average reward of the agents with the learning rate set
simulation results with different prediction orders with χ set as 0.01 fluctuates greatly over learning episodes, while the
from 1 to 3 as baselines (Proposed, χ = 1, 2, 3); to measure reward of the learning rate set as 0.001 and 0.0001 have
the effect from prediction deviations, we set the prediction similar behavior and finally the reward of different learning
model with RNN to get different prediction performances as a rates can converge to be stable near to 0.8 × 105 . Particularly,
baseline (Proposed, RNN). Particularly, in extreme cases, to when the learning rate is set as 0.01, the training of agents
observe the best possible performance of the prediction-based will occasionally fall into a local optimum, leading to sharp
scheme regarding predictions, we collect real users’ behavior fluctuations in the later stages and rewards hovering around the
data after a few rounds of simulations and input the real data as global optimum. Also, we adopt different batch sizes, discount
prediction information to the agents to execute the NSM again factors, and memory capacities for agent training, as shown
to achieve an unbiased prediction as a baseline (Proposed, in Fig. 6(b), Fig. 6(c), and Fig. 6(d), respectively. From the
Real); to observe the better performance of the prediction- figures, we can observe that the average training reward of the
based scheme regarding decision-making, we run the MDP DRL agents can converge to a stable stage near 0.8×105 after
with prediction information as a baseline, which performs 100 episodes, which suggests that the VNFs can be adapted
100 iterations for each state that appears in each episode to to the environment efficiently and the agents can learn stable
explore the better actions of agents through exhaustive search strategies to make NSM decisions with prediction information.
(Proposed, MDP). Note that these two extreme cases serve
only as benchmarks for ideal experimental comparisons. In D. Evaluation Results Versus Different Penalties
real-world NSM scenarios, they are hard to apply due to 1) The Considered Schemes Adopt Different Migration
unavoidable prediction bias and extensive iterations. Finally, Penalties: We then compare the average unsatisfied cost, cost,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

13

Proposed Proposed, MDP NoPre-DRL NoPre-SA NoPre-Reset NoPre-Random


1e3 (a) Average Unsatisfied Cost 1e3 (b) Average Cost 1e4 (c) Average Revenue 1e4 (d) Average Profit
8
8 8
6
Unsatisfied Cost

6 6
6

Revenue

Profit
Cost
4 4
4 4
2 2 2 2

0 0 0 0
10 30 50 70 90 110 10 30 50 70 90 110 10 30 50 70 90 110 10 30 50 70 90 110
Migration Penalty δ Migration Penalty δ Migration Penalty δ Migration Penalty δ

Fig. 7. The unsatisfied cost, cost, revenue, and profit of prediction-based FDRL versus different migration penalties.

Proposed Proposed, MDP NoPre-DRL NoPre-SA NoPre-Reset NoPre-Random


(a) Average Unsatisfied Cost (b) Average Cost (c) Average Revenue (d) Average Profit
1e4 1e4 1e4 1e4

2.0 2.0 8 8
Unsatisfied Cost

1.5 1.5 6 6

Revenue

Profit
Cost

1.0 1.0 4 4

0.5 0.5 2 2

0.0 0.0 0 0
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Unsatisfied Penalty η Unsatisfied Penalty η Unsatisfied Penalty η Unsatisfied Penalty η

Fig. 8. The unsatisfied cost, cost, revenue, and profit of prediction-based FDRL versus different unsatisfied penalties.

revenue, and profit of the considered schemes with different 43.72%, 55.90%, and 63.38%, respectively. Moreover, in this
migration penalties δ, as shown in Fig. 7. From Fig. 7(a) case (|N | = 10) with different δ, the gap in profit with the
and Fig. 7(b), we can see with the migration penalty δ Proposed MDP scheme is up to 18.08%.
increases, the unsatisfied cost and cost remain relatively stable,
as the migration penalty will not affect the resource allocation 2) The Considered Schemes Adopt Different Unsatisfied
strategies that the unsatisfied cost depends on. The proposed Penalties: Besides, we illustrate the performances of the
scheme and the Proposed MDP scheme have much lower considered schemes with different unsatisfied penalties η, as
unsatisfied costs and costs compared to others, since in these shown in Fig. 8. From Fig. 8(a), we can observe the unsatisfied
prediction-based schemes, the agents allocate resources for the cost of the considered schemes increases with η, and the
VNFs more reasonably with future user requests considered. proposed schemes have lower unsatisfied costs than the other
From Fig. 7(c) and Fig. 7(d), we can see the proposed schemes schemes. With η increases, the advantages of the proposed
can achieve the highest system revenue and profit, followed schemes become more obvious, since the proposed schemes
by the NoPre-SA and NoPre-DRL schemes that are very allocate more resources for the VNFs with future requested re-
close to each other, the NoPre-Reset scheme, and the NoPre- sources information in the NSM process, contributing to lower
Random scheme. With the prediction of users’ information average unsatisfied cost and lower cost. From Fig. 8(c), we can
provided, the proposed schemes can have more intelligent see the average revenues of the considered schemes fluctuate
resource allocation strategies to decide on better physical hosts on small scales since the available resources of physical ESs
for the VNFs with users’ mobility trajectories considered. are run out. In this scenario, even with higher unsatisfied penal-
Moreover, the No-Pre DRL scheme realizes even the same ties, the VNFs cannot be allocated more resources; even if the
performance compared to the NoPre-SA scheme with 500 cloud has elastic resources, it also has higher latency, introduc-
iterations and achieves better NSM strategies than the other ing lower profit. Thus, the allocated resources of VNFs remain
non-prediction schemes. Overall, compared to the NoPre- stable, as well as the satisfied resources, leading to relatively
DRL, NoPre-SA, NoPre-Reset, and NoPre-Random schemes, stable revenue. The proposed schemes have the highest average
the proposed scheme with different migration penalties δ ∈ revenue, followed by the NoPre-SA, NoPre-DRL, NoPre-
[10, 30, . . . , 110] can reduce the average unsatisfied cost by up Reset, and NoPre-Random schemes. Consequently, though the
to 49.17%, 44.53% 51.33%, and 50.32%, reduce the average system cost increases heavily with the unsatisfied penalty η,
cost by up to 37.24%, 33.05%, 37.76%, and 36.76%, improve the system profit slightly decreases, as shown in Fig. 8(d). To
the average revenue by up to 40.21%, 38.27%, 49.30%, and summarize, compared to the NoPre-DRL, NoPre-SA, NoPre-
56.23%, and improve the average profit by up to 46.13%, Reset, and NoPre-Random schemes, the proposed scheme
with different unsatisfied penalties η ∈ [50, 100, . . . , 300] can

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

14

Proposed, χ = 4 Proposed, χ = 2 Proposed, MDP NoPre-SA NoPre-Random


Proposed, χ = 3 Proposed, χ = 1 NoPre-DRL NoPre-Reset
(a) Average Unsatisfied Cost (b) Average Cost (c) Average Revenue (d) Average Profit
1e3 1e3 1e5 1e5

9 9 1.5 1.5
Unsatisfied Cost

8 8

Revenue
7 1.0 1.0

Profit
Cost
7
6
6 0.5 0.5
5
5
4 0.0
10 20 30 10 20 30 10 20 30 10 20 30
The ES number || The ES number || The ES number || The ES number ||

Fig. 9. The performance of considered schemes versus |N | while the proposed scheme adopts different prediction orders.

Proposed, LSTM Proposed, Real NoPre-DRL NoPre-Reset NoPre-Random


Proposed, RNN Proposed, MDP NoPre-SA
(a) Average Unsatisfied Cost (b) Average Cost (c) Average Revenue (d) Average Profit
9 1e3 9
1e3 1e5 1e5

8 1.5 1.5
8
Unsatisfied Cost

Revenue
7 1.0 1.0

Profit
Cost

6
6
5 0.5 0.5
5
4
0.0
10 20 30 10 20 30 10 20 30 10 20 30
The ES number || The ES number || The ES number || The ES number ||

Fig. 10. The performance of considered schemes versus |N | while the proposed scheme adopts different prediction models.

reduce the average unsatisfied cost by up to 48.96%, 44.14%, time slot’s requests, and their slight differences are the chosen
50.97%, and 49.97%, reduce the average cost by up to 44.39%, facilities for VNF migration, e.g., choosing an ES with more
40.26% 46.29%, and 45.21%, improve the average revenue by available resources will contribute to lower unsatisfied cost
up to 41.08%, 39.49%, 50.23%, and 57.04%, and improve the and cost. Particularly, the proposed scheme with χ = 1 has
average profit by up to 56.95%, 53.75%, 69.81%, and 79.01%, the highest unsatisfied cost and cost, which indicates that
respectively. Moreover, in this case (|N | = 10) with different the prediction of χ = 1 provides the VNFs with a negative
η, the gap in profit with the Proposed MDP scheme is up to effect that lets VNFs allocate much fewer resources, leading
31.11%. to higher unsatisfied costs.
From Fig. 9(c) and Fig. 9(d), when |N | ∈ [5, 20], we can see
E. Evaluation Results Versus Different Numbers of ESs that the Propose-MDP scheme has the highest average revenue
1) The Proposed Scheme Adopts Different Prediction Or- and profit, followed by the proposed scheme with χ = 4, 3, 2,
ders: To figure out how the prediction order χ affects the the NoPre-SA, NoPre-DRL, NoPre-Reset schemes, the pro-
evaluation performance, we set the proposed scheme with posed scheme with χ = 1, and finally the NoPre-Random
different prediction orders χ and further compare the per- scheme. Note that the performances of the Propose-MDP
formances of the considered schemes versus different ES scheme and the NoPre-SA schemes significantly decrease
numbers, as shown in Fig. 9. From Fig. 9(a) and Fig. 9(b), with N , which demonstrates that the exhaustive search-based
we can observe that the average unsatisfied cost and cost of schemes are hard to deal with scenarios with large numbers
the considered schemes vibrate evenly with the increase of of BSs. The average revenue and profit decrease with |N |
the ES number |N |, which suggests that the unsatisfied cost since the complexity of network topology increases with
and cost are not affected by the number of ESs, since the the ES number |N |, leading to higher latency between the
number of VNFs also increase with |N |. Particularly, when physical hosts of VNFs and users’ connected ESs, causing
we have |N | = 25, we find the UEs have relatively fewer lower average revenue/profit. When |N | = 25, since the
requested resources compared to other scenarios, leading to requested resources have a decrease, the agents have more
less unsatisfied cost. The Proposed MDP scheme (χ = 4) candidate physical hosts to choose from for migration and
has the lowest unsatisfied cost and system cost, followed by thus can migrate to physical hosts with lower access laten-
the proposed scheme with χ = 4, 3, 2, the non-prediction cies, contributing to a slight increase of revenue/profit. When
schemes, and the proposed scheme with χ = 1. The non- |N | ∈ [25, 30], the proposed scheme with χ = 4 has higher
prediction schemes have close performance to each other, revenue/profit compared to the NoPre-MDP scheme, and the
as their resource allocation strategies are based on the first NoPre-SA scheme also falls behind the NoPre-DRL scheme.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

15

Proposed, FL Proposed, MDP NoPre-SA NoPre-Reset NoPre-Random


Proposed, NoFL NoPre-DRL
(a) Average Unsatisfied Cost (b) Average Cost (c) Average Revenue (d) Average Profit
1e3 1e3 1e5 1e5
9
8 1.5 1.5
8
Unsatisfied Cost

Revenue
7 1.0 1.0

Profit
Cost
6
6 0.5
5 0.5
5
4 0.0
10 20 30 10 20 30 10 20 30 10 20 30
The ES number || The ES number || The ES number || The ES number ||

Fig. 11. The performance of considered schemes versus |N | while the proposed scheme adopts different training paradigms.

1e2 Average Communication Overhead 1e1 Average Transmission Time


1.4 Proposed, FL 6 Proposed, FL
Communication Overhead (MB)

NoFL NoFL
1.2 5

Transmission Time (s)


1.0
4
0.8
3
0.6
2
0.4
1
0.2
5 10 15 20 25 30 5 10 15 20 25 30
The ES number || The ES number ||

(a) Communication Overhead (b) Transmission Time

Fig. 12. The communication overhead and transmission time versus |N | when the proposed scheme adopts different training paradigms.

Overall, compared to the NoPre-DRL, NoPre-SA, NoPre- (very close to the Proposed MDP), followed by the proposed
Reset, and NoPre-Random schemes, the proposed scheme with scheme with LSTM model and RNN model. The baseline
different prediction orders χ ∈ [4, 3, . . . , 1] can reduce the schemes without prediction have very close performance and
average unsatisfied cost by up to 71.15%, 64.86%, 69.81%, all get high average unsatisfied cost and system cost com-
and 70.96%, reduce the average cost by up to 62.97%, pared to the proposed scheme with different models. From
56.36%, 60.26%, and 61.67%, improve the average revenue Fig. 10(c) and Fig. 10(d), we can see the average revenue and
by up to 40.90%, 55.88%, 68.53%, and 66.34%, and improve profit decrease with |N |, the performance of the Proposed
the average profit by up to 56.29%, 70.44%, 82.37%, and MDP scheme has the highest system revenue/profit when
82.99%, respectively. Moreover, with |N | increases from 5 |N | ∈ [5, 20], and the Proposed Real scheme is much better
to 30, the gaps in profit with the Proposed MDP scheme are when |N | ∈ [25, 30] followed by the proposed scheme with
30.05%, 18.07%, 15.76%, 16.61%, −10.97%, and −6.54%, LSTM model and RNN model and non-prediction schemes. To
respectively, which demonstrate that the proposed scheme summarize, compared to the NoPre-DRL, NoPre-SA, NoPre-
outperforms the Propose-MDP scheme when |N | ≥ 25. Reset, and NoPre-Random schemes, the proposed scheme with
2) The Proposed Scheme Adopts Different Prediction Mod- different prediction models adopted can reduce the average
els: To see how different prediction models affect the NSM unsatisfied cost by up to 122.11%, 113.94%, 120.37%, and
performance, we also compare the performances of the consid- 121.86%, reduce the average cost by up to 104.34%, 96.05%,
ered schemes versus different |N | where the proposed scheme 100.95%, and 102.71%, improve the average revenue by
adopts different prediction models, as shown in Fig. 10. up to 47.90%, 61.81%, 72.76%, and 70.86%, and improve
Specifically, we utilize the LSTM model and the RNN model the average profit by up to 63.20%, 75.11%, 85.16%, and
to predict future user information to get different prediction 85.67%, respectively. Moreover, with |N | increases from 5
performances and collect real user request data to achieve to 30, the gaps in profit with the Proposed MDP scheme are
an unbiased prediction present as the Proposed Real scheme. 15.47%, 3.39%, −3.82%, −0.96%, −25.09%, and −21.31%,
From Fig. 10(a) and Fig. 10(b) we can observe the average respectively, which demonstrate that the proposed scheme with
unsatisfied cost and cost of the considered schemes vibrate best predictions outperforms the Propose-MDP scheme when
evenly with the increase of |N |, the Propose-Real scheme |N | ≥ 15.
gets the lowest average unsatisfied cost and system cost

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

16

3) The Proposed Scheme Adopts Different Training [5] K. Smida, H. Tounsi, M. Frikha, and Y.-Q. Song, “Fens: Fog-enabled
Paradigms: Finally, to see how the FL paradigm affects the network slicing in SDN/NFV-based IoV,” Wirel. Pers. Commun., vol.
128, no. 3, pp. 2175–2202, Sept. 2023.
NSM performance, we compare the performance of the consid- [6] Y. Wu, H.-N. Dai, H. Wang, Z. Xiong, and S. Guo, “A survey of
ered schemes versus different |N | where the proposed scheme intelligent network slicing management for industrial IoT: Integrated
adopts different training paradigms, as shown in Fig. 11. approaches for smart transportation, smart energy, and smart factory,”
IEEE Commun. Surv. Tutorials, vol. 24, no. 2, pp. 1175–1211, Apr.
Here, we utilize the centralized training paradigm to train the 2022.
LSTM and DRL models as a baseline in which all the VNFs [7] Z. Shu, T. Taleb, and J. Song, “Resource allocation modeling for
upload the collected user data in the collection phase and the fine-granular network slicing in beyond 5G systems,” IEICE Trans.
Commun., vol. 105, no. 4, pp. 349–363, Apr. 2022.
interactive data in the migration phase to the remote cloud for
[8] X. Tang, L. Zhao, J. Chong, Z. You, L. Zhu, H. Ren, Y. Shang, Y. Han,
model training. From Fig. 11(a) to Fig. 11(d) we can observe and G. Li, “5G-based smart healthcare system designing and field trial
the NSM performances of the proposed FL-based scheme and in hospitals,” IET Commun., vol. 16, no. 1, pp. 1–13, Jan. 2022.
the non-FL based scheme are very close to each other, but [9] S. Karunarathna, S. Wijethilaka, P. Ranaweera, K. T. Hemachandra,
T. Samarasinghe, and M. Liyanage, “The role of network slicing and
according to Fig. 12(a) and Fig. 12(b) we can see the non- edge computing in the metaverse realization,” IEEE Access, vol. 11, pp.
FL based scheme has much higher communication overhead 25 502–25 530, Mar. 2023.
and transmission time for model training, especially for the [10] Y. Sun, S. Qin, G. Feng, L. Zhang, and M. A. Imran, “Service provi-
sioning framework for RAN slicing: user admissibility, slice association
cases with more ESs, which demonstrates that the proposed and bandwidth allocation,” IEEE Trans. Mob. Comput., vol. 20, no. 12,
FL-based scheme can significantly reduce the communication pp. 3409–3422, Dec. 2020.
overhead and transmission time for model training with the [11] A. Papa, A. Jano, S. Ayvaşık, O. Ayan, H. M. Gürsu, and W. Kellerer,
NSM performance ensured. Particularly, in Fig. 12(b) we “User-based quality of service aware multi-cell radio access network
slicing,” IEEE Trans. Netw. Serv. Manag., vol. 19, no. 1, pp. 756–768,
notice that the transmission time of the proposed FL-based Mar. 2021.
scheme remains relatively stable when we have less than 25 [12] Y. B. Slimen, J. Balcerzak, A. Pagès, F. Agraz, S. Spadaro, K. Kout-
ESs, since in these scenarios we find the number of users sopoulos, M. Al-Bado, T. Truong, P. G. Giardina, and G. Bernini,
“Quality of perception prediction in 5G slices for e-health services using
each VNF is serving, i.e., Uv are even, contributing to lower user-perceived QoS,” Comput. Commun., vol. 178, pp. 1–13, May 2021.
overall local model training time (depends on the largest local [13] S. Choudhury, S. Das, S. Paul, I. Seskar, and D. Raychaudhuri, “Intel-
model training time among all VNFs). Overall, the proposed ligent agent support for achieving low latency in cloud-native nextg
mobile core networks,” in Proc. ACM International Conference on
FL-based scheme can reduce the communication overhead by Distributed Computing and Networking (ICDCN), Jan. 2023, pp. 12–
up to 71.39% and save the transmission time over 3.67 times 19.
compared to non-FL-based schemes. [14] R. A. Addad, T. Taleb, H. Flinck, M. Bagaa, and D. Dutra, “Network
slice mobility in next generation mobile systems: challenges and poten-
tial solutions,” IEEE Netw., vol. 34, no. 1, pp. 84–93, Jan. 2020.
VI. C ONCLUSION [15] R. A. Addad, D. L. C. Dutra, T. Taleb, and H. Flinck, “Toward using
reinforcement learning for trigger selection in network slice mobility,”
In this paper, we have investigated the NSM problem from IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 2241–2253, July 2021.
a periodical view to mimic that the VNF should be migrated [16] W. Wang, Q. Chen, X. He, and L. Tang, “Cooperative anomaly detec-
only after the NSM trigger is detected. We designed a general tion with transfer learning-based hidden Markov model in virtualized
network slicing,” IEEE Commun. Lett., vol. 23, no. 9, pp. 1534–1537,
system model for periodical NSM and formulated the problem Sept. 2019.
of maximizing the long-term profit of MNOs. To solve this [17] F. Rezazadeh, H. Chergui, L. Christofi, and C. Verikoukis, “Actor-critic-
problem, we first model the NSM decision-making process based learning for zero-touch joint resource and energy control in net-
work slicing,” in Proc. IEEE Internation Conference on Communications
as an MDP and creatively propose a prediction-based FDRL (ICC), June 2021, pp. 1–6.
framework to solve it. The prediction of further system infor- [18] J. Zhou, W. Zhao, and S. Chen, “Dynamic network slice scaling assisted
mation is adopted as the suppliance of the system state, and we by prediction in 5G network,” IEEE Access, vol. 8, pp. 133 700–133 712,
July 2020.
set the overall feedback for each period as the system reward.
[19] Y. Xu, J. Yu, and R. M. Buehrer, “The application of deep reinforcement
By performing extensive experiments, simulation results have learning to distributed spectrum access in dynamic heterogeneous en-
demonstrated that our proposed scheme outperforms baseline vironments with partial observations,” IEEE Trans. Wireless Commun.,
schemes in improving long-term profit and can significantly vol. 19, no. 7, pp. 4494–4506, May 2020.
[20] A. M. Ibrahim, K.-L. A. Yau, Y.-W. Chong, and C. Wu, “Applications of
reduce communication overhead and save transmission time. multi-agent deep reinforcement learning: Models and algorithms,” Appl.
Sci., vol. 11, no. 22, p. 10870, Nov. 2021.
[21] J. Hao, T. Yang, H. Tang, C. Bai, J. Liu, Z. Meng, P. Liu, and
R EFERENCES Z. Wang, “Exploration in deep reinforcement learning: From single-
[1] H. Yu, Z. Ming, C. Wang, and T. Taleb, “Network slice mobility for 6G agent to multiagent domain,” IEEE Trans. Neural Netw. Learn. Syst.,
networks by exploiting user and network prediction,” in Proc. IEEE pp. 1–21, Jan. 2023.
International Conference on Communications (ICC), May 2023, pp. [22] X. Chen, G. Han, Y. Bi, Z. Yuan, M. K. Marina, Y. Liu, and H. Zhao,
4905–4911. “Traffic prediction-assisted federated deep reinforcement learning for
[2] B. Ji, Y. Han, S. Liu, F. Tao, G. Zhang, Z. Fu, and C. Li, “Several service migration in digital twins-enabled MEC networks,” IEEE J. Sel.
key technologies for 6G: challenges and opportunities,” IEEE Commun. Areas Commun., vol. 41, no. 10, pp. 3212–3229, Aug. 2023.
Stand. Mag., vol. 5, no. 2, pp. 44–51, June 2021. [23] J. Cai, A. Du, X. Liang, and S. Li, “Prediction-based path planning
[3] Y. Liu, Y. Deng, A. Nallanathan, and J. Yuan, “Machine learning for 6G for safe and efficient human–robot collaboration in construction via
enhanced ultra-reliable and low-latency services,” IEEE Wirel. Commun., deep reinforcement learning,” J. Comput. Civil Eng., vol. 37, no. 1,
vol. 30, no. 2, pp. 48–54, Apr. 2023. p. 04022046, Oct. 2022.
[4] T. K. Rodrigues and N. Kato, “Network slicing with centralized and dis- [24] W. Wang, C. Liang, Q. Chen, L. Tang, H. Yanikomeroglu, and T. Liu,
tributed reinforcement learning for combined satellite/ground networks “Distributed online anomaly detection for virtualized network slicing
in a 6G environment,” IEEE Wirel. Commun., vol. 29, no. 1, pp. 104– environment,” IEEE Trans. Veh. Technol., vol. 71, no. 11, pp. 12 235–
110, Feb. 2022. 12 249, July 2022.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2024.3404125

17

[25] X. Vasilakos, N. Nikaein, D. H. Lorenz, B. Koksal, and N. Ferdosian, Advances in Neural Information Processing Systems (NIPS), vol. 34, pp.
“Integrated methodology to cognitive network slice management in 20 436–20 446, Dec. 2021.
virtualized 5G networks,” arXiv preprint arXiv:2005.04830, May 2020. [48] Y. Zhou, Q. Ye, and J. Lv, “Communication-efficient federated learning
[26] Y. Xiao and M. Krunz, “Dynamic network slicing for scalable fog com- with compensated overlap-fedavg,” IEEE Trans. Parallel. Distrib. Syst.,
puting systems with energy harvesting,” IEEE J. Sel. Areas Commun., vol. 33, no. 1, pp. 192–205, Jan. 2021.
vol. 36, no. 12, pp. 2640–2654, Sept. 2018. [49] H. Sak, A. W. Senior, and F. Beaufays, “Long short-term memory
[27] H. Halabian, “Distributed resource allocation optimization in 5G vir- recurrent neural network architectures for large scale acoustic modeling,”
tualized networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 3, pp. arXiv preprint arXiv:1402.1128, 2014.
627–642, Mar. 2019. [50] X. Wang, C. Wang, X. Li, V. C. M. Leung, and T. Taleb, “Federated
[28] S. Dawaliby, A. Bradai, and Y. Pousset, “Distributed network slicing in deep reinforcement learning for internet of things with decentralized
large scale IoT based on coalitional multi-game theory,” IEEE Trans. cooperative edge caching,” IEEE Internet Things J., vol. 7, no. 10, pp.
Netw. Serv. Manag., vol. 16, no. 4, pp. 1567–1580, Dec. 2019. 9441–9455, Oct. 2020.
[29] M. A. Hossain and N. Ansari, “Energy aware latency minimization for [51] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network
network slicing enabled edge computing,” IEEE Trans. Green Commun. structure, dynamics, and function using networkx,” in Proc. Python in
Netw., vol. 5, no. 4, pp. 2150–2159, Dec. 2021. Science Conference (SciPy), Jan. 2008, pp. 11 – 15.
[30] Y. E. Oktian, S. Lee, H. Lee, and J. Lam, “Distributed SDN controller
system: A survey on design choice,” Comput. Netw., vol. 121, pp. 100–
111, Apr. 2017.
[31] H. Chergui, L. Blanco, L. A. Garrido, K. Ramantas, S. Kukliński,
A. Ksentini, and C. Verikoukis, “Zero-touch AI-driven distributed man-
agement for energy-efficient 6G massive network slicing,” IEEE Netw., Zhao Ming is currently a Ph.D. student at the Centre
vol. 35, no. 6, pp. 43–49, Nov. 2021. for Wireless Communications (CWC), University
[32] Z. Zhu and H. Zhao, “A survey of deep RL and IL for autonomous of Oulu, Oulu, Finland. He received the M.E. de-
driving policy learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, gree from Chongqing University (CQU), Chongqing,
pp. 14 043–14 065, Dec. 2021. China, in 2022, and the B.S. degree from Wuhan
University of Technology (WHUT), Wuhan, China,
[33] J. Zeng, D. Ding, K. Kang, H. Xie, and Q. Yin, “Adaptive DRL-based
in 2018. His current research interests include net-
virtual machine consolidation in energy-efficient cloud data center,”
work slicing, multi-access edge computing, anomaly
IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 11, pp. 2991–3002,
detection, and AI/ML technologies and their combi-
Feb. 2022.
nations.
[34] Y. Tu, H. Chen, L. Yan, and X. Zhou, “Task offloading based on LSTM
prediction and deep reinforcement learning for efficient edge computing
in IoT,” Future Internet, vol. 14, no. 2, p. 30, Jan. 2022.
[35] A. Havens, Y. Ouyang, P. Nagarajan, and Y. Fujita, “Learning latent
state spaces for planning through reward prediction,” arXiv preprint
arXiv:1912.04201, 2019.
[36] S. Q. Jalil, S. Chalup, and M. H. Rehmani, “Cognitive radio spectrum Hao Yu received the B.E. and Ph.D. degree in
sensing and prediction using deep reinforcement learning,” in Proc. communication engineering from the Beijing Uni-
International Joint Conference on Neural Networks (IJCNN). IEEE, versity of Posts and Telecommunications (BUPT),
Sept. 2021, pp. 1–8. Beijing, China, in 2015 and 2020. He was also a
[37] F. Rezazadeh, L. Zanzi, F. Devoti, H. Chergui, X. Costa-Perez, and joint-supervised Ph.D. Student with the Politecnico
C. Verikoukis, “On the specialization of FDRL agents for scalable and di Milano, Milano, Italy, and was a Postdoctoral
distributed 6G RAN slicing orchestration,” IEEE Trans. Veh. Technol., Researcher with the Center of Wireless Communi-
vol. 72, no. 3, pp. 3473–3487, Oct. 2022. cations (CWC), University of Oulu, Finland. He is
[38] J. Liu, S. Zhang, H. Nishiyama, N. Kato, and J. Guo, “A stochastic currently a senior researcher at ICTFICIAL Oy, Es-
geometry analysis of D2D overlaying multi-channel downlink cellular poo, Finland. His research interests include network
networks,” in Proc. IEEE Conference on Computer Communications automation, SDN/NFV, time-sensitive networks, and
(INFOCOM), Aug. 2015, pp. 46–54. deterministic networking.
[39] H. Abdah, J. P. Barraca, and R. L. Aguiar, “QoS-aware service continuity
in the virtualized edge,” IEEE Access, vol. 7, pp. 51 570–51 588, Apr.
2019.
[40] L. Zanzi, V. Sciancalepore, A. Garcia-Saavedra, H. D. Schotten, and
X. Costa-Pérez, “LACO: A latency-driven network slicing orchestration
in beyond-5G networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1, Tarik Taleb (Senior Member, IEEE) received the
pp. 667–682, Oct. 2020. B.E. degree (with distinction) in information engi-
[41] S. O. Oladejo and O. E. Falowo, “Latency-aware dynamic resource allo- neering and the M.Sc. and Ph.D. degrees in infor-
cation scheme for multi-tier 5G network: A network slicing-multitenancy mation sciences from Tohoku University, Sendai,
scenario,” IEEE Access, vol. 8, pp. 74 834–74 852, Apr. 2020. Japan, in 2001, 2003, and 2005, respectively. He
is currently a Full Professor at Ruhr University
[42] L. Meng, Y. Ren, B. Zhang, J.-Q. Li, H. Sang, and C. Zhang, “MILP
Bochum, Germany. He was a Professor with the
modeling and optimization of energy-efficient distributed flexible job
Center of Wireless Communications (CWC), Uni-
shop scheduling problem,” IEEE Access, vol. 8, pp. 191 191–191 203,
versity of Oulu, Oulu, Finland. He is the founder of
Oct. 2020.
ICTFICIAL Oy, and the founder and the Director of
[43] Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural
the MOSA!C Lab, Espoo, Finland. From October
networks: LSTM cells and network architectures,” Neural Comput.,
2014 to December 2021, he was an Associate Professor with the School
vol. 31, no. 7, pp. 1235–1270, July 2019.
of Electrical Engineering, Aalto University, Espoo, Finland. Prior to that,
[44] R. Shashidhar, S. Patilkulkarni, and S. Puneeth, “Combining audio and he was working as a Senior Researcher and a 3GPP Standards Expert with
visual speech recognition using LSTM and deep convolutional neural NEC Europe Ltd., Heidelberg, Germany. Before joining NEC and till March
network,” INT. J. Inf. Technol., vol. 14, no. 7, pp. 3425–3436, Dec. 2009, he worked as an Assistant Professor with the Graduate School of
2022. Information Sciences, Tohoku University, in a lab fully funded by KDDI.
[45] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre- From 2005 to 2006, he was a Research Fellow with the Intelligent Cosmos
sentations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. Research Institute, Sendai. He has been directly engaged in the development
533–536, Oct. 1986. and standardization of the Evolved Packet System as a member of the
[46] Z. Zhang, “Improved Adam optimizer for deep neural networks,” 3GPP System Architecture Working Group. His current research interests
in Proc. IEEE/ACM International Symposium on Quality of Service include AI-based network management, architectural enhancements to mobile
(IWQoS), Jan. 2018, pp. 1–2. core networks, network softwarization and slicing, mobile cloud networking,
[47] Y. Fei, Z. Yang, Y. Chen, and Z. Wang, “Exponential bellman equation SDN/NFV, software-defined security, and mobile multimedia streaming.
and improved regret bounds for risk-sensitive reinforcement learning,”

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like