Online Optimal Service Caching For Multi Access Edge Computing - 2024 - Computer
Online Optimal Service Caching For Multi Access Edge Computing - 2024 - Computer
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
Keywords: In order to fully exploit the power of Multi-Access Edge Computing, services need to be cached at the network
Multi-access edge computing edge in an adaptive and responsive way to accommodate the high system dynamics and uncertainty. In this
Service selection paper, we study the online service caching problem in MEC, with the goal to minimize users’ perceived latency
Service caching
while at the same time, ensure the rate of tasks processed by the edge server is no less than a preset threshold.
Constrained multi-armed bandit
We model the problem with a Constrained stochastic Multi-Armed Bandit formulation, and propose √ a simple
Online algorithm
yet effective online caching algorithm called Constrained Confidence Bound (CCB). CCB achieves 𝑂( 𝑇 ln 𝑇 )
bounds on both regret and violation of the constraint, and is able to achieve a good balance between them.
We further consider the scenario when there is cost (i.e., delay) due to service switches, and propose two
service switch-aware caching algorithms — Explore-First (EF) and Successive Elimination-based (SE) caching,
2 1
together with a novel sampling
√ scheme. We prove that EF achieves 𝑂(𝑇 3 (ln 𝑇 ) 3 ) bound on regret and violation,
whereas SE achieves 𝑂( 𝑇 ln 𝑇 ) and converges significantly faster. Lastly, we conduct extensive simulations
to evaluate our algorithms and results demonstrate their superior performance over baselines.
1. Introduction and can only accommodate a limited number of services. For example,
network operators usually implement cloudlet based mobile computing
It has been shown repeatedly that innovative network architectures using a computing server with small resources or a cluster with medium
and key technologies enable and empower crucial applications. It is resources. This raises the service placement problem as when and where
our consensus today that the reverse is also true as applications are to host the services at the edge nodes. Apparently, the performance of
shaping/changing the network architectures and technologies. Take MEC varies significantly depending on service placement.
for example emerging applications such as autonomous driving [1],
The service placement problem (SPP), sometimes also referred to
AR/VR [2], and networked gaming. Being resource-hungry and delay-
as service caching [6], has attracted a lot of research in the past
sensitive, these applications impose a stringent requirement on both
few years, and various algorithms [7–10], i.e., exact/approximate,
computing and networking capacity, but which cannot be met solely
by the existing cloud systems due to the long propagation delay static/dynamic, centralized/decentralized, have been proposed. Yet de-
and unstable network connections. In this context, a new network signing an optimal policy for service caching remains a challenge due to
computing paradigm called multi-access edge computing (MEC [3, the high heterogeneity and dynamics of both the system and workload.
4]), has been put forward. The key feature of MEC is that services The problem becomes even more challenging when we consider it in an
are hosted at various type of edge nodes endowed with comput- online setting where the caching decisions have to be made as system
ing/storage/communication capacities, so that low-latency access to operates, but without a priori knowledge of user-generated workload
services are possible. and network condition. In fact, these critical information such as task
With MEC, users can offload their tasks to the edge nodes (a.k.a offloading delays are stochastic and unobservable unless the services
MEC servers) for high energy-efficiency, fast responses and enhanced are cached. Moreover, from a practical point of view, it is expected that
security/privacy protection [5]. However, as compared with cloud in- the online caching algorithms can provide us provable performance
frastructure (e.g., data centers) which can virtually host all the services
guarantee.
with abundant resources, MEC servers are often resource-constrained
∗ Corresponding author.
E-mail addresses: [email protected] (W. Chu), [email protected] (X. Zhang), [email protected] (X. Jia), [email protected]
(J.C.S. Lui), [email protected] (Z. Wang).
https://doi.org/10.1016/j.comnet.2024.110395
Received 19 September 2023; Received in revised form 29 February 2024; Accepted 2 April 2024
Available online 16 April 2024
1389-1286/© 2024 Elsevier B.V. All rights reserved.
W. Chu et al. Computer Networks 246 (2024) 110395
2
W. Chu et al. Computer Networks 246 (2024) 110395
Table 1 MEC server is much smaller than task processing delay; (2) To imple-
Main notations.
ment multi-access edge computing, network infrastructure providers
Symbol Definition usually deploy MEC servers at locations such as a base station in a
Set of services in system, || = 𝐾 cellular network, or a gateway in an enterprise local area network. In
𝐿 Server capacity
such a setup, it is common that we have identical delays between UEs
ℎ A threshold denoting the rate of tasks processed by the MEC
server and the MEC server. The overall effect is that this delay can be ignored
𝛿 Failure probability when we want to optimize users’ perceived latency, as doing this will
𝛽𝑖𝑡 Arrival rate of tasks for service 𝑖 at time 𝑡 not make too much differences.
𝑚𝑡𝑖 MEC offloading delay of tasks for service 𝑖 at time 𝑡
𝑐𝑖𝑡 Cloud computing delay of tasks for service 𝑖 at time 𝑡 2.2. Problem formulation
(𝛽𝑖 , 𝑚𝑖 , 𝑐𝑖 ) Expectation of (𝛽𝑖𝑡 , 𝑚𝑡𝑖 , 𝑐𝑖𝑡 )
(𝛽̄𝑖𝑡 , 𝑚̄ 𝑡𝑖 , 𝑐̄𝑖𝑡 ) Empirical means of ({𝛽𝑖𝑡 }, {𝑚𝑡𝑖 }, {𝑐𝑖𝑡 }) up to time 𝑡
(𝛽̌𝑖𝑡 , 𝑚̌ 𝑡𝑖 , 𝑐̌𝑖𝑡 ) Lower Confidence Bound for (𝛽𝑖 , 𝑚𝑖 , 𝑐𝑖 ) at time 𝑡 To address the online caching problem (1), we model it with a Con-
(𝛽̂𝑖𝑡 , 𝑚̂ 𝑡𝑖 , 𝑐̂𝑖𝑡 ) Upper Confidence Bound for (𝛽𝑖 , 𝑚𝑖 , 𝑐𝑖 ) at time 𝑡 strained Multi-Armed Bandit (CMAB) formulation. More specifically,
𝑚𝑡𝑖,11 MEC offloading delay of tasks for service 𝑖 of type 11 at each service is regarded as an arm and caching a service is equivalent to
time 𝑡 pulling an arm. The set of arms thus can be written as = {1, 2, … , 𝐾}.
UCB𝑡 (𝑚𝑖,11 ) Upper Confidence Bound for 𝑚𝑖,11 at time 𝑡 For each arm 𝑖 ∈ , we associate it with three feedbacks: {𝑚𝑡𝑖 }𝑇𝑡=1 ,
LCB𝑡 (𝑚𝑖,11 ) Lower Confidence Bound for 𝑚𝑖,11 at time 𝑡
{𝑐𝑖𝑡 }𝑇𝑡=1 and {𝛽𝑖𝑡 }𝑇𝑡=1 . We assume that all these sequences are made of
𝐱∗ Optimal caching policy iid. random variables.
𝐱𝐭 Caching decision at time 𝑡
Let 𝑚𝑖 = E[𝑚𝑡𝑖 ], 𝑐𝑖 = E[𝑐𝑖𝑡 ] and 𝛽𝑖 = E[𝛽𝑖𝑡 ]. Also denote 𝐦 =
𝐱𝑡 (𝜋) Caching decision made by algorithm 𝜋 at time 𝑡
𝑡 (𝜋) Set of services selected by algorithm 𝜋 at time 𝑡 (𝑚1 , 𝑚2 , … , 𝑚𝐾 )T , 𝐜 = (𝑐1 , 𝑐2 , … , 𝑐𝐾 )T and 𝜷 = (𝛽1 , 𝛽2 , … , 𝛽𝐾 )T . Let be
the set of all valid caching decisions, i.e., = {𝐱 ∈ {0, 1}𝐾 |𝟏T 𝐱 ≤ 𝐿}.
When all these information are available, the best (and static) caching
policy 𝐱∗ can be put as:
Let 𝐱𝐭 = (𝑥𝑡1 , 𝑥𝑡2 , … , 𝑥𝑡𝐾 )T be the caching decision at time slot 𝑡,
where 𝑥𝑡𝑖 = 1 if service 𝑖 is cached, and 𝑥𝑡𝑖 = 0 otherwise.1 Also denote 𝐱∗ = argmin: 𝐱T 𝐦 + (𝟏 − 𝐱)T 𝐜 (2)
𝑡 )T and 𝜷 𝑡 = (𝛽 𝑡 , 𝛽 𝑡 , … , 𝛽 𝑡 )T . 𝐱∈,𝜷 T 𝐱≥ℎ
𝐦𝐭 = (𝑚𝑡1 , 𝑚𝑡2 , … , 𝑚𝑡𝐾 )T , 𝐜𝐭 = (𝑐1𝑡 , 𝑐2𝑡 , … , 𝑐𝐾 1 2 𝐾
Here, in addition to optimize users’ perceived latency, we also enforce As we have mentioned above, the expectations and distributions of
another constraint to the caching policy that the aggregate rate of tasks the three feedbacks are unknown beforehand, and therefore we have to
processed by the MEC server is no less than a preset threshold ℎ > 0. learn and make decisions based on their estimates. At each time slot 𝑡,
Note that this constraint can be regarded as the QoS requirement from an algorithm 𝜋 makes caching decision 𝐱𝑡 (𝜋) ∈ , and selects services
the service providers. Indeed, one of the most significant benefits that 𝑡 (𝜋) ⊂ . It then observes {𝑚𝑡𝑖 , 𝛽𝑖𝑡 } for each 𝑖 ∈ 𝑡 (𝜋), and {𝑐𝑗𝑡 , 𝛽𝑗𝑡 } for
a service provider can expect from multi-access edge computing is that each 𝑗 ∈ ⧵ 𝑡 (𝜋). Our objective is to design an algorithm 𝜋 to decide
the vast majority of their tasks are processed at the network edge and the caching set 𝑡 (𝜋) for 𝑡 = 1, 2, … , 𝑇 such that it achieves the minimal
the workload on cloud thus can be dramatically decreased. This at the regret, i.e., the accumulated difference between the latency under 𝜋 and
same time alleviates network congestion for the infrastructure provider. that under the optimal policy 𝐱∗ , which is defined as:
Let 𝟏 be the one column vector, i.e. 𝟏 = (1, 1, … , 1)T . With the above
∑
𝑇 ∑ ∑
notations, we can formulate the online service caching problem as the 𝑅𝑒𝑔𝜋 (𝑇 ) = ( 𝑚𝑡𝑖 + 𝑐𝑗𝑡 ) − 𝑇 (𝐱∗ T 𝐦 + (𝟏 − 𝐱∗ )T 𝐜) (3)
following optimization problem: 𝑡=1 𝑖∈𝑡 (𝜋) 𝑗∈⧵𝑡 (𝜋)
∑
𝑇 It is worthy noting that an algorithm 𝜋 which achieves a low regret
T T
Min: 𝐱𝐭 𝐦𝐭 + (𝟏 − 𝐱𝐭 ) 𝐜𝐭 (1a) may violate the QoS constraint, especially when it has little information
{𝐱𝐭 } 𝑡=1 about services/arms. We define violation of the algorithm as the gap
s.t.: 𝟏T 𝐱𝐭 ≤ 𝐿, ∀𝑡 ≤ 𝑇 , (1b) between the accumulated rate of tasks processed by the edge server
𝑡T 𝐭 and the target rate, as follows:
𝜷 𝐱 ≥ ℎ, (1c)
𝑥𝑡𝑖 ∈ {0, 1}, ∀𝑖 ∈ , 𝑡 ≤ 𝑇 , (1d) ∑
𝑇 ∑
𝑉 𝑖𝑜𝜋 (𝑇 ) = [ℎ𝑇 − 𝛽𝑖𝑡 ]+ (4)
∑𝑇 T T 𝑡=1 𝑖∈𝑡 (𝜋)
where 𝑡=1 + 𝐱𝐭 𝐦𝐭 (𝟏 − 𝐱𝐭 ) 𝐜𝐭
is the aggregate users’ perceived delay
over the whole time horizon 𝑇 . Constraint (1b) is for the resource where [𝑥]+ = max{𝑥, 0}.
limitation at the MEC server, and (1c) is for the QoS requirement. Both the regret and violation are important performance indexes
Problem (1) is an integer linear programming (ILP) problem that can and both should be taken into account when we design the algorithm.
be well solved by existing algorithms. The key obstacle here is that in A small regret means the caching policy by the algorithm is close to the
an online setting, {𝐦𝐭 , 𝐜𝐭 , 𝜷 𝑡 } are not known in advance when it comes optimal one, and a small violation implies that the QoS requirement is
to time slot 𝑡, but rather they are revealed after the caching decision is well satisfied during the service caching process. In practice, an algo-
made. rithm with sub-linear bounds on both regret and violation is considered
acceptable and applicable.
Remark. The assumption that the network delay between UEs and the
MEC server can be ignored is for the following two considerations: 3. CCB and its performance evaluation
(1) MEC servers are generally deployed at the network edge in close
proximity to end-users, whereas cloud are located much further away. In this section, we present our Constrained Confidence Bound (CCB)
This implies that the delay between UEs and the MEC server is much algorithm and its performance under simulation. The idea behind CCB
smaller than that between the MEC server and cloud. Furthermore, it is straightforward: (1) at each time slot 𝑡 we can derive a valid caching
is typically also true that the transmission delay between UEs and the policy through solving the optimization problem (1); and (2) although
the system parameters are unknown at the time of decision, we can re-
place them by estimates. This should work well as long as the algorithm
1
Unless otherwise specified, all vectors defined in this paper are column can provide us with good estimates, i.e., Lower/Upper Confidence
vectors. Bound.
3
W. Chu et al. Computer Networks 246 (2024) 110395
Denote by 𝑡 = {𝜏 , 𝑚𝜏𝑖 , 𝑐𝑖𝜏 , 𝛽𝑖𝜏 ∶ 𝑖 ∈ 𝜏 , 1 ≤ 𝜏 ≤ 𝑇 } be the history Remark. CCB is based on Con-UCB [14], but with the following differ-
of caching decisions and the observed feedback up to time slot 𝑡. CCB ences: (1) Con-UCB is proposed to tackle the online decision problem
maintains the following empirical means for each arm 𝑖 ∈ at each where the objective function is a function of one parameter only (can
time 𝑡: be in the form of multi-level feedback), whereas CCB can be applied
∑ 𝜏
𝜏<𝑡,𝑖∈𝜏 𝑚𝑖 when there are two or more such parameters in the objective function;
𝑚̄ 𝑡𝑖 = 𝑡
(5) (2) In Con-UCB, all feedback are bandit feedback, i.e., observations
𝑁 +1
∑ 𝑖,M 𝜏 can only be made if the arm is selected. On the other hand, CCB
𝜏<𝑡,𝑖∉𝜏 𝑐𝑖
𝑐̄𝑖𝑡 = (6) allows both bandit feedback and full feedback, i.e., {𝛽𝑖𝑡 }. Moreover, CCB
𝑁𝑡 + 1 allows feedback that are complementary, i.e., one parameter that can
∑ 𝑖,C 𝜏
𝜏<𝑡,𝑖∈𝜏 𝛽𝑖 be observed if the arm is selected, and the other one be observed if
𝛽̄𝑖𝑡 = (7) the arm is not selected, but which cannot be simultaneously observed,
𝑡
𝑡 𝑡 are the number of times arm 𝑖 is selected and not i.e., {𝑚𝑡𝑖 } and {𝑐𝑖𝑡 }. These feedback need to be properly handled in the
where 𝑁𝑖,M and 𝑁𝑖,C
𝑡 + 𝑁 𝑡 + 1 = 𝑡. performance analysis. As a result, CCB is a generalization of Con-UCB.
selected before time 𝑡, respectively. Obviously, 𝑁𝑖,M
√ 𝑖,C
𝛾𝜇 𝛾
Let 𝑅(𝜇, 𝑛) = 𝑛
+ 𝑛
as in [13] and 𝛾 is a positive constant. Define 3.2. Simulation results
the following Lower Confidence Bound for 𝑚𝑖 and 𝑐𝑖 at each time 𝑡:
We use simulations to investigate the performance of CCB, with both
𝑚̌ 𝑡𝑖 = max{0, 𝑚̄ 𝑡𝑖 − 2𝑅(𝑚̄ 𝑡𝑖 , 𝑁𝑖,M
𝑡
+ 1)} (8)
synthetic workload and real dataset. For synthetic workload, we assume
the popularity of services follows a Zipf distribution with skewdness
𝑐̌𝑖𝑡 = max{0, 𝑐̄𝑖𝑡 − 2𝑅(𝑐̄𝑖𝑡 , 𝑁𝑖,C
𝑡
+ 1)} (9) parameter 𝑠 ∈ {0.6, 0.8, 1.0}. Task parameters are configured as follows:
and Upper Confidence Bound for 𝛽𝑖 : we divide task sizes into a set of intervals as [0.1 MB, 0.3 MB], [0.3 MB,
0.5 MB], [0.5 MB, 0.8 MB], [0.8 MB, 1 MB], [1 MB, 3 MB], [3 MB,
𝛽̂𝑖𝑡 = min{1, 𝛽̄𝑖𝑡 + 2𝑅(𝑐̄𝑖𝑡 , 𝑡)} (10) 5 MB], [5 MB, 8 MB], and [8 MB, 10 MB] [15,16]. Note that these
𝐭 intervals are not of the same lengths. The task size of each service falls
Denote by 𝐦 ̌𝐭 = (𝑚̌ 𝑡1 , 𝑚̌ 𝑡2 , … , 𝑚̌ 𝑡𝐾 ),
𝐜̌𝐭 = and 𝜷̂ =
(𝑐̌1𝑡 , 𝑐̌2𝑡 , … , 𝑐̌𝐾
𝑡 ),
̂𝑡 ̂𝑡 ̂𝑡 in one of these intervals that are randomly picked, and once fixed, the
(𝛽1 , 𝛽2 , … , 𝛽𝐾 ). As depicted in Alg. 1, the input of CCB includes the arm
size of a task for the service is uniformly chosen from that interval. The
set , the server capacity 𝐿, the QoS requirement ℎ, the time horizon
computing intensity of tasks (in CPU cycles per bit) are drawn randomly
𝑇 , and 𝛿 ∈ (0, 1) which is a failure probability. CCB starts with 𝛾 set to
72 ln 2𝐾𝑇 . It then solves problem (1) at each time 𝑡 with the system from [100, 200, 300, 400, 500] [17], which represents certain amount
𝛿 of task heterogeneity and skewed workload distribution.
𝐭
parameters (𝐦, 𝐜, 𝜷) being replaced by (𝐦 ̌ 𝐭 , 𝐜̌𝐭 , 𝜷̂ ) to get the selected Meanwhile, we assume tasks arrive independently and the aggre-
arms 𝑡 . After that, it updates the upper/low confidence bounds for gate request rate for services is 100 req/sec. The network bandwidth
each arm. The process repeats until time 𝑇 .
between cloud and the MEC server is 5 Mbps [18], and each service is
allocated to 5.6 GHz and 2.8 GHz CPU resources from cloud and the
Procedure 1 Constrained Confidence Bound algorithm for Caching MEC server, respectively. The time horizon is set as 𝑇 = 100 000, and
Services at the MEC server. the length of each slot is 100 s. Without otherwise specified, we set the
Input: , 𝐿, ℎ, 𝑇 , 𝛿; number of services in system as 𝐾 = 100 and that can be hosted at the
Output: Selected arms/services at each time slot; MEC server as 𝐿 = 10.
2𝐾𝑇 1
1: 𝛾 = 72 ln ̌ 1 = č 1 = 𝜷̂ = 𝟎, 𝑁 1 = 𝑁 1 = 0, ∀𝑖 ∈ .
,m 𝑖,M 𝑖,C
For performance evaluation, we adopt the following two algorithms
𝛿
2: for 𝑡 = 1, 2, … , 𝑇 do as baselines: (1) Random-Caching : this is the algorithm that randomly
3: Solve the following optimization problem: picks 𝐿 services to cache at the MEC server at each time slot; and (2)
Top-Rate-Caching : this is the algorithm that always cache the top 𝐿 most
̌ 𝐭 + (𝟏 − 𝐱)T č 𝐭
𝐱𝑡 = argmin: 𝐱T m (11) popular services, assuming that the knowledge of service popularity is
̂𝐭 T
𝐱∈,𝜷 𝐱≥ℎ given a priori.
Fig. 2 shows how each algorithm performs with different parameter
4: Select arms 𝑡 according to 𝐱𝑡 , and do the following updates for
settings in simulation. From these figures, we can see that: (1) as
each arm 𝑖 ∈ :
{ expected, in all cases Random-Caching performs worst as both the
𝑡 + 1,
𝑁𝑖,M ∀𝑖 ∈ 𝑡 regret and violation grow linearly in time; (2) the Top-Rate-Caching,
𝑡+1
𝑁𝑖,M = 𝑡 , . (12) which is able to satisfy the QoS constraint consistently, also does
𝑁𝑖,M ∀𝑖 ∈ ⧵ 𝑡
not provide a satisfactory delay performance for the linearly growing
{ regret (although it grows much more slowly than Random-Caching).
𝑡 + 1,
𝑁𝑖,C ∀𝑖 ∈ ⧵ 𝑡
𝑡+1
𝑁𝑖,C = . (13) This implies that in general, the top 𝐿 most popular services does
𝑡 ,
𝑁𝑖,M ∀𝑖 ∈ 𝑡 not coincide with the set of services that provides the most caching
gain. The reason is that workload distribution can be inconsistent with
𝑡+1
5: ̌ 𝑡+1 , č 𝑡+1 , 𝜷̂
Based on the received feedback, calculate (m ) service popularity distribution, as we configured in simulation; and
accordingly. (3) CCB gives the best performance among the three algorithms, in
that both the regret and violation grow sub-linearly as time elapses.
Moreover, it can be observed that CCB behaves exactly the same way as
The following theorem holds for CCB:
Random-Caching at the very beginning (i.e., 𝑡 < 20 000), for the fact that
during this period not enough samples are collected for services and as
Theorem 3.1. By running CCB, we have a probability at least 1 − 𝛿 such
a result, CCB is not able to differentiate them but have to randomly pick
that:
√ services. After that period, CCB gradually identifies/learns the optimal
2𝐾𝑇 services and the regret grows sub-linearly then.
𝑅𝑒𝑔(𝑇 ) = 𝑂((𝐾 − 𝐿) 𝐾𝑇 ln ),
𝛿 Fig. 3 shows the performance of each algorithm in trace-driven
√ simulation. We adopt a dataset from [19], which contains packet inter-
2𝐾𝑇
𝑉 𝑖𝑜(𝑇 ) = 𝑂(𝐾 𝐾𝑇 ln ). arrival times from 5 applications generated by 36 wireless devices. The
𝛿
4
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 2. Performance of CCB and the two baseline algorithms under different system settings.
wireless traces are used to generate workload, where each packet is the MEC server, but instead they have to be directed to the cloud
regarded as a request and each <device, application> as a service. for remote execution. The cost of service switches raises two new
In this way, we get workload for 94 services in total. Meanwhile, to problems for online caching algorithms, if they were designed without
have enough data for simulation, we stretch time axis in each trace properly considering it: (1) biased/inaccurate estimate of parameters,
by 1000 (so a millisecond becomes a second). Here, since the optimal in particular, the MEC-offloading delay; and (2) system performance
set of services is not known, we give the cumulative latency instead of degradation due to frequent service switches. Take CCB for example,
regret. Again, we can see that CCB outperforms the two baselines whose Fig. 4 shows its performance when there is no switching cost VS. there
latency and violation keep growing linearly all the time, whereas CCB is cost, where in the latter scenario we set the time to load a new service
slows down the growth, i.e., when 𝑡 > 40 000. as 20 s. It is evident that in call cases, the performance of CCB decreases
All in all, we believe that CCB serves as a good solution to online due to service switches, i.e., the regret grows faster and it takes more
caching problem if our goal is more about the long-term system perfor- time to converge. These results suggest that in practice, we need online
mance. In the next section, we will present an algorithm that converges caching algorithms that can properly handle the service switching cost.
much faster, which at the same time, grows much slower in both regret
and violation than CCB. 4.1. Problem formulation
5
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 3. Performance of CCB and the two baseline algorithms in trace-driven simulation: 𝐿 = 10, ℎ = 0.2, 𝛿 = 0.01.
• type 10 — this is the set of services cached at 𝑡 but will be absent 4.2. Explore-first algorithm
at 𝑡 + 1;
• type 11 — this is the set of services cached at 𝑡 and will also be The first algorithm we propose to deal with service switches is
present at 𝑡 + 1. Explore-First (EF). To start with, let us re-examine the online caching
problem. We make the following observations: (1) Our goal is to
Note that among the four categories, service switching cost is in- minimize the user-perceived latency, that is, to cache the 𝐿 services
curred only for services of type 01, i.e., when a new service is loaded. with the largest delay savings (under the given constraint), i.e., the gap
Likewise, for each service 𝑖, let 𝑚𝑡𝑖,11 and 𝑚𝑡𝑖,01 denote the delays of between MEC-offloading delay and cloud computing delay; and (2) The
type 11 and 01, respectively, and denote by 𝑐𝑖𝑡 the delays of type 00 optimal policy is:
and 10 (both for cloud computing). The online caching problem then
becomes: 𝐱∗ = argmin: 𝐱T 𝐦𝟏𝟏 + (𝟏 − 𝐱)T 𝐜 (16)
𝐱∈,𝜷 T 𝐱≥ℎ
∑
𝑇 ∑
Min: 𝑥𝑡−1 𝑡 𝑡 𝑡−1 𝑡 𝑡 𝑡 𝑡
(14a) Note that both do not involve the switching cost. It follows that if
𝑖 𝑥𝑖 𝑚𝑖,11 + (1 − 𝑥𝑖 )𝑥𝑖 𝑚𝑖,01 + (1 − 𝑥𝑖 )𝑐𝑖
{𝐱𝐭 } 𝑡=1 𝑖∈ we can estimate 𝐦𝟏𝟏 , 𝐜 and 𝜷 accurately based on feedback, then we
s.t.: 𝟏T 𝐱𝐭 ≤ 𝐿, ∀𝑡 ≤ 𝑇 , (14b) can always find a good solution. Meanwhile, since service switching
T
cost is incurred only when we explore new services, and multiple
𝜷 𝑡 𝐱𝐭 ≥ ℎ, (14c) services needs to be selected at each time slot, we need an efficient
𝑥𝑡𝑖 ∈ {0, 1}, ∀𝑖 ∈ , 𝑡 ≤ 𝑇 , (14d) sampling scheme with the following properties: (1) low cost (≪ C𝐿 𝐾 );
(2) arms/services are sampled uniformly, so as to simplify the algorithm
The above problem is a 0–1 quadratic programming problem that is design and its performance analysis; and (3) the frequency of service
much more complicated than problem (1), in that the caching decision switches are well controlled so that we can avoid too much switching
at each time slot not only depends on the unknown delays but also cost.
depends on the current caching state. A simple and heuristic algorithm Our Explore-First algorithm is based on a novel sampling scheme
is to select services at each time slot 𝑡 through solving the following with the above properties. We use segment as the basic unit to sample
0–1 LP problem given the current system state 𝐱𝐭−𝟏 : a given set of services, where each segment contains multiple rounds,
and each round consists of two successive time slots. The structure of a
𝐱𝐭 =∶ (15a) segment is depicted in Fig. 6. We discriminate two scenarios according
∑
argmin: 𝑥𝑡−1 𝑡 𝑡
+ (1 − 𝑥𝑡−1 𝑡 𝑡
+ (1 − 𝑥𝑡𝑖 )𝑐̌𝑖𝑡 (15b) to whether the number of services to sample 𝑆 can be divided by 𝐿:
𝑖 𝑥𝑖 𝑚
̌ 𝑖,11 𝑖 )𝑥𝑖 𝑚
̌ 𝑖,01
T
𝐱𝐭 ∈,𝜷̂𝐭 𝐱𝐭 ≥ℎ 𝑖∈
• 𝑆 mod 𝐿 = 0. In this case, whenever a new round begins, we
where 𝑚̌ 𝑡𝑖,11 and 𝑚̌ 𝑡𝑖,01 are LCBs of 𝑚𝑖,11 and 𝑚𝑖,01 , respectively. This select 𝐿 new services (at the first time slot), and keep hosting
approach, although straightforward, is greedy in nature, and far from these services at the second time slot, as shown in Fig. 6(a). To
optimal as shown in Fig. 5, where we can see that the performance of sample all the services, each segment contains 𝑆∕𝐿 rounds, and
the algorithm is identical or even worse (see Fig. 5(b)) than CCB. 2𝑆∕𝐿 time slots in total.
6
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 5. Performance comparison between CCB and the greedy algorithm when there is service switching cost.
7
W. Chu et al. Computer Networks 246 (2024) 110395
play the empirically best arm set, derived by solving the optimization computed at the MEC server (without switching cost) and the cloud,
problem 20. respectively.
Procedure 2 Explore-First algorithm for Caching Services at the MEC Let 𝑡 be the set of arms remaining active at time 𝑡. SE deactivates
server. arms according to the following rules:
Input: , 𝐿, ℎ, 𝑇 ; Rule 1: At the end of each segment (assuming at time 𝑡), identify
Output: Selected arms/services at each time slot; the set of arms such that arm 𝑖 ∈ if we can find some other arm
1: Exploration phase: Sample the set of arms with 𝑁 segments. 𝑗 ∈ 𝑡 with UCB𝑡 (𝑐𝑖 ) − LCB𝑡 (𝑚𝑖,11 ) < LCB𝑡 (𝑐𝑗 ) − UCB𝑡 (𝑚𝑗,11 );
2: Select the arm set (𝐱∗ ) by solving the following optimization Rule 2: Solve the following optimization problem for 𝑡 :
problem:
𝐱𝒕 = argmin: 𝐱𝐓 UCB𝒕 (𝒎𝟏𝟏 ) + (𝟏 − 𝐱)𝐓 UCB𝒕 (𝒄) (25)
𝐱∗ = argmin: 𝐱T m
̄ 𝟏𝟏 + (𝟏 − 𝐱)T c̄ (20) 𝐱∈ ,𝜷̄𝐓 𝐱≥𝒉
T
𝐱∈,𝜷̄ 𝐱≥ℎ Denote by (𝐱𝑡 ) the set of selected arms, obtain the set = 𝑡 ⧵ (𝐱𝑡 );
3: Exploitation phase: Play (𝐱∗ ) in all remaining time slots. Rule 3: Deactivate from 𝑡 the arms in both and .
Note that rule 1 is used to identify arms with small contribution,
As show in Alg. 2, here 𝑁 is a parameter chosen to minimize the
i.e., these arms are likely not the optimal ones. Rule 2 is used to
regret. It is a function of the time horizon 𝑇 , the number of arms 𝐾 and
identify arms that are not optimal with high confidence, where the
𝐿. In Appendix B, we will show how to properly set it.
QoS constraint has been properly taken into account. Therefore, the
We define the following regret for performance evaluation:
intersection of the two sets gives us arms that can be deactivated with
𝑅𝑒𝑔𝜋 (𝑇 ) =∶ high confidence. One can image that during the very first few segments
∑
𝑇 ∑ ∑ (21) = ∅ and is a random set, since not enough samples are collected
( 𝑚𝑖,11 + 𝑐𝑗 ) − 𝑇 (𝐱∗ T 𝐦, 𝟏𝟏 + (𝟏 − 𝐱∗ )T 𝐜) and the algorithm is not able to differentiate arms. As time elapses,
𝑡=1 𝑖∈𝑡 (𝜋) 𝑗∈⧵𝑡 (𝜋) becomes larger and becomes more accurate. Once an arm is in
Note that this definition is different from (3) as it uses the expected ∩, we are highly confident that it does not belong to the optimal set
latencies whereas realized latencies are adopted in (3). The definition and thus can be eliminated. Moreover, the active set becomes smaller
of violation remains unchanged as (4). as time elapses since more and more arms are deactivated, which
significantly decreases the sampling cost. See Alg. 3 for more details.
Theorem 4.1. By running Explore-First, we have the following bounds on
regret and violation: Procedure 3 Successive Elimination-based Algorithm for Caching
( 4 ) Services at the MEC server.
𝐾 13 23 1
𝑅𝑒𝑔(𝑇 ) = 𝑂 ( ) 𝑇 (ln 𝑇 ) 3 , Input: , 𝐿, ℎ, 𝑇 ;
𝐿
( 1 2 2 Output: Selected arms/services at each time slot;
1)
𝑉 𝑖𝑜(𝑇 ) = 𝑂 𝐾 3 𝐿 3 𝑇 3 (ln 𝑇 ) 3 . 1: 𝑡 = 1; = . # is the set of active arms;
2: for 𝑡 ≤ 𝑇 do
3: Sample with a segment.
Proof. See Appendix B. □
4: if || > 𝐿 then
5: = ∅; # denotes set of arms to be potentially deactivated;
4.3. Successive elimination-based algorithm
6: for 𝑖 ∈ 𝑆 do
7: if ∃𝑗 ∈ such that UCB𝑡 (𝑐𝑖 ) − LCB𝑡 (𝑚𝑖,11 ) < LCB𝑡 (𝑐𝑗 ) −
Explore-First is able to identify the optimal set of services precisely
UCB𝑡 (𝑚𝑗,11 ) then
given sufficient samples, however, the performance in the exploration
8: = ∪ {𝑖};
phase may be poor, especially when most of the arms have a large gap
9: Solve the following optimization problem for , and denote
compared with the optimal one. Here we present another caching algo-
the selected arms as (𝐱𝑡 ):
rithm, called Successive Elimination-based (SE) caching, that can sig-
nificantly decrease the sampling cost while at the same time, improve 𝐱𝒕 = argmin: 𝐱𝐓 UCB𝒕 (𝒎𝟏𝟏 ) + (𝟏 − 𝐱)𝐓 UCB𝒕 (𝒄) (26)
the bound on both regret and violation. 𝐱∈ ,𝜷̄𝐓 𝐱≥𝒉
The main idea behind SE is as follows: (1) we divide time horizon 10: Deactivate arms in both and ⧵ (𝐱𝑡 ):
𝑇 into two phases: exploration/elimination phase and exploitation phase,
as shown in Fig. 7(b). The exploration/elimination phase consists of
successive segments, where each segment is used to sample a given set = ⧵ ( ∩ { ⧵ (𝐱𝑡 )}) (27)
of (active) arms; (2) In the end of each segment, one or more arms
may be deactivated according to the elimination rule; and (3) The
elimination phase completes when there is no more arms to be deleted, Theorem 4.2. Let 𝑡 be the time slot that the elimination phase completes,
and the exploitation phase follows which keeps playing the remaining by running SE we have the following bounds on regret and violation:
arms. √
More specifically, SE maintains the following quantities (LCB/UCB) 𝑅𝑒𝑔(𝑇 ) ≤ 𝑂( 𝐾𝑇 𝑡 ln 𝑇 ),
for each service 𝑖 at time 𝑡: √
√ 𝑉 𝑖𝑜(𝑇 ) ≤ 𝑂(𝑡 𝐾𝑇 ln 𝑇 ).
UCB𝑡 (𝑚𝑖,11 ) = 𝑚̄ 𝑖,11 + 2 ln 𝑇 ∕𝑛𝑡 (𝑖, M) (22)
8
W. Chu et al. Computer Networks 246 (2024) 110395
Lemma 4.1. SE achieves regret: the same empirically optimal set of services in the exploitation phase.
√ Obviously, this algorithm does not adapt its exploration scheduler to
𝑅𝑒𝑔(𝑇 ) ≤ 𝑂( 𝐾𝐿𝑇 ln 𝑇 ) the history of the observed rewards. Moreover, in order to have a good
performance, usually a large number of time slots is dedicated to the
Proof. See Appendix D. □ exploration phase, which incurs a high sampling cost. These together
leads to the fact that this algorithm is particularly useful when the time
4.4. Simulation results horizon 𝑇 is large and the system is stable. On the other hand, SE-
based caching works by successively eliminating non-optimal services
4.4.1. Performance over baselines during the sampling process, and therefore it satisfies the so called
Fig. 8 gives performance of EF, SE and other algorithms when there adaptive exploration and incurs much lower sampling cost, i.e., fewer
is switching cost. Again, we set the time to load a new service as time slots are needed in the exploration/elimination phase. In other
20 s. From the figure, we can see that in all cases: (1) both EF and words, this algorithm converges significantly faster and achieves much
SE can accurately identify the optimal set of services as their regret better regret bounds. Based on the above reasoning, we conclude that
and violation keep non-increasing after convergence; on the other hand, SE-based caching is particularly suitable when the underlying system
CCB exhibits a sub-linear growth in regret; (2) among the two proposed is non-stable such as a real edge computing system, where in that case
algorithms, SE converges much faster, i.e., less than 8000 time slots both the convergence speed/rate and accuracy are important, i.e., we
are needed for it to get stable whereas it takes more than 30 000 slots can divide time horizon 𝑇 into multiple phases, and restart the caching
for EF. Moreover, we find that SE are far more computationally cost- algorithm when a new phase starts, so as to quickly adapt to the
effective than CCB and EF as it requires less than 10 min to run each dynamics of the system.
simulation on our machine (2 × 2.2 GHz CPU, 8 GB Memory), whereas
it takes approximately 30 min for EF and even 2 h for CCB (note that 4.4.2. Performance comparison with SoA
CCB requires solving the optimization problem at each time slot). It is interesting to see the performance of our proposed schemes
Fig. 9 gives performance of the corresponding algorithms in trace- against existing online algorithms. To this end, we compare CCB
driven simulation. As expected, we observe that both SE and EF out- with the recently proposed potential-based algorithm [20], which is
perform the baselines and the conclusions are consistent. Moreover, a lightweight but very efficient algorithm for online content caching.
it is interesting to find that EF and SE have approximately the same Here, we model the edge-cloud system as a cache network with two
convergence time under the real workload, which suggests that in nodes, where the edge server is considered as a cache node with limited
practice one can adopt either of them for online service caching. storage capacity, and the cloud as a server that permanently holds
Furthermore, given the performance of CCB as shown in Fig. 4, all the content in system. Moreover, each service is regarded as a
we believe that SE and EF also outperforms when there is no service content, and each request for a service as a request for a content.
switching cost. We characterize each request for a service as a tuple (𝑖, 𝑡𝑚 𝑐
𝑖 , 𝑡𝑖 ), where
𝑚 𝑐
𝑖 is the service requested, 𝑡𝑖 and 𝑡𝑖 (both are stochastic) denotes the
Remark. Here we further compare and analyze the advantages and MEC-offloading delay and cloud-computing delay, respectively. The
application scenarios of the two algorithms. As we mentioned above, EF MEC server maintains a quantity 𝑄𝑖 called potential for each service
works by first sampling services uniformly at the same rate (with our 𝑖, together with the empirical means for MEC-offloading delay 𝑇𝑖𝑚 and
novel sampling scheme), and then makes caching decisions based on cloud-computing delay 𝑇𝑖𝑐 for each service 𝑖. Note that these quantities
estimates of the relevant parameters of services. It then keeps caching are initialized zero and updated whenever a new request arrives.
9
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 8. Performance of EF, SE and baseline algorithms when there is service switching cost.
Fig. 9. Performance of SE, EF and other algorithms in trace-driven simulation with switching cost: 𝐿 = 10, ℎ = 0.2, 𝛿 = 0.01.
10
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 10. Performance of CCB and the potential-based caching algorithm in simulation with no service switch cost: 𝐾 = 100, 𝐿 = 10, ℎ = 0.2, 𝛿 = 0.02.
At the very beginning, there is no services cached at the MEC server. Existing solutions can be categorized into centralized [24,25] and
Service caching is then performed according to the following rules: distributed [26–28], based on the control plane design. A centralized
Rule 1: if a request (𝑖, 𝑡𝑚 𝑐
𝑖 , 𝑡𝑖 ) arrives at the MEC server, 𝑄𝑖 is updated algorithm assumes that the global information such as application
as follows: demands and infrastructure resources are available, and computes a
globally optimal solution. For example, Hong et al. [25] propose that a
𝑄𝑖 = 𝑄𝑖 + 𝑇𝑖𝑐 − 𝑇𝑖𝑚 (28)
coordinator makes deployment decisions for IoT services over the fog
If 𝑖 is cached, then: infrastructure. The drawback of the centralized solution is that global
information is generally hard to collect and the computational cost may
𝑇𝑖𝑚 × 𝑁𝑖𝑚 + 𝑡𝑚
𝑖
𝑇𝑖𝑚 = , 𝑁𝑖𝑚 = 𝑁𝑖𝑚 + 1 (29) be excessively high. On the other hand, a distributed approach relies on
𝑁𝑖𝑚 + 1
the local computation of each node and their collaboration to address
else: the scalability and locality awareness issues. This approach is able to
𝑇𝑖𝑐 × 𝑁𝑖𝑐 + 𝑡𝑐𝑖 provide services that fit the local context, but generally speaking, it
𝑇𝑖𝑐 = , 𝑁𝑖𝑐 = 𝑁𝑖𝑐 + 1 (30) cannot guarantee global optimality of the solution.
𝑁𝑖𝑐 + 1
The service placement problem can be addressed in an offline [29,
where 𝑁𝑖𝑚 and 𝑁𝑖𝑐 denotes the number of times service 𝑖 is requested 30] and online fashion [31,32]. The offline approach requires that
when it is cached at the MEC server and when it is absent, respectively. all the information about the system and workload are given a priori
Rule 2: if a response to request (𝑖, 𝑡𝑚 𝑐
𝑖 , 𝑡𝑖 ) from the remote cloud before the placement decision is computed. That is, the placement
arrives at the MEC server and there is no room for hosting service 𝑖 decision is made at the complie time before deployment. Examples
if it is absent, then the MEC server calculates the caching probability include [29,30] that assume full knowledge of the Edge/Fog network.
𝑦𝑖 for 𝑖 based on 𝑄𝑖 ’s: On the other hand, recently proposed approaches [33–35] are mainly
𝑄𝑖 online that the placement decisions are made during the run-time of
𝑦𝑖 = ∑ (31)
𝑄𝑖 + 𝑗∈𝑐 𝑄𝑗 the systems. To provide satisfactory performance, the online algorithms
have to take into account the dynamic behaviors of the system. The
where 𝑐 is the set of services cached at the MEC server. If the decision
advantage of this approach is that it is more adaptive and responsive
is to cache 𝑖, then the service 𝑗 with the least potential, i.e., 𝑗 =
to changes. However, it remains a challenge as how to make the best
argmin 𝑄𝑖 , is evicted.
𝑖∈𝑐 use of the system resources.
It can be seen from the above two rules that the potential-based Based on whether the dynamicity of the system is handled or not,
caching algorithm aims at minimizing the aggregate latency for access- existing placement solutions can also be classified as static and dy-
ing the services in system. The following figures show its performance namic [36,37]. The static approach usually assumes that the Edge/Fog
and our proposed mechanism CCB, when there is no service switch cost infrastructure and application characteristics remain unchanged as time
and when there is cost. From Fig. 10 we can see that the potential-based elapses, which is not realistic. In fact, both the two aspects are highly
algorithm performs exceptionally well when there is no service switch time-evolving as new nodes can join and leave the system due to
cost, i.e., cumulative regret grows very slowly (although still linear) instability of the network, the resources available can change over
and the offloading rate constraint can always be satisfied. However, time based on real-life condition, and the workload varies when users’
as depicted in Fig. 11, its performance gets poor when there is switch interest changes. The dynamic approaches [38,39] employ reactive
cost, as both the regret and violation grow linearly as time elapses. strategies to deal with the dynamic nature of the infrastructure and
After a deep investigation, we find that this phenomenon is due to the application, in a way that new services may be deployed and exist-
frequent service switches incurred during the caching process by the ing services may be replaced/released whenever significant change is
potential-based algorithm. Based on this observation, we conclude here observed.
that any caching algorithm could perform poorly when there is service Alternatively, one can also characterize existing SPP solutions based
switch cost, if this cost is not taken into account when we design the on various aspects such as: (1) whether the mobility prediction is
algorithm. exploited or not for mobility and popularity caching [40,41], (2)
user-centric cooperative edge caching [42,43] or network-centric non-
5. Related work cooperative caching, (3) intelligent handover predictions for the edge
[44] and various other recent AI-based approaches like adopted rein-
The service placement problem (SPP) in Edge/Fog computing is forcement learning [45,46], (4) price congestion schemes for caching
essentially to find the available resources (nodes, links) in the network [47,48], and (5) DDPG for orchestration from an SDN perspective [49,
so as to optimize certain objectives (delay, energy consumption, etc.) 50] and so forth.
while at the same time satisfy application requirements, resource con- Obviously, our algorithms belong to the category of dynamic and
straints, locality constraints, etc. It has been a hot topic [21–23] in the online solutions. The work that most close to ours is [51], where the
past few years, and many approaches have emerged. authors address user-managed service placement problem, while in this
11
W. Chu et al. Computer Networks 246 (2024) 110395
Fig. 11. Performance of CCB and the potential-based caching algorithm in simulation with service switch cost: 𝐾 = 100, 𝐿 = 10, ℎ = 0.2, 𝛿 = 0.02.
work we address the problem from the network-side (that is, the net- i.e., for adequate computing power or load balancing. This raises
work operators make the caching decisions instead of users). Moreover, another question as how to determine the necessary number of VMs for
they adopt a contextual MAB framework with a Thompson-sampling each service, and then design efficient online learning algorithms for
scheme for online learning, whereas we employ a Constrained-MAB service caching. One possible solution is to extend the current model,
framework with a novel sampling scheme that we propose to efficiently i.e., by regarding each service with a particular number of instances
explore system dynamics. as an arm. The key challenges here are: (1) the set of arms/actions
would be huge; and (2) instead of selecting 𝐿 services each time, the
6. Conclusion and future work server capacity is now expressed as a new constraint, which further
complicates the online service caching problem.
In this paper, we study the online service caching problem for
a multi-access edge computing system, with the goal to minimize (4) Online service caching for MEC-based networks. There is a trend
users’ perceived latency. We formulate it as a Constrained Multi-Armed that multiple edge servers work collaboratively to form a shard resource
Bandits (CMAB) optimization problem, and propose three efficient pool [52–54], so as to provide reliable and elastic edge computing
algorithms — CCB, EF and SE. We show that CCB can well balance services. This, on one hand, provides us opportunity to leverage the
the objective and QoS constraint, whereas EF and SE can effectively power of the network for better exploration and exploitation. On the
learn the optimal solution when there is service switching cost. We other hand, it also raises significant challenges to design online learning
theoretically analyze their performance by giving the bound on regret algorithms for the network, since MEC servers may be heterogeneous
and violation, and conduct extensive simulations to validate their effi- in computing power, user bases, network condition, etc. Given that
cacy. Our experimental results show that these algorithms outperform there is a flurry studies on Multi-agent MAB (MA-MAB) [55–57], to the
baselines. best of our knowledge, it is still unclear as how to formulate and solve
There are several interesting issues for exploration. Below we give the problem that we concern here, i.e., Multi-agent MAB with multiple
some possible directions that we believe are important and worthy of constraints. We believe that the problem itself is interesting in the area
further investigation. of MAB and deserves further study.
12
W. Chu et al. Computer Networks 246 (2024) 110395
Appendix A. Proof of Theorem 3.1 Proof of Lemma A.4. Denote by 𝑄𝑡𝑖 be the event such that |𝑚̄ 𝑡𝑖 − 𝑚𝑖 | >
2𝑅(𝑚̄ 𝑡𝑖 , 𝑁𝑖,M ̄ 𝑡 be its complement. Let 𝛾 = 72 ln 2𝐾𝑇 , obviously
𝑡 + 1), and 𝑄
𝑖 𝛿
We rely on the following lemmas to prove the theorem. 𝑟 ≥ 1.
From Lemma A.3 we have:
Lemma A.1 (Azuma-Hoeffding Inequality [58]). Suppose {𝑌𝑛 ∶ 𝑛 = 𝛿
𝑃 𝑟{𝑄𝑡𝑖 } <
0, 1, 2, 3, …} is a martingale and |𝑌𝑛 − 𝑌𝑛−1 | ≤ 𝑐𝑛 almost surely, then with 𝐾𝑇
2
− ∑𝑛𝑑 taking a union bound,
2 𝑐2
probability at least 1 − 2𝑒 𝑗=1 𝑗 , we have:
∑
𝑇 ∑
𝐾
The following lemma is a corollary of Lemma A.2. It follows that at probability at least 1 − 𝛿,
𝑚𝑖 > 𝑚̄ 𝑡𝑖 − 2𝑅(𝑚̄ 𝑡𝑖 , 𝑁𝑖,M
𝑡
+ 1) = 𝑚̌ 𝑡𝑖
Lemma A.3 ([14]). Let the empirical means 𝑚̄ 𝑡𝑖 , 𝑐̄𝑖𝑡 and 𝛽̄𝑖𝑡 be defined
1 (36) and (37) can be proved in the same way.
as (5) (6) (7). Then for all 𝑖 and 𝑡, with probability at least 1 − 2𝑒− 72 𝛾
To prove (38), define two series of random variables:
we have:
∑ ∑ ∑
𝑡
|𝑚̄ 𝑡𝑖 − 𝑚𝑖 | ≤ 2𝑅(𝑚̄ 𝑡𝑖 , 𝑁𝑖,M
𝑡
+ 1) (32) 𝑍𝑡 = 𝑚𝑡𝑖 − 𝑚𝑖 , 𝑌𝑡 = 𝑍𝑙
𝑖∈𝑡 𝑖∈𝑡 𝑙=1
√ 𝑁 𝑇 +1 √
∑
𝑇 ∑ ∑ ∑
𝐾 ∑
𝑖,M
𝛾 𝛾
2𝐾𝑇 ≤ 4( + )
| ( (𝑚𝑡𝑖 − 𝑚̌ 𝑡𝑖 ) + (𝑐𝑖𝑡 − 𝑐̌𝑖𝑡 ))| = 𝑂((𝐾 − 𝐿) 𝐾𝑇 ln ) (38) 𝑛 𝑛
𝑡=1 𝑖∈𝑡 𝑖∉𝑡
𝛿 𝑖=1 𝑛=1
√ Note that when 𝑛 is large enough,
∑
𝑇 ∑ 2𝐾𝑇
| (𝛽̂𝑖𝑡 − 𝛽𝑖 )| = 𝑂(𝐾 𝐾𝑇 ln ) (39) 1 1 1
𝑡=1 𝑖∈𝑡
𝛿 1+ + + ⋯ + ⟶ ln 𝑛 + 𝐶
2 3 𝑛
13
W. Chu et al. Computer Networks 246 (2024) 110395
14
W. Chu et al. Computer Networks 246 (2024) 110395
on the clean event, since the contribution of the bad event can be
neglected.2 2𝑁𝐾𝛼
Denote by 𝑡 be the arm set selected when the exploration phase ≤ 𝐿 + 2𝐿𝑇 𝑟𝑡 (𝛽𝑖 )
𝐿
√ (55)
completes (assuming at time 𝑡). If 𝑡 = ∗ , then the regret will no longer ln 𝑇
increase in the exploitation phase. On the other hand, if 𝑡 ≠ ∗ , then < 2𝑁𝐾𝛼 + 2𝐿𝑇
𝑁𝛼
we must have,
Substituting (54) into the above inequation, we get:
∑ ∑ ∑
∗ ∑
∗
( 1 2 2
𝑐̄𝑖 + 𝑚̄ 𝑖,11 < 𝑐̄𝑖 + 𝑚̄ 𝑖,11 (49) 1)
𝑉 𝑖𝑜(𝑇 ) = 𝑂 𝐾 3 𝐿 3 𝑇 3 (ln 𝑇 ) 3
𝑖∉𝑡 𝑖∈𝑡 𝑖∉ 𝑖∈
2𝑁𝐾 2 𝛼 Therefore,
= 2𝐾𝑇 𝑟𝑡 (𝑚𝑖,11 )
𝐿 ∑ √ ∑√
𝑅(𝑡) = 𝑅(𝑡, 𝑗) ≤ 𝑂( ln 𝑇 ) 𝑛𝑡𝑗 (𝑗, M)
we have,
𝑗∈ 𝑗∈
2𝐿2 𝑇 2 ln 𝑇 13 √ ∑
𝑁 =( ) (54) Since 𝑓 (𝑥) = 𝑥 is a concave function and 𝑗∈ 𝑛𝑡𝑗 (𝑗, M) = 𝐿𝑡
, by
𝐾 2 𝛼3 2
Substituting it into (53), we get: Jensen’s Inequality we have,
( 4 ) √ √
𝐾 13 23 1 1 ∑√ 1 ∑ 𝐿𝑡
𝑅𝑒𝑔(𝑇 ) = 𝑂 ( ) 𝑇 (ln 𝑇 ) 3 . 𝑛𝑡𝑗 (𝑗, M) ≤ 𝑛𝑡𝑗 (𝑗, M) ≤
𝐿 𝐾 𝑗∈ 𝐾 𝑗∈ 2𝐾
Similarly, for violation we have: It follows that
∑
𝑇 ∑ √
√ 𝐿𝑡 √
𝑉 𝑖𝑜(𝑇 ) = [ℎ − 𝛽𝑖 ]+ 𝑅𝑒𝑔(𝑡) ≤ 𝑅(𝑡) ≤ 𝑂( ln 𝑇 )𝐾 = 𝑂( 𝐾𝐿𝑡 ln 𝑇 )
𝑡=1 𝑖∈𝑡 2𝐾
∑
𝑇 ∑ ∑ Next, we prove the bound for violation. Let 𝑖⋆ be the arm with the
≤ [ 𝛽𝑖 − 𝛽𝑖 ]+ largest arrival rate, i.e., 𝑖⋆ = argmax 𝛽𝑖 . Then we have,
𝑡=1 𝑖∈∗ 𝑖∈𝑡 𝑖∈
∑
𝑇 ∑
2
𝑉 𝑖𝑜(𝑇 ) ≤ (𝐿𝛽𝑖⋆ − 𝛽𝑖 )
The probability that bad event occurs is 𝑂(𝑇 −4 ). 𝜏=1 𝑖∈𝑡
15
W. Chu et al. Computer Networks 246 (2024) 110395
16
W. Chu et al. Computer Networks 246 (2024) 110395
[29] T. Nishio, R. Shinkuma, T. Takahashi, N.B. Mandayam, Service-oriented het- [56] P. Landgren, V. Srivastava, N.E. Leonard, Distributed cooperative decision
erogeneous resource sharing for optimizing service latency in mobile cloud, in: making in multi-agent multi-armed bandits, Automatica 125 (2021) 109445.
Proceedings of the First International Workshop on Mobile Cloud Computing & [57] S. Hossain, E. Micha, N. Shah, Fair algorithms for multi-agent multi-armed
Networking, 2013, pp. 19–26. bandits, Adv. Neural Inf. Process. Syst. 34 (2021) 24005–24017.
[30] V.B.C. Souza, W. Ramírez, X. Masip-Bruin, E. Marín-Tordera, G. Ren, G. [58] K. Azuma, Weighted sums of certain dependent random variables, Tohoku Math.
Tashakor, Handling service allocation in combined fog-cloud scenarios, in: 2016 J. Sec. Ser. 19 (3) (1967) 357–367, http://dx.doi.org/10.2748/tmj/1178243286.
IEEE International Conference on Communications, ICC, IEEE, 2016, pp. 1–5. [59] R. Kleinberg, A. Slivkins, E. Upfal, Multi-armed bandits in metric spaces, in:
[31] J. Zhao, X. Sun, Q. Li, X. Ma, Edge caching and computation management for Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing,
real-time internet of vehicles: an online and distributed approach, IEEE Trans. 2008, pp. 681–690.
Intell. Transp. Syst. 22 (4) (2020) 2183–2197. [60] A. Badanidiyuru, R. Kleinberg, A. Slivkins, Bandits with knapsacks, J. ACM 65
[32] S. Wang, R. Urgaonkar, T. He, K. Chan, M. Zafer, K.K. Leung, Dynamic service (3) (2018) 1–55.
placement for mobile micro-clouds with predicted future costs, IEEE Trans.
Parallel Distrib. Syst. 28 (4) (2016) 1002–1016. Weibo Chu received the B.S. degree in software engineering
[33] X. Li, X. Zhang, T. Huang, Asynchronous online service placement and task in 2005 and the Ph.D. degree in control science and engi-
offloading for mobile edge computing, in: 2021 18th Annual IEEE International neering in 2013, both from Xi’an Jiaotong University, Xi’an,
Conference on Sensing, Communication, and Networking, SECON, IEEE, 2021, China. He has participated in various research and develop-
pp. 1–9. ment projects on network testing, performance evaluation
[34] B. Gao, Z. Zhou, F. Liu, F. Xu, B. Li, An online framework for joint network and troubleshooting, and gained extensive experiences in
selection and service placement in mobile edge computing, IEEE Trans. Mob. the development of networked systems for research and
Comput. (2021). engineering purposes. From 2011–2012 he worked as a
[35] T. Liu, S. Ni, X. Li, Y. Zhu, L. Kong, Y. Yang, Deep reinforcement learning based visiting researcher at Microsoft Research Asia, Beijing. From
approach for online service placement and computation resource allocation in 2013 he was with the School of Computer Science and Tech-
edge computing, IEEE Trans. Mob. Comput. (2022). nology, Northwestern Polytechnical University. His research
[36] T. Ouyang, Z. Zhou, X. Chen, Follow me at the edge: Mobility-aware dynamic interests include internet measurement and modeling, traffic
service placement for mobile edge computing, IEEE J. Sel. Areas Commun. 36 analysis and performance evaluation.
(10) (2018) 2333–2345.
[37] Y. Zhang, L. Jiao, J. Yan, X. Lin, Dynamic service placement for virtual reality Xiaoyan Zhang received the B.E. degree in 2020 and M.S.
group gaming on mobile edge cloudlets, IEEE J. Sel. Areas Commun. 37 (8) degree in 2023, both in computer science from Northwest-
(2019) 1881–1897.
ern Polytechnical University, Xi’an, China. Her research
[38] Q. Zhang, Q. Zhu, M.F. Zhani, R. Boutaba, J.L. Hellerstein, Dynamic service interests include task scheduling and service management
placement in geographically distributed clouds, IEEE J. Sel. Areas Commun. 31
for edge/cloud computing systems.
(12) (2013) 762–772.
[39] L. Wang, L. Jiao, T. He, J. Li, H. Bal, Service placement for collaborative edge
applications, IEEE/ACM Trans. Netw. 29 (1) (2020) 34–47.
[40] P. Yang, N. Zhang, S. Zhang, L. Yu, J. Zhang, X. Shen, Content popularity
prediction towards location-aware mobile edge caching, IEEE Trans. Multimed.
21 (4) (2018) 915–929.
[41] X. Vasilakos, V.A. Siris, G.C. Polyzos, Addressing niche demand based on joint
Xinming Jia received the B.E. degree from Northwestern
mobility prediction and content popularity caching, Comput. Netw. 110 (2016)
Polytechnical University, Xi’an, China, in 2021. He is cur-
306–323.
rently working toward his M.S. degree at the School of
[42] S. Zhang, P. He, K. Suto, P. Yang, L. Zhao, X. Shen, Cooperative edge caching in
Computer Science and Technology, Northwestern Polytech-
user-centric clustered mobile networks, IEEE Trans. Mob. Comput. 17 (8) (2017)
nical University, Xi’an, China. His research interests include
1791–1805.
resource management and incentive mechanisms design for
[43] W. Han, A. Liu, V.K. Lau, Dual-mode user-centric open-loop cooperative caching
edge computing systems.
for backhaul-limited small-cell wireless networks, IEEE Trans. Wireless Commun.
18 (1) (2018) 532–545.
[44] N. Uniyal, A. Bravalheri, X. Vasilakos, R. Nejabati, D. Simeonidou, W. Feath-
erstone, S. Wu, D. Warren, Intelligent mobile handover prediction for zero
downtime edge application mobility, in: 2021 IEEE Global Communications
John C.S. Lui received the Ph.D. degree in computer
Conference, GLOBECOM, IEEE, 2021, pp. 1–6.
science from UCLA. He is currently a professor in the
[45] C. Zhong, M.C. Gursoy, S. Velipasalar, Deep reinforcement learning-based edge
Department of Computer Science and Engineering at The
caching in wireless networks, IEEE Trans. Cogn. Commun. Netw. 6 (1) (2020)
48–61. Chinese University of Hong Kong. His current research
interests include communication networks, network/system
[46] S. Chen, Z. Yao, X. Jiang, J. Yang, L. Hanzo, Multi-agent deep reinforce-
ment learning-based cooperative edge caching for ultra-dense next-generation security (e.g., cloud security, mobile security, etc.), network
networks, IEEE Trans. Commun. 69 (4) (2020) 2441–2456. economics, network sciences (e.g., online social networks,
[47] Z. Zheng, L. Song, Z. Han, G.Y. Li, H.V. Poor, A stackelberg game approach information spreading, etc.), cloud computing, large-scale
to proactive caching in large-scale mobile edge networks, IEEE Trans. Wireless distributed systems and performance evaluation theory. He
Commun. 17 (8) (2018) 5198–5211. serves in the editorial board of IEEE/ACM Transactions
[48] W. Huang, W. Chen, H.V. Poor, Request delay-based pricing for proactive on Networking, IEEE Transactions on Computers, IEEE
caching: A stackelberg game approach, IEEE Trans. Wireless Commun. 18 (6) Transactions on Parallel and Distributed Systems, Journal
(2019) 2903–2918. of Performance Evaluation and International Journal of
[49] G. Qiao, S. Leng, S. Maharjan, Y. Zhang, N. Ansari, Deep reinforcement learning Network Security. He was the chairman of the CSE Depart-
for cooperative content caching in vehicular edge computing and networks, IEEE ment from 2005 to 2011. He received various departmental
Internet Things J. 7 (1) (2019) 247–257. teaching awards and the CUHK Vice-Chancellor’s Exemplary
[50] Z. Wang, Y. Wei, F.R. Yu, Z. Han, Utility optimization for resource allocation in Teaching Award. He is also a corecipient of the IFIP WG 7.3
multi-access edge network slicing: a twin-actor deep deterministic policy gradient Performance 2005 and IEEE/IFIP NOMS 2006 Best Student
approach, IEEE Trans. Wireless Commun. 21 (8) (2022) 5842–5856. Paper Awards. He is an elected member of the IFIP WG 7.3,
[51] T. Ouyang, R. Li, X. Chen, Z. Zhou, X. Tang, Adaptive user-managed service fellow of the ACM, fellow of the IEEE, and croucher senior
placement for mobile edge computing: An online learning approach, in: IEEE research fellow.
INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp.
Zhiyong Wang received his B.E. degree in 2021 from
1468–1476.
Huazhong University of Science and Technology, Wuhan,
[52] Open Edge Computing, http://openedgecomputing.org.
China. Since August 2021, he has pursued his Ph.D. degree
[53] Openfog, https://opcfoundation.org/markets-collaboration/openfog/.
in the Department of Computer Science & Engineering at
[54] V. Farhadi, F. Mehmeti, T. He, T.F. La Porta, H. Khamfroush, S. Wang, K.S.
The Chinese University of Hong Kong, Hong Kong. His re-
Chan, K. Poularakis, Service placement and request scheduling for data-intensive
search interests include bandits, reinforcement learning and
applications in edge clouds, IEEE/ACM Trans. Netw. 29 (2) (2021) 779–792.
their applications in computer networks and recommender
[55] D. Vial, S. Shakkottai, R. Srikant, Robust multi-agent multi-armed bandits,
systems.
in: Proceedings of the Twenty-Second International Symposium on Theory,
Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile
Computing, 2021, pp. 161–170.
17