0% found this document useful (0 votes)
5 views22 pages

History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control

The document discusses a novel cache placement scheme called CPHBL for fog-assisted IoT systems, which integrates online learning and historical data to optimize cache placement amid uncertainties in file popularity. It formulates the problem as a constrained combinatorial multiarmed bandit problem and employs virtual queue techniques to manage storage costs while maximizing cache hit rewards. The results indicate that CPHBL outperforms existing methods, providing significant improvements in content delivery latency and overall system performance.

Uploaded by

loveff0728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control

The document discusses a novel cache placement scheme called CPHBL for fog-assisted IoT systems, which integrates online learning and historical data to optimize cache placement amid uncertainties in file popularity. It formulates the problem as a constrained combinatorial multiarmed bandit problem and employs virtual queue techniques to manage storage costs while maximizing cache hit rewards. The results indicate that CPHBL outperforms existing methods, providing significant improvements in content delivery latency and overall system performance.

Uploaded by

loveff0728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO.

19, OCTOBER 1, 2021 14683

History-Aware Online Cache Placement in


Fog-Assisted IoT Systems: An Integration
of Learning and Control
Xin Gao , Graduate Student Member, IEEE, Xi Huang , Member, IEEE, Yinxu Tang,
Ziyu Shao , Senior Member, IEEE, and Yang Yang , Fellow, IEEE
.

Abstract—In fog-assisted Internet-of-Things systems, it is a Index Terms—Fog computing, history-aware bandit learning,
common practice to cache popular content at the network edge to Internet of Things (IoT), learning-aided online control, proactive
achieve high quality of service. Due to uncertainties, in practice, caching.
such as unknown file popularities, the cache placement scheme
design is still an open problem with unresolved challenges: 1)
I. I NTRODUCTION
how to maintain time-averaged storage costs under budgets; 2)
URING recent years, the proliferation of Internet-of-
how to incorporate online learning to aid cache placement to
minimize performance loss [also known as (a.k.a.) regret]; and
3) how to exploit offline historical information to further reduce
D Things (IoT) devices such as smart phones and the
emerging of IoT applications such as video streaming have
regret. In this article, we formulate the cache placement problem
with unknown file popularities as a constrained combinatorial led to an unprecedented growth of data traffic [2], [3]. To
multiarmed bandit problem. To solve the problem, we employ address the concerns for increasing data traffic, many content
virtual queue techniques to manage time-averaged storage cost providers turn to cloud services for reliable content caching
constraints, and adopt history-aware bandit learning methods and delivery. However, the latency of accessing cloud services
to integrate offline historical information into the online learn- can violate the high Quality-of-Service (QoS) requirement of
ing procedure to handle the exploration–exploitation tradeoff.
With an effective combination of online control and history-aware IoT users due to congestions on backhaul links, especially at
online learning, we devise a cache placement scheme with history- peak traffic moments [4]. To mitigate the concerns, a promis-
aware bandit learning called CPHBL. Our theoretical analysis ing solution is to cache popular contents on fog servers (e.g.,
and simulations show that CPHBL achieves a sublinear time- base stations and routers with enhanced storage and comput-
averaged regret bound. Moreover, the simulation results verify ing capabilities) at the network edge in proximity to IoT users.
CPHBL’s advantage over the deep reinforcement learning-based
approach. In this way, the burden on backhaul links can be alleviated,
and the QoS in terms of the content delivery latency can be
improved [5]–[8]. Fig. 1 shows an example of wireless caching
in a multitier fog-assisted IoT system. As shown in the figure,
Manuscript received December 4, 2020; revised March 6, 2021; accepted by utilizing the storage resources on fog servers that are close
April 2, 2021. Date of publication April 9, 2021; date of current ver-
sion September 23, 2021. This work was supported in part by the to IoT devices, popular contents (e.g., files) can be cached
National Key Research and Development Program of China under Grant to achieve timely content delivery. Due to resource limit, each
2020YFB2104300; in part by the Nature Science Foundation of Shanghai edge fog server (EFS) can cache only a subset of files to serve
under Grant 19ZR1433900; and in part by the National Development and
Reform Commission of China (NDRC) under Grant “5G Network Enabled its associated IoT users. If a user’s requested file is found on
Intelligent Medicine and Emergency Rescue System for Giant Cities.” This the corresponding EFS [also known as (a.k.a.) a hit], then it
article was presented in part at the IEEE International Conference on can be downloaded directly; otherwise, the file needs to be
Communications (ICC), Jun. 2020, Dublin, Ireland. (Corresponding author:
Ziyu Shao.) fetched from the central fog server (CFS) in the upper fog tier
Xin Gao is with the School of Information Science and Technology, with extra bandwidth consumption and latency. Therefore, the
ShanghaiTech University, Shanghai 201210, China, also with the Shanghai key to maximize the benefits of caching in fog-assisted IoT
Institute of Microsystem and Information Technology, Chinese Academy
of Sciences, Shanghai 200050, China, and also with the University systems lies in the selection of a proper set of cached files
of Chinese Academy of Sciences, Beijing 100049, China (e-mail: (a.k.a. cache placement) on each EFS.
[email protected]). However, the effective design for cache placement remains
Xi Huang, Yinxu Tang, and Ziyu Shao are with the School of Information
Science and Technology, ShanghaiTech University, Shanghai 201210, as a challenging problem due to the uncertainty of file popular-
China (e-mail: [email protected]; [email protected]; ities in such systems. Specifically, as an important ingredient
[email protected]). for cache placement optimization, file popularities are usually
Yang Yang is with the Shanghai Institute of Fog Computing Technology,
ShanghaiTech University, Shanghai 201210, China, also with the Research unknown in practice [9]. Such information can only be inferred
Center for Network Communication, Peng Cheng Laboratory, Shenzhen implicitly from feedback information, such as cache hit sig-
518000, China, and also with the Shenzhen SmartCity Technology nals for user requests. Meanwhile, in practice, it is common
Development Group Company Ltd., Shenzhen 518046, China (e-mail:
[email protected]). for fog-assisted IoT systems to retain offline historical obser-
Digital Object Identifier 10.1109/JIOT.2021.3072115 vations (in terms of file request logs) on each EFS. Such offline
2327-4662 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14684 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

In general, CPHBL consists of two interacting proce-


dures: a) the online learning procedure and b) the online
control procedure. Particularly, in the online learning
procedure, we adopt the HUCB1 (UCB1 with historic
data) method [12] to leverage both offline historical
information and online feedback to learn the unknown
file popularities with a decent exploration–exploitation
tradeoff. In the online control procedure, we leverage
Fig. 1. Illustration of caching-enabled fog-assisted IoT systems. the Lyapunov optimization method [13] to update cached
files on EFSs in an adaptive manner, so that cache hit
information can also be exploited to estimate the file popular- rewards can be maximized subject to the storage cost
ities in the decision making of cache placement. Nonetheless, constraints.
it remains nontrivial about how to integrate both online feed- 3) Theoretical Analysis: To the best of our knowledge, our
back and offline historical information to reduce uncertainties work conducts the first systematic study on the integration
in decision making and minimize the resulting performance of online control, online learning, and offline histori-
loss (a.k.a. regret). If such an integration can be achieved, then cal information. In particular, our theoretical analysis
each EFS can proactively update cache placement based on its shows that our devised scheme achieves a near-optimal
learned popularity statistics to improve system performances. total cache hit reward under time-averaged storage cost
Toward such a joint design, three challenges must be constraints with atime-averaged regret bound of order
addressed. The first is concerning the tradeoff between con- O(1/V + 1/T + (log T)/(T + Hmin )). Note that V is
flicting performance metrics. On the one hand, caching more a positive tunable parameter, T is the length of time
popular files on each EFS conduces to higher cache hit rewards horizon, and Hmin is the minimum number of offline
(e.g., the total size of files served by wireless caching). On the historical observations among different EFSs.
other hand, the number of cached files should be limited to 4) Numerical Evaluation: We conduct extensive simulations
avoid excessive storage costs (e.g., cost of the memory foot- to investigate the performances of CPHBL and its vari-
print) [10]. Such a tradeoff between cache hit rewards and ants. Moreover, we devise a novel deep reinforcement
storage costs should be carefully considered for cache place- learning (DRL)-based scheme as one of the baselines to
ment. The second is regarding the exploration–exploitation be compared with CPHBL. Our simulation results not
dilemma encountered in the online learning procedure; i.e., for only verify our theoretical analysis but also show the
each EFS, should it cache the files with empirically high esti- advantage of CPHBL over the baseline schemes.
mated popularities (exploitation) or those files with inadequate 5) New Degree of Freedom in the Design Space of Fog-
feedback but potentially high popularities (exploration)? The Assisted IoT Systems: We systematically investigate the
third is about how to leverage offline historical information fundamental benefits of offline historical information
to further improve learning efficiency, which serves as a new in fog-assisted IoT systems. We provide both theoret-
degree of freedom in the design space of cache placement. ical analysis and numerical simulations to evaluate such
Faced with such challenges, the interplays among online con- benefits. Our results reveal novel insights to system
trol, online learning, and offline historical information deserve designers to improve their systems.
a systematic investigation. This article was presented in part in a conference ver-
In this article, we focus on the problem of proactive cache sion [1]. In this article, we have made significant improve-
placement in caching-enabled fog-assisted IoT systems with ments compared with the conference version. The improve-
offline historical information and unknown file popularities ments include but not limited to the design of an advanced
under constraints on time-averaged storage costs of EFSs. We algorithm, a detailed theoretical analysis of the algorithm
summarize our contributions and key results as follows. performance, a more comprehensive literature review, and
1) Problem Formulation: We formulate the problem as more simulation results. The remainder of this article is orga-
a stochastic optimization problem under uncertainties, nized as follows. Section II discusses the related works.
with the aim to maximize the total cache hit reward Section III illustrates our system model and problem formu-
in terms of the total size of files directly fetched from lation. Section IV shows our algorithm design, followed by
EFSs to IoT users over a finite time horizon. Meanwhile, the performance analysis in Section V. Section VI proposes a
we also consider the time-averaged storage cost con- novel DRL-based scheme as a baseline for evaluation, and
straint on each EFS. By exploiting the problem structure, then, Section VII discusses our simulation results. Finally,
we extend the settings of the recently developed bandit Section VIII concludes this article.
model [11] and reformulate the problem as a constrained
combinatorial multiarmed bandit (CMAB) problem.
2) Algorithm Design: To solve the formulated problem, II. R ELATED W ORK
we propose cache placement with history-aware ban- In the past decades, cache placement has been widely stud-
dit learning (CPHBL), a learning-aided cache placement ied to improve the performance of wireless networks, such as
scheme that conducts proactive and effective cache IoT networks [23] and cellular networks [3]. Among exist-
placement under time-averaged storage cost constraints. ing works, those that are most relevant to our work are

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14685

TABLE I
C OMPARISON B ETWEEN O UR W ORK AND R ELATED W ORKS

generally carried out from two perspectives: 1) the online users. By considering the problem as a CMAB problem, in [17],
control perspective and 2) the online learning perspective. they aimed to maximize the amount of served traffic through
Online Control-Based Cache Placement: Most works that wireless caching, while in [18], they further took file download-
take the online control perspective formulated cache place- ing costs into the account for optimization. Müller et al. [19]
ment problems as stochastic network optimization problems proposed a cache placement scheme based on contextual ban-
with respect to different metrics. For example, Pang et al. [10] dits, which learns the context-dependent content popularity to
jointly studied the cache placement and data sponsoring maximize the number of cache hits. Zhang et al. [20] studied
problems in mobile video content delivery networks. Their the network utility maximization problem in the context of
solution aimed to maximize the overall content delivery pay- cache placement with a nonfixed content library over time.
off with budget constraints on caching and delivery costs. Song et al. [21] proposed a joint cache placement and content
Kwak et al. [14] devised a dynamic cache placement scheme sharing scheme among cooperative caching units to maximize
to optimize service rates for user requests in a hierarchical the content caching revenue and minimize the content sharing
wireless caching network. Wang et al. [15] developed a joint expense. Xu et al. [22] modeled the problem of cache placement
traffic forwarding and cache placement scheme to optimize the with multiple caching units from the perspective of multiagent
queueing delay and energy consumption of caching-enabled multiarmed bandit (MAMAB) and devised an online scheme
networks. Xu et al. [16] proposed an online algorithm to to minimize the accumulated transmission delay over time.
jointly optimize wireless caching and task offloading with Such works generally do not consider the storage costs on
the goal of ultralow task computation delays under a long- EFSs in terms of memory footprint. In practice, without such a
term energy constraint. In general, such works adopted the consideration, caching files with excessively high storage costs
Lyapunov optimization method [13] to solve their formulated may offset the benefits of wireless caching. Moreover, none
problems through a series of per-time-slot adaptive control. of such works exploits offline historical information in their
Although the effectiveness of their solutions has been well learning procedures.
justified, they generally assumed that file popularities or file Novelty of Our Work: Overall, existing Lyapunov-based
requests are readily given prior to the cache placement process. online control works assume that all instantaneous system
Such assumptions are usually not the case in practice [9]. states are fully observable during the online decision-making
Online Learning-Based Cache Placement: Faced with con- process. However, this is often not the case in practice. In
stantly arriving file requests and unknown file popularities, a our scenario, instantaneous file demands are unknown before
number of works adopted various learning techniques, such the cache placement process, to which standard Lyapunov
as deep learning [24]–[27], transfer learning [9], [28], and optimization techniques do not apply. On the other hand,
reinforcement learning [17]–[22], [29], [30], to improve the existing bandit-based online learning works generally do not
performance of wireless caching networks. However, exist- consider stochastic time-averaged constraints on storage costs
ing solutions in such works cannot handle time-averaged in their models. The involvement of such constraints makes
constraints. Besides, they mainly resorted to time-consuming the cache placement problem more challenging and cannot be
offline pretraining and heuristic hyperparameter tuning to pro- coped with effectively by existing bandit learning techniques.
duce their solutions. Moreover, they generally provided no Different from existing works, we conduct an effective inte-
theoretical guarantee but limited insights for the resulting gration of online control, online learning, and offline historical
performance. information with a sophisticated scheme design and theoret-
Bandit learning is another method that is widely adopted to ical analysis. Accordingly, our formulation, scheme design,
promote the performance of such systems. So far, it has been and theoretical analysis are more sophisticated than standard
applied to solve scheduling problems, such as task offload- Lyapunov optimization and bandit learning techniques. To our
ing [31], task allocation [32], and path selection [33]. The best knowledge, we are the first to present a systematic study
most relevant to our work are those that consider optimizing on such a synergy technique. Our results also provide novel
proactive cache placement in terms of different performance insights to the designers of fog-assisted IoT systems. The com-
metrics. For example, Blasco and Gündüz [17], [18] studied the parison between our work and existing works is presented in
cache placement problem for a single caching unit with multiple Table I.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14686 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

TABLE II
K EY N OTATIONS EFS n, we define Kn (Kn ⊆ K, |Kn | = Kn ) as the set of
IoT users within its service range. Note that each IoT user is
served by one and only one EFS, and thus, the sets {Kn }n are
disjoint.
Particularly, we focus on the scenario in which IoT users
request to download files from EFSs. We assume that the CFS
has stored all of F files (denoted by set F  {1, 2, . . . , F})
that could be requested within the time horizon. Each file f has
a fixed size of Lf storage units. Due to caching capacity limit,
each EFS n only has  Mn units of storage to cache a portion
of the files and Mn < f ∈F Lf . Accordingly, if a user cannot
find its requested file on its associated EFS, it will request to
download the file directly from the CFS. We assume that the
CFS can provide simultaneous and independent file deliveries
to all EFSs and IoT users. An example that illustrates our
system model is shown in Fig. 1.

B. File Popularity
For each EFS n, we consider the popularity of each file f
as the expected number of IoT users in set Kn to request file
f per time slot, whose ground-truth value is denoted by dn,f .
In general, for each EFS, the popularities of different files
requested by its associated IoT users are different. Moreover,
for the same file, its popularity varies across different EFSs.
We assume that each file’s popularity remains constant within
the time horizon. In practice, such file popularities are usually
unknown a priori and can only be inferred based on feedback
information collected after user requests have been served.
Next, we introduce some variables to characterize user
III. S YSTEM M ODEL AND P ROBLEM F ORMULATION dynamics with respect to file popularity. We define binary
In this section, we describe our system model in detail. variable θk,f (t) ∈ {0, 1} such that θk,f (t) = 1 if the IoT user
Then, we present our problem formulations. Key notations in k requests for file f in time slot t and θk,f (t) = 0 otherwise.
this article are summarized in Table II. Then, we denote the file requests of IoT user k during time slot
t by vector θ k (t)  (θk,1 (t), θk,2 (t), . . . , θk,F (t)). Meanwhile,
A. Basic Model we use Dn,f (t)  k∈Kn θk,f (t) to denote the total number of
IoT users in set Kn who request for file f on EFS n in time
We consider a caching-enabled fog-assisted IoT system that
slot t. Note that Dn,f (t) is a discrete random variable over
operates over a finite time horizon of T time slots. In the
a support set {0, 1, . . . , Kn } and assumed to be independent
system, one CFS and N EFSs cooperate to serve K IoT
identically distributed (i.i.d.) across time slots with the mean
users. The fog servers and IoT users communicate with each
of dn,f .
other through wireless connections. We assume that orthog-
Besides, we assume that initially (i.e., t = 0), each EFS is
onal frequency-division multiple access (OFDMA) [34] is
provided with a fixed set of offline historical observations with
employed as the underlying wireless transmission mechanism,
respect to the number of requests for each file. Specifically,
integrated with interference management techniques, such as
the offline historical observations for file f on EFS n are
zero-forcing beamforming [35] and signal processing [36].
denoted by {Dhn,f (0), Dhn,f (1), . . . , Dhn,f (Hn,f − 1)}, where we
Based on such an assumption, we ignore the co-channel
define Hn,f ≥ 0 as the number of offline historical observa-
interference among fog servers and IoT users, abstract the
tions about file f on EFS n. When Hn,f = 0, there is no offline
physical-layer wireless links as bit pipes, and focus on the
historical information. Let Dhn,f (s) denote the sth offline his-
network-layer data communications between servers and IoT
torical observation. Here, we use superscript h to indicate that
users.1 We denote the sets of EFSs and users by N 
Dhn,f (s) belongs to offline historical information. Note that such
{1, 2, . . . , N} and K  {1, 2, . . . , K}, respectively. For each
observations are given as prior information when t = 0. Their
1 In practice, the co-channel interference cannot be completely eliminated. values are assumed to follow the same distribution as the file
In our model, we ignore such interference for the tractability of theoretical popularities over the time horizon.
analysis. Specifically, the cache placement problem considered in this work
is challenging due to its stochastic settings and uncertainties in environment
states. To jointly consider the caching problem on both the network layer and C. System Workflow
physical layer, the algorithm design and theoretical analysis would become
even more challenging, which remains an open problem. Such a cross-layer During each time slot t, the system operates across two
scheme design would be an interesting direction for future work. phases: 1) the caching phase and 2) the service phase.

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14687

1) Caching Phase: The caching phase starts from the begin- G. Problem Formulation
ning of the time slot. In this phase, each EFS n updates To achieve effective cache placement with a high QoS, two
its cached files and consumes a storage cost for each goals are considered in our work. One is to maximize the total
cached file. Then, each EFS n broadcasts its cache size of transmitted files from all EFSs so that requests from
placement to all IoT users in set Kn . IoT users can obtain timely services. In our model, this is
2) Service Phase: The service phase begins from the end equivalent to maximizing the time-averaged cache hit reward
of the caching phase and lasts until the end of the time of all EFSs over a time horizon of T time slots. The other
slot. In this phase, each IoT user generates file requests. is to guarantee a budgeted usage of storage costs over time.
For each request, if it is not cached on the EFS, then To this end, for each EFS n, we first define bn as the storage
the user will fetch the file from the CFS. Otherwise, the cost budget for caching files. Then, we impose the following
user directly downloads the file from the EFS and the constraint to ensure the time-averaged storage costs under the
EFS will receive a corresponding cache hit reward. budget in the long run:
In the next few sections, we present the definitions of cache
1
T−1
placement decisions, storage costs, and cache hit rewards,
respectively. lim sup E[Cn (t)] ≤ bn ∀n ∈ N . (5)
T→∞ T
t=0

Based on the above system model and constraints, our problem


D. Cache Placement Decision
formulation is given by
For each EFS n, we denote its cache placement decision
1
T−1
made during each time slot t by a binary vector Xn (t) 
(Xn,1 (t), Xn,2 (t), . . . , Xn,F (t)). Each entry Xn,f (t) = 1 if EFS maximize E[Rn (t)] (6a)
{X(t)}t T
t=0 n∈N
n decides to cache file f during time slot t and Xn,f (t) = 0
subject to Xn,f (t) ∈ {0, 1} ∀n ∈ N , f ∈ F, t (6b)
otherwise. Note that the total size of cached files on EFS n
does not exceed its storage capacity, i.e., (1), (5).
 In the above formulation, the objective (6a) is to maximize
Lf Xn,f (t) ≤ Mn ∀n ∈ N , t. (1) the time-averaged expectation of total cache hit reward of all
f ∈F
EFSs. Constraint (6b) states that each cache placement deci-
sion Xn,f (t) should be a binary variable. The constraint in (1)
E. Storage Cost guarantees that the total size of cached files on each EFS
For each EFS n, caching file f during a time slot t will should not exceed the storage capacity. The constraint in (5)
incur a storage cost of αLf , where α > 0 is the unit storage ensures the budget constraint on the storage cost of each EFS.
cost. The storage cost can be viewed as cost of the memory
footprint of maintaining the file, which is proportional to the IV. A LGORITHM D ESIGN
size of the file f . Accordingly, given decision Xn (t), we define For problem (6), given the full knowledge of user demands
the total storage cost on EFS n during time slot t as {Dn,f (t)}n,f at the beginning of each time slot, it can be solved
 asymptotically optimally by Lyapunov optimization meth-
Cn (t)  αLf Xn,f (t). (2) ods [13]. However, the instantaneous user demands and file
f ∈F popularities are usually not given as prior information in prac-
tice. Faced with such uncertainties, online learning needs to be
F. Cache Hit Reward incorporated to guide the decision-making process by estimat-
ing the statistics of file popularities from both online feedback
Recall that during each time slot t, for each requested file and offline historical information. To this end, we need to deal
f , if Xn,f (t) = 1, then EFS n will receive a reward Lf for the with the well-known exploration–exploitation dilemma, i.e.,
corresponding cache hit [17] (in terms of amounts of traffic how to balance the decisions made to acquire new knowledge
to fetch file f from EFS n). Then, given the cache placement about file popularity to improve learning accuracy (explo-
Xn,f (t) and user demand Dn,f (t) during time slot t, we define ration) and the decisions made to leverage current knowledge
the cache hit reward of EFS n with respect to file f as to select the empirically most popular files (exploitation). For
such a decision-making problem under uncertainty, we con-
Rn,f (t)  Lf Dn,f (t)Xn,f (t). (3) sider it through the lens of CMAB with extended settings.
With an effective integration of online bandit learning, online
Note that the cache hit reward Rn,f (t) = 0 if file f is not cached control, and offline historical information, we devise a cache
on EFS n during time slot t (i.e., Xn,f (t) = 0). Accordingly, placement scheme called cache placement with history-aware
we define the total cache hit reward of EFS n during time bandit learning (CPHBL) to solve problem (6). Fig. 2 depicts
slot t as the design of CPHBL. During each time slot, under CPHBL,
 each EFS first estimates the popularity of different files based
Rn (t) = R̂n (Xn (t))  Lf Dn,f (t)Xn,f (t). (4) on both offline historical information and collected online
f ∈F feedback. Based on such estimates, the EFS determines and

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14688 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

we consider the storage capacity constraint for each player dur-


ing each time slot, which is ignored in [11]. Last but not least,
we consider a more general reward function with respect to
file uncertainties. The above extensions make our reformulated
problem more challenging than the problem in [11].
To characterize the performance loss (a.k.a. regret) due to
decision making under such uncertainties, we define the regret
Fig. 2. Illustration of our algorithm design.
with respect to a given scheme (denoted by decision sequence
{X(t)}t ) as

1  
T−1
updates its cache placement in the current time slot. After the
Reg(T)  R∗ − E R̂n (Xn (t)) (7)
update, each EFS delivers requested cached files to IoT users. T
t=0 n∈N
For each cache hit, a reward will be credited to the EFS.
In the following sections, we extend the settings of the where constant R∗ is defined as the optimal time-averaged
existing CMAB model and demonstrate the reformulation of total expected reward for all players. In fact, maximizing the
problem (6) under such settings. Then, we articulate our algo- time-averaged expected reward is equivalent to minimizing the
rithm design with respect to online learning and online control regret. Therefore, we can rewrite problem (6) as follows:
procedures, respectively. Finally, we discuss the computational
complexity of our devised algorithm. minimize Reg(T) (8a)
{X(t)}t
subject to (1), (5), (6b). (8b)

A. Problem Reformulation To solve problem (8), we integrate history-aware bandit


The basic settings of CMAB [37] consider a sequential learning methods in the online learning procedure and virtual
interaction between a player and its environment with multiple queue techniques in the online control procedure to handle the
actions (a.k.a. arms) over a finite number of rounds. During exploration–exploitation tradeoff and the time-averaged stor-
each round, the player selects a subset of available arms to age cost constraints, respectively. In the following sections, we
play. For each selected arm, the agent will receive a reward demonstrate our algorithm design in detail.
that is sampled from an unknown distribution. The overall goal
of the player is to find an effective arm-selection scheme to B. History-Aware Online Learning Procedure
maximize its expected cumulative reward. By (4), the regret defined in (7) can be rewritten as
Based on the CMAB model, Li et al. [11] extended the
1
T−1
settings of classical CMAB by allowing the temporary unavail-  
Reg(T) = R∗ − Lf E Dn,f (t)Xn,f (t)
ability of arms while considering the fairness of arm selection. T
t=0 n∈N f ∈F
Inspired by their work, we reformulate problem (6) as a con-
strained CMAB problem in the following way. We view each 1 
T−1   
EFS as a distinct player and each file as an arm. During each = R∗ − Lf dn,f E Xn,f (t) (9)
T
time slot t, each player n ∈ N selects a subset of arms to play. t=0 n∈N f ∈F

If player n chooses to play arm f ∈ F in time slot t, then file f where the last equality holds due to the independence between
will be cached on EFS n and a reward Rn,f (t) = Lf Dn,f (t) will user demand Dn,f (t) and cache placement Xn,f (t), and the fact
be received by the player. Recall that the file demand Dn,f (t) that E[Dn,f (t)] = dn,f . By (9) and our previous analysis, to
during each time slot t is a random variable with an unknown solve problem (8), each EFS n should learn the unknown file
mean dn,f and is i.i.d. across time slots. Accordingly, reward popularity dn,f with respect to each file f .
Rn,f (t) is also an i.i.d. random variable with an unknown During each time slot t, after updating cached files accord-
mean rn,f = E[Rn,f (t)] = Lf dn,f . Meanwhile, the cache place- ing to decision Xn (t), each EFS n observes the current demand
ment decision Xn (t) = (Xn,1 (t), Xn,2 (t), . . . , Xn,F (t)) of EFS Dn,f (t) for each cached file f . Then, EFS n transmits requested
n corresponds to the arm selection of player n in time slot t. files to IoT users and acquires cache hit rewards. Based on the
Specifically, Xn,f (t) = 1 if arm f is chosen and Xn,f (t) = 0 oth- pregiven offline historical information and cache hit feedback
erwise. Our goal is to devise an arm selection scheme for the from IoT users, we have the following estimate for each file
players to maximize their expected cumulative rewards subject popularity dn,f :
to the constraints in (1) and (5).
Remark: Our model extends the settings of the bandit model 3 log t
d̃n,f (t) = min d̄n,f (t) + Kn , Kn . (10)
proposed by [11] in the following four aspects. First, we 2 hn,f (t) + Hn,f
consider multiple players instead of one player. Second, the
storage cost constraints in our problem are more challenging In (10), d̄n,f (t) is the empirical mean of the number of requests
to handle than the arm fairness constraints in [11]. Specifically, for file f that involves both offline historical observations and
under our settings, the selection of each arm for a player collected online feedbacks; hn,f (t) counts the number of time
is coupled together under storage cost constraints, whereas slots (within the first t time slots) during which file f is cho-
in [11], there is no such coupling among arm selections. Third, sen to be cached on EFS n; and Kn denotes the number of

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14689

in which we define [ · ]+  max{·, 0}. Note that the


constraints in (5) are satisfied only when the queueing pro-
cess {Qn (t)}t foreach EFS n is strongly stable [13], i.e.,
T−1
lim supT→∞ 1/T t=0 E[Qn (t)] < ∞. Intuitively, the mean
queue inputs (i.e., storage costs) should not be greater than
the mean queue outputs (i.e., cost budgets). Otherwise, virtual
Fig. 3. Illustration of virtual queues for storage cost on each EFS. Each EFS
n ∈ N maintains a virtual queue Qn (t) with an input of Cn (t) and an output queues will be overloaded, thereby violating the constraints
of bn during each time slot t. If the queueing process {Qn (t)}t is strongly in (5). To maintain the stability of virtual queues and min-
stable, then the time-averaged storage cost constraint (5) on EFS n can be imize the regret, we transform problem (8) into a series of
satisfied.
per-time-slot subproblems. We show the detailed derivation in
Appendix A. Specifically, during each time slot t, we aim to
users served by EFS n. Specifically, the number of observa- solve the following problem for each EFS n ∈ N :
tions hn,f (t) and the empirical mean of file popularity d̄n,f (t) 
by time slot t are defined as follows, respectively: maximize w̃n,f (t)Xn,f (t) (14a)
Xn (t)
f ∈F

t−1 
hn,f (t)  Xn,f (τ ) (11) subject to Lf Xn,f (t) ≤ Mn (14b)
τ =0 f ∈F
t−1 Hn,f −1 h Xn,f (t) ∈ {0, 1} ∀f ∈ F (14c)
τ =0 Dn,f (τ )Xn,f (τ ) + s=0 Dn,f (s)
d̄n,f (t)  . (12)
hn,f (t) + Hn,f where w̃n,f (t) is defined as

Remark: In (10), the term Kn ([3 log t]/[2(hn,f (t) + Hn,f )])  
denotes the confidence radius [38] that represents the degree w̃n,f (t)  Lf V d̃n,f (t) − αQn (t) . (15)
of uncertainty with respect to the empirical estimate d̄n,f (t).
The larger the confidence radius, the greater the value of In (15), parameter V is a tunable positive constant; weight
the estimate (10) and, thus, the greater the chance for file w̃n,f (t) can be viewed as the gain of caching file f on EFS
f to be cached on EFS n. In the confidence radius, the term n during time slot t; and the objective of problem (14) is to
hn,f (t) + Hn,f is the total number of observations (including maximize the total gain of caching files on EFS n under the
both online observations and offline historical observations) storage capacity constraint in (14b).
for the popularity of file f on EFS n. Given a small number During each time slot t, we solve problem (14) for each
of observations (i.e., hn,f (t) + Hn,f t), the confidence radius EFS n to determine its cache placement Xn (t). We split set
for the empirical estimate d̄n,f (t) will be large, which implies F into two disjoint sets Fn,1 (t) = {f ∈ F : w̃n,f (t) ≥ 0} and
that the file is rarely cached and hence, a great uncertainty Fn,2 (t) = {f ∈ F : w̃n,f (t) < 0} for each EFS n. Specifically,
about the estimate. In this case, the confidence radius plays a for each file f ∈ F:
dominant role in the estimate d̃n,f (t). As a result, file f will 1) if d̃n,f (t) ≥ αQn (t)/V, then w̃n,f (t) ≥ 0 and f ∈ Fn,1 (t);
be more likely to be cached on EFS n. In contrast, if a file 2) if d̃n,f (t) < αQn (t)/V, then w̃n,f (t) < 0 and f ∈ Fn,2 (t).
has been cached for an adequate number of times, its popular- For each file f ∈ Fn,2 (t), the corresponding optimal placement
ity estimate (10) will be close to its empirical mean and the decision is Xn,f (t) = 0 since the caching file f on EFS n will
role of confidence radius will be marginalized. Besides, the incur a negative gain, i.e., w̃n,f (t) < 0. By setting Xn,f (t) = 0
estimate (10) also characterizes the effects of offline historical for each file f ∈ Fn,2 (t), we can regard problem (14) as a
information and online feedback information. Particularly, in classical Knapsack problem [39]
the early stage (when t is small), suppose that the number of
online observations is much smaller than the number of offline 
maximize w̃n,f (t)Xn,f (t)
historical observations, i.e., hn,f (t) Hn,f . In this case, the {Xn,f (t)}f ∈Fn,1 (t) f ∈Fn,1 (t)
estimate (10) mainly depends on offline historical information. 
However, as more online feedbacks are collected, the impact subject to Lf Xn,f (t) ≤ Mn
of online information becomes more dominant. f ∈Fn,1 (t)
Xn,f (t) ∈ {0, 1} ∀f ∈ Fn,1 (t). (16)
C. Online Control Procedure
In the online control procedure, we leverage Lyapunov Intuitively, from the lens of the Knapsack problem, we have
optimization techniques [13] to transform the time-averaged a number of items (files) in set Fn,1 (t) and a knapsack (EFS
storage cost constraints into queue stability constraints. n’s cache) with a capacity of Mn . The weight of each item f ∈
Specifically, we introduce a virtual queue Qn (t) for each EFS Fn,1 (t) is Lf , while the value of putting item f in the knapsack
n ∈ N with Qn (0) = 0 to handle the time-averaged con- is w̃n,f (t). Given the weights and values of all items, our goal
straints (5) on storage costs. As illustrated in Fig. 3, each vir- is to select and put a subset of the items from Fn,1 (t) into the
tual queue Qn (t) is updated during each time slot t as follows: knapsack with the maximum total value. Such a problem can
be solved optimally by applying the dynamic programming
Qn (t + 1) = [Qn (t) − bn ]+ + Cn (t) (13) (DP) algorithm [40].

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14690 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

Algorithm 1 CPHBL 1: function S ET C ACHE P LACEMENT(t, n, {d̃n,f (t)}f )


Hn,f −1 h 2: Inputs: At the beginning of time slot t, for EFS n, given file
1: Initialize hn,f (0) = 0, d̄n,f (0) = H1 Dn,f (s) and
n,f s=0 demand estimate {d̃n,f (t)}f .
d̃n,f (0) = Kn for each EFS n ∈ N and each file f ∈ F . In 3: Set Fn,1 (t) ← ∅.
each time slot t ∈ {0, 1, · · · }: 4: for each file f ∈ F do
%History-aware Online Learning 5: Set w̃n,f (t) ← Lf V d̃n,f (t) − Qn (t) .
2: for each EFS n ∈ N and each file f ∈ F do 6: if w̃n,f (t) < 0 then
3: if hn,f (t) + Hn,f > 0 and t > 0then 7: Set Xn,f (t) ← 0.
 3 log t  8: else
4: d̃n,f (t) ← min d̄n,f (t) + Kn 2(h (t)+H ) , Kn .
n,f n,f 9: Set Fn,1 (t) ← Fn,1 (t) ∪ {f }.
5: end if 10: end if
6: end for 11: end for
%Cache Placement 12: Initialize vn (i, m) = 0 for i ∈ {0, 1, 2, · · · , |Fn,1 (t)|} and
7: for each EFS n ∈ N do m ∈ {0, 1, · · · , Mn }.
8: S ET C ACHE P LACEMENT(t, n, {d̃n,f (t)}f ). 13: for each i ∈ {1, 2, · · · , |Fn,1 (t)|} do
9: end for 14: for each m ∈ {1, · · · , Mn } do
%Update of Statistics and Virtual Queues 15: if Lφn,i (t) > m then
10: Update cached files according to X(t) and virtual queues Q(t) 16: Set vn (i, m) ← vn (i − 1, m).
according to (13). else
11: for each EFS n ∈ N and each file f ∈ F do
17: 
18: Set vn (i, m) ← max vn (i − 1, m), vn (i − 1, m −
12: hn,f (t + 1) ← hn,f (t) + Xn,f (t). Lφn,i (t) ) + w̃n,φn,i (t) (t) .
h (t)+Hn,f D (t)Xn,f (t)
13: d̄n,f (t + 1) ← h n,f(t+1)+H d̄n,f (t) + h n,f(t+1)+H . 19: end if
n,f n,f n,f n,f
14: end for 20: end for
21: end for
22: S ET O PT P LACEMENT(n, |Fn,1 (t)|, Mn ).
23: end function
D. Integrated Algorithm Design
Based on the design presented in the previous two sections, 1: function S ET O PT P LACEMENT(n, i, m)
we propose a novel learning-aided proactive cache place- 2: Inputs: For EFS n, given the number of files i and the
ment scheme called CPHBL. The pseudocode of CPHBL remaining storage size m.
3: if i ≥ 1 then
is presented in Algorithm 1. In particular, we denote the 4: if vn (i, m) = vn (i − 1, m − Lφn,i (t) ) + w̃n,φn,i (t) (t) and
file indices in set Fn,1 (t) by φn,1 (t), φn,2 (t), . . . , φn,|Fn,1 (t)| (t), m − Lφn,i (t) ≥ 0 then
respectively. We use v(i, m) to denote the optimal value of 5: Set Xn,φn,i (t) (t) ← 1.
problem (16) when only the first i files [i.e., files indexed 6: S ET O PT P LACEMENT(n, i − 1, m − Lφn,i (t) ).
7: else if vn (i, m) = vn (i − 1, m) then
by φn,1 (t), . . . , φn,i (t)] in Fn,1 (t) can be selected to store in
8: Set Xn,φn,i (t) (t) ← 0.
the remaining memory capacity of m storage units. Regarding 9: S ET O PT P LACEMENT(n, i − 1, m).
CPHBL, we have the following remarks. 10: end if
Remark 1: As shown in (15), the gain w̃n,f (t) of caching 11: end if
file f on EFS n is jointly determined by the file popularity 12: end function
estimate d̃n,f (t) obtained in the online learning procedure and
the virtual queue backlog size Qn (t) maintained by the online
control procedure. Specifically, such a gain increases with the
Qn (t) will be large. By the definition of weight w̃n,f (t) in (15),
rise of the estimated file popularity d̃n,f (t), but decreases with
the value of w̃n,f (t) is negatively proportional to the virtual
the increase of the virtual queue backlog size Qn (t). In this
queue backlog size Qn (t). Therefore, when the value of Qn (t)
way, the online control and online learning procedures are
increases, the weight w̃n,f (t) of caching file f on EFS n tends
integrated to determine the cache placement on each EFS.
to be negative. Under CPHBL, files with negative weights will
Remark 2: In (15), the value of parameter V in weight
not be cached, which conduces to a low time-averaged storage
w̃n,f (t) measures the relative importance of achieving high
cost.
cache hit rewards over ensuring storage cost constraints. Note
that the value of w̃n,f (t) is positively proportional to the value
of parameter V. Therefore, for each file f ∈ F, the gain w̃n,f (t)
of caching file f on EFS n during time slot t will increase as E. Computational Complexity of CPHBL
the value of V increases. Under CPHBL, EFS n will cache The computational complexity of CPHBL mainly lies in the
more files to achieve not only a higher gain but also a larger decision making for cache placement on each EFS n ∈ N
storage cost. Moreover, files with high estimated mean cache (line 8 in Algorithm 1). In this process, DP is adopted
hit rewards would be the first to be cached. to solve problem (14) with a computational complexity of
Remark 3: To ensure the storage cost constraints in (5), O(FMn ) [40]. Note that F denotes the total number of files
CPHBL would restrict each EFS to cache limited files as its on the CFS and Mn denotes the storage capacity of EFS n.
virtual queue backlog size becomes large. Intuitively, for each In practice, the cache placement process can be implemented
EFS n, if its time-averaged storage cost tends to exceed the in a distributed fashion over EFNs; accordingly, the total
cost budget bn , its corresponding virtual queue backlog size computational complexity of CPHBL is O(F maxn∈N Mn ).

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14691

V. P ERFORMANCE A NALYSIS the value of V, the more sensitive CPHBL would be to the
For each EFS n, given the number Kn of its served users increase in the storage costs. As a result, each EFS would con-
and its storage capacity Mn , as well as the size Lf of each file stantly update its cached file set with files of different storage
f ∈ F, we establish the following two theorems to characterize costs, leading to inferior cache hit rewards. In practice, the
the performance of CPHBL. selection of the value of V depends on the design tradeoff of
real systems.
Remark 6: The  last two terms of the regret bound are in the
A. Storage Cost Constraints
order of O(1/T + (log T)/(T + Hmin )). These two terms are
A budget vector b = (b1 , b2 , . . . , bN ) of storage costs is mainly incurred by the online learning procedure with offline
said to be feasible if there exists a feasible cache placement historical information and collected online feedback. In the
scheme under which all storage cost constraints in (5) can be following, we first consider the impact of Hmin on the regret
satisfied. We define the set of all feasible budget vectors as the bound under a fixed value of time horizon length T. Note
maximal feasibility region of the system, which is denoted by that when Hmin = 0, our problem degenerates to the special
the set B. The following theorem shows that all virtual queues case without offline historical information, as considered in our
are strongly stable under CPHBL when b is an interior point previous work [1]. Inthis case, the whole regret bound is in
of B. the order of O(1/V + (log T)/T). When the offline historical
Theorem 1: Suppose that the budget vector b lies in the information is available (i.e., Hmin > 0), the regret bound
interior of B, then the time-averaged storage cost constraints would be even lower. Specifically, we consider the following
in (5) are satisfied under CPHBL. Moreover, the virtual queues four cases under a fixed value of T.2
defined in (13) are strongly stable and satisfy 1) The first case is when Hmin = O(1), i.e., a constant value
 unrelated to T. Compared to the scenario without offline
1
T−1
B + V n∈N 2Kn Mn
lim sup E[Qn (t)] ≤ (17) historical information, though the value of regret bound
T→∞ T  reduces in this case, its order remains to be O(1/V +
t=0 n∈N 
where  is a positive constant, which satisfies that γ − 1 [1 (log T)/T).
is the (M + 1)-dimensional all-ones vector] is still an interior 2) The second case is when Hmin = (T), i.e., the number
point of the maximal of offline historical observations is comparable to the
 feasibility region, and the parameter B length of time horizon. In thiscase, the regret bound is
is defined as B  n∈N (b2n + α 2 Mn2 )/2.
The proof of Theorem 1 is given in Appendix B. still in the order of O(1/V + (log T)/T).
Remark 4: Theorem 1 shows that CPHBL ensures the 3) The third case is when Hmin = (T log T). In this case,
stability of virtual queue backlogs {Qn (t)}n . Moreover, the under a sufficiently great length of√time horizon T, the
time-averaged total backlog size of such virtual queues is lin- regret bound approaches O(1/V + 1/T).
early proportional to the value of parameter V. In other words, 4) The fourth case is when Hmin = (T 2 log T), i.e.,
given that vector b is interior to the maximal feasibility region, there is adequate offline historical information. In this
under CPHBL, the time-averaged total storage cost is tunable case, each EFS proactively leverages offline histori-
and guaranteed to be under the given budget. cal information to acquire highly accurate estimations
on file popularities. As a result, the last term in the
regret bound becomes even smaller, and the second term
B. Regret Bound
becomes dominant. Therefore, the order of the regret
Our second theorem provides an upper bound for the regret decreases to O(1/V + 1/T).
incurred by CPHBL over time. When it comes to the impact of time horizon length T, the
Theorem 2: Under CPHBL, the regret (7) over time horizon regret bound decreases and approaches B/V as the value of T
T is upper bounded as follows: increases. In summary, given a longer time horizon length and
 more historical information (i.e., larger values of T and Hmin ),
B 4 n∈N Kn Mn log T
Reg(T) ≤ + +Γ (18) CPHBL achieves a better regret performance. Such results are
V T T + Hmin also verified by numerical simulation in Section VII-C.

in which we define the
 constants B  n∈N (b2n + α 2 Mn2 )/2
  VI. DRL-BASED B ENCHMARK D ESIGN
and Γ  2 n∈N Kn 6Mn f ∈F Lf . Here, T is the time hori-
In recent years, DRL has been widely adopted in various
zon length and Hmin  minn,f Hn,f is the minimal number of
fields to conduct goal-directed learning and sequential decision
offline historical observation among all EFSs and files.
making [42], [43]. It deals with agents that learn to make better
The proof of Theorem 2 is given in Appendix C.
sequential decisions by interacting with the environment with-
Remark 5: In (18), the term B/V is mainly incurred by bal-
out complicated modeling and too much domain knowledge
ancing the cache hit reward and the storage cost constraints.
requirement. In this section, to compare our scheme CPHBL
Intuitively, the larger the value of V, the more focus CPHBL
with DRL-based approaches, we propose a novel cache place-
puts on maximizing cache hit rewards, and hence, a smaller
ment scheme with DRL called CPDRL as a baseline for
regret. Nonetheless, this also comes with an increase in the
evaluation.
total size of virtual queue backlogs, which is unfavorable for
keeping storage costs under the budget. In contrast, the smaller 2 The notations O, , and are all asymptotic notations introduced in [41].

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14692 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

Fig. 5. Design of the policy network for agent n. The cache placement
scheme of agent n is designed as an FNN with one hidden layer of dimension
512, followed by a ReLU activation function.

B. Detailed Design of CPDRL


Considering the limitation of existing DRL techniques,
when solving for the cache placement problem (6), we ignore
the storage cost constraints (5) in the design of CPDRL. In
Fig. 4. Overview of CPDRL design. The environment is partitioned into this section, we show our detailed design of CPDRL in terms
N independent subenvironments, each for an agent (EFS). Note that we do of the state representation, agent action, and reward signal for
not show the CFS in the subenvironment block. However, in each time slot, a particular agent n.3
each EFS may interact with the CFS for file downloading. In our model, the
CFS is assumed to provide simultaneous and independent file deliveries to all 1) State Representation: We define the observed environ-
EFSs. ment state by agent n in time slot t as Sn (t)  Xn (t − 1), i.e.,
the cache placement on the EFS n in the previous time slot
(t − 1).
A. Overall Design of CPDRL
2) Agent Action: We define the action of agent n in time
The overview design of CPDRL is shown in Fig. 4. Under slot t as a tuple An (t) ∈ A  {(f , x)|f ∈ F, x ∈ {0, 1}}.
CPDRL, we view each EFS n ∈ N as a DRL agent n, which Action An (t) = (f , x) means that the agent n updates the cache
interacts with the environment over time slots. As a result, placement decision for file f on EFS n in time slot t to x. When
the original problem turns into a multiagent DRL problem x = 1, file f will be cached on EFS n; otherwise, file f will
with N agents. Note that under our settings, such an N-agent not be cached on EFS n.
DRL problem can be decomposed into N single-agent DRL 3) Reward Design: The reward received by agent n in time
subproblems since there is no coupling among the agents’ slot t is set as the cache hit reward Rn (t) defined in (4) of
decision makings. The reasons are shown as follows. First, Section III-F.
the CFS provides simultaneous and independent file deliver- 4) Policy Network: We design each agent n’s cache place-
ies to all EFSs. Second, recall that each IoT user is served ment scheme as a feedforward neural network (FNN) [44]
by one and only one EFS, and thus, the subsets of IoT with one hidden layer of dimension 512, followed by a ReLU
users that are associated with EFSs are disjoint. Based on activation function. We show such a network design in Fig. 5.
the above two properties, the decision making on each EFS As shown in the figure, the policy network takes the observed
has no impact on the decisions on other EFSs. Therefore, environment state as input. When given input Sn (t), a prob-
the environment can be partitioned into N independent suben- ability distribution πθn (·|Sn (t)) over the action space A will
vironments and each agent n only interacts with its related be output from the network. Note that such a policy network
subenvironment n. As a result, under CPDRL, each agent design requires the number of files F to be fixed. The change
(EFS) solves for a single-agent DRL subproblem indepen- in the value of F would require the reconstruction and retrain-
dently. Next, we introduce the basic settings of the single-agent ing of the policy network. In each time slot, a candidate
DRL system. action will be sampled from set A according to the distribution
In a classical single-agent DRL system, there is an agent πθn (·|Sn (t)). The cache placement will be updated accordingly
that interacts with its environment over iterations. At the begin- if the sampled action satisfies the storage capacity constraint
ning of each iteration t, the agent observes some representation in (1); otherwise, the cache placement on EFS n will remain
of the environment’s state S(t). In response, the agent takes unchanged.
an action A(t) based on its maintained policy πθ . The pol-
icy πθ is parameterized by a deep neural network (DNN)
C. CPDRL Workflow
with parameter θ . After the agent performs the action A(t),
it observes a new state S(t + 1) and receives a reward R(t). We show the pseudocode of CPDRL in Algorithm 2. The
Based on the gained information, the agent improves its pol- operation of CPDRL is composed of two processes: 1) the
icy πθ to maximizethe time-averaged expected reward it
T−1 t 3 In this work, for simplicity, we assume that all of the N agents share
receives, i.e., E[1/T t=0 γ R(t)]. Here, γ ∈ [0, 1] is called
the same DRL design, including the same policy network structure and train-
the discount rate and it determines the present value of future ing parameters. In practice, one can employ heterogeneous DRL designs for
rewards. different agents to adapt to more general scenarios.

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14693

Algorithm 2 CPDRL
1: Initialize X n (−1) = 0 and the policy network πθn for each EFS
n ∈ N.
2: for each time slot t ∈ {0, 1, · · · , T − 1} do
3: for each EFS n ∈ N do
%Cache Placement
4: Observe state Sn (t) ← Xn (t − 1).
5: Sample a candidate action (f , Xn,f  (t)) from A according

to πθn (·|Sn (t)).


 (t), . . . , X
Set xn ← (Xn,1 (t − 1), . . . , Xn,f
6: n,F (t − 1)).
 (a)
7: if xn satisfies the constraint (1) then
8: Set An (t) ← (f , Xn,f (t − 1)).
9: else
10:  (t)).
Set An (t) ← (f , Xn,f
11: end if
12: Perform action An (t) and then receive a reward of Rn (t).
%Policy Update
13: if 1 ≤ t ≤ T0 and t % l = 0 then
14: Train policy network πθn using the collected
information from time slots (t − l + 1) to t.
15: end if
16: end for
17: end for (b)

cache placement process and 2) the policy update process. In


the cache placement process, under CPDRL, each EFS makes
cache placement decisions based on its current policy network.
In the policy update process, each EFS adopts the policy gra-
dient [45] method to train its policy network with the collected
online feedback. Note that each of the networks is trained dur-
ing the first T0 time slots on a batch basis, and the length of
(c)
each batch is set uniformly as l.
Fig. 6. Performance of CPHBL with different values of V. (a) Time-averaged
D. Comparison With CPHBL storage cost on EFS 1. (b) Storage costs on each EFS. (c) Regret and total
storage costs.
In comparison with CPHBL, CPDRL has the follow-
ing limitations. First, it requires the heuristic techniques of
network-training or hyperparameter tuning. Second, its effec- each user k’s requests are generated from a Zipf distribution
tiveness can only be justified by experimental simulations with a skewness parameter γ ∈ [0.56, 1.2]. Note that such
without theoretical performance guarantee. Third, it cannot skewness parameters are fixed but unknown to the EFSs. We
deal with the stochastic time-averaged storage cost con- set the storage cost budget bn to be 8 units for each EFS
straints. Finally, it provides few insightful explanations for n ∈ N.
the resulting decision makings and system performances. In
comparison, by employing MAB methods, CPHBL enjoys the
B. Performance of CPHBL With Fixed-Time Horizon Length
advantages of a more lightweight implementation, theoretical
and Fixed Number of Offline Historical Observations
tractability, and the applicability to time-averaged constraints.
Besides, the design of CPHBL also leads to insightful expla- In this section, we investigate the performance of CPHBL
nations for the online decision making in previous sections by fixing the time horizon length T as 5 × 106 time slots and
(see remarks in Sections IV and V). We further compare the number of offline historical observations Hn,f as 1000 for
the performance of CPHBL and CPDRL with numerical all n ∈ N , f ∈ F (i.e., Hmin = 1000).
simulations in Section VII-B. Performance of CPHBL Under Different Values of V: In
Fig. 6(a), we take the first EFS (EFS 1) as an example to
illustrate how the time-averaged storage cost on each EFS
VII. N UMERICAL R ESULTS
changes over time under different values of V. Particularly,
A. Simulation Settings on EFS 1, the time-averaged storage cost approaches the cost
We consider a fog-assisted IoT system with 1 CFS, 4 EFSs budget b1 = 8 units as time goes by. Moreover, the greater
(N = 4), and 20 IoT users (K = 20). Each user is uniformly the value of V, the longer the convergence time. For example,
randomly assigned to one of the EFSs. The file set F on the the convergence time extends from 4000 time slots to about
CFS consists of 20 files (F = 20) with different file sizes 10 000 time slots as the value of V increases from 30 to 50.
Lf ∈ {1, 2, 4, 8}. The storage capacity of each EFS is Mn = 16 This shows that the larger values of V lead to a longer time
units. We set the unit storage cost as α = 1. We assume that for convergence. Fig. 6(b) evaluates the time-averaged storage

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14694 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

(a)

Fig. 8. Regret of CPHBL and its variants.

increases from 6 to 10. Under the same settings, Fig. 7(b)


shows that the time-averaged total cache hit reward increases
by 30.71%. The reason is that with more budget, each EFS
would store more files to further maximize the cache hit
rewards. Fig. 7(c) illustrates the regret under different stor-
(b)
age cost budgets. The results verify our theoretical analysis
in (17) about the proportional growth of regret with respect
to the storage cost budget. The reason is that under CPHBL,
each EFS would explore more files when given more budget,
thereby resulting in a higher regret.
CPHBL Versus Its Variants: In Section IV-B, the con-
fidence radius in (10) measures the uncertainty about the
empirical reward estimate. The larger the confidence radius,
the greater the necessity of exploration for the corresponding
(c) file. Accordingly, each EFS is more prone to caching under-
explored files. To investigate how the regret changes under
Fig. 7. Performance of CPHBL given different storage budgets (bn ). (a) Time- different exploration strategies, we propose two types of vari-
averaged total storage costs. (b) Time-averaged total cache hit reward. (c)
Regret. ants for CPHBL: one leveraging ε-greedy method and the
other employing UCB-like methods. More detail is specified
as follows.
1) CPHBL-greedy: CPHBL-greedy differs from CPHBL in
cost on each EFS incurred by CPHBL under different val- the cache placement phase (lines 7–9 in Algorithm 1).
ues of V. As the value of parameter V increases, the storage Specifically, it replaces the HUCB1 estimates
cost on each EFS keeps increasing until it reaches the budget {d̃n,f (t)}n,f with empirical means {d̄n,f (t)}n,f in
bn = 8 units. Such results show that the time-averaged storage function S ET C ACHE P LACEMENT. Recall that d̄n,f (t)
cost constraints in (5) are strictly satisfied under CPHBL. denotes the empirical mean that involves both offline
Next, we switch to the evaluations of regrets and total stor- historical observations and online feedbacks. Then, it
age costs incurred by CPHBL with different values of V. adopts the ε-greedy method within the cache placement
As shown in Fig. 6(c), there is a notable reduction in the phase. With probability ε, each EFS n selects files
regret as the value of V increases. Such results imply that uniformly randomly from subset Fn,1 (t) to cache. With
CPHBL can achieve a lower regret with a larger value of V. probability 1 − ε, files with the empirically highest
Moreover, when the value of V is sufficiently large (V ≥ 40), reward estimates are chosen to be cached. Intuitively,
the regret value stabilizes at around 38.01. This verifies our CPHBL-greedy spends about a proportion ε of time for
previous analysis in Theorem 2 about the term B/V in the uniform exploration and the rest (1 − ε) proportion of
regret bound (18). Besides, as the value of V increases, we time for exploitation.
also see an increase in the total storage costs which even- 2) CPHBL-UCBT: CPHBL-UCBT replaces the HUCB1
tually reach the budget when V ≥ 40. Overall, the results in estimate (line 4 in Algorithm 1) with UCB1-tuned
Fig. 6(b) and (c) verify the tunable tradeoff between the regret (UCBT) estimate [46] while the rest remains the same
value and total storage costs. as CPHBL.
Performance of CPHBL Under Different Settings of Storage We compare the regret value of CPHBL against CPHBL-
Cost Budget bn : Next, we select different values for the stor- greedy (ε ∈ {0, 0.01, 0.1}) and CPHBL-UCBT in Fig. 8
age cost budget bn of each EFS n to investigate their impacts under different values of V. Regarding the variants of CPHBL,
on system performances. Fig. 7 shows our simulation results. interestingly, although CPHBL-greedy with ε = 0 intuitively
From Fig. 7(a), we see that given V = 50, the time-averaged discards the chance of uniform exploration in the online learn-
total storage costs increase by 60.93% as the value of bn ing phase, it still achieves a regret performance that is close to

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14695

be downloaded from the CFS and cached on the EFS by


replacing the least recently used files.
5) LFUDA: As a variant of LFU, LFUDA outperforms LFU
when the distributions of file demands are nonstationary
over time. Under LFUDA, each EFS maintains a cache
age, which is initialized to be zero and a key value for
each file. When a file is requested, its key value is set
to be the sum of the cache age and the number of times
that the file has been requested. If the requested file
is not in the EFS’s cache, the requested file would be
(a) downloaded from the CFS to the EFS and cached on the
EFS by replacing the files with the minimal key values.
When files are replaced, the cache age will be updated
to the minimal key value of the replaced files within the
time slot.
We show the simulation results in Fig. 9. The cache hit
rewards and total storage costs of the five baseline schemes
(MCUCB, CPDRL, LFU, LRU, and LFUDA) remain constant
given different values of V. This is because their decision
making does not involve parameter V. From Fig. 9, we
see that CPHBL achieves the lowest cache hit reward while
MCUCB achieves the highest cache hit reward. Particularly,
given V = 50, compared to MCUCB, CPHBL achieves a
(b)
38.85% lower total cache hit reward. In comparison with the
Fig. 9. Comparison between CPHBL and baseline schemes. (a) Time- other three baseline schemes, the DRL-based scheme CPDRL
averaged total cache hit reward. (b) Time-averaged total storage costs. achieves the worst performance in terms of the cache hit
reward. The reason is that it cannot learn efficiently from
limited online feedback information.
CPHBL, CPHBL-UCBT, and CPHBL-greedy with ε = 0.01. However, except CPHBL, the other five schemes fail to
The reason is that CPHBL-greedy with ε = 0 can resort to the ensure the storage cost constraints in (5).4 More specifi-
storage cost constraint guarantee in the online control phase cally, given V = 50, when compared to the five base-
to conduct enforced exploration. In comparison, the regret of line schemes (MCUCB, CPDRL, LFU, LRU, and LFUDA),
CPHBL-greedy with ε = 0.1 still performs inferior to other CPHBL achieves 50.00%, 45.67%, 50.00%, 49.93%, and
schemes due to its overexploration. 50.00% reductions in the total storage costs, respectively. Note
CPHBL Versus Other Baseline Schemes: We also compare that such results verify the advantage of our scheme over
the performances of CPHBL with five baseline schemes: 1) DRL-based approaches.
MCUCB [37]; 2) CPDRL; 3) least frequently used (LFU) [47];
4) least recently used (LRU) [47]; and 5) least frequently used
with dynamic aging (LFUDA) [48]. Below, we demonstrate C. Performance of CPHBL With Different Values of Time
how each of them proceeds in detail, respectively. Horizon Length and Numbers of Offline Historical
1) MCUCB: Under MCUCB [17], a modified combinatorial Observations
UCB scheme is used to estimate file popularities and In this section, we investigate the impacts of time horizon
decide cache placement during each time slot. length T and the number Hmin of offline historical observa-
2) CPDRL: The detailed design of CPDRL is presented tions5 on the regret of CPHBL. We take the case when V = 50
in Section VI. In the simulation, we set the network as an example for illustration. The results are shown in Fig. 10.
training parameter as T0 = 106 and l = 10. The policy In Fig. 10(a), we present the regret performances over a
network parameters are updated using the RMSprop [49] constant time horizon length T under fixed values of Hmin .
algorithm with a learning rate of 10−5 . Specifically, each curve corresponds to the result under a con-
3) LFU: Under LFU, each EFS maintains a counter for stant value of Hmin ∈ {0, 2000, 5000} (independent of T). Note
each of its cached files. Each counter records the number that when Hmin = 0, there is no offline historical information.
of times that its corresponding file has been requested
on the EFS. If a requested file is not in the cache, the 4 Recall that the storage cost budget on each EFS is set as b = 8 units in
requested file would be downloaded from the CFS and n
our simulations. Accordingly, the total time-averaged storage costs of the four
cached on the EFS by replacing the least frequently used EFSs should not exceed 32 units. However, the total time-averaged storage
files therein. costs all exceed 55 units under the five baseline schemes.
5 In our simulations, the number H
4) LRU: Under LRU, each EFS records the most recently n,f of offline historical observations on
EFS n for file f is set to be identical for all n ∈ N and f ∈ F . Therefore, by
requested time slot for each of its cached files. If a the definition that Hmin  minn,f Hn,f , we have Hn,f = Hmin for all n ∈ N
requested file is not in the cache, the requested file would and f ∈ F .

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14696 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

CMAB problem, we devised a novel proactive cache place-


ment scheme called CPHBL with an effective integra-
tion of online control, online learning, and offline histor-
ical information. The results from our theoretical analysis
and numerical simulations showed that our devised scheme
achieves a near-optimal total cache hit reward under storage
cost constraints with a sublinear time-averaged regret. To the
best of our knowledge, our work provided the first system-
(a) atic study on the synergy of online control, online learning,
and offline historical information. Our results not only revealed
novel insights to the designers of caching-enabled fog-assisted
IoT systems but also verified the advantage of CPHBL over
the DRL-based approach.

A PPENDIX A
A LGORITHM D EVELOPMENT
We define a Lyapunov function as follows:
1 
L(Q(t))  (Qn (t))2 (19)
2
(b) n∈N

in which Q(t) = (Q1 (t), Q2 (t), . . . , QN (t)) is the vector of all


Fig. 10. Regret of CPHBL. (a) Regret with fixed values of Hmin when
V = 50. (b) Regret with different values of Hmin when V = 50. virtual queues. Then, we have

L(Q(t + 1)) − L(Q(t))


1  
On the one hand, given a fixed number Hmin of offline his- = (Qn (t + 1))2 − (Qn (t))2
2
torical observations, the results show n∈N
 that the regret value is
reduced by an order of O(1/V + (log T)/T).6 On the other 1  2 
≤ bn + (Cn (t))2 + 2Qn (t)(Cn (t) − bn ) . (20)
hand, given a fixed value of T, CPHBL achieves a lower 2
n∈N
regret with more offline historical observations. However, as 
the value of T becomes sufficiently large, the regret reduction Since Cn (t) = f ∈F αLf Xn,f (t) ≤ αMn , it follows that:
turns negligible. For example, as the value of Hmin increases 
from 0 to 5000, the regret reduces by 0.74% when T = 106 , L(Q(t + 1)) − L(Q(t)) ≤ B + Qn (t)(Cn (t) − bn ) (21)
but only by 0.15% when T = 5 × 106 . n∈N
In Fig. 10(b), we compare the regret performances 
under different values of Hmin over various time horizon where B  1/2 n∈N (b2n + α 2 Mn2 ). Next, we define the
lengths. Specifically, we consider the cases when Hmin ∈ Lyapunov drift L(Q(t)) as
{0, 0.1T, T, T log T}. As shown in the figure, when the  
value of Hmin is small (e.g., when Hmin ≤ 0.1T), even a slight L(Q(t))  E L(Q(t + 1)) − L(Q(t))|Q(t) . (22)
increase in the offline historical information brings a noticeable
Then, by (21), we have
improvement to the regret performance. However, as the value
 
of Hmin increases, the degree of regret reduction becomes less 
significant. For example, given T = 105 , the regret reduces by L(Q(t)) ≤ B + E Qn (t)(Cn (t) − bn )|Q(t) . (23)
5.71% as the value of Hmin increases from 0 to 0.1T, but only n∈N
by 1.09% from 0.1T to T. All of the above results verify our We consider an optimal cache placement scheme, which makes
theoretical analysis in Theorem 2 (see Section V). i.i.d. cache placement decisions X∗ (t) in each time slot t. Then,
the optimal time-averaged expected total reward of all EFSs is

1  
T−1
VIII. C ONCLUSION
R∗ = E R̂n X∗n (t) . (24)
In this article, we considered the cache placement problem T
t=0 n∈N
with unknown file popularities in caching-enabled fog-assisted
IoT systems. By formulating the problem as a constrained According to (7), the regret of the cache placement scheme
{X(t)}t over T time slots is

1  
6 In Fig. 10(a), we provide a curve of 38 + 300 (log T)/T as an envelope T−1

of O(1/V + (log T)/T) for illustration. Note that since V is fixed, 1/V can Reg(T) = E R̂n X∗n (t) − R̂n (Xn (t)) . (25)
T
be viewed as a constant term. t=0 n∈N

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14697

By the definition of reward R̂n (·) in (4), it follows that:  that dn,f , d̃n,f (t) ∈ [0, Kn ]
where inequality (a) holds because
and inequality (b) is due to that f ∈F Lf Xn,f (t) ≤ Mn . Then,
1   
T−1
∗ it follows that:
Reg(T) = Lf E Dn,f (t)Xn,f (t) 
T
t=0 n∈N f ∈F V (Q(t)) ≤ B + VKn Mn
  n∈N
− E Dn,f (t)Xn,f (t) . (26) ⎡ ⎤

Since the cache placement decision Xn,f (t) is determined when + VE⎣ ∗
Lf dn,f Xn,f (t)|Q(t)⎦
Dn,f (t) is unknown, Xn,f (t) is independent of Dn,f (t). On the n∈N f ∈F
∗ (t) is i.i.d. over time slots and it is also
 
other hand, Xn,f 
independent of Dn,f (t). Then, by E[Dn,f (t)] = dn,f , we have +E Qn (t)(Cn (t) − bn )|Q(t)
n∈N
⎡ ⎤
1
T−1   

Reg(T) = Lf dn,f E Xn,f (t) − Xn,f (t) . (27) − VE⎣ Lf d̃n,f (t)Xn,f (t)|Q(t)⎦. (33)
T
t=0 n∈N f ∈F n∈N f ∈F

We define the one-time-slot regret in each time slot t as Substituting (2) and (4) into above inequality, we have
    
Reg (t) 

Lf dn,f Xn,f (t) − Xn,f (t) . (28) V (Q(t)) ≤ B + VKn Mn − Qn (t)bn
n∈N f ∈F n∈N n∈N
⎡ ⎤

The regret Reg(T) can be expressed as + VE⎣ ∗
Lf dn,f Xn,f (t)|Q(t)⎦
n∈N f ∈F
1 
T−1
 ⎡ ⎤
Reg(T) = E Reg (t) . (29) 
T − E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦ (34)
t=0
n∈N f ∈F
Then, we define the Lyapunov drift-plus-regret as
  where w̃n,f (t) is defined as
V (Q(t))  L(Q(t)) + VE Reg (t)|Q(t) . (30)  
w̃n,f (t)  Lf V d̃n,f (t) − αQn (t) . (35)
By (21) and (28), it follows that:
⎡ ⎤ To minimize the upper bound of drift-plus-regret V (Q(t))
 in (34), we switch to solving the following problem in each
V (Q(t)) ≤ B + VE⎣ (t)|Q(t)⎦

Lf dn,f Xn,f time slot t:
n∈N f ∈F 
  maximize w̃n,f (t)Xn,f (t)
 X(t)
+E Qn (t)(Cn (t) − bn )|Q(t) n∈N f ∈F

n∈N Lf Xn,f (t) ≤ Mn ∀n ∈ N
⎡ ⎤ subject to
 f ∈F
− VE⎣ Lf dn,f Xn,f (t)|Q(t)⎦. (31) Xn,f (t) ∈ {0, 1} ∀n ∈ N , f ∈ F. (36)
n∈N f ∈F
In fact, problem (36) can be further decoupled into N subprob-
Since d̃n,f (t) is the HUCB1 estimate of dn,f in time slot t such lems. For each EFS n ∈ N , we solve the following subproblem
that d̃n,f (t) ∈ [0, Kn ], we have for the cache placement vector Xn (t) in time slot t:
  
Lf dn,f Xn,f (t) = Lf d̃n,f (t)Xn,f (t) maximize w̃n,f (t)Xn,f (t)
Xn (t)
n∈N f ∈F n∈N f ∈F f ∈F
   
+ Lf dn,f − d̃n,f (t) Xn,f (t) subject to Lf Xn,f (t) ≤ Mn
n∈N f ∈F f ∈F

(a)  Xn,f (t) ∈ {0, 1} ∀f ∈ F. (37)


≥ Lf d̃n,f (t)Xn,f (t)
n∈N f ∈F
 
− Kn Lf Xn,f (t)
A PPENDIX B
n∈N f ∈F
 P ROOF OF T HEOREM 1
(b)
≥ Lf d̃n,f (t)Xn,f (t) By (23), we have
n∈N f ∈F  
 
− Kn Mn (32) L(Q(t)) ≤ B + E Qn (t)(Cn (t) − bn )|Q(t)
n∈N n∈N

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14698 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021


= B− bn Qn (t) the condition Q(t) in the conditional expectation shown in the
n∈N right-hand side of (41). Then, it follows that:
⎡ ⎤
 
+ E ⎣V Lf d̃n,f (t)Xn,f (t)|Q(t)⎦ L(Q(t)) ≤ B + V Kn Mn
n∈N

n∈N f ∈F
⎤   
 + Qn (t) E[Ĉn (Xn (t)] − bn . (42)
− E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦. (38) n∈N
n∈N f ∈F
By (40), we have
 
Since f ∈F Lf d̃n,f (t)Xn,f (t) ≤ Kn f ∈F Lf Xn,f (t) ≤ Kn Mn ,  
we have L(Q(t)) ≤ B + V Kn Mn −  Qn (t). (43)
  n∈N n∈N
L(Q(t)) ≤ B + V Kn Mn − bn Qn (t)
n∈N n∈N
Taking expectation at both sides of the above inequality and
⎡ ⎤ summing it over time slots {0, 1, . . . , T − 1}, we have

− E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦. (39)    
E L(Q(T)) − E L(Q(0))
n∈N f ∈F
 
T−1 
Next, we consider the following lemma. ≤ TB + TV Kn Mn −  E[Qn (t)].
Lemma 1: If the budget vector b is an interior point of n∈N t=0 n∈N
the maximal feasible region B, then there exists a feasible (44)
scheme, which makes i.i.d. decision over time independent of
the virtual queue backlog sizes. We divide at both sides by T and rearrange the terms. Then,
The proof is omitted since it is quite standard as shown in by the fact L(Q(0)) = 0 and L(Q(T)) ≥ 0, we have
the proof of [11, Lemma 1]. Based on Lemma 1, we begin 
1
T−1
to prove Theorem 1. By our assumption in Theorem 1 that b B+V n∈N Kn Mn
E[Qn (t)] ≤ . (45)
is an interior point of B, there must exist some  > 0 such T 
t=0 n∈N
that b − 1 is also an interior point of B. Here, 1 denotes
the N-dimensional all-ones vector. Then, by Lemma 1, since By taking the limsup of the left-hand side term in the above
b − 1 lies in the interior of B, there exists a feasible scheme inequality as T → ∞, we obtain
that makes i.i.d. decision over time independent of the virtual

1
T−1
queue backlog sizes such that B+V n∈N Kn Mn
  lim sup E[Qn (t)] ≤ . (46)
E Ĉn Xn (t) ≤ bn −  ∀n ∈ N , t (40) T→∞ T 
t=0 n∈N
T−1
where X (t)  (X1 (t), X2 (t), . . . , XN (t)) is the cache This implies that lim supT→∞ 1/T t=0 E[Qn (t)] < ∞ and
placement decision vector during time slot t under the the virtual queueing process {Qn (t)}t defined in (13) is strongly
scheme. We denote the cache placement decision vector dur- stable for each EFS n ∈ N . Hence, the time-averaged storage
ing time slot t under our scheme CPHBL by Xc (t)  cost constraints in (5) are satisfied.
(Xc1 (t), Xc2 (t), . . . , XcN (t)), which is the optimal solution of
problem (36). Then, based on (39), we have
  A PPENDIX C
L(Q(t)) ≤ B + V Kn Mn − bn Qn (t) P ROOF OF T HEOREM 2
⎡ n∈N n∈N
⎤ By Lemma 1, since b lies in the interior of B, there exists
 an optimal scheme, which makes i.i.d. decision over time
− E⎣ 
w̃n,f (t)Xn,f (t)|Q(t)⎦ independent of the virtual queue backlog sizes such that
n∈N f ∈F  
 
= B+V Kn Mn − V 
Lf d̃n,f (t)Xn,f (t) E Ĉn X∗n (t) ≤ bn ∀n ∈ N , t (47)
n∈N n∈N f ∈F
  where X∗ (t)  (X∗1 (t), X∗2 (t), . . . , X∗N (t)) is the cache place-
  
+E Qn (t) Ĉn Xn (t) − bn |Q(t) ment decision vector in time slot t under the optimal scheme.
n∈N By the inequality in (21) and definition (28), under CPDBL,
 we have
≤ B+V Kn Mn
n∈N L(Q(t + 1)) − L(Q(t)) + V Reg (t)
   
   
+E Qn (t) Ĉn Xn (t) − bn |Q(t) (41) ≤B+ Qn (t) Ĉn Xcn (t) − bn
n∈N n∈N
  

where the last inequality is using the fact that w̃n,f (t) ≥ 0 and −V Lf dn,f Xn,f (t) − Xn,f
c
(t) . (48)
 (t) ≥ 0. Since X  (t) is independent of Q(t), we can drop
Xn,f n∈N f ∈F

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14699

The inequality above can be equivalently written as The last equality holds because that Ĉn (X∗n (t)) is independent
of Q(t). By the inequalities in (47), it follows that:
L(Q(t + 1)) − L(Q(t)) + V Reg (t)  
   E L(Q(t + 1)) − L(Q(t))|Q(t)
≤B+ Qn (t) Ĉn X∗n (t) − bn    
+ VE Reg (t)|Q(t) ≤ B + E 1 (t)|Q(t) . (57)
n∈N
⎛ ⎞
  Taking expectation at both sides of above inequality, we have
+ ⎝ ∗
VLf dn,f Xn,f (t) − Qn (t)Ĉn (X∗n (t))⎠    
n∈N f ∈F
E L(Q(t + 1)) − L(Q(t)) + VE Reg (t) ≤ B + E[1 (t)].
⎛ ⎞ (58)
 
− ⎝ c
VLf dn,f Xn,f (t) − Qn (t)Ĉn (Xcn (t))⎠. (49) Summing above inequality over time slots {0, 1, . . . , T − 1}
n∈N f ∈F and dividing it at both sides by TV, we have
Substituting (2) into the above inequality, we have    
1 
T−1
E L(Q(T)) E L(Q(0)) 
− + E Reg (t)
L(Q(t + 1)) − L(Q(t)) + V Reg (t) TV TV T
t=0
  
≤B+ Qn (t) Ĉn X∗n (t) − bn B 1 
T−1
≤ + E[1 (t)]. (59)
n∈N V TV
 t=0

+ Lf Vdn,f − αQn (t) Xn,f (t)
n∈N f ∈F
Since L(Q(0)) and L(Q(T)) are both nonnegative, it follows
 that:
− Lf Vdn,f − αQn (t) Xn,f
c
(t). (50)
1 
T−1
n∈N f ∈F 
Reg(T) = E Reg (t)
T
For each EFS n ∈ N and each file f ∈ F, we define t=0

1 
T−1
B
wn,f (t)  Lf Vdn,f − αQn (t) . (51) ≤ + E[1 (t)]. (60)
V TV
t=0
Then, inequality (50) can be simplified as Next, we bound 1 (t) to obtain the upper bound of the regret
Reg(T).
L(Q(t + 1)) − L(Q(t)) + V Reg (t)
  
≤B+ Qn (t) Ĉn X∗n (t) − bn A. Bounding 1 (t)
n∈N
   Consider a cache placement scheme that makes a placement

+ wn,f (t) Xn,f (t) − Xn,f
c
(t) . (52) decision during each time slot t, denoted by vector X (t) 
n∈N f ∈F (X1 (t), X2 (t), . . . , XN (t)) with each entry Xn (t) as the optimal
solution of the following problem:
For simplicity of expression, we define 
   maximize wn,f (t)Xn,f (t)
∗ Xn (t)
1 (t)  wn,f (t) Xn,f (t) − Xn,f
c
(t) . (53) f ∈F

n∈N f ∈F
subject to Lf Xn,f (t) ≤ Mn
It follows that: f ∈F
Xn,f (t) ∈ {0, 1} ∀f ∈ F. (61)
L(Q(t + 1)) − L(Q(t)) + V Reg (t)
   Since X∗n (t) is a feasible solution of problem (61), we have
≤ B + 1 (t) + Qn (t) Ĉn (X∗n (t)) − bn . (54)  
 ∗
n∈N wn,f (t)Xn,f (t) ≥ wn,f (t)Xn,f (t). (62)
f ∈F f ∈F
Taking conditional expectation at both sides of the above
inequality, we have It follows that:
  
    1 (t) = ∗
wn,f (t) Xn,f (t) − Xn,f
c
(t)
E L(Q(t + 1)) − L(Q(t))|Q(t) + VE Reg (t)|Q(t)
  n∈N f ∈F
≤ B + E 1 (t)|Q(t)   
  ≤ 
wn,f (t) Xn,f (t) − Xn,f
c
(t)
  
∗ n∈N f ∈F
+E Qn (t) Ĉn (Xn (t)) − bn |Q(t) (55)  

n∈N 
  ≤ wn,f (t) Xn,f (t) − Xn,f
c
(t)
= B + E 1 (t)|Q(t) n∈N f ∈F
       
+ Qn (t) E Ĉn X∗n (t) − bn . (56) + c
w̃n,f (t) Xn,f 
(t) − Xn,f (t) . (63)
n∈N n∈N f ∈F

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14700 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

The last inequality holds since Xcn (t) is the optimal solution of Then, we rewrite the upper bound of 2 (t) in (70) as
problem (14) but Xn (t) is only a feasible solution. Rearranging 
the right-hand side of (63), we obtain 2 (t) ≤ Lf φ2,n,f (t). (72)
 n∈N f ∈F
1 (t) ≤ w̃n,f (t) − wn,f Xn,f
c
(t) (1)
n∈N f ∈F
Let tn,f be the index of the first time slot in which file f is

+ 
wn,f − w̃n,f (t) Xn,f (t). (64)  on EFS n. We define event Un,f (t)  {d̄n,f (t) − dn,f >
cached
Kn ([3 log t]/[2(hn,f (t) + Hn,f )])} for each n ∈ N and f ∈ F.
n∈N f ∈F
Summing φ2,n,f (t) over time slots {0, 1, . . . , T − 1}, it turns
By (35) and (51), we have out that
  
T−1 
T−1   
w̃n,f (t) − wn,f = Lf V d̃n,f (t) − αQn (t) − Lf Vdn,f − αQn (t) φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
  t=0 t=0
= VLf d̃n,f (t) − dn,f . (65)  

T−1
 
≤ Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
Substituting (65) into (64), we obtain (1)
t=tn,f +1
  
1 (t) ≤ VLf d̃n,f (t) − dn,f Xn,f
c
(t) 
T−1    
n∈N f ∈F = Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
   (1)
t=tn,f +1

+ VLf dn,f − d̃n,f (t) Xn,f (t). (66)     
n∈N f ∈F × 1 Un,f (t) + 1 Un,f
c
(t)

Next, we define 
T−1  
  = Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)

2 (t)  Lf d̃n,f (t) − dn,f Xn,f
c
(t) (67) (1)
t=tn,f +1
 
n∈N f ∈F ×1 Gn,f (t) ∩ Un,f (t)
and  
T−1 
   + d̃n,f (t) − dn,f Xn,f
c
(t)

3 (t)  Lf dn,f − d̃n,f (t) Xn,f (t). (68) (1)
t=tn,f +1
n∈N f ∈F  
×1 Gn,f (t) ∩ Un,f
c
(t) . (73)
Then, the upper bound of 1 (t) in (66) can be rewritten as
Next, we define
1 (t) ≤ V(2 (t) + 3 (t)). (69)  
(1)
φ2,n,f (t)  d̃n,f (t) − dn,f Xn,f
c
(t)
In the following sections, we obtain the upper bounds of 2 (t)  
and 3 (t), respectively, to bound 1 (t). × 1 Gn,f (t) ∩ Un,f (t) (74)
and
B. Bounding 2 (t)  
(2)
φ2,n,f (t)  d̃n,f (t) − dn,f Xn,f
c
(t)
To derive the upper bound of 2 (t), we define the event  
Gn,f (t)  {d̃n,f (t) ≥ dn,f } for each n ∈ N and f ∈ F. Then, × 1 Gn,f (t) ∩ Un,fc
(t) . (75)
we have
   Then, we rewrite inequality (73) as the following equivalent
2 (t) = Lf d̃n,f (t) − dn,f Xn,f
c
(t) form:
n∈N f ∈F
  
T−1 
T−1
(1)

T−1
(2)
× 1{Gn,f (t)} + 1{Gcn,f (t)} φ2,n,f (t) ≤ Kn Xn,f
c
(t) + φ2,n,f (t) + φ2,n,f (t).
   t=0 (1)
t=tn,f +1
(1)
t=tn,f +1
= Lf d̃n,f (t) − dn,f Xn,f c
(t)1{Gn,f (t)}
(76)
n∈N f ∈F
   (1)
+ Lf d̃n,f (t) − dn,f Xn,f
c
(t)1{Gcn,f (t)} By (76), to bound φ2,n,f (t), we switch to bounding φ2,n,f (t)
(2)
n∈N f ∈F and φ2,n,f (t). In the following, we derive upper bounds for
     such two terms, respectively.
≤ Lf d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t) . (70)  (1)
First, we bound T−1(1) φ2,n,f (t). According to (74), we
n∈N f ∈F t=tn,f +1
have
The last inequality holds since when event Gcn,f (h) occurs, we

T−1 
T−1  
have d̃n,f (t) < dn,f and (d̃n,f (t) − dn,f )1{Gcn,f (t)} < 0. Next, (1)
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)
we define (1) (1)
  t=tn,f +1 t=tn,f +1
   
φ2,n,f (t)  d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t) . (71) × 1 Gn,f (t) ∩ Un,f (t) . (77)

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14701


!
When event Un,f (t) = {d̄n,f (t) − dn,f >  
 −3
= Kn Mn 1 + t
Kn ([3 log t]/[2(hn,f (t) + Hn,f )])} occurs, we consider
n∈N t=2
the following two cases.  " # ∞ $
3 
1) If  d̃n,f (t) = min{d̄n,f (t) + ≤ Kn Mn 1 + t−3 dt = Kn Mn . (82)
Kn ([3 log t]/[2(hn,f (t) + Hn,f )]), Kn } = Kn , then 1 2
n∈N n∈N
d̃n,f (t) ≥ dn,f , i.e., event Gn,f (t) occurs. T−1 (2)
2) If  d̃n,f (t) = min{d̄n,f (t) + Next, we consider the upper bound of (1) φ2,n,f (t).
t=tn,f +1
Kn ([3 log t]/[2(hn,f (t) +  H n,f )]), K n } = According to (75), we have
d̄n,f (t) + Kn ([3 log t]/[2(hn,f (t) + Hn,f )]),

T−1 
T−1  
then event Gn,f (t) still occurs, i.e., d̃n,f (t) > (2)
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)
dn,f + 2Kn ([3 log t]/[2(hn,f (t) + Hn,f )]). (1) (1)
t=tn,f +1 t=tn,f +1
Therefore, we have Un,f (t) ⊂ Gn,f (t), or equivalently,  
1{Gn,f (t) ∩ Un,f (t)} = 1{Un,f (t)}. It follows that: × 1 Gn,f (t) ∩ Un,f
c
(t) . (83)

T−1 
T−1     c (t) occurs, we have
(1) When event Un,f
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)1 Un,f (t) .
(1) (1)
t=tn,f +1 t=tn,f +1
3 log t
(78) d̃n,f (t) = min d̄n,f (t) + Kn , Kn
2 hn,f (t) + Hn,f
Since d̃n,f (t), dn,f ∈ [0, Kn ], we have d̃n,f (t)−dn,f ≤ Kn . Then,
3 log t
we have ≤ d̄n,f (t) + Kn (84)
2 hn,f (t) + Hn,f

T−1
(1)

T−1
 
φ2,n,f (t) ≤ c
Kn Xn,f (t)1 Un,f (t) . (79) thus
(1)
t=tn,f +1
(1)
t=tn,f +1  
d̃n,f (t) − dn,f = d̃n,f (t) − d̄n,f (t)
Taking expectation of (79) at both sides, we have
  3 log t

T−1
(1)
+ d̄n,f (t) − dn,f ≤ 2Kn . (85)
E φ2,n,f (t) 2 hn,f (t) + Hn,f
t=tn,f +1
Then, by (85) and Xn,f (t) ≤ 1, we have

T−1
 
≤ Kn E[Xn,f
c
(t)] Pr Un,f (t) 
T−1
(2)
(1)
t=tn,f +1
φ2,n,f (t)
(1)
t=tn,f +1

T−1
= Kn E[Xn,f
c
(t)] 
T−1
3 log t
(1)
t=tn,f +1
= c
2Kn Xn,f (t)
(1)
2(hn,f (t) + Hn,f )
t=tn,f +1
3 log t  
× Pr d̄n,f (t) − dn,f > Kn . (80) × 1 Gn,f (t) ∩ Un,f
c
(t)
2(hn,f (t) + Hn,f )
Using the Chernoff-Hoeffding bound [50], we have 
T−1
3 log t
≤ c
2Kn Xn,f (t)
2(hn,f (t) + Hn,f )
3 log t (1)
t=tn,f +1
Pr d̄n,f (t) − dn,f > Kn
2(hn,f (t) + Hn,f )
! 
T−1  c (t)
Xn,f
2 hn,f (t) + Hn,f
2
3 log t ≤ Kn 6 log T  . (86)
≤ exp − · Kn2 (1)
hn,f (t) + Hn,f
t=tn,f +1
(hn,f (t) + Hn,f )Kn n,f (t) + Hn,f )
2 2(h
= exp(−3 log t) = t−3 . (81) Since hn,f (t) ≤ T, we have
Then, it follows that: 1 hn,f (t) 1
   = ·
  T−1
 (1) hn,f (t) + Hn,f hn,f (t) + Hn,f hn,f (t)
Lf E φ2,n,f (t)
n∈N f ∈F t=t(1) +1 T 1
n.f ≤ · . (87)
⎡ ⎤ T + Hn,f hn,f (t)
∞ 
 
≤ Kn E⎣ c
Lf Xn,f (t)⎦t−3 Then, it follows that:
t=1 n∈N f ∈F
∞  
T−1 
T−1
6T log T 1
 (2)
φ2,n,f (t) ≤ Kn  . (88)
−3
≤ Kn Mn t T + Hn,f hn,f (t)
(1) (1)
t=1 n∈N t=tn,f +1 t=tn,f +1

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14702 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

(i)
Let tn,f be the ith time slot in which file f is cached on EFS n. C. Bounding 3 (t)
(h (T))
Then, tn,fn,f
is the time slot in which file f is lastly cached Recall by (68) and Gn,f (t)  {d̃n,f (t) ≥ dn,f } that
before time slot T. Accordingly, we have   

3 (t) = Lf dn,f − d̃n,f (t) Xn,f (t)
hn,f (T)

T−1
1  1 n∈N f ∈F
 =    

(1)
hn,f (t) (i)
hn,f (tn,f ) = Lf dn,f − d̃n,f (t) Xn,f (t)
t=tn,f +1 i=2
n∈N f ∈F
hn,f (T) hn,f (T)−1     
 1  1 × 1 Gn,f (t) + 1 Gcn,f (t)
√ = √     
i−1 i 
i=2
#
i=1 ≤ Lf dn,f − d̃n,f (t) Xn,f (t)1 Gcn,f (t) . (94)
hn,f (T)
1  
n∈N f ∈F
≤ √ di = 2 hn,f (T) − 1
1 i
 Then, we define
≤ 2 hn,f (T). (89)
   

φ3,n,f (t)  dn,f − d̃n,f (t) Xn,f (t)1 Gcn,f (t) (95)
It follows that:

 whereby the upper bound of 3 (t) in (94) can be written as


6T log T 
T−1
(2)
φ2,n,f (t) ≤ 2Kn hn,f (T). (90) 
(1)
T + Hn,f 3 (t) ≤ Lf φ3,n,f (t). (96)
t=tn,f +1
n∈N f ∈F

Combining (72), (76), (82), and (90), we have (1)


Next, we consider the case where t ≤ tn,f and the case where
(1) (1)

T−1  
T−1
  t ≥ tn,f + 1, respectively. When t ≤ tn,f , we have d̃n,f (t) = Kn .
E[2 (t)] ≤ Lf E φ2,n,f (t) Then, the event Gcn,f (t) = {d̃n,f (t) < dn,f } would not occur
t=0 n∈N f ∈F t=0 since dn,f ≤ Kn . Therefore, φ3,n,f (t) = 0 when t ≤ tn .
5  When t ≥ tn,f (1)
+ 1, suppose that event Gcn,f (t) occurs.
≤ Kn Mn
2 Then, we have d̃n,f (t)< dn,f ≤ Kn , which implies that
n∈N
 d̃n,f (t) = d̄n,f (t) + Kn ([3 logt]/[2(hn,f (t) + Hn,f )]). It fol-
6T log T 
+2 Lf Kn hn,f (T) lows that dn,f > d̄n,f (t) + Kn ([3 log t]/[2(hn,f (t) + Hn,f )]).
T + Hn,f Hence, we bound E[φ3,n,f (t)] as follows:
n∈N f ∈F
5       
≤ Kn Mn E φ3,n,f (t) = E dn,f − d̃n,f (t) Xn,f 
(t)1 Gcn,f (t)
2
n∈N   

6T log T    ≤ E Kn Xn,f (t)1 Gcn,f (t)
+2 Kn Lf hn,f (T) (91)   
T + Hmin = Kn Xn,f
(t)E 1 Gcn,f (t)
n∈N f ∈F
 

= Kn Xn,f (t) Pr Gcn,f (t)
where we define a nonnegative integerHmin  minn,f Hn,f .
c (t) ≤ M 
The last inequality holds because that f ∈F Lf Xn,f n ≤ Kn Xn,f (t)
for each n ∈ N . On the other hand, by Jensen’s inequality, we
3 log t
have × Pr dn,f > d̄n,f (t) + Kn .
2 hn,f (t) + Hn,f

 Lf  f ∈F Lf hn,f (T) (97)
 hn,f (T) ≤ 
f ∈F f ∈F Lf f ∈F Lf
By the Chernoff-Hoeffding bound, we have
Mn T
≤  . (92) 3 log t
f ∈F Lf Pr dn,f > d̄n,f (t) + Kn
2(hn,f (t) + Hn,f )
Then, it follows that:
3 log t
= Pr d̄n,f (t) < dn,f − Kn
 2(hn,f (t) + Hn,f )
5 
T−1
E[2 (t)] ≤ Kn Mn ≤ exp(−3 log t) = t−3 . (98)
2
t=0 n∈N
⎛ ⎞
  6T 2 log T Hence, we have
+ 2⎝ Kn Mn Lf ⎠ . (93)
T + Hmin   
n∈N f ∈F E φ3,n,f (t) ≤ Kn Xn,f (t)t−3 . (99)

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14703

Based on the above inequality, we have [4] Y. Jiang, M. Ma, M. Bennis, F.-C. Zheng, and X. You, “User preference
learning-based edge caching for fog radio access network,” IEEE Trans.

T−1
  
T−1  Commun., vol. 67, no. 2, pp. 1268–1283, Feb. 2018.

Lf E φ3,n,f (t) ≤ Kn Lf Xn,f (t)t−3 [5] S. Zhao, Z. Shao, H. Qian, and Y. Yang, “Online user-ap association
t=0 f ∈F with predictive scheduling in wireless caching networks,” in Proc. IEEE
t=tn,f +1 f ∈F
(1)
GLOBECOM, 2017, pp. 1–7.
[6] Y. Jiang, M. Ma, M. Bennis, F. Zheng, and X. You, “A novel caching

T−1
Mn t−3
policy with content popularity prediction and user preference learning
≤ Kn (100) in Fog-RAN,” in Proc. IEEE GLOBECOM Workshops, 2017, pp. 1–6.
(1) [7] S. Zhao, Y. Yang, Z. Shao, X. Yang, H. Qian, and C.-X. Wang,
t=tn,f +1
“FEMOS: Fog-enabled multitier operations scheduling in dynamic wire-
  (t) ≤ M . less networks,” IEEE Internet Things J., vol. 5, no. 2, pp. 1169–1183,
where the last inequality holds because f ∈F Lf Xn,f n
Apr. 2018.
Then, by (100), it follows that: [8] X. Gao, X. Huang, S. Bian, Z. Shao, and Y. Yang, “PORA: Predictive
∞ offloading and resource allocation in dynamic fog computing systems,”

T−1
  
T−1 
t−3 ≤ Kn Mn t−3
IEEE Internet Things J., vol. 7, no. 1, pp. 72–87, Jan. 2020.
Lf E φ3,n,f (t) ≤ Kn Mn [9] B. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-based
t=0 f ∈F (1)
t=tn,f +1 t=1 approach to caching in heterogenous small cell networks,” IEEE Trans.

! Commun., vol. 64, no. 4, pp. 1674–1686, Apr. 2016.
 [10] H. Pang, L. Gao, and L. Sun, “Joint optimization of data sponsoring and
≤ Kn Mn 1 + t−3 edge caching for mobile video delivery,” in Proc. IEEE GLOBECOM,
t=2
2016, pp. 1–7.
" # ∞ $ [11] F. Li, J. Liu, and B. Ji, “Combinatorial sleeping bandits with fairness
−3 3 constraints,” in Proc. IEEE INFOCOM, 2019, pp. 1702–1710.
≤ Kn Mn 1 + t dt = Kn Mn .
1 2 [12] P. Shivaswamy and T. Joachims, “Multi-armed bandit problems with
history,” in Proc. AISTATS, 2012, pp. 1046–1054.
(101)
[13] M. J. Neely, “Stochastic network optimization with application to com-
By (96) and (101), we have munication and queueing systems,” Synth. Lectures Commun. Netw.,
vol. 3, no. 1, pp. 1–211, 2010.

T−1 
 T−1   [14] J. Kwak, Y. Kim, L. B. Le, and S. Chong, “Hybrid content caching in 5G
E[3 (t)] ≤ Lf E φ3,n,f (t) wireless networks: Cloud versus edge caching,” IEEE Trans. Wireless
Commun., vol. 17, no. 5, pp. 3030–3045, May 2018.
t=0 n∈N t=0 f ∈F [15] Y. Wang, W. Wang, Y. Cui, K. G. Shin, and Z. Zhang, “Distributed
 3 packet forwarding and caching based on stochastic network utility
≤ Kn Mn . (102) maximization,” IEEE/ACM Trans. Netw., vol. 26, no. 3, pp. 1264–1277,
2 Jun. 2018.
n∈N
[16] J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offload-
Combining (69), (93), and (102), we obtain ing for mobile edge computing in dense networks,” in Proc. IEEE
! INFOCOM, 2018, pp. 207–215.

T−1 
T−1 
T−1
[17] P. Blasco and D. Gündüz, “Learning-based optimization of cache content
E[1 (t)] ≤ V E[2 (t)] + E[3 (t)] in a small cell base station,” in Proc. IEEE ICC, 2014, pp. 1897–1903.
t=0 t=0 t=0 [18] P. Blasco and D. Gündüz, “Multi-armed bandit optimization of cache
 content in wireless infostation networks,” in Proc. IEEE ISIT, 2014,
≤ 4V Kn Mn pp. 51–55.
n∈N [19] S. Müller, O. Atan, M. van der Schaar, and A. Klein, “Smart caching
⎛ ⎞ in wireless small cell networks via contextual multi-armed bandits,” in
  6T 2 log T Proc. IEEE ICC, 2016, pp. 1–7.
+ 2V ⎝ Kn Mn Lf ⎠ . [20] X. Zhang, G. Zheng, S. Lambotharan, M. R. Nakhai, and K.-K. Wong,
T + Hmin
n∈N f ∈F “A learning approach to edge caching with dynamic content library in
wireless networks,” in Proc. IEEE GLOBECOM, 2019, pp. 1–6.
(103) [21] J. Song, M. Sheng, T. Q. S. Quek, C. Xu, and X. Wang, “Learning-
based content caching and sharing for wireless networks,” IEEE Trans.
Substituting (103) into (60), we obtain a regret bound as Commun., vol. 65, no. 10, pp. 4309–4324, Oct. 2017.
follows: [22] X. Xu, M. Tao, and C. Shen, “Collaborative multi-agent multi-armed
 bandit learning for small-cell caching,” IEEE Trans. Wireless Commun.,
B 4 n∈N Kn Mn
Reg(T) ≤ + vol. 19, no. 4, pp. 2570–2585, Apr. 2020.
V ⎛ T ⎞ [23] S. Ajmal, M. B. Muzammil, A. Jamil, S. M. Abbas, U. Iqbal, and
P. Touseef, “Survey on cache schemes in heterogeneous networks using
  6 log T
+ 2⎝ Lf ⎠
5G Internet of Things,” in Proc. ACM ICFNDS, 2019, pp. 1–8.
Kn Mn (104)
T + Hmin [24] H. Pang, J. Liu, X. Fan, and L. Sun, “Toward smart and cooperative
n∈N f ∈F edge caching for 5G networks: A deep learning based approach,” in
 Proc. IEEE/ACM IWQoS, 2018, pp. 1–6.
where B = 1/2 n∈N (bn
2 + α 2 Mn2 ) and Hmin = minn,f Hn,f . [25] M. Chen, W. Saad, and C. Yin, “Echo-liquid state deep learning for 360◦
content transmission and caching in wireless VR networks with cellular-
connected UAVs,” IEEE Trans. Commun., vol. 67, no. 9, pp. 6386–6400,
R EFERENCES Sep. 2019.
[1] X. Gao, X. Huang, Y. Tang, Z. Shao, and Y. Yang, “Proactive cache [26] A. Ndikumana, N. H. Tran, D. H. Kim, K. T. Kim, and C. S. Hong,
placement with bandit learning in fog-assisted iot system,” in Proc. IEEE “Deep learning based caching for self-driving cars in multi-access edge
ICC, 2020, pp. 1–6. computing,” IEEE Trans. Intell. Transp. Syst., early access, Mar. 4, 2020,
[2] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role doi: 10.1109/TITS.2020.2976572.
of proactive caching in 5G wireless networks,” IEEE Commun. Mag., [27] Z. Liu, H. Song, and D. Pan, “Distributed video content caching policy
vol. 52, no. 8, pp. 82–89, Aug. 2014. with deep learning approaches for D2D communication,” IEEE Trans.
[3] L. Li, G. Zhao, and R. S. Blum, “A survey of caching techniques in Veh. Technol., vol. 69, no. 12, pp. 15644–15655, Dec. 2020.
cellular networks: Research issues and challenges in content placement [28] E. Baştuğ, M. Bennis, and M. Debbah, “A transfer learning approach
and delivery strategies,” IEEE Commun. Surveys Tuts., vol. 20, no. 3, for cache-enabled wireless networks,” in Proc. IEEE WiOpt, 2015,
pp. 1710–1732, 3rd Quart., 2018. pp. 161–166.

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14704 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021

[29] A. Sengupta, S. Amuru, R. Tandon, R. M. Buehrer, and T. C. Clancy, Xi Huang (Member, IEEE) received the B.Eng.
“Learning distributed caching strategies in small cell networks,” in Proc. degree from Nanjing University, Nanjing, China, in
ISWCS, 2014, pp. 917–921. 2014. He is currently pursuing the Ph.D. degree with
[30] A. Sadeghi, F. Sheikholeslami, A. G. Marques, and G. B. Giannakis, ShanghaiTech University, Shanghai, China.
“Reinforcement learning for adaptive caching with dynamic storage pric- Since September 2014, he has been with the
ing,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2267–2281, School of Information Science and Technology,
Oct. 2019. ShanghaiTech University. He was a visiting stu-
[31] Z. Zhu, T. Liu, S. Jin, and X. Luo, “Learn and pick right nodes to dent with the Department of Electrical Engineering
offload,” in Proc. IEEE GLOBECOM, 2018, pp. 1–6. and Computer Sciences, University of California at
[32] J. Yao and N. Ansari, “Energy-aware task allocation for mobile IoT by Berkeley, Berkeley, CA, USA, from February 2017
online reinforcement learning,” in Proc. IEEE ICC, 2019, pp. 1–6. to July 2017. His current research interests include
[33] A. Mukherjee, S. Misra, V. S. P. Chandra, and M. S. Obaidat, “Resource- the optimization and the design for intelligent networks.
optimized multiarmed bandit-based offload path selection in edge UAV
swarms,” IEEE Internet Things J., vol. 6, no. 3, pp. 4889–4896,
Jun. 2018.
[34] D. López-Pérez, A. Valcarce, G. De La Roche, and J. Zhang, “OFDMA
femtocells: A roadmap on interference avoidance,” IEEE Commun.
Mag., vol. 47, no. 9, pp. 41–48, Sep. 2009.
[35] O. Somekh, O. Simeone, Y. Bar-Ness, A. M. Haimovich, and S. Shamai,
“Cooperative multicell zero-forcing beamforming in cellular downlink Yinxu Tang received the B.Eng. degree from
channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3206–3219, ShanghaiTech University, Shanghai, China, where
Jul. 2009. she is currently pursuing the master’s degree with
[36] S. Verdu et al., Multiuser Detection. Cambridge, U.K.: Cambridge Univ. the School of Information Science and Technology.
Press, 1998. Her current research interests include bandit and
[37] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed ban- reinforcement learning, resource management, and
dit: General framework and applications,” in Proc. ICML, 2013, intelligent computing.
pp. 151–159.
[38] A. Slivkins, “Introduction to multi-armed bandits,” Found. Trends Mach.
Learn., vol. 12, nos. 1–2, pp. 1–286, 2019.
[39] S. Martello, D. Pisinger, and P. Toth, “Dynamic programming and strong
bounds for the 0–1 knapsack problem,” Manag. Sci., vol. 45, no. 3,
pp. 414–424, 1999.
[40] S. Martello, D. Pisinger, and P. Toth, “New trends in exact algorithms
for the 0–1 knapsack problem,” Eur. J. Oper. Res., vol. 123, no. 2, Ziyu Shao (Senior Member, IEEE) received the B.S.
pp. 325–332, 2000. and M.Eng. degrees from Peking University, Beijing,
[41] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction China, in 2001 and 2004, respectively, and the Ph.D.
to Algorithms. Cambridge, MA, USA: MIT Press, 2009. degree from the Chinese University of Hong Kong,
[42] S. Bian, X. Huang, Z. Shao, and Y. Yang, “Neural task scheduling Hong Kong, in 2010.
with reinforcement learning for fog computing systems,” in Proc. IEEE He is an Associate Professor with the School of
GLOBECOM, 2019, pp. 1–6. Information Science and Technology, ShanghaiTech
[43] J. Pei, P. Hong, M. Pan, J. Liu, and J. Zhou, “Optimal VNF placement University, Shanghai, China. He then worked as a
via deep reinforcement learning in SDN/NFV-enabled networks,” IEEE Postdoctoral Researcher with the Chinese University
J. Sel. Areas Commun., vol. 38, no. 2, pp. 263–278, Feb. 2020. of Hong Kong from 2011 to 2013. He was a Visiting
[44] B. Irie and S. Miyake, “Capabilities of three-layered perceptrons,” in Postdoctoral Researcher with the EE Department,
Proc. ICNN, 1988, pp. 641–648. Princeton University, Princeton, NJ, USA, in 2012. He was also a Visiting
[45] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gra- Professor with the EECS Department, University of California at Berkeley,
dient methods for reinforcement learning with function approximation,” Berkeley, CA, USA, in 2017. His current research interests center on
in Proc. NeurIPS, 2000, pp. 1057–1063. intelligent networks, including AI for networks and networks for AI.
[46] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of
the multiarmed bandit problem,” Mach. Learn., vol. 47, nos. 2–3,
pp. 235–256, 2002.
[47] D. Lee et al., “LRFU: A spectrum of policies that subsumes the least
recently used and least frequently used policies,” IEEE Trans. Comput.,
vol. 50, no. 12, pp. 1352–1361, Dec. 2001.
[48] J. Dilley and M. Arlitt, “Improving proxy cache performance: Analysis
of three replacement policies,” IEEE Internet Comput., vol. 3, no. 6, Yang Yang (Fellow, IEEE) received the B.S. and
pp. 44–50, Nov./Dec. 1999. M.S. degrees in radio engineering from Southeast
[49] G. Hinton, N. Srivastava, and K. Swersky, “Neural University, Nanjing, China, in 1996 and 1999,
networks for machine learning lecture 6a overview of mini- respectively, and the Ph.D. degree in information
batch gradient descent,” Lecture Notes CSC321, Univ. engineering from the Chinese University of Hong
Toronto, Toronto, ON, Canada, 2014. [Online]. Available: Kong, Hong Kong, in 2002.
http://www.cs.toronto.edu/∼tijmen/csc321/slides/lecture_slides_lec6.pdf He is currently a Full Professor with the School
[50] W. Hoeffding, “Probability inequalities for sums of bounded random of Information Science and Technology, Master
variables,” J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 409–426, 1963. of Kedao College, as well as the Director of
Shanghai Institute of Fog Computing Technology,
ShanghaiTech University, Shanghai, China. He
Xin Gao (Graduate Student Member, IEEE) is also an Adjunct Professor with the Research Center for Network
received the B.Eng. degree from the School of Communication, Peng Cheng Laboratory, Shenzhen, China, as well as a
Electronic Information and Communications, Senior Consultant of Shenzhen Smart City Technology Development Group,
Huazhong University of Science Technology, Shenzhen. Before joining ShanghaiTech University, he has held faculty posi-
Wuhan, China, in 2015. She is currently pursuing tions with the Chinese University of Hong Kong; Brunel University, Uxbridge,
the Ph.D. degree with the School of Information U.K.; the University College London, London, U.K.; and SIMIT, Chinese
Science and Technology, ShanghaiTech University, Academy of Sciences, Beijing, China. He has published more than 300 papers
Shanghai, China, Shanghai Institute of Microsystem and filed more than 80 technical patents in these research areas. His research
and Information Technology, Chinese Academy interests include fog computing networks, service-oriented collaborative intel-
of Sciences, Shanghai, China, and University of ligence, wireless sensor networks, IoT applications, and advanced testbeds and
Chinese Academy of Sciences, Beijing, China. experiments.
Her current research interests include bandit and reinforcement learning, Dr. Yang has been the Chair of the Steering Committee of Asia–Pacific
edge computing, fog computing, and Internet of Things. Conference on Communications since January 2019.

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.

You might also like