History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control
History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control
Abstract—In fog-assisted Internet-of-Things systems, it is a Index Terms—Fog computing, history-aware bandit learning,
common practice to cache popular content at the network edge to Internet of Things (IoT), learning-aided online control, proactive
achieve high quality of service. Due to uncertainties, in practice, caching.
such as unknown file popularities, the cache placement scheme
design is still an open problem with unresolved challenges: 1)
I. I NTRODUCTION
how to maintain time-averaged storage costs under budgets; 2)
URING recent years, the proliferation of Internet-of-
how to incorporate online learning to aid cache placement to
minimize performance loss [also known as (a.k.a.) regret]; and
3) how to exploit offline historical information to further reduce
D Things (IoT) devices such as smart phones and the
emerging of IoT applications such as video streaming have
regret. In this article, we formulate the cache placement problem
with unknown file popularities as a constrained combinatorial led to an unprecedented growth of data traffic [2], [3]. To
multiarmed bandit problem. To solve the problem, we employ address the concerns for increasing data traffic, many content
virtual queue techniques to manage time-averaged storage cost providers turn to cloud services for reliable content caching
constraints, and adopt history-aware bandit learning methods and delivery. However, the latency of accessing cloud services
to integrate offline historical information into the online learn- can violate the high Quality-of-Service (QoS) requirement of
ing procedure to handle the exploration–exploitation tradeoff.
With an effective combination of online control and history-aware IoT users due to congestions on backhaul links, especially at
online learning, we devise a cache placement scheme with history- peak traffic moments [4]. To mitigate the concerns, a promis-
aware bandit learning called CPHBL. Our theoretical analysis ing solution is to cache popular contents on fog servers (e.g.,
and simulations show that CPHBL achieves a sublinear time- base stations and routers with enhanced storage and comput-
averaged regret bound. Moreover, the simulation results verify ing capabilities) at the network edge in proximity to IoT users.
CPHBL’s advantage over the deep reinforcement learning-based
approach. In this way, the burden on backhaul links can be alleviated,
and the QoS in terms of the content delivery latency can be
improved [5]–[8]. Fig. 1 shows an example of wireless caching
in a multitier fog-assisted IoT system. As shown in the figure,
Manuscript received December 4, 2020; revised March 6, 2021; accepted by utilizing the storage resources on fog servers that are close
April 2, 2021. Date of publication April 9, 2021; date of current ver-
sion September 23, 2021. This work was supported in part by the to IoT devices, popular contents (e.g., files) can be cached
National Key Research and Development Program of China under Grant to achieve timely content delivery. Due to resource limit, each
2020YFB2104300; in part by the Nature Science Foundation of Shanghai edge fog server (EFS) can cache only a subset of files to serve
under Grant 19ZR1433900; and in part by the National Development and
Reform Commission of China (NDRC) under Grant “5G Network Enabled its associated IoT users. If a user’s requested file is found on
Intelligent Medicine and Emergency Rescue System for Giant Cities.” This the corresponding EFS [also known as (a.k.a.) a hit], then it
article was presented in part at the IEEE International Conference on can be downloaded directly; otherwise, the file needs to be
Communications (ICC), Jun. 2020, Dublin, Ireland. (Corresponding author:
Ziyu Shao.) fetched from the central fog server (CFS) in the upper fog tier
Xin Gao is with the School of Information Science and Technology, with extra bandwidth consumption and latency. Therefore, the
ShanghaiTech University, Shanghai 201210, China, also with the Shanghai key to maximize the benefits of caching in fog-assisted IoT
Institute of Microsystem and Information Technology, Chinese Academy
of Sciences, Shanghai 200050, China, and also with the University systems lies in the selection of a proper set of cached files
of Chinese Academy of Sciences, Beijing 100049, China (e-mail: (a.k.a. cache placement) on each EFS.
[email protected]). However, the effective design for cache placement remains
Xi Huang, Yinxu Tang, and Ziyu Shao are with the School of Information
Science and Technology, ShanghaiTech University, Shanghai 201210, as a challenging problem due to the uncertainty of file popular-
China (e-mail: [email protected]; [email protected]; ities in such systems. Specifically, as an important ingredient
[email protected]). for cache placement optimization, file popularities are usually
Yang Yang is with the Shanghai Institute of Fog Computing Technology,
ShanghaiTech University, Shanghai 201210, China, also with the Research unknown in practice [9]. Such information can only be inferred
Center for Network Communication, Peng Cheng Laboratory, Shenzhen implicitly from feedback information, such as cache hit sig-
518000, China, and also with the Shenzhen SmartCity Technology nals for user requests. Meanwhile, in practice, it is common
Development Group Company Ltd., Shenzhen 518046, China (e-mail:
[email protected]). for fog-assisted IoT systems to retain offline historical obser-
Digital Object Identifier 10.1109/JIOT.2021.3072115 vations (in terms of file request logs) on each EFS. Such offline
2327-4662
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14684 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14685
TABLE I
C OMPARISON B ETWEEN O UR W ORK AND R ELATED W ORKS
generally carried out from two perspectives: 1) the online users. By considering the problem as a CMAB problem, in [17],
control perspective and 2) the online learning perspective. they aimed to maximize the amount of served traffic through
Online Control-Based Cache Placement: Most works that wireless caching, while in [18], they further took file download-
take the online control perspective formulated cache place- ing costs into the account for optimization. Müller et al. [19]
ment problems as stochastic network optimization problems proposed a cache placement scheme based on contextual ban-
with respect to different metrics. For example, Pang et al. [10] dits, which learns the context-dependent content popularity to
jointly studied the cache placement and data sponsoring maximize the number of cache hits. Zhang et al. [20] studied
problems in mobile video content delivery networks. Their the network utility maximization problem in the context of
solution aimed to maximize the overall content delivery pay- cache placement with a nonfixed content library over time.
off with budget constraints on caching and delivery costs. Song et al. [21] proposed a joint cache placement and content
Kwak et al. [14] devised a dynamic cache placement scheme sharing scheme among cooperative caching units to maximize
to optimize service rates for user requests in a hierarchical the content caching revenue and minimize the content sharing
wireless caching network. Wang et al. [15] developed a joint expense. Xu et al. [22] modeled the problem of cache placement
traffic forwarding and cache placement scheme to optimize the with multiple caching units from the perspective of multiagent
queueing delay and energy consumption of caching-enabled multiarmed bandit (MAMAB) and devised an online scheme
networks. Xu et al. [16] proposed an online algorithm to to minimize the accumulated transmission delay over time.
jointly optimize wireless caching and task offloading with Such works generally do not consider the storage costs on
the goal of ultralow task computation delays under a long- EFSs in terms of memory footprint. In practice, without such a
term energy constraint. In general, such works adopted the consideration, caching files with excessively high storage costs
Lyapunov optimization method [13] to solve their formulated may offset the benefits of wireless caching. Moreover, none
problems through a series of per-time-slot adaptive control. of such works exploits offline historical information in their
Although the effectiveness of their solutions has been well learning procedures.
justified, they generally assumed that file popularities or file Novelty of Our Work: Overall, existing Lyapunov-based
requests are readily given prior to the cache placement process. online control works assume that all instantaneous system
Such assumptions are usually not the case in practice [9]. states are fully observable during the online decision-making
Online Learning-Based Cache Placement: Faced with con- process. However, this is often not the case in practice. In
stantly arriving file requests and unknown file popularities, a our scenario, instantaneous file demands are unknown before
number of works adopted various learning techniques, such the cache placement process, to which standard Lyapunov
as deep learning [24]–[27], transfer learning [9], [28], and optimization techniques do not apply. On the other hand,
reinforcement learning [17]–[22], [29], [30], to improve the existing bandit-based online learning works generally do not
performance of wireless caching networks. However, exist- consider stochastic time-averaged constraints on storage costs
ing solutions in such works cannot handle time-averaged in their models. The involvement of such constraints makes
constraints. Besides, they mainly resorted to time-consuming the cache placement problem more challenging and cannot be
offline pretraining and heuristic hyperparameter tuning to pro- coped with effectively by existing bandit learning techniques.
duce their solutions. Moreover, they generally provided no Different from existing works, we conduct an effective inte-
theoretical guarantee but limited insights for the resulting gration of online control, online learning, and offline historical
performance. information with a sophisticated scheme design and theoret-
Bandit learning is another method that is widely adopted to ical analysis. Accordingly, our formulation, scheme design,
promote the performance of such systems. So far, it has been and theoretical analysis are more sophisticated than standard
applied to solve scheduling problems, such as task offload- Lyapunov optimization and bandit learning techniques. To our
ing [31], task allocation [32], and path selection [33]. The best knowledge, we are the first to present a systematic study
most relevant to our work are those that consider optimizing on such a synergy technique. Our results also provide novel
proactive cache placement in terms of different performance insights to the designers of fog-assisted IoT systems. The com-
metrics. For example, Blasco and Gündüz [17], [18] studied the parison between our work and existing works is presented in
cache placement problem for a single caching unit with multiple Table I.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14686 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
TABLE II
K EY N OTATIONS EFS n, we define Kn (Kn ⊆ K, |Kn | = Kn ) as the set of
IoT users within its service range. Note that each IoT user is
served by one and only one EFS, and thus, the sets {Kn }n are
disjoint.
Particularly, we focus on the scenario in which IoT users
request to download files from EFSs. We assume that the CFS
has stored all of F files (denoted by set F {1, 2, . . . , F})
that could be requested within the time horizon. Each file f has
a fixed size of Lf storage units. Due to caching capacity limit,
each EFS n only has Mn units of storage to cache a portion
of the files and Mn < f ∈F Lf . Accordingly, if a user cannot
find its requested file on its associated EFS, it will request to
download the file directly from the CFS. We assume that the
CFS can provide simultaneous and independent file deliveries
to all EFSs and IoT users. An example that illustrates our
system model is shown in Fig. 1.
B. File Popularity
For each EFS n, we consider the popularity of each file f
as the expected number of IoT users in set Kn to request file
f per time slot, whose ground-truth value is denoted by dn,f .
In general, for each EFS, the popularities of different files
requested by its associated IoT users are different. Moreover,
for the same file, its popularity varies across different EFSs.
We assume that each file’s popularity remains constant within
the time horizon. In practice, such file popularities are usually
unknown a priori and can only be inferred based on feedback
information collected after user requests have been served.
Next, we introduce some variables to characterize user
III. S YSTEM M ODEL AND P ROBLEM F ORMULATION dynamics with respect to file popularity. We define binary
In this section, we describe our system model in detail. variable θk,f (t) ∈ {0, 1} such that θk,f (t) = 1 if the IoT user
Then, we present our problem formulations. Key notations in k requests for file f in time slot t and θk,f (t) = 0 otherwise.
this article are summarized in Table II. Then, we denote the file requests of IoT user k during time slot
t by vector θ k (t) (θk,1 (t), θk,2 (t), . . . , θk,F (t)). Meanwhile,
A. Basic Model we use Dn,f (t) k∈Kn θk,f (t) to denote the total number of
IoT users in set Kn who request for file f on EFS n in time
We consider a caching-enabled fog-assisted IoT system that
slot t. Note that Dn,f (t) is a discrete random variable over
operates over a finite time horizon of T time slots. In the
a support set {0, 1, . . . , Kn } and assumed to be independent
system, one CFS and N EFSs cooperate to serve K IoT
identically distributed (i.i.d.) across time slots with the mean
users. The fog servers and IoT users communicate with each
of dn,f .
other through wireless connections. We assume that orthog-
Besides, we assume that initially (i.e., t = 0), each EFS is
onal frequency-division multiple access (OFDMA) [34] is
provided with a fixed set of offline historical observations with
employed as the underlying wireless transmission mechanism,
respect to the number of requests for each file. Specifically,
integrated with interference management techniques, such as
the offline historical observations for file f on EFS n are
zero-forcing beamforming [35] and signal processing [36].
denoted by {Dhn,f (0), Dhn,f (1), . . . , Dhn,f (Hn,f − 1)}, where we
Based on such an assumption, we ignore the co-channel
define Hn,f ≥ 0 as the number of offline historical observa-
interference among fog servers and IoT users, abstract the
tions about file f on EFS n. When Hn,f = 0, there is no offline
physical-layer wireless links as bit pipes, and focus on the
historical information. Let Dhn,f (s) denote the sth offline his-
network-layer data communications between servers and IoT
torical observation. Here, we use superscript h to indicate that
users.1 We denote the sets of EFSs and users by N
Dhn,f (s) belongs to offline historical information. Note that such
{1, 2, . . . , N} and K {1, 2, . . . , K}, respectively. For each
observations are given as prior information when t = 0. Their
1 In practice, the co-channel interference cannot be completely eliminated. values are assumed to follow the same distribution as the file
In our model, we ignore such interference for the tractability of theoretical popularities over the time horizon.
analysis. Specifically, the cache placement problem considered in this work
is challenging due to its stochastic settings and uncertainties in environment
states. To jointly consider the caching problem on both the network layer and C. System Workflow
physical layer, the algorithm design and theoretical analysis would become
even more challenging, which remains an open problem. Such a cross-layer During each time slot t, the system operates across two
scheme design would be an interesting direction for future work. phases: 1) the caching phase and 2) the service phase.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14687
1) Caching Phase: The caching phase starts from the begin- G. Problem Formulation
ning of the time slot. In this phase, each EFS n updates To achieve effective cache placement with a high QoS, two
its cached files and consumes a storage cost for each goals are considered in our work. One is to maximize the total
cached file. Then, each EFS n broadcasts its cache size of transmitted files from all EFSs so that requests from
placement to all IoT users in set Kn . IoT users can obtain timely services. In our model, this is
2) Service Phase: The service phase begins from the end equivalent to maximizing the time-averaged cache hit reward
of the caching phase and lasts until the end of the time of all EFSs over a time horizon of T time slots. The other
slot. In this phase, each IoT user generates file requests. is to guarantee a budgeted usage of storage costs over time.
For each request, if it is not cached on the EFS, then To this end, for each EFS n, we first define bn as the storage
the user will fetch the file from the CFS. Otherwise, the cost budget for caching files. Then, we impose the following
user directly downloads the file from the EFS and the constraint to ensure the time-averaged storage costs under the
EFS will receive a corresponding cache hit reward. budget in the long run:
In the next few sections, we present the definitions of cache
1
T−1
placement decisions, storage costs, and cache hit rewards,
respectively. lim sup E[Cn (t)] ≤ bn ∀n ∈ N . (5)
T→∞ T
t=0
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14688 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
1
T−1
updates its cache placement in the current time slot. After the
Reg(T) R∗ − E R̂n (Xn (t)) (7)
update, each EFS delivers requested cached files to IoT users. T
t=0 n∈N
For each cache hit, a reward will be credited to the EFS.
In the following sections, we extend the settings of the where constant R∗ is defined as the optimal time-averaged
existing CMAB model and demonstrate the reformulation of total expected reward for all players. In fact, maximizing the
problem (6) under such settings. Then, we articulate our algo- time-averaged expected reward is equivalent to minimizing the
rithm design with respect to online learning and online control regret. Therefore, we can rewrite problem (6) as follows:
procedures, respectively. Finally, we discuss the computational
complexity of our devised algorithm. minimize Reg(T) (8a)
{X(t)}t
subject to (1), (5), (6b). (8b)
If player n chooses to play arm f ∈ F in time slot t, then file f where the last equality holds due to the independence between
will be cached on EFS n and a reward Rn,f (t) = Lf Dn,f (t) will user demand Dn,f (t) and cache placement Xn,f (t), and the fact
be received by the player. Recall that the file demand Dn,f (t) that E[Dn,f (t)] = dn,f . By (9) and our previous analysis, to
during each time slot t is a random variable with an unknown solve problem (8), each EFS n should learn the unknown file
mean dn,f and is i.i.d. across time slots. Accordingly, reward popularity dn,f with respect to each file f .
Rn,f (t) is also an i.i.d. random variable with an unknown During each time slot t, after updating cached files accord-
mean rn,f = E[Rn,f (t)] = Lf dn,f . Meanwhile, the cache place- ing to decision Xn (t), each EFS n observes the current demand
ment decision Xn (t) = (Xn,1 (t), Xn,2 (t), . . . , Xn,F (t)) of EFS Dn,f (t) for each cached file f . Then, EFS n transmits requested
n corresponds to the arm selection of player n in time slot t. files to IoT users and acquires cache hit rewards. Based on the
Specifically, Xn,f (t) = 1 if arm f is chosen and Xn,f (t) = 0 oth- pregiven offline historical information and cache hit feedback
erwise. Our goal is to devise an arm selection scheme for the from IoT users, we have the following estimate for each file
players to maximize their expected cumulative rewards subject popularity dn,f :
to the constraints in (1) and (5).
Remark: Our model extends the settings of the bandit model 3 log t
d̃n,f (t) = min d̄n,f (t) + Kn , Kn . (10)
proposed by [11] in the following four aspects. First, we 2 hn,f (t) + Hn,f
consider multiple players instead of one player. Second, the
storage cost constraints in our problem are more challenging In (10), d̄n,f (t) is the empirical mean of the number of requests
to handle than the arm fairness constraints in [11]. Specifically, for file f that involves both offline historical observations and
under our settings, the selection of each arm for a player collected online feedbacks; hn,f (t) counts the number of time
is coupled together under storage cost constraints, whereas slots (within the first t time slots) during which file f is cho-
in [11], there is no such coupling among arm selections. Third, sen to be cached on EFS n; and Kn denotes the number of
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14689
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14690 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14691
V. P ERFORMANCE A NALYSIS the value of V, the more sensitive CPHBL would be to the
For each EFS n, given the number Kn of its served users increase in the storage costs. As a result, each EFS would con-
and its storage capacity Mn , as well as the size Lf of each file stantly update its cached file set with files of different storage
f ∈ F, we establish the following two theorems to characterize costs, leading to inferior cache hit rewards. In practice, the
the performance of CPHBL. selection of the value of V depends on the design tradeoff of
real systems.
Remark 6: The last two terms of the regret bound are in the
A. Storage Cost Constraints
order of O(1/T + (log T)/(T + Hmin )). These two terms are
A budget vector b = (b1 , b2 , . . . , bN ) of storage costs is mainly incurred by the online learning procedure with offline
said to be feasible if there exists a feasible cache placement historical information and collected online feedback. In the
scheme under which all storage cost constraints in (5) can be following, we first consider the impact of Hmin on the regret
satisfied. We define the set of all feasible budget vectors as the bound under a fixed value of time horizon length T. Note
maximal feasibility region of the system, which is denoted by that when Hmin = 0, our problem degenerates to the special
the set B. The following theorem shows that all virtual queues case without offline historical information, as considered in our
are strongly stable under CPHBL when b is an interior point previous work [1]. Inthis case, the whole regret bound is in
of B. the order of O(1/V + (log T)/T). When the offline historical
Theorem 1: Suppose that the budget vector b lies in the information is available (i.e., Hmin > 0), the regret bound
interior of B, then the time-averaged storage cost constraints would be even lower. Specifically, we consider the following
in (5) are satisfied under CPHBL. Moreover, the virtual queues four cases under a fixed value of T.2
defined in (13) are strongly stable and satisfy 1) The first case is when Hmin = O(1), i.e., a constant value
unrelated to T. Compared to the scenario without offline
1
T−1
B + V n∈N 2Kn Mn
lim sup E[Qn (t)] ≤ (17) historical information, though the value of regret bound
T→∞ T reduces in this case, its order remains to be O(1/V +
t=0 n∈N
where is a positive constant, which satisfies that γ − 1 [1 (log T)/T).
is the (M + 1)-dimensional all-ones vector] is still an interior 2) The second case is when Hmin = (T), i.e., the number
point of the maximal of offline historical observations is comparable to the
feasibility region, and the parameter B length of time horizon. In thiscase, the regret bound is
is defined as B n∈N (b2n + α 2 Mn2 )/2.
The proof of Theorem 1 is given in Appendix B. still in the order of O(1/V + (log T)/T).
Remark 4: Theorem 1 shows that CPHBL ensures the 3) The third case is when Hmin = (T log T). In this case,
stability of virtual queue backlogs {Qn (t)}n . Moreover, the under a sufficiently great length of√time horizon T, the
time-averaged total backlog size of such virtual queues is lin- regret bound approaches O(1/V + 1/T).
early proportional to the value of parameter V. In other words, 4) The fourth case is when Hmin = (T 2 log T), i.e.,
given that vector b is interior to the maximal feasibility region, there is adequate offline historical information. In this
under CPHBL, the time-averaged total storage cost is tunable case, each EFS proactively leverages offline histori-
and guaranteed to be under the given budget. cal information to acquire highly accurate estimations
on file popularities. As a result, the last term in the
regret bound becomes even smaller, and the second term
B. Regret Bound
becomes dominant. Therefore, the order of the regret
Our second theorem provides an upper bound for the regret decreases to O(1/V + 1/T).
incurred by CPHBL over time. When it comes to the impact of time horizon length T, the
Theorem 2: Under CPHBL, the regret (7) over time horizon regret bound decreases and approaches B/V as the value of T
T is upper bounded as follows: increases. In summary, given a longer time horizon length and
more historical information (i.e., larger values of T and Hmin ),
B 4 n∈N Kn Mn log T
Reg(T) ≤ + +Γ (18) CPHBL achieves a better regret performance. Such results are
V T T + Hmin also verified by numerical simulation in Section VII-C.
in which we define the
constants B n∈N (b2n + α 2 Mn2 )/2
VI. DRL-BASED B ENCHMARK D ESIGN
and Γ 2 n∈N Kn 6Mn f ∈F Lf . Here, T is the time hori-
In recent years, DRL has been widely adopted in various
zon length and Hmin minn,f Hn,f is the minimal number of
fields to conduct goal-directed learning and sequential decision
offline historical observation among all EFSs and files.
making [42], [43]. It deals with agents that learn to make better
The proof of Theorem 2 is given in Appendix C.
sequential decisions by interacting with the environment with-
Remark 5: In (18), the term B/V is mainly incurred by bal-
out complicated modeling and too much domain knowledge
ancing the cache hit reward and the storage cost constraints.
requirement. In this section, to compare our scheme CPHBL
Intuitively, the larger the value of V, the more focus CPHBL
with DRL-based approaches, we propose a novel cache place-
puts on maximizing cache hit rewards, and hence, a smaller
ment scheme with DRL called CPDRL as a baseline for
regret. Nonetheless, this also comes with an increase in the
evaluation.
total size of virtual queue backlogs, which is unfavorable for
keeping storage costs under the budget. In contrast, the smaller 2 The notations O, , and are all asymptotic notations introduced in [41].
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14692 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
Fig. 5. Design of the policy network for agent n. The cache placement
scheme of agent n is designed as an FNN with one hidden layer of dimension
512, followed by a ReLU activation function.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14693
Algorithm 2 CPDRL
1: Initialize X n (−1) = 0 and the policy network πθn for each EFS
n ∈ N.
2: for each time slot t ∈ {0, 1, · · · , T − 1} do
3: for each EFS n ∈ N do
%Cache Placement
4: Observe state Sn (t) ← Xn (t − 1).
5: Sample a candidate action (f , Xn,f (t)) from A according
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14694 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
(a)
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14695
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14696 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
A PPENDIX A
A LGORITHM D EVELOPMENT
We define a Lyapunov function as follows:
1
L(Q(t)) (Qn (t))2 (19)
2
(b) n∈N
1
T−1
VIII. C ONCLUSION
R∗ = E R̂n X∗n (t) . (24)
In this article, we considered the cache placement problem T
t=0 n∈N
with unknown file popularities in caching-enabled fog-assisted
IoT systems. By formulating the problem as a constrained According to (7), the regret of the cache placement scheme
{X(t)}t over T time slots is
1
6 In Fig. 10(a), we provide a curve of 38 + 300 (log T)/T as an envelope T−1
of O(1/V + (log T)/T) for illustration. Note that since V is fixed, 1/V can Reg(T) = E R̂n X∗n (t) − R̂n (Xn (t)) . (25)
T
be viewed as a constant term. t=0 n∈N
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14697
By the definition of reward R̂n (·) in (4), it follows that: that dn,f , d̃n,f (t) ∈ [0, Kn ]
where inequality (a) holds because
and inequality (b) is due to that f ∈F Lf Xn,f (t) ≤ Mn . Then,
1
T−1
∗ it follows that:
Reg(T) = Lf E Dn,f (t)Xn,f (t)
T
t=0 n∈N f ∈F V (Q(t)) ≤ B + VKn Mn
n∈N
− E Dn,f (t)Xn,f (t) . (26) ⎡ ⎤
Since the cache placement decision Xn,f (t) is determined when + VE⎣ ∗
Lf dn,f Xn,f (t)|Q(t)⎦
Dn,f (t) is unknown, Xn,f (t) is independent of Dn,f (t). On the n∈N f ∈F
∗ (t) is i.i.d. over time slots and it is also
other hand, Xn,f
independent of Dn,f (t). Then, by E[Dn,f (t)] = dn,f , we have +E Qn (t)(Cn (t) − bn )|Q(t)
n∈N
⎡ ⎤
1
T−1
∗
Reg(T) = Lf dn,f E Xn,f (t) − Xn,f (t) . (27) − VE⎣ Lf d̃n,f (t)Xn,f (t)|Q(t)⎦. (33)
T
t=0 n∈N f ∈F n∈N f ∈F
We define the one-time-slot regret in each time slot t as Substituting (2) and (4) into above inequality, we have
Reg (t)
∗
Lf dn,f Xn,f (t) − Xn,f (t) . (28) V (Q(t)) ≤ B + VKn Mn − Qn (t)bn
n∈N f ∈F n∈N n∈N
⎡ ⎤
The regret Reg(T) can be expressed as + VE⎣ ∗
Lf dn,f Xn,f (t)|Q(t)⎦
n∈N f ∈F
1
T−1
⎡ ⎤
Reg(T) = E Reg (t) . (29)
T − E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦ (34)
t=0
n∈N f ∈F
Then, we define the Lyapunov drift-plus-regret as
where w̃n,f (t) is defined as
V (Q(t)) L(Q(t)) + VE Reg (t)|Q(t) . (30)
w̃n,f (t) Lf V d̃n,f (t) − αQn (t) . (35)
By (21) and (28), it follows that:
⎡ ⎤ To minimize the upper bound of drift-plus-regret V (Q(t))
in (34), we switch to solving the following problem in each
V (Q(t)) ≤ B + VE⎣ (t)|Q(t)⎦
∗
Lf dn,f Xn,f time slot t:
n∈N f ∈F
maximize w̃n,f (t)Xn,f (t)
X(t)
+E Qn (t)(Cn (t) − bn )|Q(t) n∈N f ∈F
n∈N Lf Xn,f (t) ≤ Mn ∀n ∈ N
⎡ ⎤ subject to
f ∈F
− VE⎣ Lf dn,f Xn,f (t)|Q(t)⎦. (31) Xn,f (t) ∈ {0, 1} ∀n ∈ N , f ∈ F. (36)
n∈N f ∈F
In fact, problem (36) can be further decoupled into N subprob-
Since d̃n,f (t) is the HUCB1 estimate of dn,f in time slot t such lems. For each EFS n ∈ N , we solve the following subproblem
that d̃n,f (t) ∈ [0, Kn ], we have for the cache placement vector Xn (t) in time slot t:
Lf dn,f Xn,f (t) = Lf d̃n,f (t)Xn,f (t) maximize w̃n,f (t)Xn,f (t)
Xn (t)
n∈N f ∈F n∈N f ∈F f ∈F
+ Lf dn,f − d̃n,f (t) Xn,f (t) subject to Lf Xn,f (t) ≤ Mn
n∈N f ∈F f ∈F
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14698 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
= B− bn Qn (t) the condition Q(t) in the conditional expectation shown in the
n∈N right-hand side of (41). Then, it follows that:
⎡ ⎤
+ E ⎣V Lf d̃n,f (t)Xn,f (t)|Q(t)⎦ L(Q(t)) ≤ B + V Kn Mn
n∈N
⎡
n∈N f ∈F
⎤
+ Qn (t) E[Ĉn (Xn (t)] − bn . (42)
− E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦. (38) n∈N
n∈N f ∈F
By (40), we have
Since f ∈F Lf d̃n,f (t)Xn,f (t) ≤ Kn f ∈F Lf Xn,f (t) ≤ Kn Mn ,
we have L(Q(t)) ≤ B + V Kn Mn − Qn (t). (43)
n∈N n∈N
L(Q(t)) ≤ B + V Kn Mn − bn Qn (t)
n∈N n∈N
Taking expectation at both sides of the above inequality and
⎡ ⎤ summing it over time slots {0, 1, . . . , T − 1}, we have
− E⎣ w̃n,f (t)Xn,f (t)|Q(t)⎦. (39)
E L(Q(T)) − E L(Q(0))
n∈N f ∈F
T−1
Next, we consider the following lemma. ≤ TB + TV Kn Mn − E[Qn (t)].
Lemma 1: If the budget vector b is an interior point of n∈N t=0 n∈N
the maximal feasible region B, then there exists a feasible (44)
scheme, which makes i.i.d. decision over time independent of
the virtual queue backlog sizes. We divide at both sides by T and rearrange the terms. Then,
The proof is omitted since it is quite standard as shown in by the fact L(Q(0)) = 0 and L(Q(T)) ≥ 0, we have
the proof of [11, Lemma 1]. Based on Lemma 1, we begin
1
T−1
to prove Theorem 1. By our assumption in Theorem 1 that b B+V n∈N Kn Mn
E[Qn (t)] ≤ . (45)
is an interior point of B, there must exist some > 0 such T
t=0 n∈N
that b − 1 is also an interior point of B. Here, 1 denotes
the N-dimensional all-ones vector. Then, by Lemma 1, since By taking the limsup of the left-hand side term in the above
b − 1 lies in the interior of B, there exists a feasible scheme inequality as T → ∞, we obtain
that makes i.i.d. decision over time independent of the virtual
1
T−1
queue backlog sizes such that B+V n∈N Kn Mn
lim sup E[Qn (t)] ≤ . (46)
E Ĉn Xn (t) ≤ bn − ∀n ∈ N , t (40) T→∞ T
t=0 n∈N
T−1
where X (t) (X1 (t), X2 (t), . . . , XN (t)) is the cache This implies that lim supT→∞ 1/T t=0 E[Qn (t)] < ∞ and
placement decision vector during time slot t under the the virtual queueing process {Qn (t)}t defined in (13) is strongly
scheme. We denote the cache placement decision vector dur- stable for each EFS n ∈ N . Hence, the time-averaged storage
ing time slot t under our scheme CPHBL by Xc (t) cost constraints in (5) are satisfied.
(Xc1 (t), Xc2 (t), . . . , XcN (t)), which is the optimal solution of
problem (36). Then, based on (39), we have
A PPENDIX C
L(Q(t)) ≤ B + V Kn Mn − bn Qn (t) P ROOF OF T HEOREM 2
⎡ n∈N n∈N
⎤ By Lemma 1, since b lies in the interior of B, there exists
an optimal scheme, which makes i.i.d. decision over time
− E⎣
w̃n,f (t)Xn,f (t)|Q(t)⎦ independent of the virtual queue backlog sizes such that
n∈N f ∈F
= B+V Kn Mn − V
Lf d̃n,f (t)Xn,f (t) E Ĉn X∗n (t) ≤ bn ∀n ∈ N , t (47)
n∈N n∈N f ∈F
where X∗ (t) (X∗1 (t), X∗2 (t), . . . , X∗N (t)) is the cache place-
+E Qn (t) Ĉn Xn (t) − bn |Q(t) ment decision vector in time slot t under the optimal scheme.
n∈N By the inequality in (21) and definition (28), under CPDBL,
we have
≤ B+V Kn Mn
n∈N L(Q(t + 1)) − L(Q(t)) + V Reg (t)
+E Qn (t) Ĉn Xn (t) − bn |Q(t) (41) ≤B+ Qn (t) Ĉn Xcn (t) − bn
n∈N n∈N
∗
where the last inequality is using the fact that w̃n,f (t) ≥ 0 and −V Lf dn,f Xn,f (t) − Xn,f
c
(t) . (48)
(t) ≥ 0. Since X (t) is independent of Q(t), we can drop
Xn,f n∈N f ∈F
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14699
The inequality above can be equivalently written as The last equality holds because that Ĉn (X∗n (t)) is independent
of Q(t). By the inequalities in (47), it follows that:
L(Q(t + 1)) − L(Q(t)) + V Reg (t)
E L(Q(t + 1)) − L(Q(t))|Q(t)
≤B+ Qn (t) Ĉn X∗n (t) − bn
+ VE Reg (t)|Q(t) ≤ B + E 1 (t)|Q(t) . (57)
n∈N
⎛ ⎞
Taking expectation at both sides of above inequality, we have
+ ⎝ ∗
VLf dn,f Xn,f (t) − Qn (t)Ĉn (X∗n (t))⎠
n∈N f ∈F
E L(Q(t + 1)) − L(Q(t)) + VE Reg (t) ≤ B + E[1 (t)].
⎛ ⎞ (58)
− ⎝ c
VLf dn,f Xn,f (t) − Qn (t)Ĉn (Xcn (t))⎠. (49) Summing above inequality over time slots {0, 1, . . . , T − 1}
n∈N f ∈F and dividing it at both sides by TV, we have
Substituting (2) into the above inequality, we have
1
T−1
E L(Q(T)) E L(Q(0))
− + E Reg (t)
L(Q(t + 1)) − L(Q(t)) + V Reg (t) TV TV T
t=0
≤B+ Qn (t) Ĉn X∗n (t) − bn B 1
T−1
≤ + E[1 (t)]. (59)
n∈N V TV
t=0
∗
+ Lf Vdn,f − αQn (t) Xn,f (t)
n∈N f ∈F
Since L(Q(0)) and L(Q(T)) are both nonnegative, it follows
that:
− Lf Vdn,f − αQn (t) Xn,f
c
(t). (50)
1
T−1
n∈N f ∈F
Reg(T) = E Reg (t)
T
For each EFS n ∈ N and each file f ∈ F, we define t=0
1
T−1
B
wn,f (t) Lf Vdn,f − αQn (t) . (51) ≤ + E[1 (t)]. (60)
V TV
t=0
Then, inequality (50) can be simplified as Next, we bound 1 (t) to obtain the upper bound of the regret
Reg(T).
L(Q(t + 1)) − L(Q(t)) + V Reg (t)
≤B+ Qn (t) Ĉn X∗n (t) − bn A. Bounding 1 (t)
n∈N
Consider a cache placement scheme that makes a placement
∗
+ wn,f (t) Xn,f (t) − Xn,f
c
(t) . (52) decision during each time slot t, denoted by vector X (t)
n∈N f ∈F (X1 (t), X2 (t), . . . , XN (t)) with each entry Xn (t) as the optimal
solution of the following problem:
For simplicity of expression, we define
maximize wn,f (t)Xn,f (t)
∗ Xn (t)
1 (t) wn,f (t) Xn,f (t) − Xn,f
c
(t) . (53) f ∈F
n∈N f ∈F
subject to Lf Xn,f (t) ≤ Mn
It follows that: f ∈F
Xn,f (t) ∈ {0, 1} ∀f ∈ F. (61)
L(Q(t + 1)) − L(Q(t)) + V Reg (t)
Since X∗n (t) is a feasible solution of problem (61), we have
≤ B + 1 (t) + Qn (t) Ĉn (X∗n (t)) − bn . (54)
∗
n∈N wn,f (t)Xn,f (t) ≥ wn,f (t)Xn,f (t). (62)
f ∈F f ∈F
Taking conditional expectation at both sides of the above
inequality, we have It follows that:
1 (t) = ∗
wn,f (t) Xn,f (t) − Xn,f
c
(t)
E L(Q(t + 1)) − L(Q(t))|Q(t) + VE Reg (t)|Q(t)
n∈N f ∈F
≤ B + E 1 (t)|Q(t)
≤
wn,f (t) Xn,f (t) − Xn,f
c
(t)
∗ n∈N f ∈F
+E Qn (t) Ĉn (Xn (t)) − bn |Q(t) (55)
n∈N
≤ wn,f (t) Xn,f (t) − Xn,f
c
(t)
= B + E 1 (t)|Q(t) n∈N f ∈F
+ Qn (t) E Ĉn X∗n (t) − bn . (56) + c
w̃n,f (t) Xn,f
(t) − Xn,f (t) . (63)
n∈N n∈N f ∈F
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14700 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
The last inequality holds since Xcn (t) is the optimal solution of Then, we rewrite the upper bound of 2 (t) in (70) as
problem (14) but Xn (t) is only a feasible solution. Rearranging
the right-hand side of (63), we obtain 2 (t) ≤ Lf φ2,n,f (t). (72)
n∈N f ∈F
1 (t) ≤ w̃n,f (t) − wn,f Xn,f
c
(t) (1)
n∈N f ∈F
Let tn,f be the index of the first time slot in which file f is
+
wn,f − w̃n,f (t) Xn,f (t). (64) on EFS n. We define event Un,f (t) {d̄n,f (t) − dn,f >
cached
Kn ([3 log t]/[2(hn,f (t) + Hn,f )])} for each n ∈ N and f ∈ F.
n∈N f ∈F
Summing φ2,n,f (t) over time slots {0, 1, . . . , T − 1}, it turns
By (35) and (51), we have out that
T−1
T−1
w̃n,f (t) − wn,f = Lf V d̃n,f (t) − αQn (t) − Lf Vdn,f − αQn (t) φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
t=0 t=0
= VLf d̃n,f (t) − dn,f . (65)
T−1
≤ Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
Substituting (65) into (64), we obtain (1)
t=tn,f +1
1 (t) ≤ VLf d̃n,f (t) − dn,f Xn,f
c
(t)
T−1
n∈N f ∈F = Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t)
(1)
t=tn,f +1
+ VLf dn,f − d̃n,f (t) Xn,f (t). (66)
n∈N f ∈F × 1 Un,f (t) + 1 Un,f
c
(t)
Next, we define
T−1
= Kn Xn,f
c
(t) + d̃n,f (t) − dn,f Xn,f
c
(t)
2 (t) Lf d̃n,f (t) − dn,f Xn,f
c
(t) (67) (1)
t=tn,f +1
n∈N f ∈F ×1 Gn,f (t) ∩ Un,f (t)
and
T−1
+ d̃n,f (t) − dn,f Xn,f
c
(t)
3 (t) Lf dn,f − d̃n,f (t) Xn,f (t). (68) (1)
t=tn,f +1
n∈N f ∈F
×1 Gn,f (t) ∩ Un,f
c
(t) . (73)
Then, the upper bound of 1 (t) in (66) can be rewritten as
Next, we define
1 (t) ≤ V(2 (t) + 3 (t)). (69)
(1)
φ2,n,f (t) d̃n,f (t) − dn,f Xn,f
c
(t)
In the following sections, we obtain the upper bounds of 2 (t)
and 3 (t), respectively, to bound 1 (t). × 1 Gn,f (t) ∩ Un,f (t) (74)
and
B. Bounding 2 (t)
(2)
φ2,n,f (t) d̃n,f (t) − dn,f Xn,f
c
(t)
To derive the upper bound of 2 (t), we define the event
Gn,f (t) {d̃n,f (t) ≥ dn,f } for each n ∈ N and f ∈ F. Then, × 1 Gn,f (t) ∩ Un,fc
(t) . (75)
we have
Then, we rewrite inequality (73) as the following equivalent
2 (t) = Lf d̃n,f (t) − dn,f Xn,f
c
(t) form:
n∈N f ∈F
T−1
T−1
(1)
T−1
(2)
× 1{Gn,f (t)} + 1{Gcn,f (t)} φ2,n,f (t) ≤ Kn Xn,f
c
(t) + φ2,n,f (t) + φ2,n,f (t).
t=0 (1)
t=tn,f +1
(1)
t=tn,f +1
= Lf d̃n,f (t) − dn,f Xn,f c
(t)1{Gn,f (t)}
(76)
n∈N f ∈F
(1)
+ Lf d̃n,f (t) − dn,f Xn,f
c
(t)1{Gcn,f (t)} By (76), to bound φ2,n,f (t), we switch to bounding φ2,n,f (t)
(2)
n∈N f ∈F and φ2,n,f (t). In the following, we derive upper bounds for
such two terms, respectively.
≤ Lf d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t) . (70) (1)
First, we bound T−1(1) φ2,n,f (t). According to (74), we
n∈N f ∈F t=tn,f +1
have
The last inequality holds since when event Gcn,f (h) occurs, we
T−1
T−1
have d̃n,f (t) < dn,f and (d̃n,f (t) − dn,f )1{Gcn,f (t)} < 0. Next, (1)
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)
we define (1) (1)
t=tn,f +1 t=tn,f +1
φ2,n,f (t) d̃n,f (t) − dn,f Xn,f
c
(t)1 Gn,f (t) . (71) × 1 Gn,f (t) ∩ Un,f (t) . (77)
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14701
∞
!
When event Un,f (t) = {d̄n,f (t) − dn,f >
−3
= Kn Mn 1 + t
Kn ([3 log t]/[2(hn,f (t) + Hn,f )])} occurs, we consider
n∈N t=2
the following two cases. " # ∞ $
3
1) If d̃n,f (t) = min{d̄n,f (t) + ≤ Kn Mn 1 + t−3 dt = Kn Mn . (82)
Kn ([3 log t]/[2(hn,f (t) + Hn,f )]), Kn } = Kn , then 1 2
n∈N n∈N
d̃n,f (t) ≥ dn,f , i.e., event Gn,f (t) occurs. T−1 (2)
2) If d̃n,f (t) = min{d̄n,f (t) + Next, we consider the upper bound of (1) φ2,n,f (t).
t=tn,f +1
Kn ([3 log t]/[2(hn,f (t) + H n,f )]), K n } = According to (75), we have
d̄n,f (t) + Kn ([3 log t]/[2(hn,f (t) + Hn,f )]),
T−1
T−1
then event Gn,f (t) still occurs, i.e., d̃n,f (t) > (2)
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)
dn,f + 2Kn ([3 log t]/[2(hn,f (t) + Hn,f )]). (1) (1)
t=tn,f +1 t=tn,f +1
Therefore, we have Un,f (t) ⊂ Gn,f (t), or equivalently,
1{Gn,f (t) ∩ Un,f (t)} = 1{Un,f (t)}. It follows that: × 1 Gn,f (t) ∩ Un,f
c
(t) . (83)
T−1
T−1 c (t) occurs, we have
(1) When event Un,f
φ2,n,f (t) = d̃n,f (t) − dn,f Xn,f
c
(t)1 Un,f (t) .
(1) (1)
t=tn,f +1 t=tn,f +1
3 log t
(78) d̃n,f (t) = min d̄n,f (t) + Kn , Kn
2 hn,f (t) + Hn,f
Since d̃n,f (t), dn,f ∈ [0, Kn ], we have d̃n,f (t)−dn,f ≤ Kn . Then,
3 log t
we have ≤ d̄n,f (t) + Kn (84)
2 hn,f (t) + Hn,f
T−1
(1)
T−1
φ2,n,f (t) ≤ c
Kn Xn,f (t)1 Un,f (t) . (79) thus
(1)
t=tn,f +1
(1)
t=tn,f +1
d̃n,f (t) − dn,f = d̃n,f (t) − d̄n,f (t)
Taking expectation of (79) at both sides, we have
3 log t
T−1
(1)
+ d̄n,f (t) − dn,f ≤ 2Kn . (85)
E φ2,n,f (t) 2 hn,f (t) + Hn,f
t=tn,f +1
Then, by (85) and Xn,f (t) ≤ 1, we have
T−1
≤ Kn E[Xn,f
c
(t)] Pr Un,f (t)
T−1
(2)
(1)
t=tn,f +1
φ2,n,f (t)
(1)
t=tn,f +1
T−1
= Kn E[Xn,f
c
(t)]
T−1
3 log t
(1)
t=tn,f +1
= c
2Kn Xn,f (t)
(1)
2(hn,f (t) + Hn,f )
t=tn,f +1
3 log t
× Pr d̄n,f (t) − dn,f > Kn . (80) × 1 Gn,f (t) ∩ Un,f
c
(t)
2(hn,f (t) + Hn,f )
Using the Chernoff-Hoeffding bound [50], we have
T−1
3 log t
≤ c
2Kn Xn,f (t)
2(hn,f (t) + Hn,f )
3 log t (1)
t=tn,f +1
Pr d̄n,f (t) − dn,f > Kn
2(hn,f (t) + Hn,f )
!
T−1 c (t)
Xn,f
2 hn,f (t) + Hn,f
2
3 log t ≤ Kn 6 log T . (86)
≤ exp − · Kn2 (1)
hn,f (t) + Hn,f
t=tn,f +1
(hn,f (t) + Hn,f )Kn n,f (t) + Hn,f )
2 2(h
= exp(−3 log t) = t−3 . (81) Since hn,f (t) ≤ T, we have
Then, it follows that: 1 hn,f (t) 1
= ·
T−1
(1) hn,f (t) + Hn,f hn,f (t) + Hn,f hn,f (t)
Lf E φ2,n,f (t)
n∈N f ∈F t=t(1) +1 T 1
n.f ≤ · . (87)
⎡ ⎤ T + Hn,f hn,f (t)
∞
≤ Kn E⎣ c
Lf Xn,f (t)⎦t−3 Then, it follows that:
t=1 n∈N f ∈F
∞
T−1
T−1
6T log T 1
(2)
φ2,n,f (t) ≤ Kn . (88)
−3
≤ Kn Mn t T + Hn,f hn,f (t)
(1) (1)
t=1 n∈N t=tn,f +1 t=tn,f +1
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14702 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
(i)
Let tn,f be the ith time slot in which file f is cached on EFS n. C. Bounding 3 (t)
(h (T))
Then, tn,fn,f
is the time slot in which file f is lastly cached Recall by (68) and Gn,f (t) {d̃n,f (t) ≥ dn,f } that
before time slot T. Accordingly, we have
3 (t) = Lf dn,f − d̃n,f (t) Xn,f (t)
hn,f (T)
T−1
1 1 n∈N f ∈F
=
(1)
hn,f (t) (i)
hn,f (tn,f ) = Lf dn,f − d̃n,f (t) Xn,f (t)
t=tn,f +1 i=2
n∈N f ∈F
hn,f (T) hn,f (T)−1
1 1 × 1 Gn,f (t) + 1 Gcn,f (t)
√ = √
i−1 i
i=2
#
i=1 ≤ Lf dn,f − d̃n,f (t) Xn,f (t)1 Gcn,f (t) . (94)
hn,f (T)
1
n∈N f ∈F
≤ √ di = 2 hn,f (T) − 1
1 i
Then, we define
≤ 2 hn,f (T). (89)
φ3,n,f (t) dn,f − d̃n,f (t) Xn,f (t)1 Gcn,f (t) (95)
It follows that:
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: HISTORY-AWARE ONLINE CACHE PLACEMENT IN FOG-ASSISTED IoT SYSTEMS 14703
Based on the above inequality, we have [4] Y. Jiang, M. Ma, M. Bennis, F.-C. Zheng, and X. You, “User preference
learning-based edge caching for fog radio access network,” IEEE Trans.
T−1
T−1 Commun., vol. 67, no. 2, pp. 1268–1283, Feb. 2018.
Lf E φ3,n,f (t) ≤ Kn Lf Xn,f (t)t−3 [5] S. Zhao, Z. Shao, H. Qian, and Y. Yang, “Online user-ap association
t=0 f ∈F with predictive scheduling in wireless caching networks,” in Proc. IEEE
t=tn,f +1 f ∈F
(1)
GLOBECOM, 2017, pp. 1–7.
[6] Y. Jiang, M. Ma, M. Bennis, F. Zheng, and X. You, “A novel caching
T−1
Mn t−3
policy with content popularity prediction and user preference learning
≤ Kn (100) in Fog-RAN,” in Proc. IEEE GLOBECOM Workshops, 2017, pp. 1–6.
(1) [7] S. Zhao, Y. Yang, Z. Shao, X. Yang, H. Qian, and C.-X. Wang,
t=tn,f +1
“FEMOS: Fog-enabled multitier operations scheduling in dynamic wire-
(t) ≤ M . less networks,” IEEE Internet Things J., vol. 5, no. 2, pp. 1169–1183,
where the last inequality holds because f ∈F Lf Xn,f n
Apr. 2018.
Then, by (100), it follows that: [8] X. Gao, X. Huang, S. Bian, Z. Shao, and Y. Yang, “PORA: Predictive
∞ offloading and resource allocation in dynamic fog computing systems,”
T−1
T−1
t−3 ≤ Kn Mn t−3
IEEE Internet Things J., vol. 7, no. 1, pp. 72–87, Jan. 2020.
Lf E φ3,n,f (t) ≤ Kn Mn [9] B. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-based
t=0 f ∈F (1)
t=tn,f +1 t=1 approach to caching in heterogenous small cell networks,” IEEE Trans.
∞
! Commun., vol. 64, no. 4, pp. 1674–1686, Apr. 2016.
[10] H. Pang, L. Gao, and L. Sun, “Joint optimization of data sponsoring and
≤ Kn Mn 1 + t−3 edge caching for mobile video delivery,” in Proc. IEEE GLOBECOM,
t=2
2016, pp. 1–7.
" # ∞ $ [11] F. Li, J. Liu, and B. Ji, “Combinatorial sleeping bandits with fairness
−3 3 constraints,” in Proc. IEEE INFOCOM, 2019, pp. 1702–1710.
≤ Kn Mn 1 + t dt = Kn Mn .
1 2 [12] P. Shivaswamy and T. Joachims, “Multi-armed bandit problems with
history,” in Proc. AISTATS, 2012, pp. 1046–1054.
(101)
[13] M. J. Neely, “Stochastic network optimization with application to com-
By (96) and (101), we have munication and queueing systems,” Synth. Lectures Commun. Netw.,
vol. 3, no. 1, pp. 1–211, 2010.
T−1
T−1 [14] J. Kwak, Y. Kim, L. B. Le, and S. Chong, “Hybrid content caching in 5G
E[3 (t)] ≤ Lf E φ3,n,f (t) wireless networks: Cloud versus edge caching,” IEEE Trans. Wireless
Commun., vol. 17, no. 5, pp. 3030–3045, May 2018.
t=0 n∈N t=0 f ∈F [15] Y. Wang, W. Wang, Y. Cui, K. G. Shin, and Z. Zhang, “Distributed
3 packet forwarding and caching based on stochastic network utility
≤ Kn Mn . (102) maximization,” IEEE/ACM Trans. Netw., vol. 26, no. 3, pp. 1264–1277,
2 Jun. 2018.
n∈N
[16] J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offload-
Combining (69), (93), and (102), we obtain ing for mobile edge computing in dense networks,” in Proc. IEEE
! INFOCOM, 2018, pp. 207–215.
T−1
T−1
T−1
[17] P. Blasco and D. Gündüz, “Learning-based optimization of cache content
E[1 (t)] ≤ V E[2 (t)] + E[3 (t)] in a small cell base station,” in Proc. IEEE ICC, 2014, pp. 1897–1903.
t=0 t=0 t=0 [18] P. Blasco and D. Gündüz, “Multi-armed bandit optimization of cache
content in wireless infostation networks,” in Proc. IEEE ISIT, 2014,
≤ 4V Kn Mn pp. 51–55.
n∈N [19] S. Müller, O. Atan, M. van der Schaar, and A. Klein, “Smart caching
⎛ ⎞ in wireless small cell networks via contextual multi-armed bandits,” in
6T 2 log T Proc. IEEE ICC, 2016, pp. 1–7.
+ 2V ⎝ Kn Mn Lf ⎠ . [20] X. Zhang, G. Zheng, S. Lambotharan, M. R. Nakhai, and K.-K. Wong,
T + Hmin
n∈N f ∈F “A learning approach to edge caching with dynamic content library in
wireless networks,” in Proc. IEEE GLOBECOM, 2019, pp. 1–6.
(103) [21] J. Song, M. Sheng, T. Q. S. Quek, C. Xu, and X. Wang, “Learning-
based content caching and sharing for wireless networks,” IEEE Trans.
Substituting (103) into (60), we obtain a regret bound as Commun., vol. 65, no. 10, pp. 4309–4324, Oct. 2017.
follows: [22] X. Xu, M. Tao, and C. Shen, “Collaborative multi-agent multi-armed
bandit learning for small-cell caching,” IEEE Trans. Wireless Commun.,
B 4 n∈N Kn Mn
Reg(T) ≤ + vol. 19, no. 4, pp. 2570–2585, Apr. 2020.
V ⎛ T ⎞ [23] S. Ajmal, M. B. Muzammil, A. Jamil, S. M. Abbas, U. Iqbal, and
P. Touseef, “Survey on cache schemes in heterogeneous networks using
6 log T
+ 2⎝ Lf ⎠
5G Internet of Things,” in Proc. ACM ICFNDS, 2019, pp. 1–8.
Kn Mn (104)
T + Hmin [24] H. Pang, J. Liu, X. Fan, and L. Sun, “Toward smart and cooperative
n∈N f ∈F edge caching for 5G networks: A deep learning based approach,” in
Proc. IEEE/ACM IWQoS, 2018, pp. 1–6.
where B = 1/2 n∈N (bn
2 + α 2 Mn2 ) and Hmin = minn,f Hn,f . [25] M. Chen, W. Saad, and C. Yin, “Echo-liquid state deep learning for 360◦
content transmission and caching in wireless VR networks with cellular-
connected UAVs,” IEEE Trans. Commun., vol. 67, no. 9, pp. 6386–6400,
R EFERENCES Sep. 2019.
[1] X. Gao, X. Huang, Y. Tang, Z. Shao, and Y. Yang, “Proactive cache [26] A. Ndikumana, N. H. Tran, D. H. Kim, K. T. Kim, and C. S. Hong,
placement with bandit learning in fog-assisted iot system,” in Proc. IEEE “Deep learning based caching for self-driving cars in multi-access edge
ICC, 2020, pp. 1–6. computing,” IEEE Trans. Intell. Transp. Syst., early access, Mar. 4, 2020,
[2] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role doi: 10.1109/TITS.2020.2976572.
of proactive caching in 5G wireless networks,” IEEE Commun. Mag., [27] Z. Liu, H. Song, and D. Pan, “Distributed video content caching policy
vol. 52, no. 8, pp. 82–89, Aug. 2014. with deep learning approaches for D2D communication,” IEEE Trans.
[3] L. Li, G. Zhao, and R. S. Blum, “A survey of caching techniques in Veh. Technol., vol. 69, no. 12, pp. 15644–15655, Dec. 2020.
cellular networks: Research issues and challenges in content placement [28] E. Baştuğ, M. Bennis, and M. Debbah, “A transfer learning approach
and delivery strategies,” IEEE Commun. Surveys Tuts., vol. 20, no. 3, for cache-enabled wireless networks,” in Proc. IEEE WiOpt, 2015,
pp. 1710–1732, 3rd Quart., 2018. pp. 161–166.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.
14704 IEEE INTERNET OF THINGS JOURNAL, VOL. 8, NO. 19, OCTOBER 1, 2021
[29] A. Sengupta, S. Amuru, R. Tandon, R. M. Buehrer, and T. C. Clancy, Xi Huang (Member, IEEE) received the B.Eng.
“Learning distributed caching strategies in small cell networks,” in Proc. degree from Nanjing University, Nanjing, China, in
ISWCS, 2014, pp. 917–921. 2014. He is currently pursuing the Ph.D. degree with
[30] A. Sadeghi, F. Sheikholeslami, A. G. Marques, and G. B. Giannakis, ShanghaiTech University, Shanghai, China.
“Reinforcement learning for adaptive caching with dynamic storage pric- Since September 2014, he has been with the
ing,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2267–2281, School of Information Science and Technology,
Oct. 2019. ShanghaiTech University. He was a visiting stu-
[31] Z. Zhu, T. Liu, S. Jin, and X. Luo, “Learn and pick right nodes to dent with the Department of Electrical Engineering
offload,” in Proc. IEEE GLOBECOM, 2018, pp. 1–6. and Computer Sciences, University of California at
[32] J. Yao and N. Ansari, “Energy-aware task allocation for mobile IoT by Berkeley, Berkeley, CA, USA, from February 2017
online reinforcement learning,” in Proc. IEEE ICC, 2019, pp. 1–6. to July 2017. His current research interests include
[33] A. Mukherjee, S. Misra, V. S. P. Chandra, and M. S. Obaidat, “Resource- the optimization and the design for intelligent networks.
optimized multiarmed bandit-based offload path selection in edge UAV
swarms,” IEEE Internet Things J., vol. 6, no. 3, pp. 4889–4896,
Jun. 2018.
[34] D. López-Pérez, A. Valcarce, G. De La Roche, and J. Zhang, “OFDMA
femtocells: A roadmap on interference avoidance,” IEEE Commun.
Mag., vol. 47, no. 9, pp. 41–48, Sep. 2009.
[35] O. Somekh, O. Simeone, Y. Bar-Ness, A. M. Haimovich, and S. Shamai,
“Cooperative multicell zero-forcing beamforming in cellular downlink Yinxu Tang received the B.Eng. degree from
channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3206–3219, ShanghaiTech University, Shanghai, China, where
Jul. 2009. she is currently pursuing the master’s degree with
[36] S. Verdu et al., Multiuser Detection. Cambridge, U.K.: Cambridge Univ. the School of Information Science and Technology.
Press, 1998. Her current research interests include bandit and
[37] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed ban- reinforcement learning, resource management, and
dit: General framework and applications,” in Proc. ICML, 2013, intelligent computing.
pp. 151–159.
[38] A. Slivkins, “Introduction to multi-armed bandits,” Found. Trends Mach.
Learn., vol. 12, nos. 1–2, pp. 1–286, 2019.
[39] S. Martello, D. Pisinger, and P. Toth, “Dynamic programming and strong
bounds for the 0–1 knapsack problem,” Manag. Sci., vol. 45, no. 3,
pp. 414–424, 1999.
[40] S. Martello, D. Pisinger, and P. Toth, “New trends in exact algorithms
for the 0–1 knapsack problem,” Eur. J. Oper. Res., vol. 123, no. 2, Ziyu Shao (Senior Member, IEEE) received the B.S.
pp. 325–332, 2000. and M.Eng. degrees from Peking University, Beijing,
[41] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction China, in 2001 and 2004, respectively, and the Ph.D.
to Algorithms. Cambridge, MA, USA: MIT Press, 2009. degree from the Chinese University of Hong Kong,
[42] S. Bian, X. Huang, Z. Shao, and Y. Yang, “Neural task scheduling Hong Kong, in 2010.
with reinforcement learning for fog computing systems,” in Proc. IEEE He is an Associate Professor with the School of
GLOBECOM, 2019, pp. 1–6. Information Science and Technology, ShanghaiTech
[43] J. Pei, P. Hong, M. Pan, J. Liu, and J. Zhou, “Optimal VNF placement University, Shanghai, China. He then worked as a
via deep reinforcement learning in SDN/NFV-enabled networks,” IEEE Postdoctoral Researcher with the Chinese University
J. Sel. Areas Commun., vol. 38, no. 2, pp. 263–278, Feb. 2020. of Hong Kong from 2011 to 2013. He was a Visiting
[44] B. Irie and S. Miyake, “Capabilities of three-layered perceptrons,” in Postdoctoral Researcher with the EE Department,
Proc. ICNN, 1988, pp. 641–648. Princeton University, Princeton, NJ, USA, in 2012. He was also a Visiting
[45] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gra- Professor with the EECS Department, University of California at Berkeley,
dient methods for reinforcement learning with function approximation,” Berkeley, CA, USA, in 2017. His current research interests center on
in Proc. NeurIPS, 2000, pp. 1057–1063. intelligent networks, including AI for networks and networks for AI.
[46] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of
the multiarmed bandit problem,” Mach. Learn., vol. 47, nos. 2–3,
pp. 235–256, 2002.
[47] D. Lee et al., “LRFU: A spectrum of policies that subsumes the least
recently used and least frequently used policies,” IEEE Trans. Comput.,
vol. 50, no. 12, pp. 1352–1361, Dec. 2001.
[48] J. Dilley and M. Arlitt, “Improving proxy cache performance: Analysis
of three replacement policies,” IEEE Internet Comput., vol. 3, no. 6, Yang Yang (Fellow, IEEE) received the B.S. and
pp. 44–50, Nov./Dec. 1999. M.S. degrees in radio engineering from Southeast
[49] G. Hinton, N. Srivastava, and K. Swersky, “Neural University, Nanjing, China, in 1996 and 1999,
networks for machine learning lecture 6a overview of mini- respectively, and the Ph.D. degree in information
batch gradient descent,” Lecture Notes CSC321, Univ. engineering from the Chinese University of Hong
Toronto, Toronto, ON, Canada, 2014. [Online]. Available: Kong, Hong Kong, in 2002.
http://www.cs.toronto.edu/∼tijmen/csc321/slides/lecture_slides_lec6.pdf He is currently a Full Professor with the School
[50] W. Hoeffding, “Probability inequalities for sums of bounded random of Information Science and Technology, Master
variables,” J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 409–426, 1963. of Kedao College, as well as the Director of
Shanghai Institute of Fog Computing Technology,
ShanghaiTech University, Shanghai, China. He
Xin Gao (Graduate Student Member, IEEE) is also an Adjunct Professor with the Research Center for Network
received the B.Eng. degree from the School of Communication, Peng Cheng Laboratory, Shenzhen, China, as well as a
Electronic Information and Communications, Senior Consultant of Shenzhen Smart City Technology Development Group,
Huazhong University of Science Technology, Shenzhen. Before joining ShanghaiTech University, he has held faculty posi-
Wuhan, China, in 2015. She is currently pursuing tions with the Chinese University of Hong Kong; Brunel University, Uxbridge,
the Ph.D. degree with the School of Information U.K.; the University College London, London, U.K.; and SIMIT, Chinese
Science and Technology, ShanghaiTech University, Academy of Sciences, Beijing, China. He has published more than 300 papers
Shanghai, China, Shanghai Institute of Microsystem and filed more than 80 technical patents in these research areas. His research
and Information Technology, Chinese Academy interests include fog computing networks, service-oriented collaborative intel-
of Sciences, Shanghai, China, and University of ligence, wireless sensor networks, IoT applications, and advanced testbeds and
Chinese Academy of Sciences, Beijing, China. experiments.
Her current research interests include bandit and reinforcement learning, Dr. Yang has been the Chair of the Steering Committee of Asia–Pacific
edge computing, fog computing, and Internet of Things. Conference on Communications since January 2019.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on October 05,2024 at 03:01:49 UTC from IEEE Xplore. Restrictions apply.