Cache
Cache
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
systems, such as chip-level L1, L2 caches leading up to the shown that the regret of LRU provably high. Intuitively, LRU
main memory or storage [3], and sequences of content caches does not learn the true popularity of the requests, rather it
beginning close to a user and leading up to a data center [4]. keeps a track of the recently arrived requests. Thus it suffers a
In the representative scenario where there is just a single constant regret at each time step, resulting in Ω(T ) cumulative
caching element in the hierarchy, it would see all requests regret.
as illustrated in Fig. 1 (a). A cache hit would mean that the While LFU is known to attain high hit rates, it suffers from
item can be serviced from the cache, while a miss would mean the fact that the number of counters is the same as the library
that the request would be forwarded to the library. Since the size, since every request must be counted. This is clearly
cache sees all requests, whether they result in hits or misses, prohibitive, and has given rise to approximations such as
the caching algorithm simply needs to quickly and efficiently W-LFU [8], which only keeps counts within a moving win-
determine the top items to cache. dow of requests, and TinyLFU [2] that uses a sketch for
approximate counting. Our next result is to show that these
approximations never entirely eliminate the error in estimating
B. Caching to Learn With Partial Observations the popularity distribution, leading to the worst possible regret
In a cache routing approach, requests for content are routed of Ω(T ).
towards caches that are believed to possess the content (or We then propose a variant of LFU that we term LFU-Lite,
along a default path without any information). Such routing under which we use a moving window of requests to decide
might be done via a name server such as DNS redirect, whether or not a particular item appears to be popular enough
and a cache only sees requests that are directed towards it. to be counted accurately. Thus, we maintain a counter bank,
In turn, this depends on whether the cache has the content in and only count those content items that meet a threshold
question. Applications of this model include caching at cellular frequency in any window of requests thus far. The counter
base stations [5], and content dissemination using a Global bank size grows in a concave manner with time, and we find
Name Service [6] (similar to DNS) such as Mobility-First [7]. its expected size to ensure O(1) regret for a target time T.
The representative single caching element scenario is shown Thus, given a time constant of change in popularity, we can
in Fig. 1 (b), and we refer to it as the partial observation decide on the ideal number of counters.
regime. We next consider the partial observation regime, wherein
In a typical cache network, requests are seen at multi- the cache can only see requests of items currently cached
ple caches, but request information is either not aggregated, in it. We relate this problem to that of the classical multi-
or only partially. Specifically, only request summaries might armed bandit (MAB) under which actions must be taken to
be disseminated over time. The availability of such summaries learn the value of pulling the different arms. Hence, explicit
implies that the content provider often has a good idea of exploration actions are needed in this regime. We first consider
the nature of the arrival distribution, for instance the Zipf an algorithm that builds up the correct posterior probabilities
parameter that it follows and the timescale at which changes given the requests seen thus far, and caches the most frequent
are observed. However, each cache does not see the hits and items in a sample of this posterior distribution. Although its
misses of the other caches in realtime, nor does it know the empirical performance is excellent, maintaining the full pos-
identity of the popular items in advance of the fact. Thus, terior sampling (FPS) quickly becomes prohibitively difficult.
explicit caching actions must be taken in order for a cache We then consider an algorithm that simply conducts a
to learn what is popular. Hence, we have a regime in which marginal posterior sampling (MPS) by updating counts only
structural information about the distribution could be known, for the items that are in the cache. Here, counts of hits and
but individual caches are not up to date on each other’s hits and misses are awarded to the appropriate cached item, but a miss
misses as they occur, and need to learn popularity as the arrival (which manifests itself as no request being made to the cache)
process changes, corresponding to the partial observation case. is not used to update the posterior distribution of items not in
The goal of this work is to conduct a systematic analysis the cache. Clearly, we are not using the information effectively,
of caching from the perspective of regret, with idea that a and this is reflected in the regret scaling as O(log T ). This
low-regret algorithm implies fast and accurate learning in result is similar to earlier work [5].
finite time, and hence should be usable in a setting where We then ask whether we can exploit the structure of the
the popularity distribution changes with time. Can we design problem to do better? In particular, suppose that we know
regret optimal algorithms that apply to each of our learning that requests will follow a certain probability distribution (e.g.,
paradigms? Zipf), although we do not know the ranking of items (i.e.,
1) Main Results: In our analytical model, we consider a we do not know which ones are the most popular). We develop
system in which one request arrives at each discrete time unit, a Structured Information (SI) algorithm that considers this
i.e., the total number of requests is the same as the elapsed information about the distribution to reduce the regret to O(1).
time T . We begin with insight that, under the full observation We also describe a “Lite” version of the SI algorithm similar
regime, the empirical frequency is a sufficient statistic of all to LFU-Lite to reduce the number of counters.
information obtained on the popularity distribution received We first verify our analytical results via numerical simula-
thus far. Here, there is no exploration problem, and the goal tions conducted using an IRM model drawn from Zipf distri-
is to simply exploit the observations received via estimating butions with different parameters of library and cache sizes.
the empirical frequencies. Hence, the appropriate use of this We also find that Lite-type schemes appear to empirically
estimate is to choose the top C most frequent items to cache. perform even better than predicted by the analytical results.
This approach is identical to the LFU, since it evicts the item We then construct versions of the algorithms that are capable
with the least empirical frequency at each time. Our first result of following a changing popularity distribution by simply “for-
is to show that LFU has an O(1) regret, not only with respect getting” counts, which takes the form of periodically halving
to time T , but also with respect to library size. It can also be the counts in the counters. The expectation is that a low
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
regret algorithm, augmented with such a forgetting rule with of the traditional Multi-Armed Bandit (MAB) approach that
an appropriately chosen periodicity should be able to track a does not account for problem structure, and hence can only
moving popularity distribution accurately. We conduct trace- attain O(log T ) regret.
based simulations using (non-stationary) data sets obtained With regard to the MAB problem, Lai and Robbins [22]
from IBM and YouTube, and compare hit performance against showed in seminal work the Ω(log T ) regret lower bound
the ubiquitous LRU algorithm. We show that the LFU variants pertaining to any online learning algorithm. An index based
outperform LRU, and that incorporating forgetting enhances algorithm using the upper confidence idea (UCB1 algorithm)
their hit-rates. was proposed in [23], which enabled a simple implementation
Since the amount of change over time in the existing traces while achieving the optimal regret. The posterior sampling
is low, we stress test our algorithms by creating a synthetic approach, first proposed by Thompson in [24], has recently
trace that has higher changes in popularity over time. Again, been shown to attain optimal regret [25]. For a detailed
we show that the versions of our algorithms that incorporate survey, we point to a monograph [26] and a recent book [27].
forgetting are able to track such changing distributions, and Another line of works in bandits is related to the best-arm
are still able to outperform LRU, which builds a case for their identification [28], [29], which can be considered as a pure
eventual adoption. We further incorporate an online change exploration problem. In our manuscript, the full observation
detection mechanism into our algorithms to detect the changes setting does not need to perform exploration. The exploration
in popularity on the fly. We show via a synthetic non-stationary vs. exploitation trade off naturally arises in the partial obser-
trace that the online change detection scheme, when combined vation regime, hence, in that theme, we follow approaches
with the simple forgetting rule makes our algorithms robust to inspired from multi-armed bandit literature. Although there are
the changes in the popularity under non-stationary traffic. similarities, the basic approaches and theoretical guarantees
2) Related Work: Existing analytical studies of caching provided by MAB and the best arm identification problems
algorithms largely follow the IRM model, with the focus are different.
being on closed-form results of the stationary hit probabil- Much work also exists on the empirical performance eval-
ities of LRU, FIFO, RANDOM, and CLIMB [1], [9]–[11]. uation of caching algorithms using traces gathered from
The expressions are often hard to compute for large caches, different applications. While several discover fundamental
and approximations have been proposed for larger cache insights [30]–[32], our goal in this work is on analytical per-
sizes [12]. Of particular interest is the Time-To-Live (TTL) formance guarantees, and we do not provide a comprehensive
approximation [13]–[16] that associates each cached item with review.
a lifetime after which it is evicted. Appropriate choice of
this lifetime enables the accurate approximation of different II. S YSTEM M ODEL
caching schemes [15]. We consider the optimal cache content placement problem
Recent work on performance analysis of caching algorithms in a communication network. The library, which is the set
has focused on the online learning aspect. For instance, [17] of all files, is denoted by L = {1, . . . , L}. We assume for
propose TTL-based schemes to show that a desired hit rate expositional simplicity that all files are of the same size, and
can be achieved under non-stationary arrivals. Other work that the cache has a capacity of C, i.e., it can store C files
such as [18] characterize the mixing times of several simple at a given time. We denote the popularity of the files by the
caching schemes such as LRU, CLIMB, k-LRU etc. with the profile μ = (μ1 , μ2 , . . . , μL ), with i μ1 = 1. Without loss of
goal of identifying their learning errors as a function of time. generality, we assume that μ1 > μ2 > · · · > μL . Let x(t) ∈ L
However, the algorithms studied all have stationary error (they be the file request received at time t. We assume that requests
never learn perfectly) and so regret in our context would are generated independently according to the popularity profile
be Ω(T ). μ, i.e., P(x(t) = i) = μi .
An alternative approach is taken in [19], [20], where the Let C(t) denote the set of files placed in the cache by
request arrival process is taken to be adversarial. These works the caching algorithm at time t. We say that the cache gets
present asymptotic and non-asymptotic regret lower bounds a hit if x(t) ∈ C(t) and a miss if x(t) ∈ / C(t). The
respectively, and show that a coded and an uncoded policy goal of the caching algorithmis to maximize the expected
respectively achieve this bound. As in many algorithm design cumulative hits over time, E[ Tt=1 ½{x(t) ∈ C(t)}], where
and analysis settings, the adversarial model and the stochas- the expectation is over the randomness in the requests and
tic (Bayesian) model produce significantly different results that the ensuing choices on C(t) made by the caching algorithm.
are not directly comparable. For instance, [20] shows that the Clearly, if popularity distribution μ is known, the optimal
LFU algorithm incurs an Ω(T ) regret in the adversarial setting, caching policy is to place the most popular items in the cache
i.e., the bound suggests poor performance. In contrast, in the at all times, i.e., C ∗ (t) = C, where C = {1, 2, . . . , C}.
stochastic arrival setting, we show among other results that the However, in most real world applications, the popularity
LFU algorithm will achieve the best possible regret of O(1) in distribution is unknown to the caching algorithm a priori.
the full observation regime. Our results are supported through So the goal of a caching algorithm is to learn the popularity
empirical trace-based (non-adversarial) simulations. distribution (or part of it) from the sequential observations, and
Information Centric caching has gained much recent inter- to place files in the cache by judiciously using the available
est, and is particularly relevant to edge wireless networks. Joint information at each time in order to maximize the expected
caching and routing is studied in [21] where the objective is to cumulative hits.
show asymptotic accuracy of the placements, rather than finite In the literature on multi-armed bandits, it is common to
time performance that we focus on. Closest to our ideas on characterize the performance of an online learning algorithm
the partial observation model is work such as [5], which draws using the metric of regret, which is defined as the performance
a parallel between bandit algorithms and caching under this loss of the algorithm as compared to the optimal strategy with
setting. However, the algorithms considered are in the manner complete information. Since C ∗ (t) = C, the cumulative regret
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
of a caching algorithm after T time steps is defined as where arg maxC indicates the indices of the top C elements of
T the vector μ̂(t). Having established these notions, we present
R(T ) = ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}. (1) below the finite time performance guarantee for the LFU
t=1
algorithm.
Theorem 1: The LFU algorithm has an expected regret of
Let s(t) be the observation available to the caching algo- O(1). More precisely,
rithm at time t and let h(t) = (s(1), . . . , s(t−1)) be the history
of observations until time t. The optimal caching problem is 16 4 C(L − C)
E[R(T )] < min ,
defined as the problem of finding a policy π that maps h(t) to Δ2min Δmin
C(t), i.e., C(t) = πt (h(t)), in order to minimize the expected
cumulative regret, E[R(T )]. where Δmin = μC − μC+1 .
The choice of the caching policy will clearly depend on Remark 1: We note that both terms of the regret upper
the nature of the sequential observations available to it. bound are distribution dependent, i.e., they depend on Δmin .
We consider two different observation structures that are most Roughly, if LC < 1/Δmin , then the second term dominates.
common in communication networks. We will use the following Lemma for proving Theorem 1.
1) Full Observation: In the full observation structure, Lemma 2: For > 0, we have
we assume that the caching algorithm is able to observe 2
P(max |μ̂i (t) − μi | > ) ≤ 2e−t /2
.
the file request at each time, i.e., s(t) = x(t). In the i
setup of a cache and library, this regime corresponds
The lemma is obtained through an application of the
to all requests being sent to the cache, which can then
Dvoretzky-Kiefer-Wolfowitz inequality [33]. We omit the
forward the request to the library in case of a miss.
proof due to page limitation.
2) Partial Observation: In the partial observation structure,
We proceed with the proof of Theorem 1.
the caching algorithm can observe the request only in
Proof: We denote CLFU (t) just as C(t) for notational con-
the case of a hit, i.e., only if the requested item is in
venience. We first argue that if maxi |μ̂i (t) − μi | < Δmin /2,
the cache already. More precisely, we define s(t) =
then C(t) = C. Indeed, if maxi |μ̂i (t) − μi | < Δmin /2, for
x(t)½{x(t) ∈ C(t)} under this observation structure.
any j ∈ C and for any k ∈ L \ C,
In the case of a miss, s(t) = 0. In the setup of a
cache and library, this regime corresponds to the context μ̂j (t) ≥ μj − Δmin /2 ≥ μC − Δmin /2
of information centric caching, wherein requests are
≥ μC+1 + Δmin /2 ≥ μk + Δmin /2 ≥ μ̂k (t)
forwarded to the cache only if the corresponding content
is cached. and hence C(t) = C. The expected regret can then be bounded,
Below, we propose different caching algorithms to address with
the optimal caching problem under these two observation T
structures.
E[R(T )] = E ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}
t=1
III. C ACHING W ITH F ULL O BSERVATION T T
We first consider the full observation structure where ≤E ½{C(t) = C} = P(C(t) = C)
the caching algorithm can observe every file request. Our
t=1 t=1
focus is on a class of algorithms following Least Frequently T
Used (LFU) eviction, since it uses cumulative statistics of
≤ P max |μ̂i (t) − μi | ≥ Δmin /2
all received requests (unlike other popular algorithms such i
t=1
as Least Recently Used (LRU)), and so is likely to have T
low regret. As indicated earlier, this is purely an exploitation 2
∞
2 16
≤ 2e−tΔmin /8 ≤ 2e−tΔmin /8 ≤ . (2)
problem, since every request is seen at the cache, and so all
t=1 t=0 Δ2min
hits and misses are known regardless of the cached items. This
regime can be compared to a multi-armed bandit in which the We can also upper bound E[R(T )] using a different
reward of every arm is revealed irrespective of the arm that is approach, to show the trade off between Δ2min and Δmin , L,
pulled, i.e., no exploration is needed. and C. This approach makes use of Hoeffding’s inequality.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
⎡ ⎤
T
C L
likely items. From the proof of Theorem 1 (c.f. (3)), we have
≤ E⎣ Δj,k ½(μ̂k (t) > μ̂j (t))⎦ T
t=1 j=1 k=C+1
T
C L E[R(T )] = E ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}
≤E Δj,k (½{μ̂j (t) − μj ≤ −Δj,k /2} t=1
⎛ ⎞
t=1 j=1 k=C+1 T
= ⎝ μj − E [½{x(t) ∈ C(t)}]⎠
+ ½{μ̂k (t) − μk > Δj,k /2}) . (5) t=1 j∈C
⎡ ⎤
T
Using the Hoeffding inequality [34], we obtain = E⎣ μj − μk ⎦
t=1 j∈C\C(t) k∈C(t)\C
2 ⎡ ⎤
P(μ̂j (t) − μj ≤ −Δj,k /2) ≤ e−tΔj,k /2 , T
2 ≥ E⎣ (μC − μk )⎦
P(μ̂k (t) − μk > Δj,k /2) ≤ e−tΔj,k /2 .
t=1 k∈C(t)\C
⎡ ⎤
Now, continuing form (6) and by taking expectation inside T
the summation, we obtain = E⎣ (μC − μk )½{k ∈ C(t)}⎦
t=1 k∈L\C
C
L
T
−tΔ2j,k /2 T
E[R(T )] ≤ Δj,k 2e
j=1 k=C+1 t=1
= (μC − μk )P(k ∈ C(t))
t=1 k∈L\C
C
L
4 4C(L − C) T
≤ ≤ (6)
j=1 k=C+1
Δj,k Δmin ≥ (μC − μC+1 )P(C + 1 ∈ C(t)), (7)
t=1
Combining (2) and (6), we obtain the desired result. where the last inequality follows by focusing on a sub-event.
Given that the probability of item C + 1 in non-zero, we can
establish the desired lower bound using window W [t − w, t),
B. WLFU Algorithm
LFU achieves a regret of O(1), but its implementation is P(C + 1 ∈ C(t)) ≥ P({x(τ ) = C + 1 : τ ∈ [t − w, t)})
expensive in terms of memory requirements. This cost arises = (μC+1 )w .
because LFU maintains a popularity estimate for each item in
the library (μ̂i (t)), and the library size L is extremely large Combining this result with (7), we get expression
for most practical applications. Typically, allocating memory
to maintain the popularity distribution estimate for the whole E[R(T )] ≥ (μC − μC+1 )(μC+1 )w T,
library is impractical.
There are many approaches proposed to address this which has order T . Since the cost per stage is bounded,
issue [2], [8]. However, most approaches rely on heuristics- we obtain the statement of the theorem.
based approximations of the empirical estimate, often with a
tight pre-determined constraint on the memory. This leads to
non-optimal use of the available information, and could result C. LFU-Lite Algorithm
in poor performance of the corresponding algorithms. We now propose a new scheme that we call the
In this article, we consider the Window-LFU (WLFU) LFU-Lite algorithm. Unlike the LFU algorithm, LFU-Lite
algorithm [8] that has been proposed as way to overcome algorithm does not maintain an estimate of the popularity for
the expensive memory requirement of LFU. WLFU employs each item in the library. Instead, it maintains the popularity
a sliding window approach. At each time t, the algorithm estimate only for a subset of the items that it has observed.
keeps track of only the past w file requests. This is equivalent This approach significantly reduces the memory required as
to maintaining a time window from t − w to t, denoted compared to the standard LFU implementation. At the same
by W [t − w, t). Caching decisions are made based on the time, we show that the LFU-Lite achieves an O(1) regret
file requests that appeared within this window. In particular, similar to that of the LFU, and thus has a superior performance
the items to be placed in the cache at time t, CW LF U (t), compared to WLFU which suffers an Ω(T ) regret.
are the top C files with maximum appearances in the We achieve this ‘best of both’ performance by a clever
window W [t − w, t). combination of a window based approach to decide the items
We now show that the expected cumulative regret incurred to maintain an estimate, and by maintaining a separate counter
by WLFU increases linearly in time (Ω(T )), as opposed to the bank to keep track of these estimates. At each time t, LFU-
constant regret (O(1)) of the standard LFU. Since Ω(T ) is the Lite selects the top C items with maximum appearances in
worst possible regret for any learning algorithm, it suggests the window of observation W [t − w, t]. We denote this set of
that in practice there will occasionally be arbitrarily bad files as A(t). Let B(t − 1) be the set of items in the counter
sample paths with many misses. bank at the beginning of t. Then, if any item j ∈ A(t) is not
Theorem 3: Under the WLFU algorithm, E[R(T )] = Ω(T ). present in B(t − 1), it is added to the counter bank, and the
Proof: This result can be established by finding a lower counter bank is updated to B(t). Once an item is placed in
bound on the probability that cache does not match the most the counter bank, it is never removed from the counter bank.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LFU-Lite maintains an estimate of the popularity of each The first term in (10) can be bounded as
item in the counter bank. The popularity estimate of item T
C L
i ∈ B(t), μ̂i (t), is defined as
E[ Δj,k ½{μ̂j (t) − μj ≤ −Δj,k /2}]
t t=1 j=1 k=C+1
1
μ̂i (t) = ½{x(t) ∈ B(t)} (8) C L
T
(t − ti ) τ =t +1 = E[ E[ Δj,k ½{μ̂j (t) − μj ≤ −Δj,k /2}|tj ]]
i
j=1 k=C+1 t=1
where ti is the time at which the item i has been added to the C L T
counter bank. The item to be placed in the cache at time t,
CLL (t), is then selected as ≤ E[ Δj,k (tj +E[ ½{μ̂j (t)−μj
j=1 k=C+1 t=tj
CLL (t) = arg max(μ̂j (t), j ∈ B(t)) ≤ −Δj,k /2}|tj ])]
C
C L T
Description of the LFU-Lite is also given in Algorithm 1. ≤ E[ Δj,k (tj + P(μ̂j (t)−μj ≤ −Δj,k /2|tj ))]
j=1 k=C+1 t=tj
C
L
T
Algorithm 1 LFU-Lite 2
≤ E[ Δj,k (tj + e−(t−tj )Δj,k /2 ]
for t = 1, . . . , T do
j=1 k=C+1 t=tj
Observe x(t)
C
L
Select A(t), the top C files with maximum appearances in 2
≤ (Δj,k E[tj ] + ). (11)
the window W [(t − w)+ , t) Δj,k
j=1 k=C+1
for Each j ∈ A(t) do
if (j ∈ A(t) is not in B(t − 1)) then Similarly, the second term in (10) can be bounded as
tj ← t T
C L
Add file j into B(t) E[ Δj,k ½{μ̂k (t) − μk > Δj,k /2}]
end if t=1 j=1 k=C+1
end for C L
T
Select the files CLL (t) = arg maxC (μ̂j (t), j ∈ B(t)) and = E[ E[ Δj,k ½{μ̂k (t) − μk > Δj,k /2}|tk ]]
place them in the cache j=1 k=C+1 t=1
end for C L
T
= E[ Δj,k P(μ̂k (t) − μk > Δj,k /2|tk )]
We now present the performance guarantee for the LFU-Lite j=1 k=C+1 t=tk
algorithm. C
L
T
2
Theorem 4: The expected regret under the LFU-Lite algo- ≤ E[ Δj,k e−(t−tk )Δj,k /2 ]
rithm is j=1 k=C+1 t=tk
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Proof: Let p̄tj be the probability with which file i enters the A. Caching Bandit With Full Posterior Sampling
counter
bank by time t.Note that p̄ti = 1−(1−pi )t/w . Where Posterior sampling based algorithms for MAB [25], [35]
w
pi = n=μC+1 w+1 w n
n μi (1 − μi )
w−n
is the probability that typically use a Beta prior (with Bernoulli likelihood) or
item i enters the counter bank in any given window. Gaussian prior (with Gaussian likelihood) in order to exploit
L L L the conjugate pair property of the prior and likelihood (reward)
E[B(t)] = E[ ½{i∈B(t)} ] = p̄ti = 1−(1 − pi )t/w distributions. Hence, the posterior at any time will have the
same form as the prior distribution, albeit with different para-
i=1 i=1 i=1
(16) meters. This provides a computationally tractable and memory
efficient way to keep track of the posterior distribution evolu-
Observe that 1 − (1 − pi )t/w is concave in t and E[B(t)] tion. However, in the optimal caching problem, the unknown
is a sum of L concave functions, and is hence concave. popularity vector μ has interdependent components through
Remark 2: Intuitively, the counter bank will keep the counts the constraint i μi = 1. Hence, standard prior distributions
only for more popular items. It is also straight forward to like Beta will not be able to capture the full posterior evolution
show that the expected size of the counter bank decreases in the caching problem.
with the window length w. To see this, consider two different We use a Dirichilet prior on the popularity distribution
window length w1 and w2 such that w1 ≥ w2 . Let pi (w) μ = (μ1 , . . . , μL ), parametrized by α = (α1 , . . . , αL ). More
be the probability that item i enters the counter bank in any precisely,
given window when the window of length is w. Then, for L L
i ∈/ C, pi (w1 ) ≤ pi (w2 ). Intuitively, larger window length 1 αi −1 Γ(αi )
f0 (μ; α) = μi , where, B(α) = i=1 L ,
leads to more observations and hence to smaller probability B(α) i=1 Γ( i=1 αi )
of observing item i ∈/ C more than the threshold μC+1 w. Now,
(1−pi (w2 ))t/w2 ≤ (1−pi (w1 ))t/w2 ≤ (1−pi (w1 ))t/w1 . and Γ(·) is the Gamma function.
So, 1 − (1 − pi (w2 ))t/w2 ≥ 1 − (1 − pi (w1 ))t/w1 . Hence, Let ft be the posterior distribution at time t with parameter
the contribution of i ∈/ C to the expected size of the counter α(t). The posterior is updated according to the observed
bank according (16) is smaller for larger window length. information s(t). In the case of a hit, the file request x(t)
The exact dependence of E[B(t)] on w is cumbersome to is observed and s(t) = x(t). It is easy to see that the correct
characterize. We, however, illustrate this through extensive posterior update is α(t) = α(t) + ex(t) , where ex(t) is the unit
simulations in Section IV-C. vector with non-zero element at index x(t).
The posterior update is complex in the case of the cache
IV. C ACHING W ITH PARTIAL O BSERVATION miss. In case of a miss, we code s(t) = 0. Given the
current parameter α(t) = α, we can show that the posterior
We now consider the problem of optimal caching under distribution in case of a miss can be computed as
the partial observation regime. As described earlier, here the
algorithm can observe a file request only if the requested file ft+1 (μ|s(t) = 0) ∝ P(s(t) = 0|μ)ft (μ; α)
is in the cache. Hence, the caching algorithm has to perform 1
active exploration by placing a file in the cache sufficiently = L αj ft (μ; α + ej ).
often to learn its popularity in order to decide if that file i=1 αi j ∈C(t)
/
belongs to the set of the most popular files. This procedure
is in sharp contrast to the full observation structure where the
popularity estimate of each file in the library can be improved Algorithm 2 CB-FPS Algorithm
after each time step due to full visibility of all the requests.
Initialize the prior distribution f0
However, the exploration is costly because the algorithm
incurs regret every time that a sub-optimal file is placed in the for t = 1, . . . , T do
cache for exploration. Hence, the algorithm also has to perform Sample μ̂(t) ∼ ft (·)
an active exploitation, i.e., place the most popular items Select CFPS (t) = arg maxC μ̂(t)
according to the current estimate in the cache. The optimal Receive the observation s(t)
exploration vs exploitation trade-off for minimizing the regret Update the posterior ft+1 (μ) ∝ P(s(t)|μ)ft (μ)
is at the core of most online learning algorithms. The Multi- end for
Armed Bandit (MAB) model is a canonical formalism for this
class of problems. Here, there are multiple arms (actions) Hence, the posterior update in case of a miss is a com-
that yield random rewards independently over time, with bination of (L − C) Dirichlet priors from the previous step.
the (unknown) mean of arm i being μi . The objective is With the first miss, the algorithm needs to store a set of size
to learn the mean reward of each arm by exploration and (L − C) consisting of Dirichlet parameters. With each miss,
maximize cumulative reward by exploitation. parameter sets of size (L − C) need to be stored, one such
We formulate the caching problem as a multi-player multi- set for each of the parameters at the previous step. Thus,
armed bandit problem, in which the content placement in the at the tth miss, the number of parameters that we need to
cache is viewed equivalent to arm pulls. The request for an store are (L − C)t , growing exponentially in t. Hence, as the
item is considered as its reward, which is a {0, 1} random number of misses increases, the memory required to store
variable, sampled from the popularity vector. We call this these parameters will increase exponentially, rendering the full
formulation as “Caching Bandits”. Unlike in the multi-armed posterior update algorithm infeasible from an implementation
bandit problem, the rewards in the caching bandit problem perspective.
are not independent, as the request for one item in the cache We present the CB-FPS in Algorithm 2. At each time t,
indicate that there is no request for the other items. the algorithm takes a sample μ̂(t) according to the current
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
posterior ft (·). It places the top C items in the order of O(1) regret incurred by LFU and LFU-Lite. This is due to
decreasing μ̂(t) in the cache. the partial observation structure that limits the rate of learning.
In Section IV-C, we will see that a Monte Carlo version We now propose an algorithm that we call Caching Bandit with
of this algorithm can be implemented for small values of L Structural Information (CB-SI). We show that with a minimal
and C, which seems to achieve an O(1) regret. A rigorous assumption on the availability of the structural information
proof that shows such regret, even in some special cases and about the popularity distribution, CB-SI can achieve an O(1)
by neglecting computational tractability, is an interesting open regret even in the partial observation regime.
problem. We assume that the algorithm knows the value of μC and
Δmin , the popularity value of the Cth most popular item and
B. Caching Bandit With Marginal Posterior Sampling the optimality gap. Note that we do not assume knowledge
We now propose an algorithm that only performs a marginal of the identity of the Cth most popular file. We note that
posterior update. Instead of maintaining a Dirchlet prior for our proof approach follows the techniques developed in [37],
the popularity vector μ, we use a Beta prior for the popularity which can be considered as a special case with C = 1. CB-SI
of each individual item μi . The CB-MPS is described in algorithm is given in Algorithm 4.
Algorithm 3. The algorithm maintains an empirical estimate for each
item, and places into cache the items whose empirical estimate
crosses a certain threshold decided by μC , and Δmin . If there
Algorithm 3 CB-MPS Algorithm are not enough items that cross this threshold, the algorithm
Initialize αi (0) = 1, βi (0) = 1, ∀i ∈ L. samples the rest of the items without replacement, according to
for t = 1, . . . , T do the probability inversely proportional to the square of items’
Generate samples μ̂i (t) ∼ Beta(αi (t), βi (t)) gap from the C th most popular item. The intuition is that
CMPS (t) ← arg maxC μ̂(t) with the knowledge of the threshold, one can reduce the
amount of exploration needed, and hence reduce the regret.
if x(t) ∈ CMPS (t) then
To see this, once an item’s empirical estimate is more than
αx(t) (t + 1) ← αx(t) (t) + 1 μC − Δmin
βi (t + 1) ← βi (t) + 1, ∀i ∈ CMPS (t), i = x(t) 2 , we do not need to explore other items to resolve
the uncertainty about the item in question, as this item belongs
end if to the C most popular items. We still need to explore to
end for observe the other popular items. In this way, the algorithm
reduces the exploration of less popular items, while exploiting
The CB-MPS algorithm generates samples for each item the information accrued so far about the popular items, by the
from an independent beta distribution, and places the C items knowledge of the threshold. As shown in theorem below, this
with the largest samples into the cache. The algorithm then directed exploration suffers only a constant regret.
updates independent beta posteriors for each item in the Theorem 7: The expected cumulative regret of CB-SI Algo-
library. However, it updates the posterior for only those items rithm is,
currently in the cache. The posteriors for all the other items
remain the same.
We now provide a performance guarantee for the CB-MPS 2 4 32 Δ2
algorithm. E[R(T )] ≤ C ( + [4 + exp(− )]),
Δ2 Δ2 Δ2 8
Theorem 6: Under marginal posterior sampling algorithm, j∈L\C
E[R(T )] = O((L − C)C log T ).
It is clear that CB-MPS algorithm disregards the inherent
structure in the Caching Bandit formulation, by choosing not where Δ = μC − μC+1 .
to update the non-cache items. Note that, in the caching formu- Proof: We denote Δj = μC − μj . In this proof, we will
lation, the popularity vector has inter dependent components show that the expected regret of CB-SI algorithm is bounded
(it is a probability distribution), a cache miss means that above by a constant. From the proof of Theorem 1 (c.f. (6)),
one of the non-cache items is surely requested. Hence, the we get that the expected regret is bounded as below.
CB-MPS algorithm views the caching bandit problem almost
like a multi player multi armed bandit problem, regardless
of the additional structure imposed by the caching ban- E[R(T )]
T C L
dit. But we emphasize that this loss of information in the
CB-MPS algorithm is unavoidable due to the partial obser- ≤ E[ Δj,k ½{i ∈
/ C(t), j ∈ C(t)}]
vation structure limiting the observations to the cached items. t=1 i=1 j=C+1
Note that the beta posterior in CB-MPS will be corrupted if T L
L
T
we update the missed requests for the non cache items in any ≤ C E[ ½{j ∈ C(t)}] = C P(j ∈ C(t))
heuristic way. t=1 j=C+1 j=C+1 t=1
We omit the proof of this theorem because the analysis is L
T
similar to that of multi-payer multi-armed bandit algorithm. =C (P(μ̂j (nj (t)) > μC − Δj /2, j ∈ C(t))
In particular, the posterior sampling method proposed in [36] j=C+1 t=1
can be used with small modifications to show the above result.
+P(μ̂j (nj (t)) ≤ μC − Δj /2, j ∈ C(t))). (17)
C. Caching Bandit With Structural Information
Even though the CB-MPS algorithm is easy to implement, We address each term in the above summation separately.
it suffers an O(log T ) regret, which is much worse than the First, observe that for any j ∈ {C + 1 . . . L}, the following
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Δ
Algorithm 4 CB-SI Algorithm = pj,t P μ̂j (nj (t)) ≤ μC − , Zt = 2
2
Initialize αi (0) = 0, βi (0) = 0, μ̂i (0) = 1/L, ∀i ∈ L.
Δ
Initialize ni (t) = 0, ∀i ∈ L = E pj,t ½{μ̂j (nj (t)) ≤ μC − , Zt = 2}
for t = 1, . . . , T do 2
Compute the set A(t) = {i ∈ L : μ̂i (ni (t)) ≥ μC − Δ Δ
2}
pj,t
= E pi,t ½{μ̂j (nj (t)) ≤ μC − , Zt = 2} ,
if |A(t)| ≥ C then pi,t 2
CSI (t) = arg maxC μ̂j (nj (t), j ∈ A(t))
Zt ← 1 for any i ∈ C. Note that, in the equality(c), we used the
else definition of pj,t . Now, substituting the value for the sampling
For each i ∈ L \ A(t), compute probability pi,t , we obtain,
pi (t) = c/(μC − μ̂i (ni (t)))2 |μC − μ̂i (ni (t))|2 Δ
≤ E pi,t 1{μ̂j (nj (t)) ≤ μC − , Zt = 2}
c = i∈L\C(t) 1/(μC − μ̂i (ni (t)))2 (Δ 2 2
2)
Sample C − |A(t)| elements from the set L \ A(t) 4
≤ E |μC − μ̂i (ni (t))|2 pi,t ½{Zt = 2}
according to the probability pi (t). Denote these elements Δ2
as B(t) 4
≤ E |μC − μ̂i (ni (t))|2
CSI (t) = A(t) ∪ B(t) Δ2
Zt ← 2 Δ
P i ∈ C(t)|μ̂i (ni (t)) ≤ μC − , Zt = 2
end if 2
Place the files CSI (t) in the cache 4
= E |μC − μ̂i (ni (t))|2
if x(t) ∈ C(t) then Δ2
αx(t) (t + 1) ← αx(t) (t) + 1 Δ
E ½{i ∈ C(t)}|μ̂i (ni (t)) < μC − , Zt = 2
βi (t + 1) ← βi (t) + 1, ∀i ∈ C(t), i = x(t) 2
end if 4
ni (t + 1) = αi (t + 1) + βi (t + 1) = E |μC − μ̂i (ni (t))|2
Δ2
μ̂i (ni (t + 1)) = αi (t + 1)/ni (t + 1) Δ
end for E ½{i ∈ C(t), μ̂i (ni (t)) < μC − }|
2
Δ
{μ̂i (ni (t)) < μC − , Zt = 2}
inequality holds. 2
4 2
T = E |μC − μ̂i (ni (t))|
Δ2
P(μ̂j (nj (t)) > μC − Δj /2, j ∈ C(t)) Δ
t=1 ½{i ∈ C(t), μ̂i (ni (t)) < μC − } . (19)
2
T
≤ P(μ̂j (nj (t)) > μj + Δj /2, j ∈ C(t)) Here, the inequalities follow from the properties of condi-
t=1 tional expectation. Now, we obtain a bound for the second
T T
(a) 2 2 term in (17), using (19), as below.
≤ P(μ̂j (t) > μj + Δj /2) ≤ e−Δj t/2 ≤ ,
t=1 t=1
Δ2j
T
Δ
(18) E |μC − μ̂i (ni (t))|2 1{μ̂i (ni (t)) < μC − , i ∈ C(t)}
t=1
2
where the inequality (a) follows from Hoeffding’s inequality. T
For bounding the second term in (17), we use the policy Δ
≤ E |μC − μ̂i (t)|2 1{|μ̂i (t) − μC | > }
definition. Since Δj ≥ Δ, the first inequality follows trivially. 2
t=1
The equality(b) follows from the fact that when the mean T
estimate of the j th item is smaller than μC − Δ ∞
Δ
2 , the only = P(|μC − μ̂i (t)|2 1{|μ̂i (t) − μC | > } ≥ x)dx
means by which it can enter the cache is through exploration
t=1 0 2
part of the algorithm, which is denoted by Zt = 2.
T ∞
Δ2
Δj = P(|μC − μ̂i (t)|2 1{|μ̂i (t) − μC |2 > } ≥ x)dx
P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t)) 0 4
2 t=1
Δ2
Δ T
≤ P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t)) =
4
P |μC − μ̂i (t)|2
2
Δ t=1 0
(b)
= P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t), Zt = 2)
2 Δ2 2
1{|μ̂i (t)−μC | > } ≥ x dx
(c) Δ 4
= P(j ∈ C(t)|μ̂j (nj (t)) ≤ μC − , Zt = 2)
2 ∞
Δ2
Δ 2 2
+ P(|μC − μ̂i (t)| 1{|μ̂i (t)−μC | > } ≥ x)dx
P(μ̂j (nj (t)) ≤ μC − , Zt = 2) Δ2 4
2 4
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
T
Δ2
4 Δ2
= P(|μC − μ̂i (t)|2 ≥ x, |μ̂i (t) − μC |2 > )dx
t=1 0 4
∞
Δ2
+ P(|μC − μ̂i (t)|2 1{|μ̂i (t)−μC |2 > } ≥ x)dx
Δ2
4
4
T
Δ2
4 Δ2
2
= P |μ̂i (t) − μC | > dx
t=1 0 4
∞
2 2 Δ2
+ 2 P |μC − μ̂i (t)| 1{|μ̂i (t)−μC | > } ≥ x dx
Δ
4
4 Fig. 2. Regret of LFU, WLFU, LFU-Lite.
T 2
Δ Δ2
= P |μ̂i (t) − μC |2 > +
t=1
4 4
∞ 2
Δ
P |μC − μ̂i (t)|2 1{|μ̂i (t)−μC |2 > } ≥ x dx
Δ2
4
4
T
Δ2 − tΔ2 ∞
2
= 2 e 8 + 2 Pr{|μC − μ̂i (t)| ≥ x}dx
t=1
4 Δ
4
32 Δ2
≤ 4+ exp(− ). (20)
Δ2 8
Fig. 3. Growth of counters for LFU, LFU-Lite.
Combining equations (17),(19),(18),(20), we observe:
2 4 32 Δ2
E[R(T )] ≤ C + 2 [4 + 2 exp(− )]
Δ2 Δ Δ 8
j∈L\C
Remark 8: We introduce another version of CB-SI algo-
rithm, which is similar in spirit to LFULite. Following a
similar rule to LFULite, we maintain a window of the W
past observations, and at each time, the C most frequently
requested items in the window are added to the counter bank,
if those items are not already present in it. The mean estimates
of CB-SI are calculated only for items in the counter bank. Fig. 4. Regret of LFU-Lite for varying W.
We call this algorithm as CB-SILite. In Section V-B, we will
observe that CB-SILite drastically reduces the number of widely deployed and implicitly has a finite memory (i.e.,
counters needed to give a similar hit performance to CB-SI. it automatically “forgets”). We also further explore the reaction
We now compare the performance of CB-MPS and of our approaches to non-stationary requests by creating a
CB-SI via a lower bound argument. The known lower bound synthetic trace that exhibits changes at a faster timescale than
on the regret of a classical multi-player multi-armed ban- the data traces. We also test an online change detection mech-
dit is Ω(log T ), and the Thompson sampling algorithm is anism, along with the forgetting rule, under non-stationary
known to achieve this. As discussed previously, the CB-MPS request arrival process.
algorithm is equivalent to the Thompson sampling algorithm
for the multi-player multi-armed bandit formulation. Clearly,
the multi-player multi-armed bandit is at least as hard as the A. IRM Simulations
classical multi-armed bandit problem and will have a regret
Ω(log T ). Thus, we argue that the performance of CB-MPS is 1) Full Observation: We first conduct simulations for an
worse than CB-SI, which achieves O(1) regret. IRM request process following a Zipf distribution with para-
meter β under the full observation setting. Figure 2 com-
pares the regret suffered by LFU, WLFU and LFU-Lite for
V. S IMULATIONS C = 10, L = 1000, W = C 2 log L [8] and β = 1.
In this section, we start by conducting simulations with As expected, the regret suffered by the WLFU algorithm grows
requests generated under the IRM model to verify our insights linearly with time, while LFU and LFU-Lite suffer a constant
on regret obtained in the earlier sections. In each instance, regret. Figure 3 shows the growth of the number counters used
we present results averaged over ten runs of the algorithm to keep an estimate of files. The merits of LFU-Lite are clearly
under test. We then use two data traces to compare the per- seen here, as it uses approximately 35 counters to achieve a
formance of our proposed algorithms when exposed to a non- constant regret, while LF U uses 1000.
stationary arrival process. Since these requests change with The growth of counters and the regret suffered by LFU-Lite
time, we modify the algorithms to “forget” counts, by halving depends on W , the window of observation. In Figures 4 and 5,
the counts at a fixed periodicity. In the full observation regime, we compare the growth of regret and counters with W for
we also compare the performance against LRU, which is L = 1000, C = 10, β = 1. We see that the number of counters
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 11. Hit rates for full observation for IBM trace. Fig. 14. Hit rates for partial observation, IBM trace.
Fig. 12. Hit rates for full observation for YouTube trace. Fig. 15. Hit rates for partial observation, YouTube trace.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
VI. C ONCLUSION
Fig. 18. Hit rates for full observation model.
We considered the question of caching algorithm design and
analysis from the perspective of online learning. We focused
on algorithms that estimate popularity by maintaining counts
of requests seen, in both the full and partial observation
regimes. Our main findings were in the context of full
observation, it is possible to follow this approach and obtain
O(1) regret using the simple LFU-Lite approach that only
needs a small number of counters. In the context of partial
observations, our finding using the CB-SI approach was that
structure greatly enhances the learning ability of the caching
algorithm, and is able to make up for incomplete observa-
Fig. 19. Hit rates for partial observation model. tions to yield O(1) regret. We verified these insights using
both simulations and data traces. In particular, we showed
SILite uses small counter bank for all the traces as shown in that even if the request distribution changes with time, our
see Table II. approach (enhanced with a simple “forgetting” rule) is able to
outperform established algorithms such as LRU. We have also
augmented our algorithms with an online change detection
C. Online Change Detection for Non-Stationary Requests mechanism, independent to the proposed algorithms. This
In this section, we use an online change detection algorithm approach enhanced our algorithms to detect changes in the
to find if there is any change in the request arrival distribution. distributions on the fly. When we enabled the algorithms with
We augment our caching algorithms with this online change this approach along with a simple forgetting rule, we were able
detection mechanism to adapt it to the scenarios where the to achieve a good empirical performance under non-stationary
requests are non-stationary. We devise this mechanism in traffic.
such a way that it works independent of the specific caching
algorithm we use. R EFERENCES
The detection mechanism is based on a two-window para-
[1] E. G. Coffman and P. J. Denning, Operating Systems Theory. Englewood
digm proposed in [40]. It maintains a reference window, and Cliffs, NJ, USA: Prentice-Hall, 1973.
a current window. The current window slides forward with [2] G. Einziger, R. Friedman, and B. Manes, “TinyLFU: A highly efficient
each incoming data point, and the reference window is updated cache admission policy,” ACM Trans. Storage, vol. 13, no. 4, p. 35,
whenever a change is detected. Our scheme compares requests 2017.
in the reference window to the requests in the current window. [3] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quanti-
tative Approach. Amsterdam, The Netherlands: Elsevier, 2011.
We measure the total variation distance between the empirical [4] A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and
distribution of requests in the reference window and of that in K. J. Worrell, “A hierarchical internet object cache,” in Proc. USENIX
the current window. If this distance is greater than a threshold, Annu. Tech. Conf., 1996, pp. 153–164.
a change detection is announced. When a change is detected, [5] P. Blasco and D. Gunduz, “Learning-based optimization of cache content
in a small cell base station,” in Proc. IEEE Int. Conf. Commun. (ICC),
it signals the caching algorithm, which employs a heuristic to Jun. 2014, pp. 1897–1903.
reduce its counters by half. This reduction ensures that the [6] A. Sharma, X. Tie, H. Uppal, A. Venkataramani, D. Westbrook, and
algorithms reflect the changes in the request distribution. A. Yadav, “A global name service for a highly mobile internetwork,”
To evaluate the performance of our approach, as in the ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 247–258,
previous subsections, we first create a synthetic trace of 1 mil- 2015.
[7] A. Venkataramani, J. Kurose, D. Raychaudhuri, K. Nagaraja, M. Mao,
lion requests with a library size of 50000. These requests are and S. Banerjee, “MobilityFirst: A mobility-centric and trustworthy
generated by sampling a Zipf distribution, with parameter 1. internet architecture,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 3,
We then induce non-stationarity in this trace by cycling the pp. 74–80, Jul. 2014.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[8] G. Karakostas and D. N. Serpanos, “Exploitation of different types of [36] J. Komiyama, J. Honda, and H. Nakagawa, “Optimal regret analy-
locality for web caches,” in Proc. 7th Int. Symp. Comput. Commun. sis of Thompson sampling in stochastic multi-armed bandit problem
(ISCC), Jul. 2002, pp. 207–212. with multiple plays,” 2015, arXiv:1506.00779. [Online]. Available:
[9] W. F. King-III, “Analysis of demanding paging algorithms,” in Proc. http://arxiv.org/abs/1506.00779
IFIP Congr. Amsterdam, The Netherlands: North-Holland, 1971, [37] S. Bubeck, V. Perchet, and P. Rigollet, “Bounded regret in stochastic
pp. 485–490. multi-armed bandits,” in Proc. Conf. Learn. Theory, 2013, pp. 122–134.
[10] E. Gelenbe, “A unified approach to the evaluation of a class of replace- [38] P. Zerfos, M. Srivatsa, H. Yu, D. Dennerline, H. Franke, and D. Agrawal,
ment algorithms,” IEEE Trans. Comput., vol. C-22, no. 6, pp. 611–618, “Platform and applications for massive-scale streaming network analyt-
Jun. 1973. ics,” IBM J. Res. Develop., vol. 57, nos. 3–4, pp. 1–11, 2013.
[11] D. Starobinski and D. Tse, “Probabilistic methods for web caching,” [39] X. Cheng, C. Dale, and J. Liu, “Statistics and social network of Youtube
Perform. Eval., vol. 46, nos. 2–3, pp. 125–137, Oct. 2001. videos,” in Proc. 16th Interntional Workshop Qual. Service, Jun. 2008,
[12] E. J. Rosensweig, J. Kurose, and D. Towsley, “Approximate models for pp. 229–238.
general cache networks,” in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–9. [40] D. Kifer, S. Ben-David, and J. Gehrke, “Detecting change in data
[13] R. Fagin, “Asymptotic miss ratios over independent references,” J. Com- streams,” in Proc. VLDB, vol. 4. Toronto, ON, Canada, 2004,
put. Syst. Sci., vol. 14, no. 2, pp. 222–250, 1977. pp. 180–191.
[14] H. Che, Y. Tung, and Z. Wang, “Hierarchical web caching systems:
Modeling, design and experimental results,” IEEE J. Sel.Areas Commun.,
vol. 20, no. 7, pp. 1305–1314, Sep. 2002.
[15] D. S. Berger, P. Gland, S. Singla, and F. Ciucu, “Exact analysis of TTL
cache networks,” Perform. Eval., vol. 79, pp. 2–23, Sep. 2014. Archana Bura is currently pursuing the Ph.D. degree with the Department of
[16] N. Gast and B. Van Houdt, “Asymptotically exact TTL-approximations Electrical and Computer Engineering, Texas A&M University. Her research
of the cache replacement algorithms LRU(m) and h-LRU,” in Proc. 28th interests include reinforcement learning, optimization, and their applications
Int. Teletraffic Congr. (ITC), Sep. 2016, pp. 157–165. to wireless networks.
[17] S. Basu, A. Sundarrajan, J. Ghaderi, S. Shakkottai, and R. Sitaraman,
“Adaptive TTL-based caching for content delivery,” IEEE/ACM Trans.
Netw., vol. 26, no. 3, pp. 1063–1077, Jun. 2018.
[18] J. Li, S. Shakkottai, J. C. S. Lui, and V. Subramanian, “Accurate learning
or fast mixing? Dynamic adaptability of caching algorithms,” IEEE J. Desik Rengarajan is currently pursuing the Ph.D. degree with the Department
Sel. Areas Commun., vol. 36, no. 6, pp. 1314–1330, Jun. 2018. of Electrical and Computer Engineering, Texas A&M University. His research
[19] G. S. Paschos, A. Destounis, L. Vigneri, and G. Iosifidis, “Learning interests include reinforcement learning and game theory, with a focus on their
to cache with no regrets,” in Proc. IEEE INFOCOM Conf. Comput. application to the real world.
Commun., Apr. 2019, pp. 235–243.
[20] R. Bhattacharjee, S. Banerjee, and A. Sinha, “Fundamental limits on
the regret of online network-caching,” Proc. ACM Meas. Anal. Comput.
Syst., vol. 4, no. 2, pp. 1–31, Jun. 2020.
[21] S. Ioannidis and E. Yeh, “Jointly optimal routing and caching for Dileep Kalathil (Senior Member, IEEE) received the Ph.D. degree from the
arbitrary network topologies,” IEEE J. Sel. Areas Commun., vol. 36, University of Southern California (USC) in 2014. From 2014 to 2017, he was
no. 6, pp. 1258–1275, Jun. 2018. a Post-Doctoral Researcher with the Department of Electrical Engineering
[22] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation and Computer Sciences, University of California, Berkeley. He is currently
rules,” Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, Mar. 1985. an Assistant Professor with the Department of Electrical and Computer
[23] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the Engineering, Texas A&M University, College Station, TX, USA. His research
multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, interests include reinforcement learning, with applications in communication
2002. networks, power systems, and intelligent transportation systems. He was
[24] W. R. Thompson, “On the likelihood that one unknown probability a recipient of the NSF CAREER Award in 2021, the NSF CRII Award
exceeds another in view of the evidence of two samples,” Biometrika, in 2019, the Best Ph.D. Dissertation Award from the Department of Electrical
vol. 25, nos. 3–4, pp. 285–294, 1933. Engineering, USC, from 2014 to 2015, and the Best Academic Performance
[25] S. Agrawal and N. Goyal, “Further optimal regret bounds for Thompson Award from the EE Department, IIT Madras, in 2008.
sampling,” in Artificial Intelligence and Statistics. Scottsdale, AZ, USA:
PMLR, 2013, pp. 99–107.
[26] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and non-
stochastic multi-armed bandit problems,” Found. Trends Mach. Learn.,
vol. 5, no. 1, pp. 1–122, 2012. Srinivas Shakkottai (Senior Member, IEEE) received the Ph.D. degree
[27] N. Cesa-Bianchi and G. Lugosi, “Combinatorial bandits,” J. Comput. in electrical and computer engineering from the University of Illinois at
Syst. Sci., vol. 78, no. 5, pp. 1404–1422, Sep. 2012. Urbana–Champaign in 2007. He was a Post-Doctoral Scholar in management
[28] K. Jamieson and R. Nowak, “Best-arm identification algorithms for science and engineering with Stanford University in 2007. He joined Texas
multi-armed bandits in the fixed confidence setting,” in Proc. 48th Annu. A&M University in 2008, where he is currently a Professor of computer
Conf. Inf. Sci. Syst. (CISS), Mar. 2014, pp. 1–6. engineering with the Department of Electrical and Computer Engineering.
[29] D. Shah, T. Choudhury, N. Karamchandani, and A. Gopalan, “Sequential His research interests include caching and content distribution, wireless
mode estimation with Oracle queries,” in Proc. AAAI Conf. Artif. Intell., networks, multi-agent learning and game theory, and network data collection
vol. 34, no. 4, 2020, pp. 5644–5651. and analytics. He was a recipient of the Defense Threat Reduction Agency
[30] V. Martina, M. Garetto, and E. Leonardi, “A unified approach to the Young Investigator Award (2009), the NSF Career Award (2012), and the
performance analysis of caching systems,” in Proc. IEEE INFOCOM Research Awards from Cisco (2008) and Google (2010). He also received
Conf. Comput. Commun., Apr. 2014, pp. 2040–2048. an Outstanding Professor Award (2013), the Select Young Faculty Fellowship
[31] M. Zink, K. Suh, Y. Gu, and J. Kurose, “Watch global, cache local: (2014), and the Engineering Genesis Award (2019) at Texas A&M University.
Youtube network traffic at a campus network: Measurements and impli-
cations,” Proc. SPIE, vol. 6818, Jan. 2008, Art. no. 681805.
[32] N. Megiddo and D. S. Modha, “ARC: A self-tuning, low overhead
replacement cache,” in Proc. FAST, 2003, pp. 115–130.
[33] A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax charac- Jean-Francois Chamberland (Senior Member, IEEE) received the Ph.D.
ter of the sample distribution function and of the classical multinomial degree from the University of Illinois at Urbana–Champaign. He is currently a
estimator,” Ann. Math. Statist., vol. 27, no. 3, pp. 642–669, 1956. Professor with the Department of Electrical and Computer Engineering, Texas
[34] W. Hoeffding, “Probability inequalities for sums of bounded random A&M University. His research interests include computing, information, and
variables,” in The Collected Works Wassily Hoeffding. New York, NY, inference. He was a recipient of the IEEE Young Author Best Paper Award
USA: Springer, 1994, pp. 409–426. from the IEEE Signal Processing Society and the Faculty Early Career Devel-
[35] S. Agrawal and N. Goyal, “Thompson sampling for contextual ban- opment (CAREER) Award from the National Science Foundation. He served
dits with linear payoffs,” in Proc. Int. Conf. Mach. Learn., 2013, as an Associate Editor for the IEEE T RANSACTIONS ON I NFORMATION
pp. 127–135. T HEORY from 2017 to 2020.
Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.