0% found this document useful (0 votes)

6 views14 pages

Cache

The article discusses the performance analysis of caching algorithms through a regret analysis framework, focusing on their ability to learn popularity distributions of requests over time. It introduces two regimes: Full Observation, where all requests are seen, and Partial Observation, where only requests for cached items are visible, demonstrating that the Least Frequently Used (LFU) algorithm achieves optimal regret in both scenarios. The authors propose a new algorithm, LFU-Lite, and explore structured information approaches to improve caching efficiency and adapt to changing popularity distributions.

Uploaded by

Venkats Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

Cache

Uploaded by

Venkats Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE/ACM TRANSACTIONS ON NETWORKING 1

Learning to Cache and Caching to Learn:

Regret Analysis of Caching Algorithms
Archana Bura , Desik Rengarajan, Dileep Kalathil , Senior Member, IEEE,
Srinivas Shakkottai , Senior Member, IEEE, and Jean-Francois Chamberland , Senior Member, IEEE

Abstract— Crucial performance metrics of a caching algorithm

include its ability to quickly and accurately learn a popularity
distribution of requests. However, a majority of work on ana-
lytical performance analysis focuses on hit probability after an
asymptotically large time has elapsed. We consider an online
learning viewpoint, and characterize the “regret” in terms of the
finite time difference between the hits achieved by a candidate
caching algorithm with respect to a genie-aided scheme that
places the most popular items in the cache. We first consider
the Full Observation regime wherein all requests are seen by the
cache. We show that the Least Frequently Used (LFU) algorithm Fig. 1. Request forwarding with (a) full observation, and (b) partial
is able to achieve order optimal regret, which is matched by observation at the cache.
an efficient counting algorithm design that we call LFU-Lite.
We then consider the Partial Observation regime wherein only and the goal of a caching algorithm is to quickly learn which
requests for items currently cached are seen by the cache,
items are most popular and place them in a location that
making it similar to an online learning problem related to
the multi-armed bandit problem. We show how approaching minimizes client latencies. Taking this viewpoint of “caching
this “caching bandit” using traditional approaches yields either equals fast online learning of an unknown probability distribu-
high complexity or regret, but a simple algorithm design that tion,” it is clear that it is not sufficient for a caching algorithm
exploits the structure of the distribution can ensure order optimal to learn a fixed popularity distribution accurately, it must also
regret. We conclude by illustrating our insights using numerical learn it quickly in order to track the changes on popularity
simulations. that might happen frequently.
Index Terms— Caching algorithms, online learning, multi Most work on the performance analysis of caching algo-
armed bandits. rithms has focused on the stationary (long term) hit prob-
abilities under a fixed request distribution. However, such
I. I NTRODUCTION an approach does not account for the fact that request dis-
tributions change with time, and finite time performance is
C ACHING is a fundamental aspect of content distribution.
Since it is often the case that the same content item is
requested by multiple clients over some timescale, replicating
a crucial metric. Suppose that all content items are of the
same size, a cache can hold C content items, and the request
and storing content in near proximity to the requesting clients process consists of independent draws (called the Independent
over that timescale can both reduce latency at the clients, Reference Model (IRM)). Then a genie-aided algorithm that
as well as enable more efficient usage of network and server is aware of the underlying popularity distribution would place
resources. Indeed, this is the motivation for a variety of cache the top C most popular items in the cache to maximize the
eviction policies such as Least Recently Used (LRU), First In hit probability. Yet, any pragmatic caching algorithm needs
First Out (FIFO), RANDOM, CLIMB [1], Least Frequently to learn the popularity distribution as requests arrive, and
Used (LFU) [2] etc., all of which attempt to answer a basic determine what to cache. The regret suffered is the difference
question: suppose that you are aware of the timescale of in the number of cache hits between the two algorithms. How
change of popularity, what are the right content items to store? does the regret scale with the number of requests seen? We
Viewed from this angle, the problem of caching is simply have two main themes in this project that are illustrated in
that there is some underlying unknown popularity distribution the conceptual settings of Figure 1. Here, we have shown two
(that could change with time) over a library of content items, fundamentally different learning architectures using a single
cache, and a much larger “library,” which is a remote (possibly
Manuscript received September 7, 2020; revised April 14, 2021 and July 7, distributed) database that contains all content in the system.
2021; accepted July 18, 2021; approved by IEEE/ACM T RANSACTIONS In both, there is a request process that is exogenous, and
ON N ETWORKING Editor K. Jagannathan. This work was supported in learning involves using this process to determine which items
part by the National Science Foundation under Grant CNS-1955696, Grant
CRII-CPS-1850206, Grant NSF-Intel CNS-1719384, Grant ARO W911NF-
should be cached. Our learning themes are as follows.
19-1-0367, and Grant ARO W911NF-19-2-0243. (Corresponding author:
Archana Bura.)
The authors are with the Department of Electrical and Computer A. Learning to Cache With Full Observations
Engineering, Texas A&M University, College Station, TX 77843 USA (e-mail:
[email protected]; [email protected]; dileep.kalathil@
One possibility is that there is a hierarchy of caches, with
tamu.edu; [email protected]; [email protected]). requests passing from one to the next, and eventually to
Digital Object Identifier 10.1109/TNET.2021.3105880 the library. Applications of this model include hierarchical
1558-2566 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE/ACM TRANSACTIONS ON NETWORKING

systems, such as chip-level L1, L2 caches leading up to the shown that the regret of LRU provably high. Intuitively, LRU
main memory or storage [3], and sequences of content caches does not learn the true popularity of the requests, rather it
beginning close to a user and leading up to a data center [4]. keeps a track of the recently arrived requests. Thus it suffers a
In the representative scenario where there is just a single constant regret at each time step, resulting in Ω(T ) cumulative
caching element in the hierarchy, it would see all requests regret.
as illustrated in Fig. 1 (a). A cache hit would mean that the While LFU is known to attain high hit rates, it suffers from
item can be serviced from the cache, while a miss would mean the fact that the number of counters is the same as the library
that the request would be forwarded to the library. Since the size, since every request must be counted. This is clearly
cache sees all requests, whether they result in hits or misses, prohibitive, and has given rise to approximations such as
the caching algorithm simply needs to quickly and efficiently W-LFU [8], which only keeps counts within a moving win-
determine the top items to cache. dow of requests, and TinyLFU [2] that uses a sketch for
approximate counting. Our next result is to show that these
approximations never entirely eliminate the error in estimating
B. Caching to Learn With Partial Observations the popularity distribution, leading to the worst possible regret
In a cache routing approach, requests for content are routed of Ω(T ).
towards caches that are believed to possess the content (or We then propose a variant of LFU that we term LFU-Lite,
along a default path without any information). Such routing under which we use a moving window of requests to decide
might be done via a name server such as DNS redirect, whether or not a particular item appears to be popular enough
and a cache only sees requests that are directed towards it. to be counted accurately. Thus, we maintain a counter bank,
In turn, this depends on whether the cache has the content in and only count those content items that meet a threshold
question. Applications of this model include caching at cellular frequency in any window of requests thus far. The counter
base stations [5], and content dissemination using a Global bank size grows in a concave manner with time, and we find
Name Service [6] (similar to DNS) such as Mobility-First [7]. its expected size to ensure O(1) regret for a target time T.
The representative single caching element scenario is shown Thus, given a time constant of change in popularity, we can
in Fig. 1 (b), and we refer to it as the partial observation decide on the ideal number of counters.
regime. We next consider the partial observation regime, wherein
In a typical cache network, requests are seen at multi- the cache can only see requests of items currently cached
ple caches, but request information is either not aggregated, in it. We relate this problem to that of the classical multi-
or only partially. Specifically, only request summaries might armed bandit (MAB) under which actions must be taken to
be disseminated over time. The availability of such summaries learn the value of pulling the different arms. Hence, explicit
implies that the content provider often has a good idea of exploration actions are needed in this regime. We first consider
the nature of the arrival distribution, for instance the Zipf an algorithm that builds up the correct posterior probabilities
parameter that it follows and the timescale at which changes given the requests seen thus far, and caches the most frequent
are observed. However, each cache does not see the hits and items in a sample of this posterior distribution. Although its
misses of the other caches in realtime, nor does it know the empirical performance is excellent, maintaining the full pos-
identity of the popular items in advance of the fact. Thus, terior sampling (FPS) quickly becomes prohibitively difficult.
explicit caching actions must be taken in order for a cache We then consider an algorithm that simply conducts a
to learn what is popular. Hence, we have a regime in which marginal posterior sampling (MPS) by updating counts only
structural information about the distribution could be known, for the items that are in the cache. Here, counts of hits and
but individual caches are not up to date on each other’s hits and misses are awarded to the appropriate cached item, but a miss
misses as they occur, and need to learn popularity as the arrival (which manifests itself as no request being made to the cache)
process changes, corresponding to the partial observation case. is not used to update the posterior distribution of items not in
The goal of this work is to conduct a systematic analysis the cache. Clearly, we are not using the information effectively,
of caching from the perspective of regret, with idea that a and this is reflected in the regret scaling as O(log T ). This
low-regret algorithm implies fast and accurate learning in result is similar to earlier work [5].
finite time, and hence should be usable in a setting where We then ask whether we can exploit the structure of the
the popularity distribution changes with time. Can we design problem to do better? In particular, suppose that we know
regret optimal algorithms that apply to each of our learning that requests will follow a certain probability distribution (e.g.,
paradigms? Zipf), although we do not know the ranking of items (i.e.,
1) Main Results: In our analytical model, we consider a we do not know which ones are the most popular). We develop
system in which one request arrives at each discrete time unit, a Structured Information (SI) algorithm that considers this
i.e., the total number of requests is the same as the elapsed information about the distribution to reduce the regret to O(1).
time T . We begin with insight that, under the full observation We also describe a “Lite” version of the SI algorithm similar
regime, the empirical frequency is a sufficient statistic of all to LFU-Lite to reduce the number of counters.
information obtained on the popularity distribution received We first verify our analytical results via numerical simula-
thus far. Here, there is no exploration problem, and the goal tions conducted using an IRM model drawn from Zipf distri-
is to simply exploit the observations received via estimating butions with different parameters of library and cache sizes.
the empirical frequencies. Hence, the appropriate use of this We also find that Lite-type schemes appear to empirically
estimate is to choose the top C most frequent items to cache. perform even better than predicted by the analytical results.
This approach is identical to the LFU, since it evicts the item We then construct versions of the algorithms that are capable
with the least empirical frequency at each time. Our first result of following a changing popularity distribution by simply “for-
is to show that LFU has an O(1) regret, not only with respect getting” counts, which takes the form of periodically halving
to time T , but also with respect to library size. It can also be the counts in the counters. The expectation is that a low

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 3

regret algorithm, augmented with such a forgetting rule with of the traditional Multi-Armed Bandit (MAB) approach that
an appropriately chosen periodicity should be able to track a does not account for problem structure, and hence can only
moving popularity distribution accurately. We conduct trace- attain O(log T ) regret.
based simulations using (non-stationary) data sets obtained With regard to the MAB problem, Lai and Robbins [22]
from IBM and YouTube, and compare hit performance against showed in seminal work the Ω(log T ) regret lower bound
the ubiquitous LRU algorithm. We show that the LFU variants pertaining to any online learning algorithm. An index based
outperform LRU, and that incorporating forgetting enhances algorithm using the upper confidence idea (UCB1 algorithm)
their hit-rates. was proposed in [23], which enabled a simple implementation
Since the amount of change over time in the existing traces while achieving the optimal regret. The posterior sampling
is low, we stress test our algorithms by creating a synthetic approach, first proposed by Thompson in [24], has recently
trace that has higher changes in popularity over time. Again, been shown to attain optimal regret [25]. For a detailed
we show that the versions of our algorithms that incorporate survey, we point to a monograph [26] and a recent book [27].
forgetting are able to track such changing distributions, and Another line of works in bandits is related to the best-arm
are still able to outperform LRU, which builds a case for their identification [28], [29], which can be considered as a pure
eventual adoption. We further incorporate an online change exploration problem. In our manuscript, the full observation
detection mechanism into our algorithms to detect the changes setting does not need to perform exploration. The exploration
in popularity on the fly. We show via a synthetic non-stationary vs. exploitation trade off naturally arises in the partial obser-
trace that the online change detection scheme, when combined vation regime, hence, in that theme, we follow approaches
with the simple forgetting rule makes our algorithms robust to inspired from multi-armed bandit literature. Although there are
the changes in the popularity under non-stationary traffic. similarities, the basic approaches and theoretical guarantees
2) Related Work: Existing analytical studies of caching provided by MAB and the best arm identification problems
algorithms largely follow the IRM model, with the focus are different.
being on closed-form results of the stationary hit probabil- Much work also exists on the empirical performance eval-
ities of LRU, FIFO, RANDOM, and CLIMB [1], [9]–[11]. uation of caching algorithms using traces gathered from
The expressions are often hard to compute for large caches, different applications. While several discover fundamental
and approximations have been proposed for larger cache insights [30]–[32], our goal in this work is on analytical per-
sizes [12]. Of particular interest is the Time-To-Live (TTL) formance guarantees, and we do not provide a comprehensive
approximation [13]–[16] that associates each cached item with review.
a lifetime after which it is evicted. Appropriate choice of
this lifetime enables the accurate approximation of different II. S YSTEM M ODEL
caching schemes [15]. We consider the optimal cache content placement problem
Recent work on performance analysis of caching algorithms in a communication network. The library, which is the set
has focused on the online learning aspect. For instance, [17] of all files, is denoted by L = {1, . . . , L}. We assume for
propose TTL-based schemes to show that a desired hit rate expositional simplicity that all files are of the same size, and
can be achieved under non-stationary arrivals. Other work that the cache has a capacity of C, i.e., it can store C files
such as [18] characterize the mixing times of several simple at a given time. We denote the popularity of the files by the
caching schemes such as LRU, CLIMB, k-LRU etc. with the profile μ = (μ1 , μ2 , . . . , μL ), with i μ1 = 1. Without loss of
goal of identifying their learning errors as a function of time. generality, we assume that μ1 > μ2 > · · · > μL . Let x(t) ∈ L
However, the algorithms studied all have stationary error (they be the file request received at time t. We assume that requests
never learn perfectly) and so regret in our context would are generated independently according to the popularity profile
be Ω(T ). μ, i.e., P(x(t) = i) = μi .
An alternative approach is taken in [19], [20], where the Let C(t) denote the set of files placed in the cache by
request arrival process is taken to be adversarial. These works the caching algorithm at time t. We say that the cache gets
present asymptotic and non-asymptotic regret lower bounds a hit if x(t) ∈ C(t) and a miss if x(t) ∈ / C(t). The
respectively, and show that a coded and an uncoded policy goal of the caching algorithmis to maximize the expected
respectively achieve this bound. As in many algorithm design cumulative hits over time, E[ Tt=1 ½{x(t) ∈ C(t)}], where
and analysis settings, the adversarial model and the stochas- the expectation is over the randomness in the requests and
tic (Bayesian) model produce significantly different results that the ensuing choices on C(t) made by the caching algorithm.
are not directly comparable. For instance, [20] shows that the Clearly, if popularity distribution μ is known, the optimal
LFU algorithm incurs an Ω(T ) regret in the adversarial setting, caching policy is to place the most popular items in the cache
i.e., the bound suggests poor performance. In contrast, in the at all times, i.e., C ∗ (t) = C, where C = {1, 2, . . . , C}.
stochastic arrival setting, we show among other results that the However, in most real world applications, the popularity
LFU algorithm will achieve the best possible regret of O(1) in distribution is unknown to the caching algorithm a priori.
the full observation regime. Our results are supported through So the goal of a caching algorithm is to learn the popularity
empirical trace-based (non-adversarial) simulations. distribution (or part of it) from the sequential observations, and
Information Centric caching has gained much recent inter- to place files in the cache by judiciously using the available
est, and is particularly relevant to edge wireless networks. Joint information at each time in order to maximize the expected
caching and routing is studied in [21] where the objective is to cumulative hits.
show asymptotic accuracy of the placements, rather than finite In the literature on multi-armed bandits, it is common to
time performance that we focus on. Closest to our ideas on characterize the performance of an online learning algorithm
the partial observation model is work such as [5], which draws using the metric of regret, which is defined as the performance
a parallel between bandit algorithms and caching under this loss of the algorithm as compared to the optimal strategy with
setting. However, the algorithms considered are in the manner complete information. Since C ∗ (t) = C, the cumulative regret

4 IEEE/ACM TRANSACTIONS ON NETWORKING

of a caching algorithm after T time steps is defined as where arg maxC indicates the indices of the top C elements of
T the vector μ̂(t). Having established these notions, we present

R(T ) = ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}. (1) below the finite time performance guarantee for the LFU
t=1
algorithm.
Theorem 1: The LFU algorithm has an expected regret of
Let s(t) be the observation available to the caching algo- O(1). More precisely,
rithm at time t and let h(t) = (s(1), . . . , s(t−1)) be the history
of observations until time t. The optimal caching problem is 16 4 C(L − C)
E[R(T )] < min ,
defined as the problem of finding a policy π that maps h(t) to Δ2min Δmin
C(t), i.e., C(t) = πt (h(t)), in order to minimize the expected
cumulative regret, E[R(T )]. where Δmin = μC − μC+1 .
The choice of the caching policy will clearly depend on Remark 1: We note that both terms of the regret upper
the nature of the sequential observations available to it. bound are distribution dependent, i.e., they depend on Δmin .
We consider two different observation structures that are most Roughly, if LC < 1/Δmin , then the second term dominates.
common in communication networks. We will use the following Lemma for proving Theorem 1.
1) Full Observation: In the full observation structure, Lemma 2: For > 0, we have
we assume that the caching algorithm is able to observe 2
P(max |μ̂i (t) − μi | > ) ≤ 2e−t /2
.
the file request at each time, i.e., s(t) = x(t). In the i
setup of a cache and library, this regime corresponds
The lemma is obtained through an application of the
to all requests being sent to the cache, which can then
Dvoretzky-Kiefer-Wolfowitz inequality [33]. We omit the
forward the request to the library in case of a miss.
proof due to page limitation.
2) Partial Observation: In the partial observation structure,
We proceed with the proof of Theorem 1.
the caching algorithm can observe the request only in
Proof: We denote CLFU (t) just as C(t) for notational con-
the case of a hit, i.e., only if the requested item is in
venience. We first argue that if maxi |μ̂i (t) − μi | < Δmin /2,
the cache already. More precisely, we define s(t) =
then C(t) = C. Indeed, if maxi |μ̂i (t) − μi | < Δmin /2, for
x(t)½{x(t) ∈ C(t)} under this observation structure.
any j ∈ C and for any k ∈ L \ C,
In the case of a miss, s(t) = 0. In the setup of a
cache and library, this regime corresponds to the context μ̂j (t) ≥ μj − Δmin /2 ≥ μC − Δmin /2
of information centric caching, wherein requests are
≥ μC+1 + Δmin /2 ≥ μk + Δmin /2 ≥ μ̂k (t)
forwarded to the cache only if the corresponding content
is cached. and hence C(t) = C. The expected regret can then be bounded,
Below, we propose different caching algorithms to address with
the optimal caching problem under these two observation T
structures.
E[R(T )] = E ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}
t=1
III. C ACHING W ITH F ULL O BSERVATION T T

We first consider the full observation structure where ≤E ½{C(t) = C} = P(C(t) = C)
the caching algorithm can observe every file request. Our
t=1 t=1
focus is on a class of algorithms following Least Frequently T

Used (LFU) eviction, since it uses cumulative statistics of
≤ P max |μ̂i (t) − μi | ≥ Δmin /2
all received requests (unlike other popular algorithms such i
t=1
as Least Recently Used (LRU)), and so is likely to have T
low regret. As indicated earlier, this is purely an exploitation 2
∞
2 16
≤ 2e−tΔmin /8 ≤ 2e−tΔmin /8 ≤ . (2)
problem, since every request is seen at the cache, and so all
t=1 t=0 Δ2min
hits and misses are known regardless of the cached items. This
regime can be compared to a multi-armed bandit in which the We can also upper bound E[R(T )] using a different
reward of every arm is revealed irrespective of the arm that is approach, to show the trade off between Δ2min and Δmin , L,
pulled, i.e., no exploration is needed. and C. This approach makes use of Hoeffding’s inequality.

A. LFU Algorithm E[R(T )]

T
At each time t, the LFU algorithm selects the top C
requested files until time t and places them in the cache. = E[ ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}]
t=1
More precisely, the LFU algorithm maintains an empirical T
estimate of the popularity distribution, which we denote by
μ̂(t) = (μ̂1 (t), . . . , μ̂L (t)). It is defined as =E E[½{x(t) ∈ C}|C(t)] − E[½{x(t) ∈ C(t)}|C(t)]
t=1
t ⎡ ⎛ ⎞⎤
1 T
μ̂i (t) = ½{x(τ ) = i}, ∀i ∈ L.
t τ =1 = E⎣ ⎝ μj − μk ⎠⎦ (3)
t=1 j∈C k∈C(t)
The collection of files to be placed in the cache at time ⎡ ⎤
t + 1, CLFU (t + 1), is then equal to T
C L

≤ E⎣ / C(t), k ∈ C(t)}⎦
Δj,k ½{j ∈ (4)
CLFU (t + 1) = arg max (μ̂1 (t), . . . , μ̂L (t))
C t=1 j=1 k=C+1

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 5

⎡ ⎤
T
C L
likely items. From the proof of Theorem 1 (c.f. (3)), we have
≤ E⎣ Δj,k ½(μ̂k (t) > μ̂j (t))⎦ T
t=1 j=1 k=C+1
T
C L E[R(T )] = E ½{x(t) ∈ C} − ½{x(t) ∈ C(t)}

≤E Δj,k (½{μ̂j (t) − μj ≤ −Δj,k /2} t=1
⎛ ⎞
t=1 j=1 k=C+1 T

= ⎝ μj − E [½{x(t) ∈ C(t)}]⎠
+ ½{μ̂k (t) − μk > Δj,k /2}) . (5) t=1 j∈C
⎡ ⎤
T

Using the Hoeffding inequality [34], we obtain = E⎣ μj − μk ⎦
t=1 j∈C\C(t) k∈C(t)\C
2 ⎡ ⎤
P(μ̂j (t) − μj ≤ −Δj,k /2) ≤ e−tΔj,k /2 , T

2 ≥ E⎣ (μC − μk )⎦
P(μ̂k (t) − μk > Δj,k /2) ≤ e−tΔj,k /2 .
t=1 k∈C(t)\C
⎡ ⎤
Now, continuing form (6) and by taking expectation inside T

the summation, we obtain = E⎣ (μC − μk )½{k ∈ C(t)}⎦
t=1 k∈L\C
C
L
T

−tΔ2j,k /2 T

E[R(T )] ≤ Δj,k 2e
j=1 k=C+1 t=1
= (μC − μk )P(k ∈ C(t))
t=1 k∈L\C
C
L
4 4C(L − C) T
≤ ≤ (6)
j=1 k=C+1
Δj,k Δmin ≥ (μC − μC+1 )P(C + 1 ∈ C(t)), (7)
t=1
Combining (2) and (6), we obtain the desired result. where the last inequality follows by focusing on a sub-event.
Given that the probability of item C + 1 in non-zero, we can
establish the desired lower bound using window W [t − w, t),
B. WLFU Algorithm
LFU achieves a regret of O(1), but its implementation is P(C + 1 ∈ C(t)) ≥ P({x(τ ) = C + 1 : τ ∈ [t − w, t)})
expensive in terms of memory requirements. This cost arises = (μC+1 )w .
because LFU maintains a popularity estimate for each item in
the library (μ̂i (t)), and the library size L is extremely large Combining this result with (7), we get expression
for most practical applications. Typically, allocating memory
to maintain the popularity distribution estimate for the whole E[R(T )] ≥ (μC − μC+1 )(μC+1 )w T,
library is impractical.
There are many approaches proposed to address this which has order T . Since the cost per stage is bounded,
issue [2], [8]. However, most approaches rely on heuristics- we obtain the statement of the theorem.
based approximations of the empirical estimate, often with a
tight pre-determined constraint on the memory. This leads to
non-optimal use of the available information, and could result C. LFU-Lite Algorithm
in poor performance of the corresponding algorithms. We now propose a new scheme that we call the
In this article, we consider the Window-LFU (WLFU) LFU-Lite algorithm. Unlike the LFU algorithm, LFU-Lite
algorithm [8] that has been proposed as way to overcome algorithm does not maintain an estimate of the popularity for
the expensive memory requirement of LFU. WLFU employs each item in the library. Instead, it maintains the popularity
a sliding window approach. At each time t, the algorithm estimate only for a subset of the items that it has observed.
keeps track of only the past w file requests. This is equivalent This approach significantly reduces the memory required as
to maintaining a time window from t − w to t, denoted compared to the standard LFU implementation. At the same
by W [t − w, t). Caching decisions are made based on the time, we show that the LFU-Lite achieves an O(1) regret
file requests that appeared within this window. In particular, similar to that of the LFU, and thus has a superior performance
the items to be placed in the cache at time t, CW LF U (t), compared to WLFU which suffers an Ω(T ) regret.
are the top C files with maximum appearances in the We achieve this ‘best of both’ performance by a clever
window W [t − w, t). combination of a window based approach to decide the items
We now show that the expected cumulative regret incurred to maintain an estimate, and by maintaining a separate counter
by WLFU increases linearly in time (Ω(T )), as opposed to the bank to keep track of these estimates. At each time t, LFU-
constant regret (O(1)) of the standard LFU. Since Ω(T ) is the Lite selects the top C items with maximum appearances in
worst possible regret for any learning algorithm, it suggests the window of observation W [t − w, t]. We denote this set of
that in practice there will occasionally be arbitrarily bad files as A(t). Let B(t − 1) be the set of items in the counter
sample paths with many misses. bank at the beginning of t. Then, if any item j ∈ A(t) is not
Theorem 3: Under the WLFU algorithm, E[R(T )] = Ω(T ). present in B(t − 1), it is added to the counter bank, and the
Proof: This result can be established by finding a lower counter bank is updated to B(t). Once an item is placed in
bound on the probability that cache does not match the most the counter bank, it is never removed from the counter bank.

6 IEEE/ACM TRANSACTIONS ON NETWORKING

LFU-Lite maintains an estimate of the popularity of each The first term in (10) can be bounded as
item in the counter bank. The popularity estimate of item T
C L
i ∈ B(t), μ̂i (t), is defined as
E[ Δj,k ½{μ̂j (t) − μj ≤ −Δj,k /2}]
t t=1 j=1 k=C+1
1
μ̂i (t) = ½{x(t) ∈ B(t)} (8) C L
T
(t − ti ) τ =t +1 = E[ E[ Δj,k ½{μ̂j (t) − μj ≤ −Δj,k /2}|tj ]]
i
j=1 k=C+1 t=1
where ti is the time at which the item i has been added to the C L T
counter bank. The item to be placed in the cache at time t,
CLL (t), is then selected as ≤ E[ Δj,k (tj +E[ ½{μ̂j (t)−μj
j=1 k=C+1 t=tj
CLL (t) = arg max(μ̂j (t), j ∈ B(t)) ≤ −Δj,k /2}|tj ])]
C
C L T

Description of the LFU-Lite is also given in Algorithm 1. ≤ E[ Δj,k (tj + P(μ̂j (t)−μj ≤ −Δj,k /2|tj ))]
j=1 k=C+1 t=tj
C
L
T

Algorithm 1 LFU-Lite 2
≤ E[ Δj,k (tj + e−(t−tj )Δj,k /2 ]
for t = 1, . . . , T do
j=1 k=C+1 t=tj
Observe x(t)
C
L

Select A(t), the top C files with maximum appearances in 2
≤ (Δj,k E[tj ] + ). (11)
the window W [(t − w)+ , t) Δj,k
j=1 k=C+1
for Each j ∈ A(t) do
if (j ∈ A(t) is not in B(t − 1)) then Similarly, the second term in (10) can be bounded as
tj ← t T
C L

Add file j into B(t) E[ Δj,k ½{μ̂k (t) − μk > Δj,k /2}]
end if t=1 j=1 k=C+1
end for C L
T
Select the files CLL (t) = arg maxC (μ̂j (t), j ∈ B(t)) and = E[ E[ Δj,k ½{μ̂k (t) − μk > Δj,k /2}|tk ]]
place them in the cache j=1 k=C+1 t=1
end for C L
T

= E[ Δj,k P(μ̂k (t) − μk > Δj,k /2|tk )]
We now present the performance guarantee for the LFU-Lite j=1 k=C+1 t=tk
algorithm. C
L
T
2
Theorem 4: The expected regret under the LFU-Lite algo- ≤ E[ Δj,k e−(t−tk )Δj,k /2 ]
rithm is j=1 k=C+1 t=tk

C(L − C)w 4 C(L − C) C

L

E[R(T )] ≤ + 2
Δmin
, ≤ . (12)
pmin
j=1 k=C+1
Δj,k
w w
where Δmin = μC − μC+1 , pmin = n=μC+1 w+1 n Combining (11) and (12) we obtain
μnC (1 − μC )w−n .
C
L
C
L

Proof: For each item i ∈ L, μ̂i (t) is defined as in (8) 4
for t > ti . Here, we also define μ̂i (t) = 0 for t ≤ ti , before E[R(T )] ≤ Δj,k E[tj ] + (13)
j=1 k=C+1 j=1 k=C+1
Δj,k
item i enters the counter bank. We note that this is only a
proof approach and doesn’t influence the implementation of It only then remains to bound E[tj ] for j ∈ C, which can
the algorithm. Now, from (4) easily be shown to satisfy
T
C L ∞
∞

E[R(T )] ≤ E[ Δj,k ½{j ∈
/ C(t), k ∈ C(t)}] E[tj ] ≤ (1 − pj )t/w ≤ w(1 − pj )k ≤ w/pj . (14)
t=1 j=1 k=C+1 t=1 k=1
T
C L
where pj is the probability that item j is selected in a given
≤ E[ Δj,k ½(μ̂k (t) > μ̂j (t))] (9) window.
t=1 j=1 k=C+1 Combining (15) and (14), we obtain
T
C L
C L C L
wΔj,k 4
≤ E[ Δj,k (½{μ̂j (t)−μj ≤ −Δj,k /2} E[R(T )] ≤ +
t=1 j=1 k=C+1 j=1
p j j=1
Δj,k
k=C+1 k=C+1
+ ½{μ̂k (t) − μk > Δj,k /2})]. (10) C(L − C)w 4C(L − C)
≤ + . (15)
Note that the LFU-Lite algorithm incurs a regret at time pmin Δmin
t if an item j ∈ C is not present in the counter bank B(t).
This is taken into account in the above expression (c.f. (9)) Proposition 5: The growth of the expected size of the
by defining μ̂j (t) = 0 for j ∈
/ B(t). counter bank as a function of time is concave.

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 7

Proof: Let p̄tj be the probability with which file i enters the A. Caching Bandit With Full Posterior Sampling
counter
bank by time t.Note that p̄ti = 1−(1−pi )t/w . Where Posterior sampling based algorithms for MAB [25], [35]
w
pi = n=μC+1 w+1 w n
n μi (1 − μi )
w−n
is the probability that typically use a Beta prior (with Bernoulli likelihood) or
item i enters the counter bank in any given window. Gaussian prior (with Gaussian likelihood) in order to exploit
L L L the conjugate pair property of the prior and likelihood (reward)

E[B(t)] = E[ ½{i∈B(t)} ] = p̄ti = 1−(1 − pi )t/w distributions. Hence, the posterior at any time will have the
same form as the prior distribution, albeit with different para-
i=1 i=1 i=1
(16) meters. This provides a computationally tractable and memory
efficient way to keep track of the posterior distribution evolu-
Observe that 1 − (1 − pi )t/w is concave in t and E[B(t)] tion. However, in the optimal caching problem, the unknown
is a sum of L concave functions, and is hence concave. popularity vector μ has interdependent components through
Remark 2: Intuitively, the counter bank will keep the counts the constraint i μi = 1. Hence, standard prior distributions
only for more popular items. It is also straight forward to like Beta will not be able to capture the full posterior evolution
show that the expected size of the counter bank decreases in the caching problem.
with the window length w. To see this, consider two different We use a Dirichilet prior on the popularity distribution
window length w1 and w2 such that w1 ≥ w2 . Let pi (w) μ = (μ1 , . . . , μL ), parametrized by α = (α1 , . . . , αL ). More
be the probability that item i enters the counter bank in any precisely,
given window when the window of length is w. Then, for L L
i ∈/ C, pi (w1 ) ≤ pi (w2 ). Intuitively, larger window length 1 αi −1 Γ(αi )
f0 (μ; α) = μi , where, B(α) = i=1 L ,
leads to more observations and hence to smaller probability B(α) i=1 Γ( i=1 αi )
of observing item i ∈/ C more than the threshold μC+1 w. Now,
(1−pi (w2 ))t/w2 ≤ (1−pi (w1 ))t/w2 ≤ (1−pi (w1 ))t/w1 . and Γ(·) is the Gamma function.
So, 1 − (1 − pi (w2 ))t/w2 ≥ 1 − (1 − pi (w1 ))t/w1 . Hence, Let ft be the posterior distribution at time t with parameter
the contribution of i ∈/ C to the expected size of the counter α(t). The posterior is updated according to the observed
bank according (16) is smaller for larger window length. information s(t). In the case of a hit, the file request x(t)
The exact dependence of E[B(t)] on w is cumbersome to is observed and s(t) = x(t). It is easy to see that the correct
characterize. We, however, illustrate this through extensive posterior update is α(t) = α(t) + ex(t) , where ex(t) is the unit
simulations in Section IV-C. vector with non-zero element at index x(t).
The posterior update is complex in the case of the cache
IV. C ACHING W ITH PARTIAL O BSERVATION miss. In case of a miss, we code s(t) = 0. Given the
current parameter α(t) = α, we can show that the posterior
We now consider the problem of optimal caching under distribution in case of a miss can be computed as
the partial observation regime. As described earlier, here the
algorithm can observe a file request only if the requested file ft+1 (μ|s(t) = 0) ∝ P(s(t) = 0|μ)ft (μ; α)
is in the cache. Hence, the caching algorithm has to perform 1
active exploration by placing a file in the cache sufficiently = L αj ft (μ; α + ej ).
often to learn its popularity in order to decide if that file i=1 αi j ∈C(t)
/
belongs to the set of the most popular files. This procedure
is in sharp contrast to the full observation structure where the
popularity estimate of each file in the library can be improved Algorithm 2 CB-FPS Algorithm
after each time step due to full visibility of all the requests.
Initialize the prior distribution f0
However, the exploration is costly because the algorithm
incurs regret every time that a sub-optimal file is placed in the for t = 1, . . . , T do
cache for exploration. Hence, the algorithm also has to perform Sample μ̂(t) ∼ ft (·)
an active exploitation, i.e., place the most popular items Select CFPS (t) = arg maxC μ̂(t)
according to the current estimate in the cache. The optimal Receive the observation s(t)
exploration vs exploitation trade-off for minimizing the regret Update the posterior ft+1 (μ) ∝ P(s(t)|μ)ft (μ)
is at the core of most online learning algorithms. The Multi- end for
Armed Bandit (MAB) model is a canonical formalism for this
class of problems. Here, there are multiple arms (actions) Hence, the posterior update in case of a miss is a com-
that yield random rewards independently over time, with bination of (L − C) Dirichlet priors from the previous step.
the (unknown) mean of arm i being μi . The objective is With the first miss, the algorithm needs to store a set of size
to learn the mean reward of each arm by exploration and (L − C) consisting of Dirichlet parameters. With each miss,
maximize cumulative reward by exploitation. parameter sets of size (L − C) need to be stored, one such
We formulate the caching problem as a multi-player multi- set for each of the parameters at the previous step. Thus,
armed bandit problem, in which the content placement in the at the tth miss, the number of parameters that we need to
cache is viewed equivalent to arm pulls. The request for an store are (L − C)t , growing exponentially in t. Hence, as the
item is considered as its reward, which is a {0, 1} random number of misses increases, the memory required to store
variable, sampled from the popularity vector. We call this these parameters will increase exponentially, rendering the full
formulation as “Caching Bandits”. Unlike in the multi-armed posterior update algorithm infeasible from an implementation
bandit problem, the rewards in the caching bandit problem perspective.
are not independent, as the request for one item in the cache We present the CB-FPS in Algorithm 2. At each time t,
indicate that there is no request for the other items. the algorithm takes a sample μ̂(t) according to the current

8 IEEE/ACM TRANSACTIONS ON NETWORKING

posterior ft (·). It places the top C items in the order of O(1) regret incurred by LFU and LFU-Lite. This is due to
decreasing μ̂(t) in the cache. the partial observation structure that limits the rate of learning.
In Section IV-C, we will see that a Monte Carlo version We now propose an algorithm that we call Caching Bandit with
of this algorithm can be implemented for small values of L Structural Information (CB-SI). We show that with a minimal
and C, which seems to achieve an O(1) regret. A rigorous assumption on the availability of the structural information
proof that shows such regret, even in some special cases and about the popularity distribution, CB-SI can achieve an O(1)
by neglecting computational tractability, is an interesting open regret even in the partial observation regime.
problem. We assume that the algorithm knows the value of μC and
Δmin , the popularity value of the Cth most popular item and
B. Caching Bandit With Marginal Posterior Sampling the optimality gap. Note that we do not assume knowledge
We now propose an algorithm that only performs a marginal of the identity of the Cth most popular file. We note that
posterior update. Instead of maintaining a Dirchlet prior for our proof approach follows the techniques developed in [37],
the popularity vector μ, we use a Beta prior for the popularity which can be considered as a special case with C = 1. CB-SI
of each individual item μi . The CB-MPS is described in algorithm is given in Algorithm 4.
Algorithm 3. The algorithm maintains an empirical estimate for each
item, and places into cache the items whose empirical estimate
crosses a certain threshold decided by μC , and Δmin . If there
Algorithm 3 CB-MPS Algorithm are not enough items that cross this threshold, the algorithm
Initialize αi (0) = 1, βi (0) = 1, ∀i ∈ L. samples the rest of the items without replacement, according to
for t = 1, . . . , T do the probability inversely proportional to the square of items’
Generate samples μ̂i (t) ∼ Beta(αi (t), βi (t)) gap from the C th most popular item. The intuition is that
CMPS (t) ← arg maxC μ̂(t) with the knowledge of the threshold, one can reduce the
amount of exploration needed, and hence reduce the regret.
if x(t) ∈ CMPS (t) then
To see this, once an item’s empirical estimate is more than
αx(t) (t + 1) ← αx(t) (t) + 1 μC − Δmin
βi (t + 1) ← βi (t) + 1, ∀i ∈ CMPS (t), i = x(t) 2 , we do not need to explore other items to resolve
the uncertainty about the item in question, as this item belongs
end if to the C most popular items. We still need to explore to
end for observe the other popular items. In this way, the algorithm
reduces the exploration of less popular items, while exploiting
The CB-MPS algorithm generates samples for each item the information accrued so far about the popular items, by the
from an independent beta distribution, and places the C items knowledge of the threshold. As shown in theorem below, this
with the largest samples into the cache. The algorithm then directed exploration suffers only a constant regret.
updates independent beta posteriors for each item in the Theorem 7: The expected cumulative regret of CB-SI Algo-
library. However, it updates the posterior for only those items rithm is,
currently in the cache. The posteriors for all the other items
remain the same.
We now provide a performance guarantee for the CB-MPS 2 4 32 Δ2
algorithm. E[R(T )] ≤ C ( + [4 + exp(− )]),
Δ2 Δ2 Δ2 8
Theorem 6: Under marginal posterior sampling algorithm, j∈L\C
E[R(T )] = O((L − C)C log T ).
It is clear that CB-MPS algorithm disregards the inherent
structure in the Caching Bandit formulation, by choosing not where Δ = μC − μC+1 .
to update the non-cache items. Note that, in the caching formu- Proof: We denote Δj = μC − μj . In this proof, we will
lation, the popularity vector has inter dependent components show that the expected regret of CB-SI algorithm is bounded
(it is a probability distribution), a cache miss means that above by a constant. From the proof of Theorem 1 (c.f. (6)),
one of the non-cache items is surely requested. Hence, the we get that the expected regret is bounded as below.
CB-MPS algorithm views the caching bandit problem almost
like a multi player multi armed bandit problem, regardless
of the additional structure imposed by the caching ban- E[R(T )]
T C L

dit. But we emphasize that this loss of information in the
CB-MPS algorithm is unavoidable due to the partial obser- ≤ E[ Δj,k ½{i ∈
/ C(t), j ∈ C(t)}]
vation structure limiting the observations to the cached items. t=1 i=1 j=C+1
Note that the beta posterior in CB-MPS will be corrupted if T L
L
T

we update the missed requests for the non cache items in any ≤ C E[ ½{j ∈ C(t)}] = C P(j ∈ C(t))
heuristic way. t=1 j=C+1 j=C+1 t=1
We omit the proof of this theorem because the analysis is L
T

similar to that of multi-payer multi-armed bandit algorithm. =C (P(μ̂j (nj (t)) > μC − Δj /2, j ∈ C(t))
In particular, the posterior sampling method proposed in [36] j=C+1 t=1
can be used with small modifications to show the above result.
+P(μ̂j (nj (t)) ≤ μC − Δj /2, j ∈ C(t))). (17)
C. Caching Bandit With Structural Information
Even though the CB-MPS algorithm is easy to implement, We address each term in the above summation separately.
it suffers an O(log T ) regret, which is much worse than the First, observe that for any j ∈ {C + 1 . . . L}, the following

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 9

Δ
Algorithm 4 CB-SI Algorithm = pj,t P μ̂j (nj (t)) ≤ μC − , Zt = 2
2
Initialize αi (0) = 0, βi (0) = 0, μ̂i (0) = 1/L, ∀i ∈ L.
Δ
Initialize ni (t) = 0, ∀i ∈ L = E pj,t ½{μ̂j (nj (t)) ≤ μC − , Zt = 2}
for t = 1, . . . , T do 2

Compute the set A(t) = {i ∈ L : μ̂i (ni (t)) ≥ μC − Δ Δ
2}
pj,t
= E pi,t ½{μ̂j (nj (t)) ≤ μC − , Zt = 2} ,
if |A(t)| ≥ C then pi,t 2
CSI (t) = arg maxC μ̂j (nj (t), j ∈ A(t))
Zt ← 1 for any i ∈ C. Note that, in the equality(c), we used the
else definition of pj,t . Now, substituting the value for the sampling
For each i ∈ L \ A(t), compute probability pi,t , we obtain,

pi (t) = c/(μC − μ̂i (ni (t)))2 |μC − μ̂i (ni (t))|2 Δ
≤ E pi,t 1{μ̂j (nj (t)) ≤ μC − , Zt = 2}
c = i∈L\C(t) 1/(μC − μ̂i (ni (t)))2 (Δ 2 2
2)
Sample C − |A(t)| elements from the set L \ A(t) 4
≤ E |μC − μ̂i (ni (t))|2 pi,t ½{Zt = 2}
according to the probability pi (t). Denote these elements Δ2
as B(t) 4
≤ E |μC − μ̂i (ni (t))|2
CSI (t) = A(t) ∪ B(t) Δ2
Zt ← 2 Δ
P i ∈ C(t)|μ̂i (ni (t)) ≤ μC − , Zt = 2
end if 2
Place the files CSI (t) in the cache 4
= E |μC − μ̂i (ni (t))|2
if x(t) ∈ C(t) then Δ2
αx(t) (t + 1) ← αx(t) (t) + 1 Δ
E ½{i ∈ C(t)}|μ̂i (ni (t)) < μC − , Zt = 2
βi (t + 1) ← βi (t) + 1, ∀i ∈ C(t), i = x(t) 2
end if 4
ni (t + 1) = αi (t + 1) + βi (t + 1) = E |μC − μ̂i (ni (t))|2
Δ2
μ̂i (ni (t + 1)) = αi (t + 1)/ni (t + 1) Δ
end for E ½{i ∈ C(t), μ̂i (ni (t)) < μC − }|
2

Δ
{μ̂i (ni (t)) < μC − , Zt = 2}
inequality holds. 2
4 2
T = E |μC − μ̂i (ni (t))|
Δ2
P(μ̂j (nj (t)) > μC − Δj /2, j ∈ C(t)) Δ
t=1 ½{i ∈ C(t), μ̂i (ni (t)) < μC − } . (19)
2
T

≤ P(μ̂j (nj (t)) > μj + Δj /2, j ∈ C(t)) Here, the inequalities follow from the properties of condi-
t=1 tional expectation. Now, we obtain a bound for the second
T T

(a) 2 2 term in (17), using (19), as below.
≤ P(μ̂j (t) > μj + Δj /2) ≤ e−Δj t/2 ≤ ,
t=1 t=1
Δ2j
T
Δ
(18) E |μC − μ̂i (ni (t))|2 1{μ̂i (ni (t)) < μC − , i ∈ C(t)}
t=1
2
where the inequality (a) follows from Hoeffding’s inequality. T
For bounding the second term in (17), we use the policy Δ
≤ E |μC − μ̂i (t)|2 1{|μ̂i (t) − μC | > }
definition. Since Δj ≥ Δ, the first inequality follows trivially. 2
t=1
The equality(b) follows from the fact that when the mean T
estimate of the j th item is smaller than μC − Δ ∞
Δ
2 , the only = P(|μC − μ̂i (t)|2 1{|μ̂i (t) − μC | > } ≥ x)dx
means by which it can enter the cache is through exploration
t=1 0 2
part of the algorithm, which is denoted by Zt = 2.
T ∞
Δ2
Δj = P(|μC − μ̂i (t)|2 1{|μ̂i (t) − μC |2 > } ≥ x)dx
P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t)) 0 4
2 t=1
Δ2
Δ T

≤ P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t)) =
4
P |μC − μ̂i (t)|2
2
Δ t=1 0
(b)
= P(μ̂j (nj (t)) ≤ μC − , j ∈ C(t), Zt = 2)
2 Δ2 2
1{|μ̂i (t)−μC | > } ≥ x dx
(c) Δ 4
= P(j ∈ C(t)|μ̂j (nj (t)) ≤ μC − , Zt = 2)
2 ∞
Δ2
Δ 2 2
+ P(|μC − μ̂i (t)| 1{|μ̂i (t)−μC | > } ≥ x)dx
P(μ̂j (nj (t)) ≤ μC − , Zt = 2) Δ2 4
2 4

10 IEEE/ACM TRANSACTIONS ON NETWORKING

T
Δ2
4 Δ2
= P(|μC − μ̂i (t)|2 ≥ x, |μ̂i (t) − μC |2 > )dx
t=1 0 4

∞
Δ2
+ P(|μC − μ̂i (t)|2 1{|μ̂i (t)−μC |2 > } ≥ x)dx
Δ2
4
4
T
Δ2
4 Δ2
2
= P |μ̂i (t) − μC | > dx
t=1 0 4
∞
2 2 Δ2
+ 2 P |μC − μ̂i (t)| 1{|μ̂i (t)−μC | > } ≥ x dx
Δ
4
4 Fig. 2. Regret of LFU, WLFU, LFU-Lite.
T 2
Δ Δ2
= P |μ̂i (t) − μC |2 > +
t=1
4 4
∞ 2

Δ
P |μC − μ̂i (t)|2 1{|μ̂i (t)−μC |2 > } ≥ x dx
Δ2
4
4
T

Δ2 − tΔ2 ∞
2
= 2 e 8 + 2 Pr{|μC − μ̂i (t)| ≥ x}dx
t=1
4 Δ
4

32 Δ2
≤ 4+ exp(− ). (20)
Δ2 8
Fig. 3. Growth of counters for LFU, LFU-Lite.
Combining equations (17),(19),(18),(20), we observe:
2 4 32 Δ2

E[R(T )] ≤ C + 2 [4 + 2 exp(− )]
Δ2 Δ Δ 8
j∈L\C

Remark 8: We introduce another version of CB-SI algo-
rithm, which is similar in spirit to LFULite. Following a
similar rule to LFULite, we maintain a window of the W
past observations, and at each time, the C most frequently
requested items in the window are added to the counter bank,
if those items are not already present in it. The mean estimates
of CB-SI are calculated only for items in the counter bank. Fig. 4. Regret of LFU-Lite for varying W.
We call this algorithm as CB-SILite. In Section V-B, we will
observe that CB-SILite drastically reduces the number of widely deployed and implicitly has a finite memory (i.e.,
counters needed to give a similar hit performance to CB-SI. it automatically “forgets”). We also further explore the reaction
We now compare the performance of CB-MPS and of our approaches to non-stationary requests by creating a
CB-SI via a lower bound argument. The known lower bound synthetic trace that exhibits changes at a faster timescale than
on the regret of a classical multi-player multi-armed ban- the data traces. We also test an online change detection mech-
dit is Ω(log T ), and the Thompson sampling algorithm is anism, along with the forgetting rule, under non-stationary
known to achieve this. As discussed previously, the CB-MPS request arrival process.
algorithm is equivalent to the Thompson sampling algorithm
for the multi-player multi-armed bandit formulation. Clearly,
the multi-player multi-armed bandit is at least as hard as the A. IRM Simulations
classical multi-armed bandit problem and will have a regret
Ω(log T ). Thus, we argue that the performance of CB-MPS is 1) Full Observation: We first conduct simulations for an
worse than CB-SI, which achieves O(1) regret. IRM request process following a Zipf distribution with para-
meter β under the full observation setting. Figure 2 com-
pares the regret suffered by LFU, WLFU and LFU-Lite for
V. S IMULATIONS C = 10, L = 1000, W = C 2 log L [8] and β = 1.
In this section, we start by conducting simulations with As expected, the regret suffered by the WLFU algorithm grows
requests generated under the IRM model to verify our insights linearly with time, while LFU and LFU-Lite suffer a constant
on regret obtained in the earlier sections. In each instance, regret. Figure 3 shows the growth of the number counters used
we present results averaged over ten runs of the algorithm to keep an estimate of files. The merits of LFU-Lite are clearly
under test. We then use two data traces to compare the per- seen here, as it uses approximately 35 counters to achieve a
formance of our proposed algorithms when exposed to a non- constant regret, while LF U uses 1000.
stationary arrival process. Since these requests change with The growth of counters and the regret suffered by LFU-Lite
time, we modify the algorithms to “forget” counts, by halving depends on W , the window of observation. In Figures 4 and 5,
the counts at a fixed periodicity. In the full observation regime, we compare the growth of regret and counters with W for
we also compare the performance against LRU, which is L = 1000, C = 10, β = 1. We see that the number of counters

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 11

Fig. 8. Cumulative regret performance comparison for L = 100,

C = 1, β = 2.
Fig. 5. Growth of counters for varying W.

Fig. 9. Regret of CB-SI, CB-SILite, CB-MPS for C = 10, β = 1.

Fig. 6. Regret of LFU-Lite for varying L.

Fig. 10. Growth of counters for varying L.

Fig. 7. Growth of counters for varying L. B. Trace-Based Simulations

We next conduct trace-based simulations using real world
data. The description of the traces that we use in this work is
is essentially unchanged for a wide range of W, indicating a given below.
robustness to windowing as long as it is sufficiently large. 1) IBM Trace: This trace is obtained from [38]. It contains
The key advantage of the LFU-Lite algorithm is that it a total of 1 million requests to an IBM web server for
suffers a constant regret, while keeping track of fewer items a library of size 43857.
even for large library sizes. This is clearly seen in Figure 6 2) YouTube Trace: This trace is obtained from [39].
and 7. LFU-Lite only keeps track of approximately 45 items It contains information about the requests made for
while the LFU algorithm keeps track of almost all items. 161085 newly created YouTube videos each week over
2) Partial Observation: Figure 8 shows the regret perfor- 20 weeks. From the data, we compute the popularity
mance for CB-SI, CB-MPS and CB-FPS algorithms. We see distribution of the videos for each week, and obtain
that both FPS and SI versions have constant regret, whereas 50000 samples from each week’s distribution. IN this
the MPS approach has increasing regret consistent with our manner, we create an access trace in which the request
analysis. While we are forced to keep L and C low in Figure 8 distribution changes over each set of 50000 requests.
due to the complexity of the FPS approach, Figure 9 shows the We run this trace for 1 million requests.
cumulative regret performance for L = 1000, 10000, for CB- 3) Synthetic Trace for Changing Popularity Distribution:
MPS, CB-SI and CB-SILite algorithms. As expected, the SI We observe that the content popularities in the real
approach has constant regret. CB-SILite also suffers constant world traces change quite slowly. Our goal is also to
regret, while being a little worse compared to CB-SI. CB-MPS understand the impact of non-stationarity in the request
algorithm has logarithmic regret. arrival process on the hit rate performance of the pro-
Finally, Figure 10 shows the number of counters used by posed algorithms. To obtain a reasonable amount of non-
CB-SI and CB-SILite algorithms for L = 1000, 5000, 10000, stationarity in the popularity of items, we generate a
with C = 10, β = 1. The number of counters used by synthetic access trace that changes the popularities peri-
CB-SILite is very less even for large library sizes, compared odically. To create this trace, we use a Zipf distribution
to CB-SI. with parameter 1 to sample 1 million requests in the

12 IEEE/ACM TRANSACTIONS ON NETWORKING

Fig. 11. Hit rates for full observation for IBM trace. Fig. 14. Hit rates for partial observation, IBM trace.

Fig. 12. Hit rates for full observation for YouTube trace. Fig. 15. Hit rates for partial observation, YouTube trace.

Fig. 16. Hit rates for partial observation, change trace.

Fig. 13. Hit rates for full observation for change trace.
TABLE II
TABLE I C OUNTER BANK S IZE FOR CB-SIL ITE , PARTIAL O BSERVATION
C OUNTER BANK S IZE FOR LFUL ITE , F ULL O BSERVATION

performance gain in LFUCHANGE and LFULiteCHANGE is

following manner. For every 100000 requests, we swap due to the slowly varying popularities in the YouTube trace.
the probabilities of top 10000 items in the access distri- Next, we show the performance of all the five algorithms
bution cyclically, in steps of 500. This approach results on the synthetic change trace (Figure 13). We observe that
in considerable change in the distribution of the request LRU dominates LFU and LFULite for small cache sizes,
arrivals for the top 10000 items in the library. while LFU and LFULite outperforms LRU as the cache size
1) Full Observation: We compare the hit rates of LRU, grows. We also observe that the heuristic versions of LFU and
LFU, and LFULite algorithms on the three traces. The size LFULite outperforms LRU for all cache sizes.
of the window for LFULite is chosen O(C log L), following 2) Partial Observation: Figure 14 shows the hit rates for
the suggestions in [8]. Figure 11 shows the hit rates of LRU, CB-SI,CB-SILite, and CB-MPS algorithms for the IBM trace.
LFU and LFULite on the IBM trace. We observe that LFU and We observe that the CB-SI algorithm clearly outperforms CB-
LFULite outperform LRU. Moreover, LFULite gives the same SILite and CB-MPS algorithms. Figure 15 shows the hit rate
performance as LFU, while using only a fraction of counters. performance of these algorithms for the YouTube trace. Even
For the YouTube trace (Figure 12), in addition to the here, the performances of CB-SI and CB-SILite are superior
three algorithms, we implement heuristic versions of LFU to the CB-MPS algorithm.
and LFULite that account for the change in distribution. For the synthetic change trace, we also implement change
We halve the counts of LFU and LFULite every 50000 versions of all the three algorithms by halving the counts peri-
requests. We observe that LFU and LFULite outperform odically every 50000 requests. We notice that CB-SICHANGE
LRU, while the change versions do slightly better. The small outperforms all the other algorithms. We observe that CB-

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 13

popularities at the change points, i.e., at each change point,

we cyclically left-shift the components of the popularity vector
by 500 positions. We follow the same procedure at all the 10
randomly generated change points in the trace. In this way,
we ensure that the resulting trace contains requests with
changing distribution at random points in the trace, modeling
a real world non-stationary behavior.
Figure 17 illustrates the performance of our change point
detection scheme. We note that the scheme indeed is able
to detect changes accurately. Figure 18 illustrates the perfor-
Fig. 17. Change point detection.
mance of our algorithms with the change detection mechanism
in the full information setting. We observe that LFU and LFU-
Lite (with the heuristic) outperform all the other algorithms
in all cases. Similarly, Figure 19 illustrates the performance
of our algorithms for partial information setting, where we
observe that CB-SI with the embedded CHANGE heuristic
performs the best.

VI. C ONCLUSION
Fig. 18. Hit rates for full observation model.
We considered the question of caching algorithm design and
analysis from the perspective of online learning. We focused
on algorithms that estimate popularity by maintaining counts
of requests seen, in both the full and partial observation
regimes. Our main findings were in the context of full
observation, it is possible to follow this approach and obtain
O(1) regret using the simple LFU-Lite approach that only
needs a small number of counters. In the context of partial
observations, our finding using the CB-SI approach was that
structure greatly enhances the learning ability of the caching
algorithm, and is able to make up for incomplete observa-
Fig. 19. Hit rates for partial observation model. tions to yield O(1) regret. We verified these insights using
both simulations and data traces. In particular, we showed
SILite uses small counter bank for all the traces as shown in that even if the request distribution changes with time, our
see Table II. approach (enhanced with a simple “forgetting” rule) is able to
outperform established algorithms such as LRU. We have also
augmented our algorithms with an online change detection
C. Online Change Detection for Non-Stationary Requests mechanism, independent to the proposed algorithms. This
In this section, we use an online change detection algorithm approach enhanced our algorithms to detect changes in the
to find if there is any change in the request arrival distribution. distributions on the fly. When we enabled the algorithms with
We augment our caching algorithms with this online change this approach along with a simple forgetting rule, we were able
detection mechanism to adapt it to the scenarios where the to achieve a good empirical performance under non-stationary
requests are non-stationary. We devise this mechanism in traffic.
such a way that it works independent of the specific caching
algorithm we use. R EFERENCES
The detection mechanism is based on a two-window para-
[1] E. G. Coffman and P. J. Denning, Operating Systems Theory. Englewood
digm proposed in [40]. It maintains a reference window, and Cliffs, NJ, USA: Prentice-Hall, 1973.
a current window. The current window slides forward with [2] G. Einziger, R. Friedman, and B. Manes, “TinyLFU: A highly efficient
each incoming data point, and the reference window is updated cache admission policy,” ACM Trans. Storage, vol. 13, no. 4, p. 35,
whenever a change is detected. Our scheme compares requests 2017.
in the reference window to the requests in the current window. [3] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quanti-
tative Approach. Amsterdam, The Netherlands: Elsevier, 2011.
We measure the total variation distance between the empirical [4] A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and
distribution of requests in the reference window and of that in K. J. Worrell, “A hierarchical internet object cache,” in Proc. USENIX
the current window. If this distance is greater than a threshold, Annu. Tech. Conf., 1996, pp. 153–164.
a change detection is announced. When a change is detected, [5] P. Blasco and D. Gunduz, “Learning-based optimization of cache content
in a small cell base station,” in Proc. IEEE Int. Conf. Commun. (ICC),
it signals the caching algorithm, which employs a heuristic to Jun. 2014, pp. 1897–1903.
reduce its counters by half. This reduction ensures that the [6] A. Sharma, X. Tie, H. Uppal, A. Venkataramani, D. Westbrook, and
algorithms reflect the changes in the request distribution. A. Yadav, “A global name service for a highly mobile internetwork,”
To evaluate the performance of our approach, as in the ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 247–258,
previous subsections, we first create a synthetic trace of 1 mil- 2015.
[7] A. Venkataramani, J. Kurose, D. Raychaudhuri, K. Nagaraja, M. Mao,
lion requests with a library size of 50000. These requests are and S. Banerjee, “MobilityFirst: A mobility-centric and trustworthy
generated by sampling a Zipf distribution, with parameter 1. internet architecture,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 3,
We then induce non-stationarity in this trace by cycling the pp. 74–80, Jul. 2014.

14 IEEE/ACM TRANSACTIONS ON NETWORKING

[8] G. Karakostas and D. N. Serpanos, “Exploitation of different types of [36] J. Komiyama, J. Honda, and H. Nakagawa, “Optimal regret analy-
locality for web caches,” in Proc. 7th Int. Symp. Comput. Commun. sis of Thompson sampling in stochastic multi-armed bandit problem
(ISCC), Jul. 2002, pp. 207–212. with multiple plays,” 2015, arXiv:1506.00779. [Online]. Available:
[9] W. F. King-III, “Analysis of demanding paging algorithms,” in Proc. http://arxiv.org/abs/1506.00779
IFIP Congr. Amsterdam, The Netherlands: North-Holland, 1971, [37] S. Bubeck, V. Perchet, and P. Rigollet, “Bounded regret in stochastic
pp. 485–490. multi-armed bandits,” in Proc. Conf. Learn. Theory, 2013, pp. 122–134.
[10] E. Gelenbe, “A unified approach to the evaluation of a class of replace- [38] P. Zerfos, M. Srivatsa, H. Yu, D. Dennerline, H. Franke, and D. Agrawal,
ment algorithms,” IEEE Trans. Comput., vol. C-22, no. 6, pp. 611–618, “Platform and applications for massive-scale streaming network analyt-
Jun. 1973. ics,” IBM J. Res. Develop., vol. 57, nos. 3–4, pp. 1–11, 2013.
[11] D. Starobinski and D. Tse, “Probabilistic methods for web caching,” [39] X. Cheng, C. Dale, and J. Liu, “Statistics and social network of Youtube
Perform. Eval., vol. 46, nos. 2–3, pp. 125–137, Oct. 2001. videos,” in Proc. 16th Interntional Workshop Qual. Service, Jun. 2008,
[12] E. J. Rosensweig, J. Kurose, and D. Towsley, “Approximate models for pp. 229–238.
general cache networks,” in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–9. [40] D. Kifer, S. Ben-David, and J. Gehrke, “Detecting change in data
[13] R. Fagin, “Asymptotic miss ratios over independent references,” J. Com- streams,” in Proc. VLDB, vol. 4. Toronto, ON, Canada, 2004,
put. Syst. Sci., vol. 14, no. 2, pp. 222–250, 1977. pp. 180–191.
[14] H. Che, Y. Tung, and Z. Wang, “Hierarchical web caching systems:
Modeling, design and experimental results,” IEEE J. Sel.Areas Commun.,
vol. 20, no. 7, pp. 1305–1314, Sep. 2002.
[15] D. S. Berger, P. Gland, S. Singla, and F. Ciucu, “Exact analysis of TTL
cache networks,” Perform. Eval., vol. 79, pp. 2–23, Sep. 2014. Archana Bura is currently pursuing the Ph.D. degree with the Department of
[16] N. Gast and B. Van Houdt, “Asymptotically exact TTL-approximations Electrical and Computer Engineering, Texas A&M University. Her research
of the cache replacement algorithms LRU(m) and h-LRU,” in Proc. 28th interests include reinforcement learning, optimization, and their applications
Int. Teletraffic Congr. (ITC), Sep. 2016, pp. 157–165. to wireless networks.
[17] S. Basu, A. Sundarrajan, J. Ghaderi, S. Shakkottai, and R. Sitaraman,
“Adaptive TTL-based caching for content delivery,” IEEE/ACM Trans.
Netw., vol. 26, no. 3, pp. 1063–1077, Jun. 2018.
[18] J. Li, S. Shakkottai, J. C. S. Lui, and V. Subramanian, “Accurate learning
or fast mixing? Dynamic adaptability of caching algorithms,” IEEE J. Desik Rengarajan is currently pursuing the Ph.D. degree with the Department
Sel. Areas Commun., vol. 36, no. 6, pp. 1314–1330, Jun. 2018. of Electrical and Computer Engineering, Texas A&M University. His research
[19] G. S. Paschos, A. Destounis, L. Vigneri, and G. Iosifidis, “Learning interests include reinforcement learning and game theory, with a focus on their
to cache with no regrets,” in Proc. IEEE INFOCOM Conf. Comput. application to the real world.
Commun., Apr. 2019, pp. 235–243.
[20] R. Bhattacharjee, S. Banerjee, and A. Sinha, “Fundamental limits on
the regret of online network-caching,” Proc. ACM Meas. Anal. Comput.
Syst., vol. 4, no. 2, pp. 1–31, Jun. 2020.
[21] S. Ioannidis and E. Yeh, “Jointly optimal routing and caching for Dileep Kalathil (Senior Member, IEEE) received the Ph.D. degree from the
arbitrary network topologies,” IEEE J. Sel. Areas Commun., vol. 36, University of Southern California (USC) in 2014. From 2014 to 2017, he was
no. 6, pp. 1258–1275, Jun. 2018. a Post-Doctoral Researcher with the Department of Electrical Engineering
[22] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation and Computer Sciences, University of California, Berkeley. He is currently
rules,” Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, Mar. 1985. an Assistant Professor with the Department of Electrical and Computer
[23] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the Engineering, Texas A&M University, College Station, TX, USA. His research
multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, interests include reinforcement learning, with applications in communication
2002. networks, power systems, and intelligent transportation systems. He was
[24] W. R. Thompson, “On the likelihood that one unknown probability a recipient of the NSF CAREER Award in 2021, the NSF CRII Award
exceeds another in view of the evidence of two samples,” Biometrika, in 2019, the Best Ph.D. Dissertation Award from the Department of Electrical
vol. 25, nos. 3–4, pp. 285–294, 1933. Engineering, USC, from 2014 to 2015, and the Best Academic Performance
[25] S. Agrawal and N. Goyal, “Further optimal regret bounds for Thompson Award from the EE Department, IIT Madras, in 2008.
sampling,” in Artificial Intelligence and Statistics. Scottsdale, AZ, USA:
PMLR, 2013, pp. 99–107.
[26] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and non-
stochastic multi-armed bandit problems,” Found. Trends Mach. Learn.,
vol. 5, no. 1, pp. 1–122, 2012. Srinivas Shakkottai (Senior Member, IEEE) received the Ph.D. degree
[27] N. Cesa-Bianchi and G. Lugosi, “Combinatorial bandits,” J. Comput. in electrical and computer engineering from the University of Illinois at
Syst. Sci., vol. 78, no. 5, pp. 1404–1422, Sep. 2012. Urbana–Champaign in 2007. He was a Post-Doctoral Scholar in management
[28] K. Jamieson and R. Nowak, “Best-arm identification algorithms for science and engineering with Stanford University in 2007. He joined Texas
multi-armed bandits in the fixed confidence setting,” in Proc. 48th Annu. A&M University in 2008, where he is currently a Professor of computer
Conf. Inf. Sci. Syst. (CISS), Mar. 2014, pp. 1–6. engineering with the Department of Electrical and Computer Engineering.
[29] D. Shah, T. Choudhury, N. Karamchandani, and A. Gopalan, “Sequential His research interests include caching and content distribution, wireless
mode estimation with Oracle queries,” in Proc. AAAI Conf. Artif. Intell., networks, multi-agent learning and game theory, and network data collection
vol. 34, no. 4, 2020, pp. 5644–5651. and analytics. He was a recipient of the Defense Threat Reduction Agency
[30] V. Martina, M. Garetto, and E. Leonardi, “A unified approach to the Young Investigator Award (2009), the NSF Career Award (2012), and the
performance analysis of caching systems,” in Proc. IEEE INFOCOM Research Awards from Cisco (2008) and Google (2010). He also received
Conf. Comput. Commun., Apr. 2014, pp. 2040–2048. an Outstanding Professor Award (2013), the Select Young Faculty Fellowship
[31] M. Zink, K. Suh, Y. Gu, and J. Kurose, “Watch global, cache local: (2014), and the Engineering Genesis Award (2019) at Texas A&M University.
Youtube network traffic at a campus network: Measurements and impli-
cations,” Proc. SPIE, vol. 6818, Jan. 2008, Art. no. 681805.
[32] N. Megiddo and D. S. Modha, “ARC: A self-tuning, low overhead
replacement cache,” in Proc. FAST, 2003, pp. 115–130.
[33] A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax charac- Jean-Francois Chamberland (Senior Member, IEEE) received the Ph.D.
ter of the sample distribution function and of the classical multinomial degree from the University of Illinois at Urbana–Champaign. He is currently a
estimator,” Ann. Math. Statist., vol. 27, no. 3, pp. 642–669, 1956. Professor with the Department of Electrical and Computer Engineering, Texas
[34] W. Hoeffding, “Probability inequalities for sums of bounded random A&M University. His research interests include computing, information, and
variables,” in The Collected Works Wassily Hoeffding. New York, NY, inference. He was a recipient of the IEEE Young Author Best Paper Award
USA: Springer, 1994, pp. 409–426. from the IEEE Signal Processing Society and the Faculty Early Career Devel-
[35] S. Agrawal and N. Goyal, “Thompson sampling for contextual ban- opment (CAREER) Award from the National Science Foundation. He served
dits with linear payoffs,” in Proc. Int. Conf. Mach. Learn., 2013, as an Associate Editor for the IEEE T RANSACTIONS ON I NFORMATION
pp. 127–135. T HEORY from 2017 to 2020.

Authorized licensed use limited to: Texas A M University. Downloaded on September 20,2021 at 22:12:13 UTC from IEEE Xplore. Restrictions apply.

Microsoft Premium PL-900 by VCEplus 157q
100% (1)
Microsoft Premium PL-900 by VCEplus 157q
123 pages
Management Information System of Allied Bank
57% (14)
Management Information System of Allied Bank
13 pages
Management Canvas - Vol 1 - Issue 1
100% (5)
Management Canvas - Vol 1 - Issue 1
52 pages
Spooky2 Morgellon Lyme Guide 9.2019
No ratings yet
Spooky2 Morgellon Lyme Guide 9.2019
87 pages
Bep Eir Guidance Based On Iso19650
No ratings yet
Bep Eir Guidance Based On Iso19650
20 pages
Prediction Paper - Non Calculator Paper 1
No ratings yet
Prediction Paper - Non Calculator Paper 1
16 pages
Hype It Guidebook: @goldsheepdesign @steph - Sonic
No ratings yet
Hype It Guidebook: @goldsheepdesign @steph - Sonic
8 pages
Database Management Systems Course Guide Book PDF
No ratings yet
Database Management Systems Course Guide Book PDF
4 pages
SAFE Tutorial v. 12 Ingles
No ratings yet
SAFE Tutorial v. 12 Ingles
112 pages
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
No ratings yet
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
85 pages
AI and Robotics
No ratings yet
AI and Robotics
22 pages
Practical Research Paper Smart Dustbin 12 Euler 1
No ratings yet
Practical Research Paper Smart Dustbin 12 Euler 1
41 pages
Exercise 2 Analitik Bisnis
0% (1)
Exercise 2 Analitik Bisnis
10 pages
Mandatery Aix Command For Oracle Dba and Apps Dba
No ratings yet
Mandatery Aix Command For Oracle Dba and Apps Dba
34 pages
10-04-2023 User Pppoe DHCP
No ratings yet
10-04-2023 User Pppoe DHCP
48 pages
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
No ratings yet
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
51 pages
Forecast Explosion Consumption Setup Steps
No ratings yet
Forecast Explosion Consumption Setup Steps
5 pages
Java Theroy ! Easy To Learn
No ratings yet
Java Theroy ! Easy To Learn
53 pages
Grade 6 Math Circles Working With An Abacus: Faculty of Mathematics Waterloo, Ontario N2L 3G1
No ratings yet
Grade 6 Math Circles Working With An Abacus: Faculty of Mathematics Waterloo, Ontario N2L 3G1
10 pages
PWM Signal Generator ESR 1.2
No ratings yet
PWM Signal Generator ESR 1.2
4 pages
Ai - Bad402 - M2
No ratings yet
Ai - Bad402 - M2
15 pages
CS Project
No ratings yet
CS Project
14 pages
Cache
No ratings yet
Cache
145 pages
Committee On Sponsoring Organization On The Treadway Commission (COSO)
No ratings yet
Committee On Sponsoring Organization On The Treadway Commission (COSO)
3 pages
History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control
No ratings yet
History-Aware Online Cache Placement in Fog-Assisted IoT Systems An Integration of Learning and Control
22 pages
Magnaye, Kimberly Wealth - Case Study 1
No ratings yet
Magnaye, Kimberly Wealth - Case Study 1
1 page
Dashpute Smita A.: Brief Overview
No ratings yet
Dashpute Smita A.: Brief Overview
3 pages
Lesson Plan 24 Collecting Like Terms 2
No ratings yet
Lesson Plan 24 Collecting Like Terms 2
2 pages
LiuXixi 2020
No ratings yet
LiuXixi 2020
50 pages
Microsoft Nav 2009 Part A
No ratings yet
Microsoft Nav 2009 Part A
3 pages
1 s2.0 S030439750400475X Main
No ratings yet
1 s2.0 S030439750400475X Main
35 pages
Sample Paper SEET 2023
No ratings yet
Sample Paper SEET 2023
1 page
Caching On The Changing Web: Ying Xing May 17, 2001
No ratings yet
Caching On The Changing Web: Ying Xing May 17, 2001
18 pages
Theoretical Study of Cache Systems: Dmitry Dolgikh
No ratings yet
Theoretical Study of Cache Systems: Dmitry Dolgikh
17 pages
Improving The Performance of Browsers Using Fuzzy Logic: K Muralidhar, Dr. N Geethanjali
No ratings yet
Improving The Performance of Browsers Using Fuzzy Logic: K Muralidhar, Dr. N Geethanjali
8 pages
The SIEVE Algorithm
No ratings yet
The SIEVE Algorithm
18 pages
APC: Self-Tuning, Low Overhead Replacement Cache
No ratings yet
APC: Self-Tuning, Low Overhead Replacement Cache
16 pages
Consistent Hashing and Random Trees
No ratings yet
Consistent Hashing and Random Trees
10 pages
Flow Caching Effectiveness in Packet Forwarding Applications
No ratings yet
Flow Caching Effectiveness in Packet Forwarding Applications
19 pages
Quintum Configuration Guide DX
No ratings yet
Quintum Configuration Guide DX
47 pages
Universal-Caching Sinha22
No ratings yet
Universal-Caching Sinha22
9 pages
Proactive Caching Strategy Based On Queueing Theory in F-RAN
No ratings yet
Proactive Caching Strategy Based On Queueing Theory in F-RAN
13 pages
Wu 2021
No ratings yet
Wu 2021
15 pages
Function Approximation Based Reinforcement Learning For Edge Caching in Massive MIMO Networks
No ratings yet
Function Approximation Based Reinforcement Learning For Edge Caching in Massive MIMO Networks
13 pages
CSD Final Report
No ratings yet
CSD Final Report
8 pages
Arc: A Self-Tuning, Low Overhead Replacement Cache
No ratings yet
Arc: A Self-Tuning, Low Overhead Replacement Cache
17 pages
Cluster Based Content Caching Driven by Popularity Prediction
No ratings yet
Cluster Based Content Caching Driven by Popularity Prediction
10 pages
Fundamental Limits of Cache-Aided Interference Management
No ratings yet
Fundamental Limits of Cache-Aided Interference Management
16 pages
Popularity-Driven Content Caching
No ratings yet
Popularity-Driven Content Caching
9 pages
p22 Kumar
No ratings yet
p22 Kumar
8 pages
N7DM08
No ratings yet
N7DM08
14 pages
Modelo Revista Henri Poincar
No ratings yet
Modelo Revista Henri Poincar
11 pages
Web Caching Alg
No ratings yet
Web Caching Alg
4 pages
10 Web Caching
No ratings yet
10 Web Caching
24 pages
T-Caching: Enhancing Feasibility of In-Network Caching in ICN
No ratings yet
T-Caching: Enhancing Feasibility of In-Network Caching in ICN
13 pages
An Imitation Learning Approach For Cache Replacement: B A C B D Hit C B A D Miss Cache Evict
No ratings yet
An Imitation Learning Approach For Cache Replacement: B A C B D Hit C B A D Miss Cache Evict
14 pages
I Jcs It 20140503122
No ratings yet
I Jcs It 20140503122
4 pages
Documents: Search Books, Presentatio
No ratings yet
Documents: Search Books, Presentatio
14 pages
Optimal and Scalable Caching For 5G Using Reinforcement Learning of Space-Time Popularities
No ratings yet
Optimal and Scalable Caching For 5G Using Reinforcement Learning of Space-Time Popularities
11 pages
Web Caching Through Modified Cache Replacement Algorithm
No ratings yet
Web Caching Through Modified Cache Replacement Algorithm
5 pages
Unit-5 Mobile Computing
No ratings yet
Unit-5 Mobile Computing
133 pages
Caching Policies and Strategies - System Design On AWS (Book)
No ratings yet
Caching Policies and Strategies - System Design On AWS (Book)
29 pages
Mathematical Modelling Approach in Web Cache Scheme: Hapiza@tmsk - Uitm.edu - My Norlaila@tmsk - Uitm.edu - My
No ratings yet
Mathematical Modelling Approach in Web Cache Scheme: Hapiza@tmsk - Uitm.edu - My Norlaila@tmsk - Uitm.edu - My
5 pages
Intelligent Web Caching Using Neurocomputing and Particle Swarm Optimization Algorithm
No ratings yet
Intelligent Web Caching Using Neurocomputing and Particle Swarm Optimization Algorithm
6 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
8 pages
Zipf's Distribution Caching Application in Named Data Networks
No ratings yet
Zipf's Distribution Caching Application in Named Data Networks
4 pages
Wcaching PDF
No ratings yet
Wcaching PDF
34 pages
The Persistent-Access-Caching Algorithm : Predrag R. Jelenkovi C
No ratings yet
The Persistent-Access-Caching Algorithm : Predrag R. Jelenkovi C
33 pages
Boosting Cache Performance by Access Time Measurements: Gil Einziger, Omri Himelbrand, Erez Waisbard
No ratings yet
Boosting Cache Performance by Access Time Measurements: Gil Einziger, Omri Himelbrand, Erez Waisbard
29 pages
CACHE 技术讨论: sina@冰砖帮帮忙
No ratings yet
CACHE 技术讨论: sina@冰砖帮帮忙
17 pages
Report
No ratings yet
Report
7 pages
Ijet V3i6p6
No ratings yet
Ijet V3i6p6
7 pages
Caching Presentation
No ratings yet
Caching Presentation
14 pages
Distributed Caching Algorithms For Content Distribution Networks
No ratings yet
Distributed Caching Algorithms For Content Distribution Networks
22 pages
Benefit-Based Data Caching in Ad Hoc Networks
No ratings yet
Benefit-Based Data Caching in Ad Hoc Networks
16 pages
A Crash Course in Caching - Part 1 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 1 - by Alex Xu
9 pages
PTEA Algorithm For Wireless P2P Networks in The Presence of Cooperative Cache
No ratings yet
PTEA Algorithm For Wireless P2P Networks in The Presence of Cooperative Cache
3 pages
What Is Caching and Its Benefits
No ratings yet
What Is Caching and Its Benefits
4 pages
Caching: Application Server Cache
No ratings yet
Caching: Application Server Cache
3 pages
Novel Approach For Cooperative Caching in Distributed Environment
No ratings yet
Novel Approach For Cooperative Caching in Distributed Environment
4 pages
Caching: Application Server Cache
No ratings yet
Caching: Application Server Cache
4 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
From Everand
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Container Infrastructure and Operations: Definitive Reference for Developers and Engineers
From Everand
Container Infrastructure and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
From Everand
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet

Cache

Uploaded by

Cache

Uploaded by

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE/ACM TRANSACTIONS ON NETWORKING 1

Learning to Cache and Caching to Learn:

Abstract— Crucial performance metrics of a caching algorithm

2 IEEE/ACM TRANSACTIONS ON NETWORKING

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 3

4 IEEE/ACM TRANSACTIONS ON NETWORKING

A. LFU Algorithm E[R(T )]

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 5

6 IEEE/ACM TRANSACTIONS ON NETWORKING

C(L − C)w 4 C(L − C) C

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 7

8 IEEE/ACM TRANSACTIONS ON NETWORKING

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 9

10 IEEE/ACM TRANSACTIONS ON NETWORKING

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 11

Fig. 8. Cumulative regret performance comparison for L = 100,

Fig. 9. Regret of CB-SI, CB-SILite, CB-MPS for C = 10, β = 1.

Fig. 6. Regret of LFU-Lite for varying L.

Fig. 10. Growth of counters for varying L.

Fig. 7. Growth of counters for varying L. B. Trace-Based Simulations

12 IEEE/ACM TRANSACTIONS ON NETWORKING

Fig. 16. Hit rates for partial observation, change trace.

performance gain in LFUCHANGE and LFULiteCHANGE is

BURA et al.: LEARNING TO CACHE AND CACHING TO LEARN 13

popularities at the change points, i.e., at each change point,

14 IEEE/ACM TRANSACTIONS ON NETWORKING

You might also like