0% found this document useful (0 votes)
46 views15 pages

Cost-Aware WWW Proxy Caching Algorithms: University of Wisconsin-Madison University of California-Irvine

This document summarizes a research paper on cost-aware web proxy caching algorithms. It introduces a new algorithm called GreedyDual-Size that considers document size, latency, and network costs. The paper reviews existing caching algorithms and shows via simulation that GreedyDual-Size outperforms other algorithms in hit ratio, latency reduction, and network cost reduction when costs are appropriately defined. It generalizes an existing algorithm called Greedy-Dual to address variable document sizes and costs.

Uploaded by

sumitrajput786
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views15 pages

Cost-Aware WWW Proxy Caching Algorithms: University of Wisconsin-Madison University of California-Irvine

This document summarizes a research paper on cost-aware web proxy caching algorithms. It introduces a new algorithm called GreedyDual-Size that considers document size, latency, and network costs. The paper reviews existing caching algorithms and shows via simulation that GreedyDual-Size outperforms other algorithms in hit ratio, latency reduction, and network cost reduction when costs are appropriately defined. It generalizes an existing algorithm called Greedy-Dual to address variable document sizes and costs.

Uploaded by

sumitrajput786
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The following paper was originally published in the

Proceedings of the USENIX Symposium on Internet Technologies and Systems


Monterey, California, December 1997

Cost-Aware WWW Proxy Caching Algorithms

Pei Cao
University of Wisconsin-Madison
Sandy Irani
University of California-Irvine

For more information about USENIX Association contact:


1. Phone: 510 528-8649
2. FAX: 510 548-5738
3. Email: [email protected]
4. WWW URL: http://www.usenix.org/
Cost-Aware WWW Proxy Caching Algorithms
Pei Cao Sandy Irani
Department of Computer Science, Information and Computer Science Department,
University of Wisconsin-Madison. University of California-Irvine.
[email protected] [email protected]

Abstract Web server. Numerous studies [WASAF96] have


shown that the hit ratio for Web proxy caches can
Web caches can not only reduce network trac and be as high as over 50%. This means that if proxy
downloading latency, but can also a ect the distri- caching is utilized extensively, the network trac can
bution of web trac over the network through cost- be reduced signi cantly.
aware caching. This paper introduces GreedyDual- Key to the e ectiveness of proxy caches is a doc-
Size, which incorporates locality with cost and size ument replacement algorithm that can yield high hit
concerns in a simple and non-parameterized fashion ratio. Unfortunately, techniques developed for le
for high performance. Trace-driven simulations show caching and virtual memory page replacement do not
that with the appropriate cost de nition, GreedyDual- necessarily transfer to Web caching.
Size outperforms existing web cache replacement al- There are three primary di erences between Web
gorithms in many aspects, including hit ratios, la- caching and conventional paging problems. First,
tency reduction and network cost reduction. In ad- web caching is variable-size caching: due to the re-
dition, GreedyDual-Size can potentially improve the striction in HTTP protocols that support whole le
performance of main-memory caching of Web docu- transfers only, a cache hit only happens if the entire
ments. le is cached, and web documents vary dramatically
in size depending on the information they carry (text,
1 Introduction image, video, etc.). Second, web pages take di erent
amounts of time to download, even if they are of the
As the World Wide Web has grown in popularity in same size. A proxy that wishes to reduce the aver-
recent years, the percentage of network trac due age latency of web accesses may want to adjust its
to HTTP requests has steadily increased. Recent re- replacement strategy based on the download latency.
ports show that Web trac has constituted 40% of Third, access streams seen by the proxy cache are the
the network trac in 1996, compared to only 19% union of web access streams from tens to thousands
in 1994. Since the majority of Web documents re- of users, instead of coming from a few programmed
quested are static documents (i.e. home pages, audio sources as in the case of virtual memory paging.
and video les), caching at various network points Proxy caches are in a unique position to a ect
provides a natural way to reduce web trac. A com- web trac on the Internet. Since the replacement
mon form of web caching is caching at HTTP proxies, algorithm decides which documents are cached and
which are intermediateries between browser processes which documents are replaced, it a ects which fu-
and web servers on the Internet (for example, one can ture requests will be cache hits. Thus, if the insti-
choose a proxy by setting the network preference in tution employing the proxy must pay more on some
the Netscape Navigator1). network links than others, the replacement algorithm
There are many bene ts of proxy caching. It can favor expensive documents (i.e. those travelling
reduces network trac, average latency of fetching through the expensive links) over cheap documents.
Web documents, and the load on busy Web servers. If it is known that certain network paths are heav-
Since documents are stored at the proxy cache, many ily congested, the caching algorithm can retain more
HTTP requests can be satis ed directly from the documents which must travel on congested paths.
cache instead of generating trac to and from the The proxy cache can reduce its contribution to the
network router load by preferentially caching docu-
1 Navigator is a trademark of Netscape Inc. ments that travel more hops. Web cache replace-
ment algorithms can incorporate these considerations ine replacement algorithm. If one is given a sequence
by associating an appropriate network cost with ev- of requests to uniform size blocks of memory, it is
ery document, and minimizing the total cost incurred well known that the simple rule of evicting the block
over a particular access stream. whose next request is farthest in the future will yield
Today, most proxy systems use some form of the the optimal performance [Bel66]. In the variable-size
Least-Recently-Used replacement algorithm. Though case, no such oine algorithm is known. In fact, it is
some proxy systems also consider the time-to-live known that determining the optimal performance is
elds of the documents and replace expired docu- NP-hard [Ho97], although there is an algorithm which
ments rst, studies have found that time-to-live elds can approximate the optimal to within a logarithmic
rarely correspond exactly to the actual life time of factor [Ir97]. The approximation factor is logarithmic
the document and it is better to keep expired-but- in the maximum number of bytes that can t in the
recently-used documents in the cache and validate cache, which we will call k.
them by querying the server [LC97]. The advantage For the cost consideration, there have been several
of LRU is its simplicity; the disadvantage is that it algorithms developed for the uniform-size variable-
does not take into account le sizes or latency and cost paging problem. GreedyDual [You91b], is actu-
might not give the best hit ratio. ally a range of algorithms which include a generaliza-
Many Web caching algorithms have been pro- tion of LRU and a generalization of FIFO. The name
posed to address the size and latency concerns. We GreedyDual comes from the technique used to prove
are aware of at least nine algorithms, from the sim- that this entire range of algorithms is optimal accord-
ple to the very elaborate, proposed and evaluated in ing to its competitive ratio. The competitive ratio is
separate papers, some of which give con icting con- essentially the maximum ratio of the algorithms cost
clusions. This naturally leads to a state of confusion to the optimal oine algorithm's cost over all possible
over which algorithm should be used. In addition, request sequences. (For an introduction to competi-
none of the existing algorithms address the network tive analysis, see [ST85]).
cost concerns. We have generalized the result in [You91b] to show
In this paper, we introduce a new algorithm, that our algorithm GreedyDual-Size, which handles
called GreedyDual-Size, which combines locality, size documents of di ering sizes and di ering cost (de-
and latency/cost concerns e ectively to achieve the scribed in Section 4), also has an optimal competi-
best overall performance. GreedyDual-Size is a varia- tive ratio. Interestingly, it is also known that LRU
tion on a simple and elegant algorithm called Greedy- has an optimal competitive ratio when the page size
Dual [You91b], which handles uniform-size variable- can vary and the cost of fetching a document is the
cost cache replacement. Using trace-driven simula- same for all documents or proportional to the size of
tion, we show that GreedyDual-Size with appropri- a document [FKIP96].
ate cost de nitions out-performs the various \cham-
pion" web caching algorithms in existing studies on
a number of performance issues, including hit ratios,
2.2 Existing Document Replacement
latency reduction, and network cost reduction. Algorithms
We describe nine cache replacement algorithms pro-
posed in recent studies, which attempt to minimize
2 Existing Results various cost metrics, such as miss ratio, byte miss ra-
The size and cost concerns make web caching a much tio, average latency, and total cost. Below we give
more complicated problem than traditional caching. a brief description of all of them. In describing the
Below we rst summarize the existing theoretical re- various algorithms, it is convenient to view each re-
sults, then take a look at a variety of web caching quest for a document as being satis ed in the follow-
algorithms proposed so far. ing way: the algorithm brings the newly requested
document into the cache and then evicts documents
until the capacity of the cache is no longer exceeded.
2.1 Existing Theoretical Results Algorithms are then distinguished by how they choose
There are a number of results on the optimal oine which documents to evict. This view allows for the
replacement algorithms and online competitive algo- possibility that the requested document itself may be
rithms on simpli ed versions of the Web caching prob- evicted upon its arrival into the cache, which means
lem. it replaces no other document in the cache.
The variable document sizes in web caching make  Least-Recently-Used (LRU) evicts the doc-
it much more complicated to determine an optimal of- ument which was requested the least recently.
 Least-Frequently-Used (LFU) evicts the Di+1 =Di , where Di is the total number of docu-
document which is accessed least frequently. ments seen so far which have been requested at
least i times in the trace. Pi (s) is the same as
 Size [WASAF96] evicts the largest document. Pi except the value is determined by restricting
 LRU-Threshold [ASAWF95] is the same as the count only to pages of size s. Furthermore,
LRU, except documents larger than a certain let 1 , D(t) be the probability that a page is re-
threshold size are never cached; quested again as a function of the time (in sec-
onds) since its last request t; D(t) is estimated
 Log(Size)+LRU [ASAWF95] evicts the as
document who has the largest log(size) and is  
the least recently used document among all doc- D(t) = :035 log(t + 1) + :45 1 , e 2,et6 :
uments with the same log(size).
Then for a particular document d of size s and
 Hyper-G [WASAF96] is a re nement of LFU cost c, if the last request to d is the i'th request
with last access time and size considerations; to it, and the last request was made t seconds
ago, d's value in LRV is calculated as:
 Pitkow/Recker [WASAF96] removes the 
least-recently-used document, except if all doc- V (i; t; s) = PP1(1 (s)(1 , D(t))  c=s if i = 1
uments are accessed today, in which case the i , D(t))  c=s otherwise
largest one is removed;
Among all documents, LRV evict the one with
 Lowest-Latency-First [WA97] tries to mini- the lowest value. Thus, LRV takes into account
mize average latency by removing the document locality, cost and size of a document.
with the lowest download latency rst;
Existing studies using actual Web proxy traces
 Hybrid, introduced in [WA97], is aimed at re- narrowed down the choice for proxy replace-
ducing the total latency. A function is com- ment algorithms to LRU, SIZE, Hybrid and
puted for each document which is designed to LRV. Results in [WASAF96, ASAWF95] show that
capture the utility of retaining a given docu- SIZE performs better than LFU, LRU-threshold,
ment in the cache. The document with the Log(size)+LRU, Hyper-G and Pitkow/Recker. Re-
smallest function value is then evicted. The sults in [WASAF96] also show that SIZE outperforms
function for a document p located at server s LRU in most situations. However, a di erent study
depends on the following parameters: cs , the [LRV97] shows that LRU outperforms SIZE in terms
time to connect with server s, bs the bandwidth of byte hit rate. Comparing LFU and LRU, our
to server s, np the number of times p has been experiments show that though LFU can outperform
requested since it was brought into the cache, LRU slightly when the cache size is very small, in
and zp , the size (in bytes) of document p. The most cases LFU performs worse than LRU. In terms
function is de ned as: of minimizing latency, [WA97] show that Hybrid
  performs better than Lowest-Latency-First. Finally,
cs + Wbsb (np )Wn [LRV97] shows that LRV outperforms both LRU and
zp SIZE in terms of hit ratio and byte hit ratio. One
disadvantage of both Hybrid and LRV is their heavy
where Wb and Wn are constants. Estimates for parameterization, which leaves one uncertain about
cs and bs are based on the the times to fetch their performance across access streams.
documents from server s in the recent past. However, the studies o er no conclusion on which
algorithm a proxy should use. Essentially, the prob-
 Lowest Relative Value (LRV), introduced in lem is nding an algorithm that can combine the ob-
[LRV97], includes the cost and size of a doc- served access pattern with the cost and size consid-
ument in the calculation of a value that esti- erations.
mates the utility of keeping a document in the
cache. The algorithm evicts the document with 2.2.1 Implementation Concerns
the lowest value. The calculation of the value
is based on extensive empirical analysis of trace The above \champion" algorithms vary in time and
data. For a given i, let Pi denote the proba- space complexity. In the cases when there are a large
bility that a document is requested i + 1 times number of documents in the cache, this can have a
given that it is requested i times. Pi is esti- dramatic e ect on the time required to determine
mated in an online manner by taking the ratio which document to evict.
 LRU can be implement easily with O(1) over- many traces as possible. We were successful in ob-
head per cached le and O(1) time per access; taining the following traces of HTTP requests going
through Web proxies:
 Size can be implemented by maintaining a pri-
ority queue on the documents in memory based  Digital Equipment Corporation Web Proxy
on their size. Since the size of a document does server traces [DEC96](Aug-Sep 1996), servicing
not change, handling a hit requires O(1) time about 17,000 workstations, for a period of 25
and handling an eviction requires O(log k) time, days, containing a total of about 24,000,000 ac-
where k is the number of cached documents. cesses;
 Hybrid is also implemented using a priority  University of Virginia proxy server and client
queue, thus requiring O(log k) time to nd a traces [WASAF96] (Feb-Oct 1995), containing
replacement. Furthermore, it requires an array four sets of traces, each servicing from 25 to 61
keeping track of the average latency and band- workstations, containing from 13,127 to 227,210
width for every Web server. It is used in esti- accesses;
mating the downloading latency of a web page.  Boston University client traces [CBC95](Nov
This requires extra storage. In addition, since 1994 - May 1995), containing two sets of traces,
the estimate is updated every time a connec- one servicing 5 workstations (17,008 accesses),
tion to the server is made, a faithful implemen- the other 32 workstations (118,105 accesses);
tation requires updating many pages' latency
estimation. We found this prohibitively time- We are in the process of obtaining more traces from
consuming, and we omit the step in the imple- other sources.
mentation. We nd that omitting the step does We present the results of fourteen traces. They
not a ect our results signi cantly. include all of Virginia Tech and Boston University
traces, and eight subsets of the DEC traces. The
 LRV requires O(1) storage per cached le plus subsets are Web accesses made by users 0-512, and
some bookkeeping information. If the Cost in users 1024-2048, in each week, for the three and a
LRV is proportional to Size, the authors of the half weeks period from Aug. 29 to Sep. 22, 1996.
algorithm suggests an ecient method that can The use of the subsets is partly due to our current
nd the replacement in O(1) time, though the simulator's limitation (it cannot simulate more than
constants can be large. If Cost is arbitrary, then two million requests at a time), and partly due to
O(k) time is needed to nd a replacement. We our observation that a caching proxy server built out
also found that the cost of calculating D(t) are of a high-end workstation can only service about 512
very high, since it uses log and exp. users at a time.
Another concern about both Hybrid and LRV is We perform some necessary pre-processing over
that they employ constants which might have to be the traces. For the DEC traces, we simulated only
tuned to the patterns in the request stream. For Hy- those requests whose replies are cacheable as speci ed
brid, we use the values which were used in [WA97] in HTTP 1.1 [HT97] (i.e. GET or HEAD requests
in our simulations. We did not experiment with tun- with status 200, 203, 206, 300, or 301, and not a
ing those constants to improve the performance of \cgi bin" request). In addition, we do not include
Hybrid. those requests that are queries (i.e. \?" appears in
Though LRV can incorporate arbitrary network the URL), though such requests are a small fraction
costs associated with documents, the O(k) compu- of total cacheable requests (around 3% to 5%). For
tational complexity of nding a replacement can be Virginia Tech traces, we simulated only the \GET"
prohibitively expensive. The problem is that D(t) requests with reply status 200 and a known reply size.
has to be recalculated for every document every time Thus, our numbers di er from what are reported in
some document has to be replaced. The overhead [WASAF96]. The Virginia Tech traces unfortunately
makes LRV impractical for proxy caches that wish to do not come with latency information. For Boston
take network costs into consideration. University traces, we simulated only those requests
that are not serviced out of browser caches.
3 Web Proxy Traces 3.1 Locality in Web Accesses
As the conclusions from a trace-driven study in- In the search for an e ective replacement algorithm,
evitably depend on the traces, we tried to gather as we analyzed the traces to understand the access pat-
terns of Web requests seen by the proxies. The strik-
1e+01 1e+01

1e+00 1e+00

Percentage of References
Percentage of Reference

1e-01 1e-01

1e-02 1e-02

1e-03 1e-03

1e-04 1e-04

1e-05 1e-05
2000 4000 6000 8000 10000 2000 4000 6000 8000 10000
time since last access (minutes) time since last access (minutes)

Figure 1: Percentage of references to documents Figure 2: Percentage of references as a function of


whose last accesses are t minutes ago, for t from 5 time since last access by the same user.
to 10000.
probability density function of k=t has been used to
ing property we found is that all traces exhibit excel- simulate temporal locality behavior in a recent Web
lent long-term locality. proxy benchmark [WPB].
Figure 1 shows the percentage of references to a There are two reasons for the good locality in
document whose last reference is t minutes ago, for Web accesses seen by the proxy. One is that each
t from 5 to 10000, in the DEC traces for the period user's accesses tend to exhibit locality | gure 2
from Sep. 12 to Sep. 18. In other words, the gure shows the probability that a document is accessed
shows the probability of a document being accessed by a user t minutes after the last access by the same
again as a function of the time since the last access to user, for DEC traces in the period from Sep. 12 to
this document. The graphs for other traces are sim- Sep. 18 (again, the gures for other traces are sim-
ilar to the one shown here. Clearly, the probability ilar). Clearly, each user tends to re-access recently-
of reference drops signi cantly as the time since last read documents, and re-access documents that are
reference increases (note that the y-axis is in logarith- read on a daily basis (note the spikes around 24 hours,
mic scale), with occasional spikes around multiples of 48 hours, etc. in the gure). Though one might ex-
24 hours. pect that browsers' caches absorb the locality among
Figure 3 shows the accumulative percentage of ref- the same user's accesses seen by the proxy, the results
erences to documents whose last references are less seems to indicate that this is not necessarily the case,
than t minutes ago, for the entire DEC traces from and users are using proxy caches as an extension to
Aug. 29 to Sep. 22. The dashed curve on the graph the browser cache. [LRV97] observes the same phe-
shows the corresponding percentage of bytes refer- nomenon.
enced. In Figure 3, which uses linear scale for the y- The other reason is that users' interests overlap in
axis, and logarithmic scale for the x-axis, we see that time | comparing gures 2 and 1, we can see that
the curves are nearly linear. That is, the probability for the same t, the percentage in gure 1 is higher
of a document being referenced again within t min- than that in gure 2. This indicates that part of the
utes is proportionally to log(t), indicating that the locality observed by the proxy comes from the fact
probability of re-reference to documents referenced that the proxy sees a merged stream of accesses from
exactly t minutes ago can be modeled as k=t, where many independent users, who share a certain amount
k is a constant. of common interests. Thus, we believe the locality
A di erent study [LRV97] reached very similar observed is not particular to the traces described here,
conclusions on a di erent set of traces. Indeed, it is but rather a common characteristic of accesses seen
this observation that promoted the design of the func- by proxies with a large enough user community.
tion D(t) in LRV. Since the studies nd similar tem- To further demonstrate the e ect of inter-user
poral locality patterns in the Web access traces, the sharing on hit ratios, Figure 4 shows the hit ratio
Accumulative Percentage of References but we focus one particular version which is a gener-
50 alization of LRU. It is concerned with the case when
pages in a cache have the same size, but incur di er-
files
bytes ent costs to fetch from a secondary storage. The al-
40
gorithm associates a value, H , with each cached page
p. Initially, when a page is brought into cache, H is
30 set to be the cost of bringing the page into the cache
(the cost is always non-negative). When a replace-
ment needs to be made, the page with the lowest H
20 value, minH , is replaced, and then all pages reduce
their H values by minH . If a page is accessed, its
H value is restored to the cost of bringing it into the
10 cache. Thus, the H values of recently accessed pages
retain a larger portion of the original cost than those
of pages that have not been accessed for a long time.
0
By reducing the H values as time goes on and restor-
1 10 100 1000 10000
ing them upon access, the algorithm integrates the
time since last access (minutes) locality and cost concerns in a seamless fashion.
Figure 3: Percentage of references and percentage of To incorporate the di erence sizes of the docu-
bytes referenced to documents accessed less than t ment, we extend the GreedyDual algorithm by set-
minutes ago (the accumulative version of Figure 1). ting H to cost=size upon an access to a document,
Note that the y-axis is in linear scale and the x-axis where cost is the cost of bringing the document, and
is in log scale. size is the size of the document in bytes. We called
the extended version the GreedyDual-Size algorithm.
The de nition of cost depends on the goal of the re-
and byte hit ratio as a function of the size of the placement algorithm: cost is set to 1 if the goal is to
user group sharing the cache. The gures are quartile maximize hit ratio, it is set to the downloading la-
graphs [Tufte], the middle curve showing the median tency if the goal is to minimize average latency, and
of the hit ratios of individual groups of clients in the it is set to the network cost if the goal is to minimize
DEC traces, and the other four points for each group the total cost.
size showing the minimum, the 25% percentile, the At the rst glance, GreedyDual-Size would require
75% percentile, and the maximum of the hit ratios k subtractions when a replacement is made, where k
of individual groups. The median hit ratios show an is the number of documents in cache. However, a dif-
almost linear increase as the group size doubles. ferent way of recording H removes these subtractions.
The idea is to keep an \in ation" value L, and let all
In the absence of cost and size concerns, LRU is future setting of H be o set by L. Figure 5 shows an
the optimal online algorithm for reference streams ex- ecient implementation of the algorithm.
hibiting good locality [CD73] (strictly speaking, those Using this technique, GreedyDual-Size can be im-
conforming to the LRU-stack model). However, in plemented by maintaining a priority queue on the
the Web context, replacing a more recently used but documents, based on their H value. Handling a hit
large le can yield a higher hit ratio than replacing a requires O(log k) time and handling an eviction re-
less recently used but small le. Similarly, replacing quires O(log k) time, since in both cases a single item
a more recently used but inexpensive le may yield a in the queue must be updated. More ecient imple-
lower total cost than replacing a less recently used but mentations can be designed that make the common
expensive le. Thus, we need an algorithm that com- case of handling a hit requiring only O(1) time.
bines locality, size and cost considerations in a simple,
online way that does not require tuning paramters ac-
cording to the particular traces, and yet maximizes
Online-Optimality of GreedyDual-Size
the overall performance. It can be proven that GreedyDual-Size is online-
optimal. For any sequence of accesses to documents
with arbitrary sizes and arbitrary costs, the cost of
4 GreedyDual-Size Algorithm cache misses under GreedyDual-Size is at most k
times that under the oine optimal replacement algo-
The original GreedyDual algorithm is proposed by rithm, where k is the ratio of the cache size to the size
Young [You91b]. It is actually a range of algorithms, of the smallest page. The ratio is the lowest achiev-
1.0 1.0

0.8 0.8

Byte Hit Ratio


0.6 0.6
Hit Ratio

0.4 0.4

0.2 0.2

0.0 0.0
1 10 100 1000 1 10 100 1000
group size (clients) group size (clients)

Figure 4: Hit ratio and byte hit ratio as a function of the size of the user group sharing the cache. The x-axis is
in log scale.

able by any online replacement algorithm. Below is a


proof of the online-optimality of GreedyDual-Size.
Neal Young proved in [You91b] that Greedy Dual
for pages of uniform size is k-competitive. We prove
here that the version which handles pages of multiple
size is also k-competitive. (In both cases, k is de ned
to be the ratio of the size of the cache to the size of the
smallest page). The proof below is based on a proof
Algorithm GreedyDual
: that another algorithm called BALANCE which also
Initialize L
0. solves the multi-cost uniform-size paging problem is
Process each request document in turn: k-competitive [CKPV91].
The current request is for document : p All of the above bounds are tight, since we can
(1) if p
is already in memory, always assume that all pages are as small as pos-
(2) H p L c p =s p
() + ( ) ( ). sible and have the same cost and invoke the lower
(3) if p
is not in memory, bound of k on the competitive ratio for the uniform-
(4) while
there is not enough room size uniform-cost paging problem found in [ST85].
in memory for , p It should also be noted that the same bound can
(5) Let L
minq2M ( ). Hq be proven for the version of the algorithm which uses
(6) Evict q
such that ( )= . Hq L c(p) instead of c(p)=s(p) in the description of the al-
(7) Bring p
into memory and set gorithm in Figure 5. Young has independently proven
H (p) L + c(p)=s(p) a generalization of the result below [You97]. The gen-
end eralization covers the whole range of algorithms de-
scribed in his original paper [You91b] instead of the
particular version covered here.
Figure 5: GreedyDual Algorithm.
Theorem 1 GreedyDual-Size
is k-competitive, where k is the ratio of the size of
the cache to the size of the smallest document.
Proof. We will charge each algorithm for the docu-
ments they evict instead of the documents they bring
in. The di erence between the two cost measures is
at most an additive constant.
Each request for a document happens in a series of Now we must establish that the total cost of all
steps. First the optimal algorithm serves the request. documents evicted by GreedyDual-Size is at most
Then each of the steps of GreedyDual-Size is executed scache  Lfinal . Consider an eviction that GreedyDual-
in a separate step. Each step of each request happens Size performs. Let p be the document that is evicted
at a di erent point in time. and let L1 and L2 be the values for L when it is
It is straightforward to show by induction on time brought in and evicted from the cache, respectively.
that for every document q in the cache The value of H (p) when p is brought in to the cache is
L1 + c(p)=s(p). p can only be evicted if L equals H (p).
L  pmin H (p)  H (q)  L + sc((qq)) : Since H (p) can only increase during the time that p
2 M is in the cache, we know that L2 , L1  c(p)=s(p).
Let Lfinal be the last value of L. Let smin denote Draw an interval on the real line from L1 to L2
the size of the smallest document. Let scache be the that is closed on the left end and open on the right
total size of the cache. Note that k = scache=smin . end. Assign the interval a weight of s(p). If we
We will rst prove that the total cost of all documents draw an interval for every such eviction, the cost of
which OPT evicts is at least smin  Lfinal . Then we GreedyDual-Size is upper bounded by the sum over
will show that the total cost of all documents evicted all intervals of their length times their weight. By def-
by GreedyDual-Size is at most scache  Lfinal . inition, all intervals lie in the range from 0 to Lfinal .
Every time L increases, there is some document The nal observation is that the total weight of
which GreedyDual-Size has in the cache which the all the intervals which contain any single point on
optimal does not have in the cache. This is because the real line is at most scache. Consider a point L on
0

L only increases when GreedyDual-Size has exceeded the line where an interval begins or ends. The total
the size of its cache and must evict a document. Since weight of the intervals that cover this point is the sum
the optimal algorithm has already satis ed the re- of the sizes of the documents which are in the cache
quest, it has the requested document in the cache. when L reaches a value of L . Since the size of the
0

Since the newly requested document does not t in cache is at most scache , the sum of the weights of the
GreedyDual-Size's cache, GreedyDual-Size must have intervals which cover L is at most scache .
0

some document in the cache which the optimal does 2


not have in the cache.
Consider a period of time in which GreedyDual-
Size has p in its cache and the optimal does not. Such 5 Performance Comparison
a period begins with the optimal evicting p from the Using trace driven simulation, we compare the per-
cache and ends when either we evict p from the cache formance of GreedyDual-Size with LRU, Size, Hy-
or the optimal brings p back in to the cache. We will brid, and LRV. Size, Hybrid, and LRV are all \cham-
attribute any increase in L which occurs during this pion" algorithms from previously published studies
period to the cost the optimal incurred in evicting p at [WASAF96, LRV97, WA97]. In addition, for LRV, we
the beginning of the period. The cost of evicting p is rst go through the whole trace to obtain the neces-
c(p). The only thing we have to prove in establishing sary parameters, thus giving it the advantage of per-
that the optimal cost is at least smin  Lfinal is that fect statistical information. In contrast, GreedyDual-
L increases by at most c(p)=s(p)  c(p)=smin during Size takes into account cost, size and locality in a
the period. more natural manner and does not require tuning to
Let L1 be the value of L when the period begins. a particular set of traces.
We know that at this time H (p)  L1 + c(p)=s(p).
Furthermore, H (p) does not change during this pe-
riod. This is because H (p) only increases when p is 5.1 Performance Metrics
requested. p can only be requested on the last re- We consider ve aspects of web caching bene ts: hit
quest of the period (because the period is de ned to ratio, byte hit ratio, latency reduction, hop reduc-
the period of time in which GreedyDual-Size has p tion, and weighted-hop reduction. By hit ratio, we
in its cache and the optimal does not). If the last mean the number of requests that hit in the proxy
request of the period is to document p, then the opti- cache as a percentage of total requests. By byte hit
mal brings p into its cache, and hence the period ends ratio, we mean the number of bytes that hit in the
before H (p) increases. If the period ends by p's evic- proxy cache as the percentage of the total number
tion, H (p) remains the same until p is evicted. Since of bytes requested. By latency reduction, we mean
H (p) is an upper bound for L, L can not increase to the percentage of the sum of downloading latency for
more than L1 + c(p)=s(p) during the entire period.
the pages that hit in cache over the sum of all down- the results are quite similar to those obtained from
loading latencies. Hop reduction and weighted-hop these subsets.
reduction are used to measure the e ectiveness of the
algorithm at reducing network costs, as explained be- Below, we divide our results into three subsec-
low. tions. In Section 5.2, we measure the hit rate and
To investigate the regulatory role that can be byte hit rate of di erent algorithms. In Section 5.3
played by proxy caches, we model the network cost as- we compare the latency reduction. In Section 5.4
sociated with each document as \hops". The \hops" we compare the hop reduction and weighted hop re-
value can be the number of network hops travelled duction. The corresponding value under the in nite
by a document, to model the case when the proxy cache are listed in Table 1. Since these simulations
tries to reduce the overall load on Internet routers, assume limited cache storage, their ratios cannot be
or it can be the monetary cost associated with fetch- higher than the in nite cache ratios.
ing the document, to model the case when the proxy The cache sizes investigated in the simulation were
has to pay for documents travelling through di erent chosen by taking a xed percentage of the total sizes
network carriers. of all distinct documents requested in the sequence.
We evaluate the algorithms in a situation where The percentages are 0.05%, 0.5%, 5%, 10% and 20%.
there is a skew in the distribution of \hops" values For example, for trace DEC-U1:8/29-9/4, which in-
among the documents. The skewed distribution mod- cludes the requests made by users 0-512 for the week
els the case when a particular part of the network is of 8/29 to 9/4 and has a total data set size of 9.21GB,
congested, or the proxy has to pay a di erent amount the cache sizes experimented are 4.6MB, 46.1MB,
of money for documents travelling through di erent 461MB, 921MB and 1.84GB.
networks (for example, if the proxy is at an Inter- Due to space limitation, we organize the traces
net Service Provider). In our particular simulation, into four groups: Boston University traces, Virginia
we assign each Web server a hop value equal to 1 or Tech traces, DEC-U1 traces, and DEC-U2 traces, and
32 2 , so that 1=8 of the servers have hop value 32 and present the averaged results per trace group. The
7=8 of the servers have hop value 1. This simulates results for individual traces are similar.
the scenario, for example, that 1=8 of the web servers
contacted are located in Asia, or can only be reached 5.2 Hit Rate and Byte Hit Rate
through an expensive or congested link.
Associated with the \hop" value are two metrics: We introduce two versions of the GreedyDual-Size
hop reduction and weighted-hop reduction. Hop re- algorithm, GD-Size(1) and GD-Size(packets). GD-
duction is the ratio between the total number of the Size(1) sets the cost for each document to 1, and
hops of cache hits and the total number of the hops GD-Size(packets) sets the cost for each document to
of all accesses; weighted-hop reduction is the corre- 2+size=536 (that is, the estimated number of network
sponding ratio for the total number of hops times packets sent and received if a miss to the document
\packet savings" on cache hits. A cache hit's packet happens). In other words, GD-Size(1) tries to mini-
saving is 2 + file size=536, as an estimate of the ac- mize miss ratio, and GD-Size(packets) tries to mini-
tual number of network packets required if the request mize the network trac resulting from the misses.
is a cache miss (1 packet for the request, 1 packet for Figure 6(a) shows the average hit ratio of the
the reply, and size=536 for extra data packets, as- four groups of traces under LRU, Size, LRV, GD-
suming a 536-byte TCP segment size). Size(1), and GD-Size(packets). The graphs from left
For each trace, we rst calculate the bene t ob- to right show the results for Boston University traces,
tained if the cache size is in nite. The values for Virginia Tech traces, DEC-U1 traces and DEC-U2
all traces are shown in Table 1. In the table, BU- traces, respectively. Figure 6(b) is a simpli ed ver-
272 and BU-B19 are two sets of traces from Boston sion of 6(a) showing only the curves of LRU and GD-
University [CBC95], VT-BL, VT-C, VT-G, VT-U are Size(1), highlighting the di erences between the two.
four sets of traces from Virginia Tech [WASAF96], Graph (c) shows the average byte hit ratio for the
DEC-U1:8/29-9/4 through DEC-U1:9/19-9/22 are four trace groups under the di erent algorithms.
the requests made by users 0-512 (user group 1) The results show that clearly, GD-Size(1) achieves
for each week in the three and half week period, the best hit ratio among all algorithms across traces
and DEC-U2:8/29-9/4 through DEC-U2:9/19-9/22 and cache sizes. It approaches the maximal achiev-
are the traces for users 1024-2048 (user group 2). We able hit ratio very fast, being able to achieve over
experimented with other subsets of DEC traces and 95% of the maximal hit ratio when the cache size is
only 5% of the total data set size. It performs partic-
2These numbers are chosen partly because, at one time, the ularly well for small caches, suggesting that it would
maximum number of hops along a packet's route was 32.
Trace Clients Total Total Hit Byte Reduced Reduced Reduced
Requests GBytes Rate HR Latency Hops WeightedHops
BU-272 5 17007 0.39 0.25 0.15 0.13 0.16 0.09
BU-B19 32 118104 1.59 0.47 0.27 0.20 0.48 0.25
VT-BL 59 53844 0.674 0.43 0.33 - 0.35 0.16
VT-C 26 11250 0.159 0.45 0.38 - 0.33 0.15
VT-G 26 47802 0.630 0.50 0.30 - 0.49 0.31
VT-U 74 164160 2.30 0.46 0.33 - 0.40 0.25
DEC-U1:8/29-9/4 512 633881 9.21 0.42 0.35 0.24 0.34 0.25
DEC-U1:9/5-9/11 512 691211 9.32 0.40 0.31 0.23 0.32 0.23
DEC-U1:9/12-9/18 512 658166 9.23 0.39 0.31 0.19 0.39 0.32
DEC-U1:9/19-9/22 512 280087 3.86 0.38 0.31 0.16 0.25 0.21
DEC-U2:8/29-9/4 1024 455858 5.57 0.33 0.22 0.20 0.27 0.19
DEC-U2:9/5-9/11 1024 428719 5.13 0.30 0.21 0.18 0.25 0.16
DEC-U2:9/12-9/18 1024 408503 4.94 0.29 0.19 0.15 0.24 0.17
DEC-U2:9/19-9/22 1024 170397 2.00 0.26 0.19 0.15 0.17 0.11

Table 1: Bene ts under a cache of in nite size for each trace, measured as hit ratio, byte hit ratio, latency
reduction, hop reduction, and weighted-hop reduction.

be a good replacement algorithm for main memory gorithm.


caching of web pages.
However, Figure 6(c) reveals that GD-Size(1)
achieves its high hit ratio at the price of lower byte
5.3 Reduced Latency
hit ratio. This is because GD-Size(1) considers the Another major concern for proxies is to reduce the
saving for each cache hit as 1, regardless of the size latency of HTTP requests through caching, as nu-
of document. GD-Size(packets), on the other hand, merous studies have shown that the waiting time
achieves the overall highest byte hit ratio and the has become the primary concern of Web users. One
second highest hit ratio (only moderately lower than study [WA97] introduced a proxy replacement algo-
GD-Size(1)). GD-Size(packets) seeks to minimize (es- rithm called Hybrid, which takes into account the dif-
timated) network trac, in which both hit ratio and ferent latencies incurred to load di erent web pages,
byte hit ratio play a role. and attempts to minimize the average latency. The
For the Virginia Tech traces, LRV outperforms study [WA97] further showed that in general the al-
GD-Size(packets) in terms of hit ratio and byte hit gorithm has a lower average latency than LRU, LFU
ratio. This is due to the fact that those traces have and SIZE.
signi cant skews in the probability of references to We also designed two versions of GreedyDual-
di erent sized les, and LRV knows the distribution Size that take latency into account. One, called
before-hand and includes it in the calculation. How- GD-Size(latency), sets the cost of a document to the
ever, for all other traces where the skew is less signif- latency that was required to download the document.
icant, LRV performs worse than GD-Size(packets) in The other, called GD-Size(avg latency), sets the cost
terms of both hit ratio and byte hit ratio, despite its to the estimated download latency of a document,
heavy parameterization and foreknowledge. using the same method of estimating latency as in
LRU performs better than SIZE in terms of hit Hybrid [WA97].
ratio when the cache size is small (less or equal than Figure 7(a) shows the latency reductions for
5% of the total date set size), but performs slightly LRU, Hybrid, GD-Size(1), GD-Size(latency) and
worse when the cache size is large. The relative GD-Size(avg latency). The graphs, from left to right,
comparison of LRU and Size di ers from the results show the results for Boston University traces, DEC-
in [WASAF96], but agrees with those in [LRV97]. U1 traces and DEC-U2 traces. The gure unfor-
In summary, for proxy designers that seek to max- tunately does not include results for Virginia Tech
imize hit ratio, GD-Size(1) is the appropriate algo- traces because they do not have latency informa-
rithm. If both high hit ratio and high byte hit ratio tion for each HTTP request. Clearly, GD-Size(1)
are desired, GD-Size(packets) is the appropriate al- performs the best, yielding the highest latency re-
LRU
SIZE LRV GD-Size(1) GD-Size(packets)

0.4
Hit Ratio

0.2

0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

(a) Hit ratios of LRU, Size, LRV, GD-Size(1) and GD-Size(packets) for each trace group.

0.4
Hit Ratio

0.2

0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
(b) A simplified version of (a) showing only the curves for LRU and GD-Size(1).

0.3
Byte Hit Ratio

0.2

0.1

0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Relative Cache Size (%) Relative Cache Size (%) Relative Cache Size (%) Relative Cache Size (%)
Boston University Traces Virginia Tech traces DEC-U1 traces DEC-U2 traces
(c) Byte hit ratios of LRU, Size, LRV, GD-Size(1), and GD-Size(packets) for each trace group.
Figure 6: Hit ratio and byte hit ratio comparison of the algorithms.
duction. GD-Size(latency) and GD-Size(packets) n- simulation. The results also show that the speci cally
ish the second, with LRU following close behind. designed algorithms achieve their e ect. For hop re-
GD-Size(avg latency) performs badly for small cache duction, GD-Size(hops) performs the best, and for
sizes, but performs very well for relatively large cache weighted-hop reduction, GD-Size(weightedhops) per-
sizes. Finally, Hybrid performs the worst. forms the best. This shows that GreedyDual-Size not
Examination of the results shows that the reason only can combine cost concerns nicely with size and
for Hybrid's poor performance is its low hit ratio. locality, but is also very exible and can accommo-
In the Boston University traces, Hybrid's hit ratio date a variety of performance goals.
is much lower than LRU's for cache sizes  5% of Thus, we recommend GD-Size(hops) as the re-
the total data set sizes, and only slightly higher for placement algorithm for the regulatory role of
larger cache sizes. For all DEC traces, Hybrid's hit proxy caches. If the network cost is proportional
ratio is much lower than LRU's, under all cache sizes. to the number of bytes or packets, then GD-
Hybrid has a low hit ratio because it does not consider Size(weightedhops) is the appropriate algorithm.
how recently a document has been accessed during
replacement.
Since [WA97] reports that Hybrid performs well,
5.5 Summary
our results here seem to suggest that Hybrid's per- Based on the above results, we have the following
formance is perhaps trace-dependent. In our simula- recommendation. If the proxy wants high hit ra-
tion of Hybrid we used the same constants in [WA97], tio or low average latency, GD-Size(1) is the ap-
without tuning them to our traces. Unfortunately we propriate algorithm. If the proxy desires high byte
were not able to obtain the traces used in [WA97]. hit ratio as well, then GD-Size(packets) achieves a
It is a surprise to us that GD-Size(1), which good balance among the di erent goals. If the doc-
does not take latency into account, performs bet- uments have associated network or monetary costs
ter than GD-Size(latency) and GD-Size(avg latency). that do not change over time, or change slowly over
Detailed examination of the traces shows that the time, then GD-Size(hops) or GD-Size(weightedhops)
latency of loading the same document varies signif- is the appropriate algorithm. Finally, in the case of
icantly. In fact, for each of the DEC traces, vari- main memory caching of web documents, GD-Size(1)
ance among latencies of the same document ranges should be used because of its superior performance
from 5% to over 500%, with an average around 71%. under small cache sizes.
Thus, a document that was considered cheap (taking
less time to download) may turn out expensive at the
next miss, while a document that was considered ex- 6 Conclusion
pensive may actually take less time to download. The This paper introduces a simple web cache replace-
best bet for the replacement algorithm, it seems, is ment algorithm: GreedyDual-Size, and shows that it
to maximize hit ratio. outperforms existing replacement algorithms in many
In summary, GD-Size(1) is the best algorithm to performance aspects, including hit ratios, latency re-
reduce average latency. The high variance among duction, and network cost reduction. GreedyDual-
loading latencies for the same document reduces the Size combines locality, cost and size considerations in
e ectiveness of latency-conscious algorithms. a uni ed way without using any weighting function or
parameter. It is simple to implement and accommo-
5.4 Network Costs dates a variety of performance goals. Through trace-
To incorporate network cost considerations, driven simulations, we identify the cost de nitions
GD-Size(hops) sets the cost of each document to for GreedyDual-Size that maximize di erent perfor-
the hop value associated with the Web server of mance gains. GreedyDual-Size can also be applied to
the document, and GD-Size(weightedhops) sets the main memory caching of Web documents to further
cost to be hops  (2 + file size=536). Figure 7(b) improve performance.
and 7(c) show the hop reduction and weighted-hop The GreedyDual-Size algorithms shown so far can
reduction for LRU, GD-Size(1), GD-Size(hops), and only optimize one performance measure at a time. We
GD-Size(weightedhops). are looking into how to adjust the algorithm when the
The results show that algorithms that consider goal is to optimize more than one performance mea-
network costs do perform better than algorithms that sures (for example, both hit ratio and byte hit ratio).
are oblivious to them. The results here are di erent We also plan to study the integration of hint-based
from the latency results because the network cost as- prefetching with the cache replacement algorithm.
sociated with a document does not change during our Finally, we have shown that if an appropriate
0.20

0.15
Reduced Latency

LRU
Hybrid
GD-Size(1)
0.10
GD-Size(packets)
GD-Size(avg_latency)
GD-Size(latency)
0.05

0.00
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

(a) Latency reduction under LRU, Hybrid, GD-Size(1), GD-Size(packets), GD-Size(latency)


and GD-Size(avg_latency) for Boston University, DEC-U1, and DEC-U2 traces.
LRU GD-Size(1) GD-Size(hops) GD-Size(weightedhops)
0.4

0.3
Hop Reduction

0.2

0.1

0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
(b) Hop reduction under LRU, GD-Size(1), GD-Size(hops) and GD-Size(weightedhops).
Weighted Hop Reduction

0.2

0.1

0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Relative Cache Size (%) Relative Cache Size (%) Relative Cache Size (%) Relative Cache Size (%)
Boston University Traces Virginia Tech Traces DEC-U1 Traces DEC-U2 traces
(c) Weighted hop reduction under LRU, GD-Size(1), GD-Size(hops) and GD-Size(weightedhops).
Figure 7: Latency, hops, and weighted hops reductions under various algorithms.
network cost can be associated with a document, [Ir97] S. Irani. Page replacement with multi-size
GreedyDual-Size algorithm can be used to adjust pages and applications to web caching. In the
the caching of di erent documents to a ect the Proceedings for the 29th Symposium on the The-
Web trac. In other words, if proxy caches use ory of Computing, 1997, pages 701-710.
the GreedyDual-Size algorithm, and they can be in-
formed of the congestion on the network, the caches [LRV97] P. Lorenzetti, L. Rizzo and L. Vi-
can cooperate to reduce the trac over the congested cisano. Replacement Policies for a Proxy Cache.
links. However, how to detect congested path on the http://www.iet.unipi.it/ luigi/research.html.
network and how to assign appropriate cost values for [CBC95] Carlos R. Cunba, Azer Bestavros, Mark E.
the a ected documents are topics beyond the scope Crovella Characteristics of WWW Client-based
of this paper, and remain our future work. Traces BU-CS-96-010, Boston University.
[LM96] Paul Leach and Je Mogul. The Hit Metering
Acknowledgement Protocol. Manuscript.
The research is not possible without the support [HT97] IETF The HTTP 1.1 Protocol - Draft.
from people who make their proxy traces available. http://www.ietf.org.
Sandy Irani is supported in part by NSF Grant CCR-
9309456. Our shepherd, Carl Staelin, contributed [ST85] D. Sleator and R. E. Tarjan. Amortized e-
the accumulative percentage graphs and the quartile ciency of list update and paging rules. Communi-
graphs and greatly improved the paper. Finally, the cations of the ACM, 28:202{208, 1985.
anonymous referees provided very helpful comments. [Tufte] Edward Tufte The Visual Display of Quanti-
tative Information. Graphics Printers, Feburary
References 1992.
[ASAWF95] M. Abrams, C.R. Standbridge, [W3C] The Noti cation Protocol.
G.Abdulla, S. Williams and E.A. Fox. Caching http://www.w3c.org.
Proxies: Limitations and Potentials. WWW-4, [WASAF96] S. Williams, M. Abrams, C.R. Stand-
Boston Conference, December, 1995. bridge, G.Abdulla and E.A. Fox. Removal Poli-
[Bel66] L.A. Belady. A study of replacement algo- cies in Network Caches for World-Wide Web Doc-
rithms for virtual storage computers. IBM Sys- uments. In Proceedings of the ACM Sigcomm96,
tems Journal, 5:78{101, 1966. August, 1996, Stanford University.
[CD73] G. Co man, Jr., Edward and Peter J. Den- [WA97] R. Wooster and M. Abrams. Proxy
ning, Operating Systems Theory, Prentice-Hall, Caching the Estimates Page Load Delays. In
Inc. 1973. the 6th International World Wide Web Con-
ference, April 7-11, 1997, Santa Clara, CA.
[CKPV91] M. Chrobak, H. Karlo , T. H. Payne and http://www6.nttlabs.com/HyperNews/get/ PA-
S. Vishwanathan. New results on server problems. PER250.html.
newblock SIAM Journal on Discrete Mathemat- [WPB] Jussara Almeida and Pei
ics, 4:172{181, 1991. Cao. The Wisconsin Proxy Benchmark (WPB).
[DEC96] Digital Equipment Cooperation, Digital's http://www.cs.wisc.edu/cao/wpb1.0.html.
Web Proxy Traces [You91b] N. Young. The k-server dual and loose com-
ftp://ftp.digital.com/pub/DEC/traces/proxy petitiveness for paging. Algorithmica,June 1994,
/webtraces.html. vol. 11,(no.6):525-41. Rewritten version of \On-
[FKIP96] A. Feldman, A. Karlin, S. Irani, S. Phillips. line caching as cache size varies", in The 2nd An-
Private Communication. nual ACM-SIAM Symposium on Discrete Algo-
rithms, 241-250, 1991.
[Ho97] Hosseini, Saied, Private Communication.
[You97] N. Young. Online le caching. To appear in
[LC97] Chengjie Liu, Pei Cao. Maintaining Strong the Proceedings for the 9th Annual ACM-SIAM
Cache Consistency in the World-Wide Web. In Symposium on Discrete Algorithms, 1998.
Proceedings of the 1997 International Conferences
on Distributed Computing Systems, May, 1997.

You might also like