0% found this document useful (0 votes)
29 views6 pages

Revisiting Cacheability in Times of User

This paper analyzes the cacheability of user-generated content on the Internet, focusing on protocols such as HTTP, BitTorrent, eDonkey, and NNTP. The study, based on data from over 20,000 residential DSL lines, reveals that while client/server protocols like HTTP and NNTP have limited caching potential, P2P applications like BitTorrent show significant opportunities for caching due to high content duplication. The findings suggest that re-evaluating caching strategies could effectively reduce network traffic costs, particularly for static multimedia content.

Uploaded by

holahole4869
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Revisiting Cacheability in Times of User

This paper analyzes the cacheability of user-generated content on the Internet, focusing on protocols such as HTTP, BitTorrent, eDonkey, and NNTP. The study, based on data from over 20,000 residential DSL lines, reveals that while client/server protocols like HTTP and NNTP have limited caching potential, P2P applications like BitTorrent show significant opportunities for caching due to high content duplication. The findings suggest that re-evaluating caching strategies could effectively reduce network traffic costs, particularly for static multimedia content.

Uploaded by

holahole4869
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Revisiting Cacheability

in Times of User Generated Content


Bernhard Ager, Fabian Schneider, Juhoon Kim, Anja Feldmann
{bernhard|fabian|jkim|anja}@net.t-labs.tu-berlin.de
TU Berlin, Deutsche Telekom Laboratories, Ernst-Reuter-Platz 7, 10589 Berlin

Abstract—Today’s Internet traffic is dominated by users’ de- study a set of application protocols. For these we investigate
mand for exchanging content. In particular, multi-media content, the potential of caches, traffic redirection for sharing content,
i. e., photos, music, and video, as well as software downloads and as well as causes of non-cacheability.
updates contribute substantially to today’s Internet traffic. One
option for reducing network costs is to use caches—exploiting In this paper we present observations based on passive
the observation that content popularity is consistent with Zipf’s packet-level monitoring of more than 20,000 residential DSL
law. Yet, Web caching became unprofitable due to the increase lines from a major European ISP. This unique vantage point
in popularity of dynamic Web content. However, since at this coupled with our application protocol analysis capabilities
point rich content is not very dynamic caching appears to be enables a more comprehensive and detailed characterizations
worthwhile again.
We base our analysis on anonymized traces from a large Euro- than previously possible. We focus on the cacheability of
pean ISP connecting more than 20,000 residential DSL customers multiple applications that are predominantly used for shar-
to the Internet, collected in 2009. We focus on the most prominent ing content in our environment1 : (i) HTTP, (ii) BitTorrent,
protocols in this environment—HTTP, BitTorrent (BT), eDonkey, (iii) eDonkey, and (iv) NNTP. Note, that HTTP and NNTP are
and NNTP—and estimate the potential of caching for traffic client/server protocols while BitTorrent and eDonkey are P2P
reduction. On the one hand, our results show that the potential
for caching most client/server-based applications like HTTP and protocols. It might seem odd to include NNTP, the protocol
NNTP is small. On the other hand P2P-based applications such used by Usenet, but we surprisingly find that NNTP accounts
as BitTorrent and certain HTTP based applications have high for more than 2 % of the total traffic volume and seems to be
content duplication ratios. used as an alternative for file-sharing [10]. In previous work,
Karagiannis et al. [9] study the cacheability of BitTorrent
I. I NTRODUCTION
before it became one of the dominant file-sharing applications
The Internet has evolved into a system where users can based on data from a residential university enviroment. Erman
easily share content with their friends and/or other users via et al. [6] only focus on the cacheability of Web traffic.
applications such as wikis, blogs, online social networks, P2P We find that the story for caching is ambivalent. For
file-sharing applications, One-Click Hosters, or video portals, client/server-based applications, including NNTP and some
to name a few of the most well-known user generated content Web domain classes, e. g., One Click Hosters, caching is
(UGC) services. In terms of volume, multi-media content ineffective. For other HTTP services, e. g., Software/Updates,
including photos, music, and videos, as well as software we observe caching efficiencies up to 90 %. In addtion, for
downloads and updates, are major contributors and together some domains, using opportunistic cache heuristics improves
responsible for most of the Internet traffic [11], [12], [16], cacheability substantially. For P2P protocols, especially Bit-
[17]. Indeed, HTTP is again accounting for more than 50 % Torrent, there is substantial potential for caching if the cache
of the traffic [11], [12], [16], [17] and is hardly (mis-)used actively participates in the protocol. Moreover, traffic localiza-
as transport protocol for other applications [12]. Among the tion via mechanisms, such as those proposed by, e. g., Aggar-
causes for the increase of HTTP traffic are One-Click-Hosters wal et al. [1], Xie et al. [21], Choffnes and Bustamante [3],
such as rapidshare.com or uploaded.to and the increase of and currently under discussion within the IETF ALTO [18]
streaming content, e. g., offered by youtube.com. working group, are promising2 . Traffic localization in effect
In the early stages of the Internet Web caches were very uses local peers as cache.
popular. However, the efficiency of Web caches [7], [15]
decreased drastically as the popularity of advanced features II. DATA S ETS
increased: dynamic/personalized Web pages (via cookies),
We base our study on multiple sets of anonymized packet-
AJAX-based applications, etc. Reexamining today’s content
level observations of residential DSL connections collected at
we find that a large fraction, especially multi-media content,
is static and therefore it might be rewarding to re-evaluate 1 In previous work Maier et al. [12] found that at our vantage point HTTP

the caching potential. This is confirmed by recent caching is responsible for more than 50 % of the traffic while P2P (BitTorrent and
efficiency studies for specific applications, e. g., Gnutella [8], eDonkey) contribute less than 15 %. Even assuming all unclassified traffic to
be P2P in total it only accounts for less than 30 %.
Fasttrack [20], YouTube [22], BitTorrent [9], and Web [6]. 2 The key idea is that ISPs and P2P users collaborate to locate close by
Rather than focusing on a specific application, we in this paper peers.
TABLE I
OVERVIEW OF ANONYMIZED PACKET TRACES AND SUMMARIES .

Application Volume
Name Start date Dur Size HTTP BitTorrent eDonkey NNTP
APR09 Wed 01 Apr’09 2am 24 h >4 TB 58 % 9% 3% 2%
AUG09 Fri 21 Aug’09 2am 48 h >11 TB 63 % 9% 3% 2%
HTTP-14d Wed 09 Sep’09 3am 14 d > 200 GB corresponds to > 40 TB HTTP
NNTP-15d Wed 05 Aug’09 3am 15 d > 2 GB corresponds to > 2 TB NNTP
BitTorrent-14d Sat 20 Jun’09 3am 14 d > 80 GB corresponds to > 5 TB BitTorrent

Other Customers of AS (AS hosts)


aggregation points within a large European ISP. Our monitor,
AS Edge
using Endace monitoring cards, allows us to observe the traffic Router

of more than 20,000 DSL lines to the Internet. The data


anonymization, classification, as well as application protocol Monitoring Customers down-
Point
specific header extraction is performed immediately on the Internet/other ASes Broadband stream of monitoring
(external hosts) Access Router point (local hosts)
secured measurement infrastructure using the Bro NIDS [13]
with dynamic protocol detection (DPD) [5]. For HTTP we Fig. 1. Vantage point and sets of hosts: local, AS, and external
extend the standard protocol analyzer to compute MD5 hashes
across the HTTP bodies. In addition, we developed DPD-based
analyzers for NNTP, eDonkey, and BitTorrent [4], including principle cacheable. We, in this paper, explore the potential for
the Azureus Messaging Protocol [2] and the LibTorrent Ex- cacheability rather than focusing on specific caching heuristics,
tension Protocol. thus we are never limited by disk space.
We use an anonymized 24 h packet trace collected in April We use the following two cacheability metrics: number of
2009 (APR09) and an anonymized 48 h trace collected in cacheable requests and cacheable volume. If ki denotes the
August 2009 (AUG09). These are the same data sets as total number of downloads for item i of size si the cacheable
analyzed by Maier et al. [12]. While we typically do not volume of n items is computed as cacheability according to
experience any packet loss, there are several multi-second Equation (1). The cacheable requests are calculated using the
periods (less than 5 minutes overall per packet trace) with no same equation with all sizes equal to 1.
packets due to OS/file-system interactions.
For studying long term effects of cacheability we used ∑i=1..n (ki − 1) · si
cacheability = (1)
Bro’s online analysis capabilities to collect several anonymized ∑i=1..n ki · si
protocol specific trace summaries (BitTorrent-14d, NNTP-15d, To estimate the impact of caching it is important to consider
HTTP-14d) which span at least 2 weeks. Due to the amount of which mechanisms are available to redirect requests to the
traffic at our vantage point and the resource intensive analysis cache and where the cache is located. If P2P content is
we gather the online trace summaries one at a time. Table I available at multiple different locations one can use mech-
summarizes characteristics of the traces, including their start, anisms [1], [3], [21] currently under discussion within the
duration, size, and application mix. IETF ALTO [18] group. In addition, it may be beneficial or
With regards to the application mix, see Table I, Maier even necessary to setup dedicated caches within the network.
et al. [12] find that HTTP, NNTP, BitTorrent, and eDonkey With regards to network location, see Figure 1, we distinguish
each contribute a significant amount of traffic. Moreover, their between (i) hosts that are downstream from the monitor, local
total traffic adds up to more than 72 % of the overall traffic hosts, (ii) hosts that are within the ISP’s autonomous system
at our vantage point. Similar protocol distributions have been (AS) yet not local, AS hosts, and (iii) external hosts.
observed at different times and at other locations of the same
ISP. A. Cacheability: Peer-to-Peer
Surprisingly, NNTP accounts for more than 2 % of the In P2P the unit for caching is either the complete object or
traffic. However, a detailed investigation [10] shows that 99 % the transfer unit if the P2P protocol splits the file into chunks,
of the current traffic is bound to/from fee-based NNTP servers. e. g., for more efficient transfers. The former captures the
These fee-based offers are competing with One-Click-Hosters, number of users interested in an object. The latter corresponds
as the client/server-based alternative for file-sharing. to the observed traffic. Since users may not download complete
objects while online these metrics can differ significantly. In
III. T ERMINOLOGY AND A PPROACH BitTorrent the complete objects are the torrents and the transfer
Before delving into the details of the application specific units are the (fixed size) blocks. In eDonkey the objects are
cacheability analysis we introduce our terminology. Caching the files and transfer units are blocks.
refers to saving a copy of a reply to a request on a server— Given that a BitTorrent retrieval can last multiple days it
the cache—with the intention to satisfy subsequent requests is likely that a any trace includes some partial downloads.
for the same content from the cache instead of the origin. Regarding a cacheability analysis the lack of knowledge about
For us all requests to a content item except the first one are in transfers that occurred before the start of the observation
TABLE II
period is problematic as these could have primed the cache. C RITERIA FOR IDENTIFYING CACHEABLE OBJECTS IN HTTP.
Fortunately, for BitTorrent we can leverage the bitfield and scenario object ID HTTP return 2nd level host path cache
have messages to infer which transfers occurred prior to the method code domain control
trace start using a similar methodology as Karagiannis et ideal ✔
domain ✔ ✔ ✔ ✔
al. [9]. complete ✔ ✔ ✔ ✔ ✔
Given that P2P clients are also P2P servers we can in full ✔ ✔ ✔ ✔ ✔ ✔
addition estimate a lower bound for the cacheability for AS realistic ✔ ✔ ✔ ✔ ✔
hosts and external hosts. Thus, we can study the potential
of P2P cacheability with regards to (i) the number of peers
interested in an object, referred to as peers, (ii) transferred RFC 2616. However, unlike a real cache, we are still using the
blocks, called blocks, and (iii) transferred blocks given a object ID. We use this scenario only to evaluate the negative
primed cache, called primed, for all three classes of hosts. impact of the object ID on cacheability.
In the fifth and last scenario, realistic, the cache behaves
B. Cacheability: Client-Server like an ideal caching proxy server with unlimited disk space.
For HTTP and NNTP we note the following differences: The cache always performs the best possible caching option.
(i) Each expression of interest for an object corresponds to a E. g., if an object is stale it is not purged from the cache.
full or partial download of the whole object; (ii) we cannot Rather the next query for the same object causes a conditional
infer information for clients outside of our local network. request to the HTTP server, thus allowing the refresh of a
Moreover, cache control in HTTP 1.1 offers a lot of options cached object without actually downloading it. If an object
via explicit cache-control headers. To identify HTTP objects is already partially cached when a request occurs, only the
we use both—the size and the MD5 sum of the HTTP body missing parts are fetched from the server if the object has
(object ID). not changed in the meantime. If no expiration time is set
With regards to cacheability the units of interest for NNTP we use a heuristic motivated by RFC 2616: texpiry = tnow +
are articles. NNTP articles are comparable to objects in HTTP min(0.1 · (tnow −tLast−Modi f ied ), 1 day). Table II summarizes the
or emails in IMAP. Commands for accessing articles are five HTTP scenarios. Note, only the scenarios full and realistic
ARTICLE, HEAD, and BODY, which request to download the are using timeouts.
whole article, only the headers, and only the body, respectively.
Although NNTP includes a large number of commands the IV. R ESULTS
above ones are the only ones not used for navigating, control- Our analysis shows that the potential cacheability differs
ling, and listing. Accordingly, the other commands contribute substantially among the application protocols.
only a tiny fraction (less than 2 ‰) to the overall NNTP
volume. NNTP differs from HTTP as article IDs are unique A. Cacheability of Peer-to-Peer
and articles cannot be modified except by the administrator. Even though P2P is no longer dominating residential traffic,
To identify objects we thus use the article ID. P2P still contributes a significant amount of traffic. In fact,
We study a range of different cacheability scenarios ranging BitTorrent and eDonkey are the second and third most volu-
from ideal to realistic, see Table II. The ideal scenario is minous protocols after HTTP, with more than 10 % of the total
used to derive an upper bound on the cacheability using traffic volume.
Equation (1). This is the only scenario applicable to NNTP. In Figure 2 we show the complementary cumulative dis-
However, for HTTP this scenario is practically infeasible as tribution functions of the distribution we use to calculate
one would have to reliable predict that two requests (different cacheability of BitTorrent for BitTorrent-14d. The first indicator
URLs) are for the same content. We observe that CDNs are for estimating the potential of caching is the number of peers
often performing load balancing and thus serve the same con- per torrent swarm, scenario peers. Figure 2 (a) shows this
tent under URLs that only differ in the host name. Moreover, a distribution for local, AS, external, and all hosts. We see that
realistic cache is typically unable to cache objects if two clients many torrents have a sizable number of peers within the AS.
use different query methods or if the return code differs. The If all peers within the AS would do a complete download
domain scenario thus adds these three criteria to the object ID, 97 % of the bytes are downloadable from peers within the
and therefore gives us insight on the impact of load balancing AS—corresponding to an AS-wide caching efficiency of 97 %.
over different servers. To account for the observation that This is a lower bound as there may be other peers within
identical object IDs can be hosted by the same providers at the AS. When we consider only local hosts caching efficiency
different URLs (host name and/or path), e. g., to help with drops substantially to 27 %. Nevertheless, this is still promising
load balancing or to accommodate GET parameters we use for caching.
the scenario complete. When focusing on scenario primed we find that 85 % of the
In the fourth scenario, full, we simulate an actual download volume are in principle cacheable. However, this
HTTP cache which respects the cache control headers: cacheability result relies on the availability of hosts that are
Pragma, Cache-Control, Expires, Last-Modified, Etag, torrent seeders (learned via the bitfield and have messages). We
Authorization, and the HTTP methods as specified in find that some hosts are seeders for a large number of torrents,
100

100

100
10−2
10−2
CCDF

CCDF

CCDF

10−4
10−4
overall
10−4

within POP within POP overall

10−6
within AS within AS from AS

10−8
extern extern from extern

5 10 50 500 5000 1 5 10 50 500 1 5 10 50 500

(a) Peers per torrent (b) Torrents per peer (c) Downloads per block
Fig. 2. Complementary cumulative distribution functions (CCDFs) of BitTorrent analysis for BitTorrent-14d (note the distinct scales).

TABLE III
C ACHEABILITY OF NNTP ARTICLES
35
29.9
% of total BitTorrent Volume

Download Cacheability APR09 AUG09 NNTP-15d


30
Upload by number of requests 2.0 % 2.6 % 7.0 %
21.5

25 by volume 2.1 % 2.7 % 6.9 %


20
15.1

15
10.2

10 locations normalized by the overall BitTorrent traffic volume.


4.6
3.3
2.6

5 We distinguish between continents and within Europe we


1.2
0.6
0.6
0.2
0.1

1
0
0

0 separate the local language region of the vantage point from


Antarctica

Afrika

Asia

Lang. region

Europe (rest)

North America

Oceanina

South America

the rest of Europe. Only one third of the content is being


downloaded from the same language region. This indicates
that while there many be some bias due to language it is not
the dominating effect with regards to cacheability. The overall
download volume is roughly twice that of the upload volume.
Fig. 3. Normalized BitTorrent traffic volume
This effect may be due to the asymmetry of the underlying
DSL lines. In addition, Europe seems to be preferable as the
upload to download ratio is better.
see Figure 2 (b), and that these are online for substantial time For eDonkey the cacheability results for peers per file and
periods. This is consistent with the results of Stutzbach and transferred blocks are not quite as good. For a 24h trace
Rejaie [19]. However, if we only consider actual downloads, from September 2008 we find that the cacheability per file
scenario block, the cacheability drops to roughly 9 %, see is 71 % within the AS. However, when considering blocks the
Figure 2 (c), contrary to the previous results. Part of the cacheability again drops substantially to roughly 4 %.
reason is that we may not have observed a flashcrowd effect
during the 14 day observation period. Another reason is that B. Cacheability of NNTP
clients often are seeders for many torrents but are only actively Our caching analysis of NNTP shows that the majority
involved in a small number of downloads. For data from 2004 of all articles is requested exactly once, see Table III. We
from a residential complex at a university Karagiannis et al [9] find that less than 7 % of all articles are downloaded multiple
found that the cacheability for actual downloads is between times. With respect to volume we observe similar results—the
6 and 11.8 % which is consistent with our results. However, cacheability is again less than 7 %. While we cannot identify
they saw a substantially lower overall cache efficiency with a temporal trend cacheability increases with the length of the
less than 18.5 %. observation period, i. e., NNTP-15d, and thus also the number
We also note that almost all (> 89 %) of the chunks are of different DSL lines using the NNTP protocol. However,
downloaded form and uploaded to external hosts even though we note that a small number of lines, less than 2 %, are
the content is available locally, see Figure 2 (c). We are able to using NNTP. However, NNTP users are usually among the
identify such content by inspecting the bitfield messages of the top volume users.
peers. This shows the potential of both P2P neighbor selection
strategies [1], [3], [21] as well as caches and confirms the C. Cacheability of HTTP
results of Plissonneau et al. [14] for eDonkey and Karagiannis Given the number of different HTTP scenarios we first dis-
et al. [9]. cuss some general observations. Next, we explore if cacheabil-
To check if our observations are biased by the language ity differs by Web service, how it scales with population
of the content we examine the geolocation of the BitTorrent size, and what might be possible cache optimizations. In the
traffic with the help of the ISP. Figure 3 shows the amount following we report results only for HTTP-14d as the results
of BitTorrent traffic downloaded from or uploaded to different are consistent across all traces.
TABLE IV TABLE VI
E FFECTIVE CACHE CONTROL HEADERS OVERALL HTTP C ACHEABILITY

Cache control header Frequency ideal domain complete full realistic


Cache-control 57.2 % Bytes 28 % 27 % 21 % 9.5 % 22 %
Pragma 0.5 % Requests 71 % 71 % 57 % 16 % 47 %
Expires 1.7 %
Etag 22.8 %
Last-Modified 6.8 %

22
none 12.0 %

Cacheability in %
20
TABLE V
C ACHEABILITY OF TOP 15 DOMAINS ( BY BYTES )

18
type of service UGC fraction complete full realistic

16
OCH1 ✔ 12.6 % 2.6 % 0.0 % 1.5 %
OCH2 ✔ 1.3 % 6.0 % 1.1 % 2.0 % 1/8 1/4 1/2 1
OCH3 ✔ 1.0 % 7.1 % 0.0 % 0.2 %
OCH4 ✔ 0.9 % 2.6 % 0.0 % 0.1 % Fig. 4. Cacheability dependent on fraction of population
Video1 ✔ 10.8 % 13.0 % 0.0 % 3.8 %
Video2 ✔ 2.2 % 43.4 % 0.1 % 5.2 %
Video3 ✔ 1.4 % 7.0 % 0.0 % 0.2 %
Video4 ✔ 1.4 % 14.7 % 1.5 % 3.6 % We find that this can lead to measurement error of over 40 %
Video5 ✔ 1.1 % 11.7 % 2.5 % 9.0 %
Software1 2.8 % 63.0 % 68.6 % 64.8 %
in download volume. Moreover, Erman et al. only consider
Software2 1.8 % 22.0 % 2.9 % 87.4 % the Cache-Control header. However, one third of the replies
Software3 0.9 % 12.9 % 3.3 % 54.7 % with cache control are controlled by a different header, e. g.,
Software4 0.8 % 56.9 % 42.7 % 65.4 %
CDN1 ? 1.5 % 34.8 % 12.8 % 25.4 %
Expires, as listed in Table IV.
Search ? 1.0 % 56.0 % 5.3 % 32.7 % Individual Sites: Table V shows the cacheability for the
overall 100.0 % 21.0 % 9.5 % 21.7 % top 15 domains (by bytes) classified according to the Web
service that they offer for the scenarios complete, full, and
realistic. In column UGC we mark if a domain is dominated by
General Observations: In principle, there is substantial po- user generated content. When comparing scenarios complete
tential for caching HTTP (see Table VI). In the ideal scenario and full we see that respecting cache control headers often
71 % of the requests are cacheable and 28 % of the bytes. has a devastating effect on cacheability for some domains.
This is consistent with previous results [7] which also observe When comparing scenarios full and realistic we see that the
a significantly higher request hit rate than byte hit rate. negative impact of the object ID also differs across sites.
Disabling caching across different second level domains, The site Software1 differs: realistic cacheability is lower than
scenario domain, does not decrease the caching efficiency by predicted by the full scenario. This can occur if an intermediate
much. It is still 71 % for requests and 27 % for bytes. Disabling HTTP request invalidates the cache before allowing access to
caching across different URLs for the same object, scenario a resource, e. g., when delivering a login page instead of the
complete, causes the efficiency to drop to 57 % and 21 %. This requested object upon missing authentication cookies.
shows that identical objects are usually not hosted by different There are significant differences among the Web hosters.
providers, while for each provider it is quite common to host Sites offering software appear to ensure good cacheability
the same object on different hosts or with different paths. (e. g., > 50 %) and can take advantage of caching (e. g., >
Including cache control headers, the scenario full, reduces 60 % for the realistic scenario). Some sites hosting videos
cache efficiency drastically to 16 % and 9.5 %. However, in have substantial potential for caching (> 40 %) but do not take
the realistic scenario overall cacheability increases to 47 % advantage of it. CDNs also have potential but realize it only
and 22 %. The omission of the object ID is responsible for partially (34.8 vs. 12.8 %). One click hoster (OCH) have hardly
this increase in cacheability: (i) the cache may be allowed to any caching potential (< 8 %) and do not even take advantage
serve an object that has changed on the server; (ii) aborted of the little potential. The caching hit rate for the realistic
downloads and partial requests lead to different object IDs scenario is less than 2 %. We observe that sites dominated by
and are thus only cacheable in the realistic scenario. user generated content exhibit considerably lower cacheability
The results of Erman et al. [6] for the ideal scenario are than other sites, e. g., software hosters.
even more promising. They found a cacheability of 92 % Population Size: Next, we explore the impact of the popula-
for requests and 68 % for bytes. Their final results after tion size on cacheability. We randomly subdivide our popula-
considering cache-control also show a substantial drop but tion into smaller sub-populations and recompute the cacheabil-
again indicate a better cache hit rate with 32 % of the bytes. ity for the realistic scenario. We observe that cacheability
One reason for the more promising results are that they assume appears to increase with population size. When doubling the
that the size of the download is equal to the Content-Length population size the increase in cacheability ranges from 1.6 %
header. However, we find that many downloads are interrupted to 2.9 %, cf. Figure 4. We presume that the caching potential
prematurely. In particular large downloads are often aborted. further increases with an increase in population. However,
TABLE VII
O PPORTUNISTIC CACHING FOR TOP 15 DOMAINS ( SCENARIO realistic). if unused due to cache control, personalization, and load
D OMAINS WITH NO IMPROVEMENT ARE NOT SHOWN . balancing.
Caching for P2P protocols is in principle very promising,
improvement (percentage points)
type baseline 10 s 10 min 1h 1d ∞ especially when combined with P2P neighbor selection strate-
OCH1 1.5 % 8.1 % 16.6 % 17.0 % 17.8 % 18.4 % gies. However, taking advantage of the potential is non-trivial
OCH2 2.0 % 0.0 % 0.1 % 0.1 % 0.1 % 0.1 % as the cacheability for simple chunk downloads drops to 9 %.
Video4 3.6 % 0.2 % 1.0 % 1.2 % 1.3 % 1.3 %
Video5 9.0 % 0.0 % 0.1 % 0.1 % 0.3 % 0.4 % Therefore, we plan to explore how to take advantage of the
Software3 54.6 % 0.0 % 0.0 % 0.0 % 0.0 % 0.1 % potential caching rates of more than 95 % for BitTorrent in
CDN1 25.4 % 0.1 % 3.0 % 9.5 % 19.5 % 23.7 % future work.
Search 32.7 % 0.0 % 0.3 % 0.5 % 0.7 % 0.7 %
overall 21.7 % 1.1 % 2.6 % 2.9 % 3.5 % 4.0 % R EFERENCES
[1] AGGARWAL , V., F ELDMANN , A., AND S CHEIDELER , C. Can ISPs and
P2P users cooperate for improved performance? SIGCOMM Comput.
there may be saturation effects. Also note that the variability Commun. Rev. 37, 3 (2007).
of cacheability increases with decreasing population size. [2] Azureus messaging protocol. http://www.azureuswiki.com/index.php/
Azureus messaging protocol, Apr 2009.
Cache Optimizations: Next we explore why the potential [3] C HOFFNES , D. R., AND B USTAMANTE , F. E. Taming the torrent: a
for HTTP caching is not used. For this purpose we allow practical approach to reducing cross-ISP traffic in peer-to-peer systems.
violations of the strict caching semantics for two high-volume In Proc. ACM SIGCOMM (2008), pp. 363–374.
[4] C OHEN , B. The BitTorrent Protocol Specification. http://bittorrent.org/
sites. For Video1 we study the impact of personalization and beps/bep 0003.html, 2008.
load balancing over servers with different host names. We [5] D REGER , H., F ELDMANN , A., M AI , M., PAXSON , V., AND S OMMER ,
start at a baseline of 3.8 %. Removing personalization, i. e., R. Dynamic application-layer protocol analysis for network intrusion
detection. In Proc. Usenix Security Symp. (2006).
parameters, from URLs yields an increased cacheability of [6] E RMAN , J., G ERBER , A., H AJIAGHAYI , M. T., P EI , D., AND
20.1 %. Unifying host names increases cacheability to 24.6 %. S PATSCHECK , O. Network-aware forward caching. In Proc. World
Thus, we conclude that personalization can be a major cause Wide Web Conference (2009).
[7] F ELDMANN , A., C ACERES , R., D OUGLIS , F., G LASS , G., AND R A -
for non-cacheability of objects. BINOVICH , M. Performance of web proxy caching in heterogeneous
Some objects do not include any information regarding their bandwidth environments. In Proc. IEEE INFOCOM (1999).
cacheability. Thus in principle they cannot be cached. This [8] H EFEEDA , M., AND S ALEH , O. Traffic Modeling and Proportional
Partial Caching of Peer-to-Peer Systems. IEEE/ACM Trans. Networking
may be either intentional, or by negligence of the operator. We 16, 6 (2008).
now explore if opportunistic caching, meaning setting artificial [9] K ARAGIANNIS , T., RODRIGUEZ , P., AND PAPAGIANNAKI , D. Should
expiry times and thereby violating strict cache semantics, can Internet Service Providers fear peer-assisted content distribution? In
Proc. ACM Internet Measurement Conference (2005).
help for such objects. More specifically, we examine expiry [10] K IM , J., S CHNEIDER , F., AGER , B., AND F ELDMANN , A. Today’s
times of 10 s, 10 min, 1 h, 1 d, and infinite. usenet usage: Characterizing NNTP traffic. In Proceedings of the 13th
The overall effect of opportunistic expiries is only small: IEEE Global Internet Symposium (March 2010).
[11] L ABOVITZ , C., M C P HERSON , D., AND I EKEL -J OHNSON , S. NANOG
2.6 % increase for a ten minutes timeout and 4.0 % for infinite 47: 2009 internet observatory report. http://www.nanog.org/meetings/
caching. However, OCH1 and CDN1 show a large increase nanog47/abstracts.php?pt=MTQ1MyZuYW5vZzQ3&nm=nanog47.
in cacheability. For OCH1 even ten minutes is sufficient to [12] M AIER , G., F ELDMANN , A., PAXSON , V., AND A LLMAN , M. On
dominant characteristics of residential broadband internet traffic. In
gain most of the benefits. Further investigation shows that mis- Proc. ACM Internet Measurement Conference (Nov 2009).
configured download accelerators are responsible. Such accel- [13] PAXSON , V. Bro: A system for detecting network intruders in real-time.
erators download large objects across multiple parallel connec- Computer Networks 31, 23–24 (1999).
[14] P LISSONNEAU , L., C OSTEUX , J.-L., AND B ROWN , P. Detailed analysis
tions. While this is not per se harmful, these accelerators issue of eDonkey transfers on ADSL. In Next Generation Internet Design and
partial requests for overlapping regions. As soon as the desired Engineering, 2006. NGI ’06. 2006 2nd Conference on (2006).
data is fetched, the accelerator closes the connection. However [15] R ABINOVICH , M., AND S PATSCHECK , O. Web Caching and Replica-
tion. Addison-Wesley Professional, 2001.
it takes time to cancel the transaction, and therefore additional [16] S ANDVINE I NC . 2009 global broadband phenomena. http://www.
data is downloaded. We observe clients that open up to 300 sandvine.com/news/global broadband trends.asp, 2009.
parallel connections resulting in an increase of the download [17] S CHULZE , H., AND M OCHALSKI , K. Internet study 2008/2009. http:
//www.ipoque.com/resources/internet-studies/ (need to register), 2009.
volume by a factor of three. With some cache tuning such [18] S EEDORF, J., AND B URGER , E. W. Application-layer traffic optimiza-
extra downloads can be eliminated with a small opportunistic tion (ALTO) problem statement. RFC 5693, Oct 2009.
timeout. [19] S TUTZBACH , D., AND R EJAIE , R. Understanding churn in peer-to-peer
networks. In Proc. ACM Internet Measurement Conference (2006).
V. S UMMARY [20] W IERZBICKI , A., L EIBOWITZ , N., R IPEANU , M., AND W O ŹNIAK , R.
Cache Replacement Policies Revisited: The Case of P2P Traffic. In
Our analysis of 20,000 residential broadband DSL lines Cluster Computing and the Grid (2004), pp. 182–189.
of a large European ISP shows that contrary to recent work [21] X IE , H., YANG , Y. R., K RISHNAMURTHY, A., L IU , Y. G., AND S IL -
BERSCHATZ , A. P4P: Provider portal for applications. In Proc. ACM
caching is not necessarily beneficial. For NNTP and some Web SIGCOMM (2008), pp. 351–362.
domain classes, including sites dominated by user generated [22] Z INK , M., K YOUNGWON , S., Y U , G., AND K UROSE , J. Watch Global,
content, we hardly find any potential. However, some Web Cache Local: YouTube Network Traffic at a Campus Network. In Proc.
SPIE (2008), vol. 6818.
service provider can take advantage of caches, e. g., software
download providers or CDNs have substantial potential even

You might also like