Experiments of Large File Caching and Comparisons of Caching Algorithms
Experiments of Large File Caching and Comparisons of Caching Algorithms
245
conducted based on simulation of data traces collected The value of Lvalue tarts from 0 and is updated every
through experiments. The total number of requests from time a file is evicted. The new value for Lvalue will be
these files (at least 1MB) was 130,866, and the total size the key value of the removed file. If a request is a miss,
for the files was 380.81GBytes. There were 62,530 the file will be fetched from the remote server. If the free
unique requests and their total size was 205.86GBytes. space in the cache is smaller than the size of the
The files in the trace file were collected between January incoming file, one or more files that has the minimum
05 and January 14, 2004 inclusive. key will be removed. The file will then be cached and:
Each algorithm is further explained as follows. - the frequency (f) count is set to 1.
LRU –Least Recently Used. LRU removes the file - key is recalculated from formula (A)
that was least recently accessed. In this method, time is - cache used = cache used + size(f)
used to determine the file to be removed. More than one If a request is a hit:
file can be replaced if the incoming file size is greater - file frequency is increased by 1;
than the size of the replaced least recently object. - key is recalculated from formula (A)
LRU Size – Least Recently Used with Size. This - cache size does not change
method determines the file to be removed by looking at
the least recently used and the size of the file. The file to Static Cache. The static cache algorithm is a simple
be removed must have size which is greater or equally to algorithm that caches all the popular requests that were
the size of the incoming file to be cached. The main aim made on the current day. The cache is deleted and reset
of the algorithm is to replace one file if the cache is full. at the end of each day.
LRU Threshold – LRU with Threshold. In this Static Algorithm
algorithm a value known as a threshold and the time for Gather all the requests for the day
the last access to the file are used to choose a file to For each requested file {
replace from the cache. The file that has the longest Set value = #references / size of file
recent-accessed time and its size is less than the }
threshold value is evicted from the cache if a new file is Sort these files from descending order
to be cached cannot fit in the current cache. Files that are Populate the cache from the top of the sorted list
larger than the threshold value are not replaced from the Largest Size. This algorithm determines the victim
cache. The idea is to keep larger files in cache to save file by removing the largest file from the cache. Size is
bandwidth. the key parameter in this method. The rationale here is
LFU –Least Frequently Used. This algorithm that more disk space will be available by replacing the
removes the file that has been accessed the least. largest file.
Frequency is the key parameter that is used to determine Table 2 summarizes the implemented caching
the object to be replaced. algorithms and the rationale in determining the file for
LFU Size –Least Frequently Used with Size. This replacement. Each algorithm is further explained as
algorithm replaces the file that has been accessed the follows.
least. Frequency is the key parameter that is used to Table 2. Summary of the Implemented Algorithms
determine the file to be replaced. The file to be removed Algorithm Replacement Policy
must have size which is greater or equally to the size of LRU Least recently accessed first
LRU LRU and file size in a specified range with
the incoming file to be cached. Similar to LRU Size, the Size respect to incoming file size
main aim of the algorithm is to replace just one file if the LRU LRU and file size less than a certain threshold
cache is full. Threshold
LFU Threshold – LFU with Threshold LFU Least frequently accessed first
LFU LFU and file size in a specified range with
This algorithm removes the file that has been accessed Size respect to incoming object
the least and its size is less than or equal to the threshold LFU LFU and file size less than a certain threshold
value. The idea is similar to LRU Threshold. Threshold
Greedy Dual Size Frequency [13]. The key GreedyDual Least value first according to
SizeFreq Key(f) = L + (F*B)/S;
parameters in this method are file size, number of LargestSize Largest file first
references (frequency) to the file and the cost. The cost Static No files are evicted, the cache is reset at the
of downloading the file from the remote server is equal end of the day, based on yesterday’s request
value (frequency/# of files)
to the average bandwidth usage of the file. Each file that
is cached is assigned a key calculated using the above
parameters. If the cache is full, the file that has the least 5. Results and Analysis
key will be evicted. The key is calculated using the Hit Ratios. Figure 3 shows the hit ratios for each of
formula shown below: the implemented algorithms.
Key(f) = Lvalue + freq(f) * bandwidth(f) / size(f) …(A)
246
Figure 3. Hit Ratio Figure 5. Bytes Saved
Cache sizes Vs Hit Ratios Cache Sizes Vs Saved Bytes
0.6 1.40E+11
1.20E+11
S a v e d B y te s (B y te s )
Hit Ratios LRU Saved Bytes: LRU
0.5
Hit Ratios LFU 1.00E+11 Saved Bytes: LFU
Saved Bytes: GreedyDual
0.4 Hit Ratios GreedyDual
H it R atios
BHR: LRUSize
0.35 NoCache
BHR: LRUThresh
0.3 BHR: LFUSize
ba 6000 bandwidth saved 64GB
BHR: LFUThresh
nd
0.25 wi
5000
BHR: LargestSize
0.2 BHR: Static dth 4000
0.15 (K 3000
0.1 Bit 2000
s/s
0.05 ec)
1000
0 0
0 14 29 44 59 74 89 10 11 13
1 4 8 16 32 64 100 120 150 9 8 7 6 5 4 43 92 41
Cache Sizes(GBytes)
average time(30 minutes)
247
collected in our experiments. GreedyDualSize-Freq has [13] S. Jin and A. Bestavros, "Popularity-aware GreedyDual-
performed better than the LFU, because Size Web Proxy Caching Algorithms", Proc. of ICDCS, 2000.
GreedyDualSize-Freq method optimizes the byte hit ratio [14] L. Karsten, K. Sven, "Open-CHORD: Distributed and
Mobile Systems Group", http://www.lspi.wiai.unibamberg.de/
and hit ratio with respect to LFU results.
dmsg/software/open_chord/.
LFU can be used if the user is only interested in the [15] Kazza's Architecture, “How a Kazza’s client finds a
popular files that are used frequently. If the hit ratio is song”, http://computer.howstuffworks.
the main concern, then the LRUSize could be the [16] B. Kim, K. Kim, “Keyword Search in DHT-Based Peer-
appropriate algorithm to use. For maximum byte hit ratio to-Peer Networks”, Proc. of 7th Int’l Conf. on Algorithms and
and bandwidth saved LRU generates the highest value. Arch. for Parallel Processing,, 2007, pp.338-347.
We are planning to repeat the experimental work as [17] D. Kosti´c, A. Rodriguez, J. Albrecht, and A. Vahdat,
the nature of our network traffic has changed “Bullet: high bandwidth data dissemination using an overlay
significantly since 2004. When the experiments were mesh”, Proc. of 19th ACM SOSP, 2003.
[18] T. Leighton, Akamai Technologies Inc., “The Challenges
conduced, the university had 30Mbps of Internet
of Delivering Content on the Internet” Keynote Speech,
bandwidth in January, 2004. For the 2006/2007 IASTED Conf. On Parallel and Distributed Systems, 2002.
academic year the bandwidth was expanded to 60Mbps [19] K. Park and V. S. Pai, “Scale and Performance in the
in September and by April it had been doubled again to CoBlitz Large-File Distribution Service”, Proc. of the 3rd
120Mbps [24]. It is expected that more bandwidth has Symp. on Networked Sys Design and Implementation, 2006.
been consumed for identical large files. [20] M. Rabinovich and O. Spatscheck, Web Caching and
Replication, Addison Wesley, 2002.
[21] R. van Renesse, K. Birman, and W. Vogels, “Astrolabe: A
References
robust and scalable technology for distributed system
[1] S. Annapureddy, M. J. Freedman, and D. Mazires, “Shark:
monitoring, management and data mining”, In ACM
Scaling File Servers via Cooperative Caching”, Proc. of 2nd
Transactions on Computer Systems, May 2003.
USENIX/ACM Symp. on Networked Sys Design and
[22] S. van Rompaey, K. Spaey, and C. Blondia, “Bandwidth
Implementation, May 2005.
versus Storage Trade-off in a Content Distribution Network
[2] S. Birrer, D. Lu, F. E. Bustamante, Y. Qiao, and P. Dinda,
and a Single Server System”, Proc. of the 7th Int’l Conf. on
“FatNemo: Building a resilient multi-source multicast fat-tree”,
Telecommunications, 2003, pp. 315-320.
Proc. of 9th Int’l Workshop on Web Content Caching and
[23] W. Shi and Y. Mao, “Performance Evaluation of Peer-to-
Distribution, October 2004.
Peer Web Caching Systems”, Journal of Systems and Software,
[3] M. Castro, P. Drushcel, A. Kermarrec, A. Nandi, A.
May 2005, pp. 714-726.
Rowstron, and A. Singh, “SplitStream: High-bandwidth
[24] J. Stewart, Personal communication (email) with C.-H.
content distribution in a cooperative environment”, Proc. of
Lung, May 29th 2007.
SOSP, Oct 2003.
[25] D. C. Verma, “Overview of Content Distribution
[4] L. Cherkasova and J. Lee, “FastReplica: Efficient Large
Networks” and “Site Design and Scalability Issues in Content
File Distribution within Content Delivery Networks”, Proc. of
Distribution Networks” in Content Distribution Networks, New
the 4th USITS, March 2003.
York, NY: John Wiley & Sons, Inc., 2002, pp 1-61.
[5] “Open Chord Specification”, http://open-
[26] D. C. Verma, “CDN Data Sharing Schemes” in Content
chord.sourceforge.net.
Distribution Networks, John Wiley & Sons, Inc., 2002.
[6] Y. Chu, S. G. Rao, S. Seshan, and H. Zhang, “A case for
[27] “Web Caching: Making the Most of Your Internet
end system Multicast”, IEEE J. on Selected Areas in
Connection”, http://www.web-cache.com/.
Communication, 2002.
[28] B. Whitehead, A Scalable Anycast Technology for
[7] Cisco Systems, “Network Caching Technologies”,
Caching Content Distribution Networks, Project Report, Dept
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/net
of Sys and Comp Eng, Carleton Univ., Canada, 2004.
_cach.htm.
[29] K.-H. Yang and J.-M. Ho, “Proof: A Novel DHT-Based
[8] B. Cohen. Bittorrent, 2003,
Peer-to-Peer Search Engine”, IEIEC Trans. on
http://bitconjurer.org/BitTorrent.
Communications, E90-B4, 2007, pp. 817-825.
[9] “The Coral Content Distribution Network”,
http://www.coralcdn.org/.
[10] “Gnutella Protocol Definition”, Acknowledgements:
http://rfcgnutella.sourceforge.net/. We are grateful to John Stewart at Computer Computing
[11] Internet Caching Resource Center, Services, and Narendra Mehta and Dave Sword in the Systems
http://www.caching.com/caching101.htm and Computer Engineering Department for their trust, support,
[12] Internet Systems Consortium Inc., “ISC Domain Survey: and encouragement.
Number of Internet Hosts”, 2004,
http://www.isc.org/index.pl?/ops/ds/host-count history.php.
248