CSD Final Report
CSD Final Report
There are some learning based algorithms for caching like LeCaR, which has access to
two simple policies,LRU and LFU. LeCaR uses regret minimization, a machine learning
technique that allows the dynamic selection of one of these policies upon a cache miss.
But, it has been shown that LeCaR underperforms many state-of-the-art algorithms
such as ARC and LIRS for many production workloads.
In this project, an attempt has been made to design an adaptive learning cache
replacement algorithm, which will perform better than the state-of-the-art. Simulation
experiments have been conducted on industry-level workloads and findings have been
comprehensively showcased. Also, a mirror of wikipedia.org was made by running a
caching Nginx Proxy in front of it. This would help reduce latency and availability of wiki
contents even in the times of DDoS attacks.
Please refer to the GitHub repo for the full implementation of the algorithm.
In this project, a learning based caching algorithm has been designed and is used to
cache wikipedia.org using a virtual private server (VPS).
High Level Design of the system is given below:
Wikipedia's infrastructure provides regular database dumps and static HTML archives to
the public, and has permissive licensing that allows for rehosting with modification. In
this project, these publicly available dumps have been used to host Wikipedia mirrors.
A Nginx caching proxy was created in front of wikipedia.org and it would serve the
clients. If the user’s request is present in the cache server, it will be served via proxy
only. Otherwise, the request will be sent to the original wikipedia server. In this way, it
will help in reducing the latency and enhancing user experience. It also has the benefit
that if some DDoS attacks happen on the original wikipedia, then the user’s request can
be served from the proxy server if it is present in the cache.
Note: Due to unavailability of Cloud Platforms, proxy server in this project was
established on local machine’s virtual machine(VM).
when learning rate was initialized at 0.05 and cache size was set at 7 and 10 :
when learning rate was initialized at 0.05 and cache size was set at 15 and 20 :
when the learning rate was initialized at 0.1 and cache size was set at 7 and 10:
when the learning rate was initialized at 0.1 and cache size was set at 15 and 20:
when the learning rate was initialized at 0.3 and cache size was set at 7 and 10:
when the learning rate was initialized at 0.3 and cache size was set at 15 and 20:
This is the result obtained for MSR Cambridge workload. Learning based algorithm
was compared with many state of the art algorithms and at varying learning rates. It can
be seen that the Learning based caching algorithm has performed better than many
standard algorithms.
Note: All these results are also available in the github repo.
[ 3 ] Rodriguez, L. V., Yusuf, F. B., Lyons, S., Paz, E., Rangaswami, R., Liu, J., Zhao, M., and
Narasimhan, G. Learning cache replacement with cacheus. In FAST (2021), pp. 341–354.
[ 4 ] Jiayi Chen, Nihal Sharma, Tarannum Khan, Shu Liu, Brian Chang, Aditya Akella, Sanjay
Shakkottai, and Ramesh K Sitaraman. 2023. Darwin: Flexible Learning-based CDN Caching. In
Proceedings of the ACM SIGCOMM 2023 Conference (ACM SIGCOMM '23). Association for
Computing Machinery, New York, NY, USA, 981–999. https://doi.org/10.1145/3603269.3604863
[6] https://learn.microsoft.com/en-us/azure/cdn/media/cdn-overview/cdn-overview.png