A Methodology For Web Cache Deception Vulnerability Discovery

This document presents a methodology for discovering Web Cache Deception (WCD) vulnerabilities in web applications, highlighting the security risks associated with misconfigured web caches. The authors propose a novel solution that automates the detection of these vulnerabilities, covering previously unaddressed cases and introducing a new attack vector via web-client-based email services. The study includes experimental evaluations demonstrating the effectiveness of the proposed approach against real-world targets.

Uploaded by

opps3c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views8 pages

A Methodology For Web Cache Deception Vulnerability Discovery

Uploaded by

opps3c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Methodology for Web Cache Deception Vulnerability Discovery

a b c d
Filippo Berto , Francesco Minetti , Claudio A. Ardagna and Marco Anisettti
Department of Computer Science, University of Milan, Milan, Italy
{firstname.lastname}@unimi.it, [email protected]

Keywords: Web Cache Deception, Web Cache, Web Security.

Abstract: In recent years, the use of caching techniques in web applications has increased significantly, in line with their
expanding user base. The logic of web caches is closely tied to the application logic, and misconfigurations
can lead to security risks, including the unauthorized access of private information and session hijacking. In
this study, we examine Web Cache Deception as a technique for attacking web applications. We develop a
solution for discovering vulnerabilities that expands upon and encompasses prior research in the field. We
conducted an experimental evaluation of the attack’s efficacy against real-world targets, and present a new
attack vector via web-client-based email services.

1 INTRODUCTION fecting modern applications; ii) a new attack vector

using web mail client, iii) experimentally evaluate the
Content distribution is a common problem in mod- effectiveness of our solution in real scenarios..
ern web applications, as there are rapidly increasing
numbers of users who access the same resources. In 1.1 Motivation and Goals
the recent years, many software products have inte-
grated web caching technologies and services to en- Existing tools, including the one proposed in (Mirhei-
hance the performance of their infrastructure. The in- dari et al., 2022), do not cover a number of relevant
creasing popularity of these techniques has prompted cases and are not automated. In most of the cases they
researchers to investigate the associated security as- do not cover scenarios where there are no caching-
pects, leading to the identification of a new type of related HTTP headers or where there are responses
attack: Web Cache Deception (WCD) (Gil, 2017; with different content for the same cached resource,
Mirheidari et al., 2022; Mirheidari et al., 2020; such as cases where the vulnerable application uses
Nguyen et al., 2019a)1 . These attacks exploit vul- the Cloudflare email obfuscation. In addition, they
nerabilities in caching services to exfiltrate informa- do not cover specific cases of certain software prod-
tion, bypassing access control features and obtaining uct versions, such as the case having advisory CVE–
stored data intended for other users (Mirheidari et al., 2020–151512 . Also rare cases in which the web ap-
2022). Researchers have identified several alternative plication caches all the resources that have an HTTP
attacks, revealing a variety of vulnerabilities in enter- 200 response code are not covered as well as cases in
prise content distribution systems (Mirheidari et al., which the classic payloads cannot be used but a sim-
2020). Although some solutions have been developed ple unique query string must be used. Our solution
to automatically scan for WCD vulnerabilities, they aims to cover all the above cases providing certain de-
do not consider all the possible attack vectors. Our gree of automation. In addition, we propose a novel
contribution with this paper is threefold: i) a novel attack vector for WCD vulnerabilities exploiting web
solution for detecting WCD vulnerabilities capable of mail clients automatically loading web contents. To
covering a wide range of important novel cases af- the best of our knowledge, this approach has never
been discussed before in literature or in public domain
a https://orcid.org/0000-0002-2720-608X resources.
b https://orcid.org/0009-0007-1272-956X
c https://orcid.org/0000-0001-7426-4795
d https://orcid.org/0000-0002-5438-9467
1 Practical Web Cache Attacks: https://portswigger.net/
research/practical-web-cache-poisoning 2 https://nvd.nist.gov/vuln/detail/CVE-2020-15151
1.2 Paper Structure storing private content within the cache. The at-
tacker can then retrieve the data using a matching
The rest of the paper is organized as follows: Sec- caching key.
tion 2 analyses the literature related to web cache vul-
In both cases, the misconfiguration of the web cache
nerabilities and exploitation techniques, later focus-
allows the attacker to generate false positive matches
ing on WCDs. Section 3 describes the background
with other users’ caching keys.
of WCD, explaining how web caches work and how
WCD’s literature is still in its infancy, with only
their misconfiguration can be exploited, and the mo-
few available related publications. Omer Gil was the
tivation of this paper. Section 4 shows our proposed
first describing the attack, showing how misconfig-
methodology, describing how our solution works and
ured web caches and Content Distribution Networks
the new edge cases being covered. Section 5 describes
(CDNs) may incorrectly store users responses’ data,
the cases covered by our solution and introduces a
mistaking them for static files such as stylesheets
novel attack vector. Section 6 contains our experi-
and scripts (Gil, 2017). In this scenario the attacker
mental validation of the developed solutions against
can then send a key-matching request, retrieving the
real web services. Finally, in Section 7 we discuss the
cached response, possibly containing sensitive infor-
results obtained and report our conclusions.
mation or session data. Mirheidari et al. introduced
a WCD search methodology focusing on markers in
HTTP responses (Mirheidari et al., 2020). Further-
2 RELATED WORK more, in this article the authors have proposed inno-
vative WCD payloads, characterized by the posses-
The security of web caches is a well known problem sion of special characters that block the parsing of
without comprehensive solution yet. It is grounded on the path by the origin application servers, while they
the difficulties of finding the correct middle ground are entirely parsed by the intermediary nodes. These
between caching every content and none, it is strictly payloads have broadened the spectrum of possible
linked to the application logic and securing it is gen- WCD scenarios. These techniques have been given
erally a hard task. Initial works on the security of the name of path confusion as they confuse the origin
web caches date back to the end of the last century, application server into believing that the requested re-
when the rapid growth in popularity of Internet resource is the correct one, while letting the intermedi-
quired systems that could handle large scale distribu- ary nodes carry out the complete parsing of the path,
tion of contents (Chankhunthod et al., 1996; Smith letting them believe that a static resource has been
et al., 1999). requested, and therefore that it can be saved locally.
The configuration of web caching services varies Mirheidari et al. also implemented a WCD detec-
according to the requirements of the target web ap- tion tool whose computation is based on the presence
plication, as it manages the caching logic for its con- and semantics of the HTTP caching headers. They
tent. Correctly configuring web caches is error-prone: also conducted the largest large-scale experiment in
several considerations must be taken into account de- WCD detection. To do so they developed an automa-
pending on the application, such as the type of content tion tool that does not require a manual registration
returned by the web application, the resource path, phase on the website to be tested. The tool relies
and the HTTP code returned. Attacks against web on the fact that if a web application mistakenly saves
caches usually fall into two classes: non-authenticated resources in the web cache, it will
1. Cache poisoning, where the attacker sends a probably do the same with authenticated resources.
crafted request to a vulnerable web cache, forc- This gave them a solution that is well suited to large-
ing it to store a particular version of the server’s scale experiments (Mirheidari et al., 2022). Nguyen et
response. Eventually, a user who sends a request al. conducted a large-scale exploration of commonly
that matches the attacker’s caching keys will re- used cache systems’ security and identified many vul-
ceive the cached content. This type of attack can nerabilities that can be attributed to misconfiguration,
be used as a form of Denial of Service (DOS), pre- misinterpretation of standards, and bypassing of secu-
venting the user from retrieving the correct con- rity features (Nguyen et al., 2019a).
tent, or even as misinformation, providing the user Several solutions for the security of web caches
with outdated information (Ghaznavi et al., 2021; and CDNs have been adopted in literature. In (Jabiyev
Nguyen et al., 2019b; Nguyen et al., 2019a). et al., 2021), Jabiyev et al. have proposed a fuzzing-
based approach to caches vulnerability discovery.
2. Cache deception, covered in this paper, which
Anomaly detection techniques have been proposed to
occurs when web caches are incorrectly config-
counteract crafted requests (Yang et al., 2022; Ghaz-
ured, allowing an attacker to deceive users into
p p p j
HTTP/1.1 HTTP/1.1
Host: www.vulnerable.com
Host: www.domain.com Cookie: <SESSION
... COOKIES>
1 2 ...
2

4 3 HTTP/1.1 200 OK
HTTP/1.1 200 OK
Content-type: text/html
Content-type: text/html
...
X-Cache: miss
HTTP/1.1 200 OK HTTP/1.1 200 OK ...
<SENSITIVE CONTENT>
Content-type: text/html Content-type: text/html <SENSITIVE CONTENT> 4
X-Cache: miss ...
... GET /non-existent.js HTTP/1.1 200 OK
HTTP/1.1 Content-type: text/html
Host: www.vulnerable.com X-Cache: hit
... ...
Figure 1: Web caching mechanism. 1
<SENSITIVE CONTENT>

https://www.vulnerable.com/non-existent.js
navi et al., 2021; Zolfaghari et al., 2020). Others have
proposed automated techniques for identifying cache Figure 2: Web cache deception mechanism.
poisoning vulnerabilities (Hildebrand, 2021) or sub-
stituted standard HTTP-based caches with ones based server. In a real environment, the intermediary is an
on custom network protocols (Lin et al., 2022). edge server of a CDN or a reverse proxy at the edge of
Different network stacks have implemented vari- a Demilitarized zone (DMZ). HTTP response codes
ous approaches to the security of CDNs. For instance, and headers may differ from the standard ones shown
networks based on the Information Centric Network- in Figure 1. Resources saved in web caches are identi-
ing (ICN) paradigm require producers to sign all con- fied through configuration variables called cache keys
tent, which mitigates the risk of cache poisoning. Ad- which value is generated using various parts of HTTP
ditionally, automated verification solutions have been requests and responses related to a given resource.
implemented to verify non-functional properties, such Depending on the deployment configuration this duty
as security and performance (Anisetti et al., 2021; of continuously managing the keys handling mecha-
Anisetti et al., 2022). Similar solutions could be ap- nism is accomplished either by the organization’s or
plied transparently to traditional web cache services, the CDN provider’s system administrators, as com-
reducing the risk of misconfiguration. plete automation often leads to false positives.

3.2 Web Cache Deception

3 BACKGROUND
Web cache deceptions are vulnerabilities that arise
In this section we summarize the main concepts of from an incorrect configuration of the web caching
web caching and web cache deception vulnerabilities. mechanisms. Figure 2 shows a diagram of the WCD
attack mechanism.
In step 1 the attacker tricks the victim user into
3.1 Web Caching sending a request to a crafted URL. This can be
achieved through Cross Site Scripting (XSS) (Cui
Web caching refers to the process through which et al., 2020; Liu et al., 2019) or phishing (Barron
HTTP responses are saved by intermediary nodes of et al., 2021; Gupta et al., 2016) or other common at-
a multi-level web infrastructure and then served by tack vectors. The URL consists of a first part, indi-
them if needed. This process ensures optimization of cated in green in Figure 2, which points to a resource
HTTP traffic, reducing latency and network usage and that exists on the application server, concatenated to a
improving performance. Figure 1 represents the be- second part, the WCD payload indicated in red, which
havior of a generic web caching mechanism. points to a non-existent static resource on the server.
In steps 1 and 2 a client requests a resource from In step 2, the victim, authenticated on the vulnera-
a server using an HTTP GET request. The request ble site, opens the malicious link and sends an HTTP
travels from the client to the originating application GET request with session cookies to the origin appli-
server, passing through an intermediary node that cation server. In step 3, after the server has received
is providing the caching service. Subsequently, the the HTTP request, it will reply with an HTTP 200 re-
origin application server replies with an HTTP 200 sponse containing the victim user’s private informa-
response, which will travel up to the intermediary tion. The response will travel up to the intermediate
node. The intermediary node will now save a copy node, which will save the response locally, detecting
of the response locally and forward the response to the end of the URL path as a static file due to its mis-
the client. At a later time, if a client requests the same configuration. Finally, the content is forward it to the
resource, it will be returned directly by the interme- client. As last step, the attacker, requests the same re-
diary node, thus eliminating useless traffic between source with the same cache keys, fetching Personally
the intermediary node and the originating application
Identifiable Information (PII) of the victim user. def getPageLinks(url, depth: int = 0)
WCD vulnerabilities have the following charac- if isInLinkList(url) and
teristics: isAmplitudeAllowed(url) then
• require a simple HTTP GET request with session save(url);
cookies. Other web application vulnerabilities if depth < MAX DEPTH and
usually have stricter requirements, e.g. Reflected totalLinks < MAX LINKS then
XSS vulnerabilities described in (Cui et al., 2020; hrefs ← getAllAnchorHref();
Shrivastava et al., 2016) where the response must actions ← getAllFormAction();
also be interpreted by the browser; foreach href ∈ hrefs do
getPageLinks(link, depth+1);
• are widespread, as many web infrastructures to- end
day use one or more web caching mechanisms in- foreach action ∈ actions do
ternally; getPageLinks(action,
• have impact that varies according to the type of depth+1);
content mistakenly saved in web caches; end
• have a finite attack surface, given by the combina- Algorithm 1: Crawling phase algorithm.
tion of the set of resources returned by the appli-
cation server that contain private user information • send an HTTP GET request with session cookies
with the set of WCD payloads. to the given link concatenated with the WCD pay-
Given the above characteristics, the process of search- load;
ing for these vulnerabilities can be partially auto- • send the same previous request, but without the
mated. session cookies;
• finally, check if there are cookies, markers or
Cross Site Request Forgery (CSRF) tokens of the
4 METHODOLOGY victim user in the response to the unauthenticated
request and, if so, it will warn the user that it has
In this section we describe our scanning methodology, found a possible WCD.
which is subdivided into 3 phases.
Algorithm 2 describes the steps of the detection
phase.
Registration. In the first phase, we register a new
user on the target application. Normally this step def detection(payloadList)
requires manual intervention by the researcher, in- foreach p ∈ payloadList do
putting specific markers in the registration that will authResponse ←
later be searched by the program. Then the re- sendAuthRequest(p);
searcher logs onto the target website and copies the unauthResponse ←
session cookies to the scanning tool using the pro- sendUnauthRequest(p);
vided browser extension. if isAuthContent(unauthResponse)
then
possible WCD found
Crawling. During the second phase, our solution end
will perform an authenticated recursive crawling of
the domain using the provided cookies and a headless Algorithm 2: Detection phase algorithm.
browser. We found this to provide the most effective
results even with single page applications. The tool We note that, although not implemented in our so-
will save all the links it finds within anchor elements lution, the first phase could also be fully automated
and forms, creating a representation of the application by programmatically recognizing sign up and sign in
attack surface. Algorithm 1 describes the steps of the forms and inserting the appropriate tokens and cre-
crawling phase. dentials. We also note that some sites require two-
factor authentication or obstruct robots and scrap-
ers with detection and prevention techniques, such as
Detection. The third and final phase, focuses on de- captchas, often placed precisely in conjunction with
tection of WCD vulnerabilities using the collected the sign-in and sign-up forms. It has recently been
URLs. For each collected link and for each WCD demonstrated how it is possible to bypass these anti-
payload, the tool performs the following actions: bot puzzles in an automated way with the help of neu-
ral networks (Mirheidari et al., 2022; Ma et al., 2020). SMTP

<!DOCTYPE html>
<html><body>
<img src="https://www.vulnerable.com/xyz.js">
<img src="https://www.attacker.com/xyz.js">
5 COVERED CASES ...
</body></html>

Our solution covers the classic cases with and with-

out HTTP headers related to caching, including path
confusion techniques. Furthermore, it covers cases in
which it receives responses where the body differs for Vulnerable domain
the same resource saved in the web cache, e.g. in web 2 3 4
applications that use Cloudflare email obfuscation.
Our solution also covers a specific case of certain
versions of the Content Management System (CMS)
OpenMage LTS, based on Magento, with advisory
CVE–2020–15151. For certain versions of this CMS, Attacker domain
the default installation includes a WCD-vulnerable
Figure 3: Web mail client as attack vector.
web cache local to the application server. Specifically,
with the default configuration, all 404 responses re- to a server owned by the attacker. When the victim
lated to a request with the path ending with a static user opens the malicious mail in a web mail client
extension are saved in the cache. The problem is that that does not filter third-party content in the body of
these 404 responses contain the CSRF token of the the mail, two HTTP GET requests are sent. The first
victim user who requested it. As a result, an attacker request will initiate the WCD attack, and second re-
could steal the victim’s CSRF token and, if the same- quest will notify the attacker’s server that the WCD
site attributes of the cookies allow it, it could perpe- attack has started and with what payload value it was
trate a CSRF attack. These versions of the CMS re- performed. At this point the attacker’s server will be
quire a cookie called X-Magento-Vary3 with a pre- able to request the resource erroneously stored in the
cise value (and the same for all users) to access the cache and steal the victim’s private information. It
web cache. Our tool detects whether the X-Magento- should be noted that throughout this process the in-
Vary cookie is present in the session cookies, saves its teraction of the victim user is almost non-existent, as
value and subsequently for each HTTP request made, they will only have to view an email, without click-
the cookie in question will be included in the requests ing any link. We have experimentally demonstrated
cookies. This case, like the others mentioned previ- that in order to perpetrate the attack just described, 3
ously, could not have been identified using current conditions must be valid:
tools.
• the site vulnerable to WCD must have session
cookies with the same-site attribute set to none
5.1 Novel Attack Vector: Web Mail
and the secure flag set to true;
Client
• the web mail client must not filter in the body of
We identified web mail clients as possible vectors for the emails content that could generate GET re-
WCD-type attacks. The hypothesis is that web mail quests to third-party sites;
clients can, under certain conditions, send HTTP GET • the victim user must be using the Chrome
requests with session cookies while loading email browser, which as of this writing has not yet im-
contents. At the time of writing, we are unaware of plemented state partitioning.
any previous discussion in literature or public domain Moreover, we have demonstrated how before the
resources on the topic. insertion of the same-site attribute of cookies in
The attack is summarized in Figure 3. In step 1, browsers (before mid-2020) it was possible to per-
an attacker sends an email with two images in the petrate the attack with any site vulnerable to WCD
body to the victim’s email address. The first image and with any browser. This was demonstrated by test-
will have the src attribute with a URL dedicated to ing the attack with a mid-2020 standalone release of
the WCD attack, while the second image will point the Firefox browser and a popular web mail client
3 Magento’s default caching policies:
that does not filter third-party content in the body of
emails. Furthermore, flaws in the filtering of content
https://devdocs.magento.com/guides/v2.4/extension-dev-
guide/cache/page-caching/public-content.html
in the body of emails that can generate HTTP GET
requests can be exploited by an attacker to use web CSRF (38.36%)

mail clients that normally could not be used as an at- PII theft (23.07%)
tack vector. This experiment has shown how with the Account takeover (23.07%)
advent of new types of vulnerabilities a privacy prob- Useless (15.4%)
lem, such as filtering third-party content in the body
of emails, can also become a security problem.
(a) WCD consequences.

Cloudflare (38.46%)
6 EXPERIMENTS
Akamai (38.46%)

This section describes the experiments carried out to CloudFront (7.69%)

validate the efficacy of the methodology, focusing on Nginx (7.69%)
detection of WCD vulnerabilities, their classification Magento local cache (7.69%)
and exploitation of the vulnerable target. Follow-
ing, additional experiments on the exploitation of web (b) Misconfigured technologies.
mail clients as vectors for WCD attacks.
200 (38.46%)

6.1 Detecting WCD 404 (38.46%)

410 (23.07%)
The developed tool has been tested on 100 domains,
chosen from organizations that explicitly allow secu-
rity testing, e.g. by providing bug bounty programs. (c) Cached HTTP response codes.
Furthermore, these domains have been selected in
such a way that they all have private content re-
turned directly in HTML by the application server. Of CSRF (33.33%)
these 100 domains, 13 were affected by WCD. Fig- Useless (66.66%)
ures 4a, 4b and 4c show pie charts plots representing
respectively the consequences of the WCDs vulnera-
bility, the web caching technologies found to be mis-
configured and the HTTP response codes of the re- (d) OpenMage LTS WCD.
sources erroneously saved in the web caches. Figure 4: WCD detection experimental results.
As shown in Figure 4a, the impact of the WCDs
found is variable and ranges from a CSRF attacks, The other domains had the session cookies’ same-site
allowing changes to the personal information of the attribute set to LAX or STRICT, thus preventing the
victim account, to the theft of the victim account in attack. Figure 4d contains a pie chart plot represent-
the worst case. Figure 4b shows how most of the web ing the consequences of the WCDs found during this
caching software found vulnerable belonged to CDNs analysis.
(Cloudflare, Akamai, CloudFront) while a smaller The same domains were also tested manu-
part to reverse proxies (Nginx, Magento). Finally, ally in order to verify the effectiveness of the
Figure 4c shows that the HTTP response codes of re- tool. We identified false negative cases, where the
sources erroneously saved in the cache are mainly 200 tool could not detect the vulnerability as it was
and 404. In smaller numbers, cases with responses er- blocked by anti-bot software checking the JavaScript
roneously saved in the cache with HTTP 410 response navigator.webdriver property, thus identifying the
code were also detected. Selenium driver and halting the crawling phase. We
A separate analysis was carried out regarding also noticed false positives cases in which cookies
CVE–2020–15151. First, 15 domains using the vul- with unimportant values, such as the domain name
nerable version of OpenMage LTS were identified, or domain URLs, were returned by the application
these domains were chosen from a different pool of server, misleading the tool into detecting a possible
domains from those mentioned in the previous exper- WCD with cookie exfiltration. The issue has been
iment. In all 15 domains the default configuration was corrected by filtering the returned cookies keeping
active and all were vulnerable to WCD, allowing ex- only the ones with more than 16 characters and which
filtration of the session token. Of these 15 domains, differ in the initial part from a variant of the domain
5 did not have the same-site session cookie attribute name (such as domain.com, http://domain.com,
set, allowing an attacker to perpetrate a CSRF attack. https://domain.com).
Following the detection of WCDs vulnerabilities 6.2 Discussion
in authenticated mode, the tool was run on a list of
750 domains in unauthenticated mode. In this execu- In terms of defense against WCD vulnerabilities, the
tion mode, the tool only checked for the presence of primary solution relies on a proper configuration of
CSRF tokens that were erroneously stored in the web web caching technologies, preventing the caches from
cache. This experiment is based on the hypothesis storing private information. This implies specific con-
that if the tool detects the presence of CSRF tokens siderations on the handling of HTTP requests by the
mistakenly stored in the web cache for a particular application server, even for non-existing endpoints. It
anonymous session, it will most likely do the same is paramount to check both the test and production
for authenticated users. Of these 750 domains, only environments as the version or configuration of the
3 were found to be vulnerable. This lower number application or the caching technology changes.
of positive cases is mainly due to the fact that the re- Considering the web mail clients as vectors for
search targeted a specific subset of possible WCDs WCD attacks, a substantial defense level can be
cases. In addition, the tool uses simple regular ex- achieved by filtering any content in the body of
pressions to match against CSRF tokens, therefore it the emails that could automatically generate HTTP
is likely that some were missed. We have found that GET requests to third-party sites, such as images and
for large-scale analysis of the unauthenticated type, it styling files. Current widespread behavior of allow-
is less computationally expensive and more effective ing whitelisting of entire (sub)domains could pose a
to use an approach based on the analysis of the HTTP threat in the event of changes to the cache technol-
caching headers. We note that all the experiments ogy configuration. Finally, some vendors, such as
conducted in this study solely targeted web applica- Cloudflare, have implemented software products that
tions of organizations that granted explicit permission mitigate WCD vulnerabilities by performing content
for security testing. Additionally, the tool developed type checks on HTTP responses, verifying that the
generated minimal network overhead in comparison content-type header matches the one declared in the
to a normal browser, preventing accidental flooding path, if any, and thus deeming it suitable for being
of the target, and is available on our GitHub reposi- stored.
tory under the Creative Common license4 .

6.1.1 Global Scale Attacks

7 CONCLUSION
Various researchers have tackled the issue of exploit-
ing WCDs in a CDN-like environment in a globally In this paper we presented a novel methodology for
scalable way. The limit they encountered, from the detecting WCD vulnerabilities, experimentally evalu-
attacker’s point of view, was that of being able to re- ate its effectiveness covering the largest possible num-
trieve the resources that were initially saved in a pre- ber of WCD cases. Our detection solution demon-
cise edge server in the globe, without knowing in ad- strated better reliability for authenticated analyses,
vance in which geographical region the victim is lo- compared to the unauthenticated ones. Finally, a
cated. We found that the attacker can easily overcome novel attack vector for WCDs using web mail client
the issue by inducing the user in connecting to one of has been proposed and experimentally verified. We
their servers using attacks similar to the ones used for also have verified how some privacy-preserving tech-
WCD (e.g. XSS or phishing). When the victim user niques introduced by default in web browsers in mid-
connects to the attacker’s server, their IP is collected 2020 have accidentally reduced part of the attack sur-
and the victim is redirect to the URL of the WCD face of WCDs.
attack. The attacker can then identify the victim lo-
cation using IP-to-Location services. Alternatively,
an attacker could simultaneously send several HTTP ACKNOWLEDGEMENTS
requests to each CDN edge server in the victim sup-
posed region. The work was partially supported by the projects
i) MUSA – Multilayered Urban Sustainability Ac-
tion – project, funded by the European Union –
NextGenerationEU, under the National Recovery
and Resilience Plan (NRRP) Mission 4 Component
2 Investment Line 1.5: Strengthening of research
4 Source
code of the proposed tool: https://github.com/ structures and creation of R&D “innovation ecosys-
SESARLab/WCD prober tems”, set up of “territorial leaders in R&D” (CUP
G43C22001370007, Code ECS00000037); ii) SER- tions Security, CCS ’21, pages 1805–1820, New York,
ICS (PE00000014) under the NRRP MUR program NY, USA. Association for Computing Machinery.
funded by the EU – NextGenerationEU; iii) 1H-HUB Lin, S., Xin, R., Goel, A., and Yang, X. (2022). Invi-
and SOV-EDGE-HUB funded by Università degli Cloak: An End-to-End Approach to Privacy and Per-
Studi di Milano – PSR 2021/2022 – GSA – Linea 6; formance in Web Content Distribution. In Proceed-
ings of the 2022 ACM SIGSAC Conference on Com-
and iv) program “piano sostegno alla ricerca” funded puter and Communications Security, CCS ’22, pages
by Università degli Studi di Milano. 1947–1961, New York, NY, USA. Association for
Computing Machinery.
Liu, M., Zhang, B., Chen, W., and Zhang, X. (2019).
REFERENCES A Survey of Exploitation and Detection Methods of
XSS Vulnerabilities. IEEE Access, 7:182004–182016.
Conference Name: IEEE Access.
Anisetti, M., Ardagna, C. A., Berto, F., and Damiani, E.
(2021). Security Certification Scheme for Content- Ma, Y., Zhong, G., Liu, W., Sun, J., and Huang, K. (2020).
centric Networks. In 2021 IEEE International Confer- Neural CAPTCHA networks. Applied Soft Comput-
ence on Services Computing (SCC), pages 203–212, ing, 97:106769.
Chicago, IL, USA. IEEE. Mirheidari, S. A., Arshad, S., Onarlioglu, K., Crispo, B.,
Anisetti, M., Ardagna, C. A., Berto, F., and Damiani, Kirda, E., and Robertson, W. (2020). Cached and
E. (2022). A Security Certification Scheme for Confused: Web Cache Deception in the Wild. In
Information-Centric Networks. IEEE Trans. Netw. Proceedings of the 29th USENIX Security Symposium
Serv. Manage., 19(3):2397–2408. (USENIX Security 20), pages 665–682.
Barron, T., So, J., and Nikiforakis, N. (2021). Click This, Mirheidari, S. A., Golinelli, M., Onarlioglu, K., Kirda, E.,
Not That: Extending Web Authentication with De- and Crispo, B. (2022). Web Cache Deception Esca-
ception. In Proceedings of the 2021 ACM Asia Con- lates! In Proceedings of the 31st USENIX Security
ference on Computer and Communications Security, Symposium (USENIX Security 22), pages 179–196.
ASIA CCS ’21, pages 462–474, New York, NY, USA. Nguyen, H. V., Iacono, L. L., and Federrath, H. (2019a).
Association for Computing Machinery. Mind the cache: large-scale explorative study of web
Chankhunthod, A., Danzig, P. B., Neerdaels, C., Schwartz, caching. In Proceedings of the 34th ACM/SIGAPP
M. F., and Worrell, K. J. (1996). A Hierarchical Inter- Symposium on Applied Computing, SAC ’19, pages
net Object Cache. In Proceedings of the 1996 annual 2497–2506, New York, NY, USA. Association for
conference on USENIX Annual Technical Conference, Computing Machinery.
volume 164 of ATEC ’96, page 13, USA. USENIX Nguyen, H. V., Iacono, L. L., and Federrath, H. (2019b).
Association. Your Cache Has Fallen: Cache-Poisoned Denial-of-
Cui, Y., Cui, J., and Hu, J. (2020). A Survey on XSS At- Service Attack. In Proceedings of the 2019 ACM
tack Detection and Prevention in Web Applications. SIGSAC Conference on Computer and Communica-
In Proceedings of the 2020 12th International Confer- tions Security, CCS ’19, pages 1915–1936, New York,
ence on Machine Learning and Computing, ICMLC NY, USA. Association for Computing Machinery.
’20, pages 443–449, New York, NY, USA. Associa- Shrivastava, A., Choudhary, S., and Kumar, A. (2016). XSS
tion for Computing Machinery. vulnerability assessment and prevention in web appli-
Ghaznavi, M., Jalalpour, E., Salahuddin, M. A., Boutaba, cation. In 2016 2nd International Conference on Next
R., Migault, D., and Preda, S. (2021). Content Deliv- Generation Computing Technologies (NGCT), pages
ery Network Security: A Survey. IEEE Communica- 850–853.
tions Surveys & Tutorials, 23(4):2166–2190. Confer- Smith, J., Calvert, K., Murphy, S., Orman, H., and Pe-
ence Name: IEEE Communications Surveys & Tuto- terson, L. (1999). Activating networks: a progress
rials. report. Computer, 32(4):32–41. Conference Name:
Gil, O. (2017). Web Cache Deception Attack. In Proceed- Computer.
ings of Black Hat 2017 US. Yang, L., Moubayed, A., Shami, A., Heidari, P.,
Gupta, S., Singhal, A., and Kapoor, A. (2016). A liter- Boukhtouta, A., Larabi, A., Brunner, R., Preda, S.,
ature survey on social engineering attacks: Phishing and Migault, D. (2022). Multi-Perspective Content
attack. In Proceedings of 2016 International Confer- Delivery Networks Security Framework Using Op-
ence on Computing, Communication and Automation timized Unsupervised Anomaly Detection. IEEE
(ICCCA), pages 537–540. Transactions on Network and Service Management,
19(1):686–705. Conference Name: IEEE Transac-
Hildebrand, M. (2021). Automated Scanning for Web Cache tions on Network and Service Management.
Poisoning Vulnerabilities. PhD thesis, Technische
Universität Dortmund. Zolfaghari, B., Srivastava, G., Roy, S., Nemati, H. R.,
Afghah, F., Koshiba, T., Razi, A., Bibak, K., Mitra,
Jabiyev, B., Sprecher, S., Onarlioglu, K., and Kirda, E. P., and Rai, B. K. (2020). Content Delivery Networks:
(2021). T-Reqs: HTTP Request Smuggling with Dif- State of the Art, Trends, and Future Roadmap. ACM
ferential Fuzzing. In Proceedings of the 2021 ACM Comput. Surv., 53(2):34:1–34:34.
SIGSAC Conference on Computer and Communica-