A Methodology For Web Cache Deception Vulnerability Discovery
A Methodology For Web Cache Deception Vulnerability Discovery
a b c d
Filippo Berto , Francesco Minetti , Claudio A. Ardagna and Marco Anisettti
Department of Computer Science, University of Milan, Milan, Italy
{firstname.lastname}@unimi.it, [email protected]
Abstract: In recent years, the use of caching techniques in web applications has increased significantly, in line with their
expanding user base. The logic of web caches is closely tied to the application logic, and misconfigurations
can lead to security risks, including the unauthorized access of private information and session hijacking. In
this study, we examine Web Cache Deception as a technique for attacking web applications. We develop a
solution for discovering vulnerabilities that expands upon and encompasses prior research in the field. We
conducted an experimental evaluation of the attack’s efficacy against real-world targets, and present a new
attack vector via web-client-based email services.
4 3 HTTP/1.1 200 OK
HTTP/1.1 200 OK
Content-type: text/html
Content-type: text/html
...
X-Cache: miss
HTTP/1.1 200 OK HTTP/1.1 200 OK ...
<SENSITIVE CONTENT>
Content-type: text/html Content-type: text/html <SENSITIVE CONTENT> 4
X-Cache: miss ...
... GET /non-existent.js HTTP/1.1 200 OK
HTTP/1.1 Content-type: text/html
Host: www.vulnerable.com X-Cache: hit
... ...
Figure 1: Web caching mechanism. 1
<SENSITIVE CONTENT>
https://www.vulnerable.com/non-existent.js
navi et al., 2021; Zolfaghari et al., 2020). Others have
proposed automated techniques for identifying cache Figure 2: Web cache deception mechanism.
poisoning vulnerabilities (Hildebrand, 2021) or sub-
stituted standard HTTP-based caches with ones based server. In a real environment, the intermediary is an
on custom network protocols (Lin et al., 2022). edge server of a CDN or a reverse proxy at the edge of
Different network stacks have implemented vari- a Demilitarized zone (DMZ). HTTP response codes
ous approaches to the security of CDNs. For instance, and headers may differ from the standard ones shown
networks based on the Information Centric Network- in Figure 1. Resources saved in web caches are identi-
ing (ICN) paradigm require producers to sign all con- fied through configuration variables called cache keys
tent, which mitigates the risk of cache poisoning. Ad- which value is generated using various parts of HTTP
ditionally, automated verification solutions have been requests and responses related to a given resource.
implemented to verify non-functional properties, such Depending on the deployment configuration this duty
as security and performance (Anisetti et al., 2021; of continuously managing the keys handling mecha-
Anisetti et al., 2022). Similar solutions could be ap- nism is accomplished either by the organization’s or
plied transparently to traditional web cache services, the CDN provider’s system administrators, as com-
reducing the risk of misconfiguration. plete automation often leads to false positives.
<!DOCTYPE html>
<html><body>
<img src="https://www.vulnerable.com/xyz.js">
<img src="https://www.attacker.com/xyz.js">
5 COVERED CASES ...
</body></html>
mail clients that normally could not be used as an at- PII theft (23.07%)
tack vector. This experiment has shown how with the Account takeover (23.07%)
advent of new types of vulnerabilities a privacy prob- Useless (15.4%)
lem, such as filtering third-party content in the body
of emails, can also become a security problem.
(a) WCD consequences.
Cloudflare (38.46%)
6 EXPERIMENTS
Akamai (38.46%)
410 (23.07%)
The developed tool has been tested on 100 domains,
chosen from organizations that explicitly allow secu-
rity testing, e.g. by providing bug bounty programs. (c) Cached HTTP response codes.
Furthermore, these domains have been selected in
such a way that they all have private content re-
turned directly in HTML by the application server. Of CSRF (33.33%)
these 100 domains, 13 were affected by WCD. Fig- Useless (66.66%)
ures 4a, 4b and 4c show pie charts plots representing
respectively the consequences of the WCDs vulnera-
bility, the web caching technologies found to be mis-
configured and the HTTP response codes of the re- (d) OpenMage LTS WCD.
sources erroneously saved in the web caches. Figure 4: WCD detection experimental results.
As shown in Figure 4a, the impact of the WCDs
found is variable and ranges from a CSRF attacks, The other domains had the session cookies’ same-site
allowing changes to the personal information of the attribute set to LAX or STRICT, thus preventing the
victim account, to the theft of the victim account in attack. Figure 4d contains a pie chart plot represent-
the worst case. Figure 4b shows how most of the web ing the consequences of the WCDs found during this
caching software found vulnerable belonged to CDNs analysis.
(Cloudflare, Akamai, CloudFront) while a smaller The same domains were also tested manu-
part to reverse proxies (Nginx, Magento). Finally, ally in order to verify the effectiveness of the
Figure 4c shows that the HTTP response codes of re- tool. We identified false negative cases, where the
sources erroneously saved in the cache are mainly 200 tool could not detect the vulnerability as it was
and 404. In smaller numbers, cases with responses er- blocked by anti-bot software checking the JavaScript
roneously saved in the cache with HTTP 410 response navigator.webdriver property, thus identifying the
code were also detected. Selenium driver and halting the crawling phase. We
A separate analysis was carried out regarding also noticed false positives cases in which cookies
CVE–2020–15151. First, 15 domains using the vul- with unimportant values, such as the domain name
nerable version of OpenMage LTS were identified, or domain URLs, were returned by the application
these domains were chosen from a different pool of server, misleading the tool into detecting a possible
domains from those mentioned in the previous exper- WCD with cookie exfiltration. The issue has been
iment. In all 15 domains the default configuration was corrected by filtering the returned cookies keeping
active and all were vulnerable to WCD, allowing ex- only the ones with more than 16 characters and which
filtration of the session token. Of these 15 domains, differ in the initial part from a variant of the domain
5 did not have the same-site session cookie attribute name (such as domain.com, http://domain.com,
set, allowing an attacker to perpetrate a CSRF attack. https://domain.com).
Following the detection of WCDs vulnerabilities 6.2 Discussion
in authenticated mode, the tool was run on a list of
750 domains in unauthenticated mode. In this execu- In terms of defense against WCD vulnerabilities, the
tion mode, the tool only checked for the presence of primary solution relies on a proper configuration of
CSRF tokens that were erroneously stored in the web web caching technologies, preventing the caches from
cache. This experiment is based on the hypothesis storing private information. This implies specific con-
that if the tool detects the presence of CSRF tokens siderations on the handling of HTTP requests by the
mistakenly stored in the web cache for a particular application server, even for non-existing endpoints. It
anonymous session, it will most likely do the same is paramount to check both the test and production
for authenticated users. Of these 750 domains, only environments as the version or configuration of the
3 were found to be vulnerable. This lower number application or the caching technology changes.
of positive cases is mainly due to the fact that the re- Considering the web mail clients as vectors for
search targeted a specific subset of possible WCDs WCD attacks, a substantial defense level can be
cases. In addition, the tool uses simple regular ex- achieved by filtering any content in the body of
pressions to match against CSRF tokens, therefore it the emails that could automatically generate HTTP
is likely that some were missed. We have found that GET requests to third-party sites, such as images and
for large-scale analysis of the unauthenticated type, it styling files. Current widespread behavior of allow-
is less computationally expensive and more effective ing whitelisting of entire (sub)domains could pose a
to use an approach based on the analysis of the HTTP threat in the event of changes to the cache technol-
caching headers. We note that all the experiments ogy configuration. Finally, some vendors, such as
conducted in this study solely targeted web applica- Cloudflare, have implemented software products that
tions of organizations that granted explicit permission mitigate WCD vulnerabilities by performing content
for security testing. Additionally, the tool developed type checks on HTTP responses, verifying that the
generated minimal network overhead in comparison content-type header matches the one declared in the
to a normal browser, preventing accidental flooding path, if any, and thus deeming it suitable for being
of the target, and is available on our GitHub reposi- stored.
tory under the Creative Common license4 .