Tracking Mechanism
Tracking Mechanism
12, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2869251
ABSTRACT As the usage of the Web increases, so do the threats an everyday user faces. One of the most
pervasive threats a Web user faces is tracking, which enables an entity to gain unauthorized access to the
user’s personal data. Through the years, many client storage technologies, such as cookies, have been used
for this purpose and have been extensively studied in the literature. The focus of this paper is on three newer
client storage mechanisms, namely, Web Storage, Web SQL Database, and Indexed Database API. Initially,
a large-scale analysis of their usage on the Web is conducted to appraise their usage in the wild. Then, this
paper examines the extent that they are used for tracking purposes. The results suggest that Web Storage is
the most used among the three technologies. More importantly, to the best of our knowledge, this paper is the
first to suggest Web tracking as the main use case of these technologies. Motivated by these results, this paper
examines whether popular desktop and mobile browsers protect their users from tracking mechanisms that
use Web Storage, Web SQL Database, and Indexed Database. Our results uncover many cases where the rel-
evant security controls are ineffective, thus making it virtually impossible for certain users to avoid tracking.
INDEX TERMS Web tracking, web security, privacy, indexed database, indexedDB, web storage, Web SQL
database.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 52779
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. Belloro, A. Mylonas: I Know What You Did Last Summer: New Persistent Tracking Mechanisms in the Wild
user’s device. Our results uncover multiple cases where the response headers or by using client-side scripting. The client
users are exposed to privacy violations, as: a) they are unable is expected to save this data and send it back to the server in
to delete data created by the API of Web Storage, Web SQL subsequent HTTP requests. Each cookie is associated to an
Database or Indexed Database API even though they are origin, i.e., a combination of the hostname, the port number
attempting to clear locally stored data of their browsing, and and the protocol used by the web application [5]. This is based
b) they unknowingly store potentially tracking data created by on a concept known as ‘same-origin policy’, which has been
these APIs while browsing the web in a private session. These the cornerstone of browser security since the early days of the
findings have serious privacy implications, as they highlight web [6].
that it is virtually impossible for certain users to avoid web For performance reasons, web browsers limit not only
tracking. the length of HTTP cookies, but also apply constraints
Our contributions include: to their quantity, allowing only a few dozens per origin.
• We perform a large-scale analysis of the usage of Web Several online studies provide an overall view of the limits
Storage, Web SQL Database or Indexed Database APIs that different web browser vendors set to HTTP cookies
on the web. We quantify their pervasiveness in the con- [8], [9].
text of tracking code and find that these technologies are Since a webpage can contain resources from multiple ori-
mostly used by trackers. To the best of our knowledge, gins, HTTP cookies are often used to identify and track users,
we are the first to uncover that the main use case of these not only across different browsing sessions, but also across
technologies is web tracking. different websites. Over the years, both Internet users and leg-
• We investigate the capability of modern, popular islators have become more aware of the privacy implications
browsers for desktops and mobile devices to delete data of third-party tracking [7].
that can be stored locally via these APIs. Moreover,
we examine if data from these APIs remain after B. WEB STORAGE
a private browsing session. In both cases, we find Web Storage [13] is a specification that allows web appli-
instances where the users would be exposed to privacy cations to create a persistent key-value store in the browser,
violations if a tracker uses Web Storage, Web SQL the content of which is maintained either until the end of a ses-
Database or Indexed Database APIs as the tracking sion (i.e., sessionStorage), or beyond (i.e., localStorage). This
vector, as we identified many cases that the relevant technology enables web applications to store a much greater
security control has questionable effectiveness. amount of data compared to HTTP cookies. Specifically,
The rest of the paper is organized as follows. Section II the storage capacity provided by web storage varies from
briefly provides the required background in client storage 5MB to 25MB, depending on the browser. An innovative
technologies. Section III investigates how frequently and for feature of Web Storage is that a web application can use a
which purpose these APIs are used in the wild. Section IV client-side JavaScript API to retrieve locally stored data, even
reviews the controls offered to the users over these APIs. when the browser is offline. Web Storage is in fact completely
Finally, Section V presents the related work and Section VI based on client-side scripting and, unlike HTTP cookies, data
concludes the paper and discusses future work. cannot be sent via HTTP headers.
Similarly to HTTP cookies, the security model of Web
II. BACKGROUND Storage is based the same-origin policy. This means that each
Since the early days of the Web, HTTP cookies have been origin has a unique storage object assigned to it. For this
used as a client-side storage mechanism. As the web evolved, reason, the specification does not recommended using this
a desire for different and more capacious ways to store struc- technology on websites that use a shared host name or do not
tured data on the web client started to emerge. Over the years, use HTTPS. Otherwise, information leakage or spoofing may
several client-based storage technologies appeared. Most of happen, as for example in the case of DNS spoofing attacks.
them, such as Local Shared object of Adobe Flash [10], Moreover, the specification recommends treating persistently
Oracle Java [11], Microsoft Silverlight [12] and Google stored data as potentially sensitive, as they could contain
Gears (Google Code, 2008), were made available through email addresses or calendar appointments, etc.
third-party plug-ins. However, with the advent of HTML5, As with HTTP cookies, a third-party tracking agent
browsers started to support native functionalities that could could use Web Storage to profile users across multiple ses-
replace these third-party plug-ins. Client-side persistent data sions [13]. The specification recommends browser vendors
storage technologies were introduced, such as Web Stor- to treat web storage content in the same manner as they
age [13], Web SQL Database [15] and Indexed Database treat HTTP cookies. In particular, vendors are encouraged
API [20]. This section briefly introduces the aforementioned to organize the user interfaces for clearing data in a way
three technologies, as well as cookies. that allows users to clear all different types of persistent data
simultaneously. It is also important to point out that, while
A. COOKIES Web Storage is a much lesser known technology than HTTP
An HTTP cookie is a short piece of data (typically with cookies, its usage is not exempt from regulations around
size 4K) that a website sends to a client, either via HTTP personal user data [14].
The dataset in use comes from the HTTP Archive project we have developed scripts that combine the domains that are
created by [21]. Every fortnight, it crawls a list of webpages, listed in the aforementioned blacklists after their files have
which is loosely based on the Alexa Top Sites [22]. HTTP been properly parsed and sanitized.
Archive collects data, such as the payload content and logs We run our experiments against: a) the whole dataset pro-
the interaction between the browser and the crawler. It also vided by HTTP Archive on the 15th of May 2018 and b) the
captures the body of the responses for each subresource Alexa top 10,000 sites. Table 2 summarizes the number of
(i.e. any file that is fetched by an HTML page such as scripts, websites, subresources and truncated or empty subresources
stylesheets) used by the website. Since the size of the dataset in our experiments. We highlight the low percentage of trun-
generated by HTTP Archive can be up to several hundreds of cated or blank subresources, since on those the matching rules
gigabytes, Google BigQuery [23] was used for its processing. are not applicable.
For each of the three client-storage APIs one matching
rule was used to create a series of SQL queries, which run TABLE 2. Data used from HTTP archive.
against the HTTP Archive dataset using Google BigQuery.
These rules, which are summarized in Table 1, were defined
by using constructs required to perform basic operations, such
as creating a data store, reading and writing data. Appendix A
lays out the constructs that have been identified in this work
in our matching rules.
B. EXPERIMENTAL RESULTS
Table 3 shows the usage of the primitives considered, on the
whole dataset provided by HTTP Archive for the 15th of
May 2018. An interesting result is that more than two thirds
of the websites analyzed contain Web Storage related con-
structs. Another result worth noticing is that the constructs
In order to identify whether a subresource belongs to a analysed are very often found on third party subresources.
tracker, we created a database of tracking domains by aggre- Similarly, Table 4, shows the results for the Alexa’s top
gating three well-known tracking blacklists, namely: Discon- 10,000 sites. It is interesting to notice that in this case,
nect (2017), No Track [26] and Easy List (2017). To this aim, the values for the usage of the Indexed Database API are
TABLE 3. Results for the whole dataset. starting from around 30% for Indexed Database API to more
than 70% for Web Storage (localStorage). This significant
finding suggests that currently user tracking is a major use
case for the APIs that have been examined. Surprisingly, this
is also the case for a deprecated standard, i.e., Web SQL DB.
C. DISCUSSION
This section has shown that a significant number of the
websites analysed contains at least one tracking subresource
TABLE 4. Results for the alexa top 10K. having code constructs that belong to at least one of the three
APIs considered. More importantly, it has shown that tracking
scripts seem to currently be the major use case of the three
storage APIs considered. Indeed, in many cases, subresources
that contain the analysed APIs are often identified as trackers.
As our experiments used a dataset that represents a significant
portion of the World Wide Web, we consider that our results
shed some light on the usage of Web Storage, IndexedDB and
Web SQL in user tracking.
However, the usage of HTTP Archive as the dataset for
almost double compared to the whole dataset. The use of Web our experiments introduces a number of limitations to our
SQL remains low in our experiments, which is expected as work. HTTP Archive can only provide snapshots of front
this API is deprecated. pages of openly available websites. The scanning engine
Table 5 summarizes the number of domains that include does not perform operations such as user log in or following
at least one tracking subresource, which is using one of the links on a menu. Considering that primitives such as the
three client-side storage APIs. As it can be seen, there is a Indexed Database API are designed to support advanced web
high percentage of websites containing at least one tracking applications, it is reasonable to assume that there are cases of
subresource where constructs that belong to Web Storage websites in which those storage techniques are used only once
(localStorage) can be found. The figures are much smaller the user is logged in. However, this is an accepted limitation,
for Indexed Database API and considerably smaller for Web especially considering that in order to quantify the usage of
SQL Database. client-side storage techniques in the context of user tracking,
it is far more important to focus on the large-scale adoption of
TABLE 5. Websites and tracking subresources. the technologies in question rather than on specific use cases.
Another limitation of our work stems from the scanning
engine of HTTP Archive, as it truncates payloads that are
greater than 2 MBs. This means that if the constructs defined
in the matching rules happen to be in the part of the payload
that HTTP Archive could not capture, they will not be found
by our queries. However, as shown in Table 2 truncation
and empty subresources seldom appear in our dataset. More-
Finally, Table 6 highlights the usage of the client-side over, their absence does not invalidate our findings. On the
storage techniques in the context of tracking from a different contrary, their successful capture from HTTP Archive might
angle. It shows amongst all the subresources that have been provide additional subresources that match our rules, thus
analysed, the percentage of them containing the constructs reinforcing our results.
for the API considered that are used by a tracking domain. In addition, HTTP Archive does not contain snapshots
In other words, this table answers the question: ‘‘how fre- from each one of the Alexa Top one million sites. The set
quently are those storage techniques used as tracking vec- of websites scanned is loosely based on the Alexa list, but
tors?’’. In all cases, the frequencies are surprisingly high, any private individual could send a request to HTTP Archive
to add or remove sites to the dataset. The actual number
of websites included in each scan is specified in the results
TABLE 6. Tracking subresources and primitives.
section.
Finally, this work suffers from a limitation that is common
in any static analysis approach. Our work verifies the pres-
ence of certain constructs in client-side scripts, but cannot
verify the actual usage of the primitives unless the actual web
application is executed in the browser, which falls outside the
scope of our work. For example, a website could include a
JavaScript library that relies on Web Storage, but never exe- more recent versions of iOS-WebKit-based browsers have
cute its code in the browser. Moreover, some websites include introduced a more consistent approach on which all the three
third-party libraries that perform a set of basic operations APIs are disabled on private browsing mode.
using a given primitive with the sole purpose of assessing Our results also uncover multiple cases in which current
browser capabilities. This practice is known as ‘feature detec- popular browsers cannot protect the privacy of their users,
tion’ and one of the most well-known libraries used for this as they fail to delete or isolate data stored via the API of
purpose is Modernizr [27]. Web Storage, Web SQL DB or IndexedDB. As summarized
in Table 7 our results suggest that: a) the process of removing
IV. USER CONTROL OVER LOCALLY STORED DATA private data from a browser does not always delete data stored
The previous section uncovers that currently Web Storage, in all of the three client-side storage APIs or requires an extra
Indexed Database API and Web SQL Database are frequently step in the browser’s user interface and b) some browsers do
used as a tracking vector. In this context, this section exam- not fully isolate client-side stored data when used in private
ines: i) whether popular desktop and smartphone browsers mode.
support the three aforementioned APIs, ii) the effectiveness
TABLE 7. Results for user control over local stored data.
of the deletion of the data stored by them as part of the
mechanism that clears browsing data, and iii) if data remain
when they are created in private browsing mode.
A. METHODOLOGY
As mentioned previously in section II.B, the specifications
recommend browser vendors to treat the data removal of
various client-side persistent data features in the same way
as HTTP cookies. This means that browsers are expected to
make it easy for users, or at least possible, to remove all
locally stored user data. In addition, nowadays all browsers
offer to their users the functionality to browse the web
through a private session (often referred to as private or incog-
nito mode). The primary aim of the private session is to allow
users to browse the web without the browser saving data
regarding the ‘private’ browsing history.
We built a simple web application, called Storage
Watcher,1 in order to verify the: a) level of API support in
a given browser, and b) effectiveness of data deletion.
The tests were performed in June 2018, on a broad
selection of desktop (Windows, Mac OS) and smartphone
(Android, iOS, Windows Phone) browsers. These include
the most popular browsers in these platforms, such as
Firefox, Chrome, Safari, Opera, and Edge/Internet Explorer.
Tables 11 and 12 in Appendix B include the details of the
browsers that were analysed and the results of the abovemen-
tioned experiments.
B. EXPERIMENTAL RESULTS
Our results uncover inconsistencies with regards to the
support of the client-side storage APIs by the different
browsers (see Tables 11 and 12 in Appendix B). For example,
amongst the desktop browsers, Firefox and Edge, disable the
IndexedDB API when used in private browsing mode. In both
cases, the other two storage APIs remain available. In con-
trast, certain versions of iOS WebKit-based browsers (Safari,
Chrome and Firefox for iOS) and Firefox for Android, seem
to do the exact opposite, as they disable the Web Storage Specifically, certain versions of iOS-WebKit-based
and Web SQL Database APIs when in private mode, but not browsers (Safari2 and Chrome for iOS3 ) and some Android
the IndexedDB API. It is, however, worth mentioning that 2 Reported: https://bugs.webkit.org/show_bug.cgi?id=188164
3 Reported https://bugs.chromium.org/p/chromium/issues/detail?id=
1 Available at: https://github.com/stefano-belloro/storage-watcher 868857
browsers (Firefox for Android4 and MiuiBrowser) retain IndexedDB API content. On an earlier version of Firefox
IndexedDB API content even after a user requests data dele- analysed (Firefox 47 on Windows XP), this was also the case
tion. In all the cases considered, the user interface not only for Web Storage (localStorage). This default setting could
does not make clear that IndexedDB API content will persist, be misleading for an inexperienced user and give a sense of
but also gives the impression that all ‘offline web site data’ anonymity that cannot be guaranteed, especially considering
will be deleted (Fig. 3). Furthermore, in MiuiBrowser v.9.1.3, that the IndexedDB API could be used as a backdoor to
Web Storage (localStorage) content is also maintained, after reinstate content of HTTP cookies [35].
a user requests the deletion of private data. Fortunately, in the Similarly, Internet Explorer for Windows Phone 8.10 by
case of iOS browsers, this issue seems to be resolved in HTC requires a separate action to remove IndexedDB API
the latest version of the software considered in this work. content. In this case, the user needs to navigate to a different
However, this behavior can still be seen on other recent menu item called ‘‘advanced settings’’ and choose the option
browsers (i.e., Firefox 60 on Android 8). ‘‘manage storage’’.
Furthermore, Opera 43 on Android allows the persistence
of data stored using IndexedDB API and Web SQL Database
across different private browsing sessions.6 Similarly, Opera
for iOS exhibits the same behavior for Web Storage
(localStorage) and MiuiBrowser 9.1.3 for both Web Storage
(localStorage) and IndexedDB API.
Moreover, in Google Chrome’s guest mode, content stored
in each of the three APIs persists across different windows
opened in guest mode.7 This means that a user would need
to quit Chrome completely in order to discard locally stored
data accumulated in a guest browsing session. This behavior
might be misleading for certain users who might assume that
simply closing the browsing window but not the application
might be enough to remove locally-stored private data.
Lastly, when running the experiment on
MiuiBrowser 9.1.3, it was noticed that the browser carries
over the values of IndexedDB API content created while
using the application on normal browsing mode. As a result,
if a private browsing session is preceded by a regular usage of
the browser in its normal mode, MiuiBrowser allows a third
party tracker to resume and recreate tracking values set while
the user was browsing on previous non-private sessions and
identify them even if they are browsing in private mode.
C. DISCUSSION
Our findings suggest that in many cases web users are
exposed to privacy violations if the website they visit or any
of its 3rd party subresources use Web storage, IndexedDB
and Web SQL DB as a tracking vector. This holds true
FIGURE 3. Firefox 57 on Android 6.0. The user interface suggest that
as our experiments uncovered instances in which: a) data
offline data will be removed. persists after clearing local data or after closing a private
session, b) data persists unless the user configures the browser
It is also worth pointing out that some browsers require the appropriately, c) persistent data from a non-private session are
user to perform an extra action in order to include IndexedDB leaked to the private session, and d) data stored in guest mode
API content to the process of clearing private data. As a matter is deleted only after quitting Chrome. It is worth stressing,
of fact, on all the desktop versions of Firefox5 in scope of this that non security and technically savvy users are more likely
work, whilst the user interface allows deleting data stored via to use the default settings of the data clearing, thus failing to
IndexedDB API using the same panel used to remove HTTP delete data that potentially violate their privacy in the cases
cookies, this option is disabled by default. This means that that are describe in Table 7.
users would have to expand the ‘details’ dropdown menu and
manually add ‘offline website data’ if they wish to remove
6 Reported: Bug reference: DNAWIZ-38391
4 Reported: https://bugzilla.mozilla.org/show_bug.cgi?id=1479403 7 Reported: https://bugs.chromium.org/p/chromium/issues/detail?id=
5 Reported: https://bugzilla.mozilla.org/show_bug.cgi?id=1479414 868870
Our work also uncovers inconsistencies with regards to identifier of the user. Finally, they discovered instances of
disabling certain client-storage APIs in private mode. If the cookies values containing personal identifiable information
reasoning for disabling the APIs is to prevent user tracking, such as users’ IP and email address, which, represent a serious
it should be noted that advanced tracking mechanisms employ breach of privacy.
multi-tier approaches based on a combination of various Soltani et al. [31] conducted a study on the usage of Flash
storage vectors [35]. Therefore, blocking certain APIs whilst Local Shared Object, often referred to as ‘Flash cookies’, as a
allowing the usage of others might not produce the desired tracking vector. They analysed the top 100 domains ranked
level of privacy. Another interesting aspect is the way that by QuantCast. On 31 of them, they found at least a case
browsers have implemented the security controls that han- of data overlap between HTTP cookies and Flash cookies,
dle the data of the APIs, namely private browsing and data meaning that the same value appeared on the data stored in
clearing, is inconsistent across different versions of the same both technologies. Moreover, they found several occurrences
browsers and across different platforms (c.f. Table 11 and of what they defined as ‘‘cookie respawning’’, in which the
submitted bugs). value of a deleted HTTP cookie is restored in the background,
Moreover, our experiments include a) the most popu- taken from a Flash cookie that keeps its back up. On a follow-
lar browsers of the popular operating systems for desktops up study, Ayenson et al. [32] observed the emerging usage
(i.e., Windows, Mac OS) and b) the most popular mobile of Web Storage (localStorage) as a tracking vector. While
browsers, which can be found in different types of mobile the authors did not find if this storage system was directly
devices, such as smartphone and tablets, for the most popular employed as part of respawning mechanisms, they noticed
platforms (i.e., Android, iOS, Windows Phone). As these several cases of matching values among HTTP cookies and
browsers currently hold the majority of the user share, Web Storage data, which they named ‘HTML5 cookies’.
we consider our results representative. Furthermore, as sum- Roesner et al. [33] presented an in-depth investigation of
marized in Table 7, it is worth noting that the majority of our web tracking performed by third-party actors. The work anal-
findings concern popular mobile browsers, such as Chrome, ysed a corpus of around 1000 websites, spanning from very
Firefox and Safari. Given the popularity of these browsers popular to lesser-used websites, and found the presence of
and the fact that mobile devices are nowadays the primary over 500 unique trackers. The authors proposed a classifica-
vector to access the web [28], this increases the impact of our tion of trackers that goes beyond the usual notion of first-party
findings. and third-party trackers. Instead, they introduced a classifica-
tion system based on the tracking behavior that is observable
from the client. This system challenges the significance of
V. RELATED WORK classifying cookies as either third-party or first-party. In fact,
A. CLIENT-SIDE STORAGE SYSTEMS AS all cookies could be classified as first-party in the context of
TRACKING VECTORS their own origins and often users visit those origins as ‘first-
Krishnamurthy and Wills [29] studied the diffusion of private party clients’, such as in the case of social networks. For this
user information performed by third-party trackers that use reason, the authors suggested the usage of terms like ‘‘tracker-
a combination of HTTP cookies and other elements of the owned’’ cookies and ‘‘site-owned’’ cookies. The work also
DOM. The authors analysed a selection of 1200 popular web- documented the occurrence of ‘‘cookie leaks’’, in which
sites and collected statistical data over a period of four years. the contents of a cookie associated to a given origin are
The results showed that the collection of user data increased passed as parameters in a request to another origin, with the
over time, even in websites where the user is expected to purpose of circumventing the browser’s same-origin policy.
provide confidential information such as medical or financial Furthermore, the authors attempted to quantify the usage of
details. More specifically, during the latest period that was alternatives to HTTP cookies. The authors found ‘‘remark-
analysed, September 2008, the penetration was 70%. Further- ably little use’’ of Web Storage (localStorage). In fact, out
more, it was discovered that 52% of the websites considered, of the 524 trackers identified, this storage mechanism was
contained code from at least two third-party tracking entities. used in only 8 cases. Moreover, only 5 of them were found to
Gonzalez et al. [30] performed a large-scale study on the contain unique identifies. All of those 5 cases were instances
usage, content and format of HTTP cookies in the wild. Their of cookie respawning, meaning that the user identifiers were
work analysed a large dataset of network data that comprised copies of the values found on HTTP cookies. Finally, Flash
of 5.6 billion HTTP requests. The authors determined the LSOs were used by 35 trackers, but only 9 of them were
reach of cookies by measuring the number of referrers that identified as instances of cookie respawning.
generate an HTTP request to the same cookie-setting end- Acar et al. [34] performed a large-scale analysis of a
point. They found that, while the vast majority of cookies selection of advanced persistent tracking mechanisms. They
relate to a unique referrer domain, there is a long tail of reported the usage of Indexed Database API as a storage
cookies whose originating requests come from a significantly mechanism of tracking data, albeit in a small number of
high number of different domains. Moreover, the authors cases (20 out of the 100 000 analysed - 0.02%). The authors
analysed the names of the cookies and found instances of claimed to be the first to document evidence of the usage
websites that use cookies whose names include a unique of IndexedDB as an evercookie vector. ‘‘Evercookie’’ is a
technique that significantly increases the resilience of goodwill of the tracker. Moreover, it appears that many of the
tracking HTTP cookies [35]. The mechanism consists of a parties involved with user tracking argue that their behavior
client-side API that replicates the HTTP cookie data across should not be considered tracking as it is defined by the DNT
several types of client-side storage systems. specification, and consequentially refuse to implement it.
Derksen et al. [36] also discussed the usage of Web Stor- Furthermore, the authors pointed out that neither blocking
age (localStorage) and Indexed Database API for tracking. third-party cookies is an effective method as some browsers
The authors analysed the behavior of twenty popular track- only block the writing operation of a cookie, but not the
ing services on a selection of about a thousand websites. reading. Therefore, the tracker would still be able to read the
They found that localStorage was used by 15% of the value of a cookie that has been set on a previous visit to social
trackers analysed. Moreover, none of the websites analysed media sites or by advertising popups. Finally, the authors
showed the usage of Indexed Database API as a track- mentioned that private browsing mode is not an effective anti-
ing vector. The authors also studied the implementation of tracking method because it is primarily designed to protect
data deletion. They found that the browsers they analysed users from attackers with physical access to the machine and
allowed the deletion of both Web Storage (localStorage) not necessarily from remote user tracking. As a method of
and IndexedDB data, via the same mechanism that removes protecting users’ privacy, the authors propose ShareMeNot,
cookies. Similarly, Bujlow et al. [37], seem to imply that the a browser extension that limits third-party tracking code
content of data stored using these techniques is automatically that belongs to social media sites, while making sure that
emptied when the cookies are cleared. However, as this work actual functionality visible to the user remains unaffected.
uncovers currently in some popular browsers, data deletion In practice, the extension allows tracking requests to be sent
requires either an extra step by the user in order to include only when the user clicks on an embedded social media but-
HTML5-related client-side storage techniques or does not ton (such as Facebook’s ‘‘Like’’). The solution proposed by
happen at all. the authors has been subsequently incorporated into another
Another known practice used by trackers is cookie privacy tool named ‘‘Privacy Badger’’, a browser extension
matching (or cookie syncing). This technique is used in that uses algorithmic methods to decide which resource is
real-time advertising bidding, allowing trackers to asso- tracking the user and verifies whether scripts that belong to
ciate different tracking profiles that relate to the same user. a given domain collect unique identifiers even after sending
Olejnik et al. [38] quantified both the frequency and the
breadth of data leakage related to cookie matching. They
analysed a sample of 100 user profiles and found that 91 of TABLE 8. Constructs used by web storage (localstorage).
them were subject to cookie matching, showing instances of
trackers leaking 27% of a user’s browsing history. Moreover,
they showed that the market value of parts of a users’ brows-
ing history can be as low as a fraction of a US dollar cent.
Englehardt [39] also discussed cookie-syncing, warning
that it can allow the sharing of personal data between
different tracking servers, without the user’s direct consent.
Cookie syncing can also further enhance the impact of cookie
respawning. In fact, while most major trackers do not use TABLE 9. Constructs used by indexed database API.
mechanisms such as the aforementioned evercookie, they
might share user information with trackers that do use tech-
niques of cookie resurrection.
TABLE 11. API support and data deletion results in the examined mobile browsers.
TABLE 11. API support and data deletion results in the examined mobile browsers.
a ‘‘Do Not Track’’ message. In this case, it automatically with browser extensions, such as AdBlock Plus [44]. The
disallows content from that third-party tracker [42]. author also claimed that tracking is often inextricably tangled
Mayer [43] studied a series of technologies developed to with third-party advertising, therefore often blocking trackers
protect users from third-party trackers. The author found that also entails blocking code that provides advertisements.
community-maintained blacklists are the most effective way Mylonas et al. [45] analyzed the security controls of several
to prevent undesired user tracking. Those lists mainly consist mobile and desktop browsers. According to their results,
of URLs or domains and are generally used in conjunction desktop browsers generally provide better protection, as the
TABLE 12. API support and data deletion results in the examined desktop browsers.
controls available on them perform better than those avail- third-party cookies and in many cases the interface that allows
able on their mobile counterparts. For example, users of the user to control security features can be confusing. Finally,
the mobile browsers do not have the option to opt-out of the authors found a number of security issues on two major
mobile browsers and also pointed out that in most of the the extent to which they are used for tracking purposes.
mobile browsers the ‘Do Not Track’ header is unavailable. As shown by the results, currently there is a large fraction of
Virvilis et al. [46] compared the different protection mea- websites that utilize the three primitives, with Web Storage
sures against rogue sites offered by desktop and mobile being the most used. However, the most alarming result is
browsers. According to their results mobile browsers often the frequency in which these APIs seem to used by trackers,
offer a lower level of protection compared to their desktop- which for all three technologies seems to be higher than
based counterparts and in some cases they offer no protection 30% and in particular almost 70% for Web Storage. Finally,
at all. Furthermore, the authors introduced Secure Proxy, we examined whether the current popular web browsers for
a new browser-independent countermeasure that overcomes desktops and mobile devices can protect their users from
the technical limitations related to each specific browser privacy violations that use the aforementioned three technolo-
without the need of browser extensions. Secure Proxy con- gies as the tracking vector. Our results suggest that in many
sists of a HTTP forward proxy that operates at network cases the relevant security controls (i.e., data clearing and
level to filter content before it reaches the user’s device. The private mode) are ineffective in deleting the relevant data and
filtering mechanism is delegated to a third-party service that ensuring isolation of the data when used in private sessions.
assesses the reliability of the content providers, based on the The bugs that were identified in this work have been reported
aggregation of multiple blacklists and Antivirus engines. to the relevant browser vendors as indicated in section 4.B.
Building from the previous work, Nisioti et al. [47] revisit
the anti-phishing mechanisms available for users of mobile APPENDIX A
browsers of three popular operating systems. The study MATCHING RULES USED IN STATIC ANALYSIS
revealed that the protection provided by pre-installed web The Web Storage API provides two storage mechanisms, one
browsers is still very poor and in most cases non-existent. for handling data within a current session (sessionStorage)
The only browsers that offer an adequate level of protection and another one that lasts beyond the current session (local-
are Firefox and Chrome on Android. Moreover, in iOS, Storage). In this work, only the constructs used by localStor-
neither the default browser nor any of the third-party browsers age were considered, as content stored using sessionStorage
offer any protection against phishing attacks. In this context, expires at the end of a browsing session. TABLE 8 shows
the authors proposed TRAWL (TRAnsparent Web protection the constructs needed in order to read or write data using
for alL), an extension of ‘Secure Proxy’. Similarly to ‘Secure localStorage.
Proxy’, TRAWL is implemented outside the users’ device The same process was followed for the Indexed Database
in order to avoid resource consumption and to offer cross API. The constructs mentioned in TABLE 9 are part of the
platform compatibility. The tool provides DNS and URL steps necessary to create a local database containing an object
filtering based on a collection of curated blacklists, but store and to access the store to either read or write data.
instead of delegating the filtering to a third-party service it Similarly, Table 10 shows the constructs necessary to read
performs it locally. In this way, the user’s privacy is preserved and write data using the now deprecated Web SQL Database
and any third party limitations are overcome. API.
Similarly, Kontaxis and Chew [48] present a new anti-
tracking mechanism of Mozilla Firefox, called Tracking Pro- APPENDIX B
tection. The mechanism is similar to ad-blocking browser FULL RESULTS OF SECTION IV
extensions such as AdBlock Plus. It analyses all outgoing Tables 11 and 12 provide all the results from the experiments
HTTP requests and matches them against a blacklist, which that were described, summarized and discussed in Section IV.
is based on a curated list of tracking origins. The authors
evaluated their approach against 200 popular news sites and REFERENCES
according to the results there was a 67.5% reduction in the [1] Types of Personal Information and Images Shared Digitally by Global
Internet Users as of January 2017. Accessed: Jan. 2018. [Online]. Avail-
number of HTTP cookies. Moreover, this approach resulted able: https://www.statista.com/statistics/266835/sharing-content-among-
on a 44% median reduction in page load time and 39% us-internet-users/
reduction in data usage for the testes sites. [2] C. Castelluccia and A. Narayanan, ‘‘Privacy considerations of online
behavioural tracking,’’ Eur. Union Agency Netw. Inf. Secur., Heraklion,
Greece, Tech. Rep. Deliverable–2012-10-19, 2012.
VI. CONCLUSION [3] S. Englehardt et al., ‘‘Cookies that give you away: The surveillance
Online tracking is an everyday practice and, when it is per- implications of Web tracking,’’ in Proc. 24th Int. Conf. World Wide Web,
May 2015, pp. 289–299
formed against the user’s will it is a major privacy violation. [4] T. Bujlow, V. Carela-Español, J. Solé-Pareta, and P. Barlet-Ros. (2015).
While older client-side storage technologies such as cookies ‘‘Web tracking: Mechanisms, implications, and defenses.’’ [Online]. Avail-
have been studied extensively as tracking vectors, newer able: https://arxiv.org/pdf/1507.07872.pdf
[5] A. Barth. (2011). The Web Origin Concept 2011 IETF RFC6454. [Online].
technologies, i.e., Web Storage, Indexed Database API and Available: https://tools.ietf.org/html/rfc6454
Web SQL Database, have not received the same level of atten- [6] E. Shepherd. (2017). Same-Origin Policy in MDN Web Docs. [Online].
tion. In this paper, we measure the frequency of use of these Available: https://developer.mozilla.org/en-US/docs/Web/Security/Same-
origin_policy
technologies on a HTTP Archive dataset, which constitutes a [7] D. M. Kristol, ‘‘HTTP cookies: Standards, privacy, and politics,’’ ACM
representative sample of the World Wide Web, and examine Trans. Internet Technol., vol. 1, no. 2, pp. 151–198, 2001.
[8] J. Manico. (2009). Real World Cookie Length Limits. Manicode. [Online]. [34] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz,
Available: http://manicode.blogspot.hk/2009/08/real-world-cookie- ‘‘The Web never forgets: Persistent tracking mechanisms in the wild,’’
length-limits.html in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Nov. 2014,
[9] I. Roberts. (2013). Browser Cookie Limits. [Online]. Available: pp. 674–689.
http://browsercookielimits.squawky.net/ [35] S. Kamkar. (2010). Evercookie. [Online]. Available: http://samy.pl/
[10] Adobe Systems. (2012). What Are Local Shared Objects? Security and Pri- evercookie
vacy. [Online]. Available: http://web.archive.org/web/20121230094342/ [36] I. Derksen, I. E. Poll, and F. van den Broek. (2016). HTML5 Tracking
http://www.adobe.com/security/flashplayer/articles/lso/ Techniques in Practice. [Online]. Available: http://www.cs.ru.nl/
[11] Oracle. (2017). Java Documentation. [Online]. Available: bachelorscripties/2016/Ivar_Derksen___4375408___HTML5_Tracking_
http://docs.oracle.com/en/java Techniques_in_Practice.pdf
[12] Microsoft. (2017). What is Silverlight?. [Online]. Available: [37] T. Bujlow, V. Carela-Español, J. Solé-Pareta, and P. Barlet-Ros. (2015).
https://www.microsoft.com/silverlight/what-is-silverlight/default Web Tracking: Mechanisms, Implications, and Defences. [Online]. Avail-
[13] Web Hypertext Application Technology Working Group. (2017) able: https://arxiv.org/abs/1507.07872
Web Storage in HTML Living Standard. [Online]. Available: [38] L. Olejnik, T. Minh-Dung, and C. Castelluccia. (2013). Selling off Privacy
https://html.spec.whatwg.org/multipage/webstorage.html at Auction. [Online]. Available: https://hal.inria.fr/hal-00915249
[14] European Commission. Cookies European Commission. Accessed: Jan. [39] S. Englehardt. (2014). The hidden perils of cookie syncing. Freedom to
2018. [Online]. Available: http://ec.europa.eu/ipg/basics/legal/cookies/ Tinker. [Online]. Available: https://freedom-to-tinker.com/2014/08/07/the-
index_en.htm hidden-perils-of-cookie-syncing/
[15] I. Hickson. (2010). Web SQL Database. W3C Working Group Note 18 [40] C. Soghoian. (2011). Slight Paranoia: The History of the do not Track
November 2010. Accessed: Jan. 2018. [Online]. Available: https://www. Header. Accessed: Jan. 2018. [Online]. Available: http://paranoia.dubfire.
w3.org/TR/2010/NOTE-webdatabase-20101118 net/2011/01/history-of-donot-track-header.html
[41] R. Fielding and D. Singer. (2017). Tracking Preference Expression (DNT).
[16] N. R. Mehta. (2009). WebSimpleDB, A.P.I., in W3C Working Draft.
[Online]. Available: https://www.w3.org/TR/tracking-dnt/
[Online]. Available: https://www.w3.org/TR/2009/WD-WebSimpleDB-
[42] (2017). Privacy Badger. [Online]. Available: https://www.eff.
20090929
org/privacybadger
[17] M. Owens, Introducing SQLite. The Definitive Guide to SQLite. New York,
[43] J. Mayer. (2011). Tracking the Trackers: Self-Help Tools. The Cen-
NY, USA: Apress LP, 2006, pp. 1–16.
ter for Internet & Society. [Online]. Available: http://cyberlaw.stanford.
[18] Chromium Blog. (2010). More Resources for Developers. [Online].
edu/blog/2011/09/tracking-trackers-self-help-tools
Available: https://blog.chromium.org/2010/01/more-resources-for-
[44] Eyeo GmbH. (2017). Getting Started with Adblock Plus. [Online]. Avail-
developers.html
able: https://adblockplus.org/getting_started#general
[19] A. Ranganathan. (2010). Beyond HTML5: Database APIs and the Road [45] A. Mylonas, N. Tsalis, and D. Gritzalis, ‘‘Evaluating the manageability
to IndexedDB. Mozilla Hacks. Accessed: Jan. 2018. [Online]. Avail- of Web browsers controls,’’ in Proc. Int. Workshop Secur. Trust Manage.
able: https://hacks.mozilla.org/2010/06/beyond-html5-database-apis-and- Berlin, Germany: Springer, Sep. 2013, pp. 82–98.
the-road-to-indexeddb [46] N. Virvilis et al., ‘‘Security Busters: Web browser security vs. rogue sites,’’
[20] A. Alabbas and J. Bell. (2017). Indexed Database API 2.0, W3C Pro- Comput. Secur., vol. 52, pp. 90–105, 2015.
posed Recommendation. Accessed: Nov. 16, 2017. [Online]. Available: [47] A. Nisioti, M. Heydari, A. Mylonas, V. Katos, and V. H. F. Tafreshi,
https://www.w3.org/TR/IndexedDB-2 ‘‘TRAWL: Protection against rogue sites for the masses,’’ in Proc. 11th
[21] S. Sounders. (2011), Announcing the HTTP Archive, High Perfor- Int. Conf. Res. Challenges Inf. Sci. (RCIS), May 2017, pp. 120–127.
mance Web Sites Blog, https://www.stevesouders.com/blog/2011/03/30/ [48] G. Kontaxis and M. Chew. (2015). ‘‘Tracking protection in Firefox
announcing-the-http-archive/ for privacy and performance.’’ [Online]. Available: https://arxiv.org/
[22] Alexa Internet, Inc. (2017), Alexa Top 1,000,000 Sites. [Online]. Available: abs/1506.04104
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
[23] I. Grigorik. (2013). HTTP Archive + BigQuery = Web Performance
Answers, in Author’s Blog. [Online]. Available: https://www.igvita.com/
2013/06/20/http-archive-bigquery-web-performance-answers/ STEFANO BELLORO received the M.Sc. degree
[24] Google Cloud Platform. (2017). SQL Reference. [Online]. Available: in software engineering and Internet architecture
https://cloud.google.com/bigquery/docs/reference/standard-sql/ with a dissertation in cybersecurity. He was lead-
[25] Information Technology—Database Languages—SQL—Part 11: Informa- ing the Web teams, where he was responsible for
tion and Definition Schemas, (SQL/Schemata), standard ISO/IEC 9075- the BBC World Service Web portfolio, providing
11:201, International Organization for Standardization IEC JTC 1/SC 32, news in more than 40 different languages. He is
2011. [Online]. Available: https://www.iso.org/standard/53685.html currently a Software Engineering Manager with
[26] Quidsup. (2017). NoTrack. [Online]. Available: https://github.com/ BBC. He also looks after the teams that build and
quidsup/notrack support software and tools for broadcasting.
[27] F. Ateş. (2017). What is Modernizr? [Online]. Available: https://modernizr.
com/docs/#what-is-modernizr
[28] R. V. D. Meulen and C. Pettey. (2012). Gartner Survey High-
lights Top Five Daily Activities on Media Tablets. [Online]. Available: ALEXIOS MYLONAS (M’09) received the B.Sc.
https://www.gartner.com/newsroom/id/2070515 degree (Hons.) in computer science from the
[29] B. Krishnamurthy and C. Wills, ‘‘Privacy diffusion on the Web: A lon- Athens University of Economics and Business,
gitudinal perspective,’’ in Proc. 18th Int. Conf. World wide web, 2009, the M.Sc. degree in information security from the
pp. 541–550. Royal Holloway, University of London, and the
[30] R. Gonzalez et al., ‘‘The cookie recipe: Untangling the use of cookies in
Ph.D. degree in information and communication
the wild,’’ in Proc. IEEE Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2017,
security from the Athens University of Economics
pp. 1–9.
and Business. He was a Security Consultant with
[31] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle, ‘‘Flash
cookies and privacy,’’ in Proc. AAAI Spring Symp., Intell. Inf. Privacy
VeriSign’s PKI Trust Network. He was a Lecturer
Manage., Mar. 2010, pp. 158–163. with Staffordshire University. He is currently a
[32] M. D. Ayenson, D. J. Wambach, A. Soltani, N. Good, and C. J. Hoofnagle. Lecturer with Bournemouth University. He is also an expert in cybersecurity.
(2011). Flash Cookies and Privacy II: Now With HTML5 and ETag He has more than 20 publications that are well referenced and appear in
Respawning. [Online]. Available: https://www.truststc.org/education/reu/ esteemed conference and journal publications. His current research interests
11/Posters/AyensonMWambachDpaper.pdf include cybersecurity, threat intelligence, and Web security. He is also a
[33] F. Roesner, T. Kohno, and D. Wetherall, ‘‘Detecting and defending against member of ACM. He has served as a technical committee member for
third-party tracking on the Web,’’ in Proc. 9th USENIX Conf. Netw. Syst. conferences and journals.
Design Implement., Apr. 2012, pp. 1–12.