0% found this document useful (0 votes)
21 views5 pages

Tracking Protection For Android's WebView - Andrzej Hunt

Uploaded by

vjxphbydbj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Tracking Protection For Android's WebView - Andrzej Hunt

Uploaded by

vjxphbydbj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Andrzej Hunt

Home About Me

Tracking Protection for


Android’s WebView
Posted on May 10, 2017 by Andrzej Hunt

Unlike iOS (really just Safari), Android has no content blocking API. Tracking protection is
available in some browsers, e.g. Firefox in combination with addons (and also in Firefox’s
private browsing which includes tracking protection enabled by default). For fun, we
decided to look into whether it’s possible to provide Tracking Protection when using
Android’s default WebView implementation. This blog post describes how that was done,
and explores some of the implementation details of our URL matching algorithm.

It turns out that Firefox Focus on iOS also had to build their own URL matching
implementation: iOS content blocking is current only available in Safari, and not in the iOS
WebView equivalent. That implementation was influenced by the design of iOS’s content
blocking APIs and file formats, but when you’re not subject to that restriction it’s possible to
build a faster approach, so my ignorance of that version wasn’t necessarily a bad thing, as
I’ll describe later in this post.

Why would you want to do this? One reason is that browser engines are large – and we
wanted to see whether it’s possible to build a privacy focused browser whose size
measures in megabytes instead of tens of megabytes – which would require reusing
whatever engine the platform provides (in the case of iOS you actually have no choice in
the matter, fortunately Android is a little more free). There are actually some drawbacks to
using platform-provided browser engines – which will the topic of a future post – but it’s
certainly possible to implement tracking protection on top of Android’s WebView.

Tracking Protection Lists


Firefox and Focus use the Disconnect tracking protection lists: these are lists of domains
hosting trackers that should be blocked, categorised by tracker type, e.g. Social trackers,
Analytics Trackers, Advertising Trackers, etc. Further to this there’s an override “entity” list,
which unblocks domains that are owned by a given company whenever you are browsing
a site owned by that company. (E.g. if FooBar Tracker Corp owns both foo.com and
bar.com, we would allow loading of resources from bar.com while browsing foo.com,
even though we’d block all other sites from loading resources from foo.com and
bar.com.) You can read more about these lists at the repo where the Mozilla copies of
these lists are maintained.

As such, tracking protection is fairly simple: every time a given webpage requests a
resource, we match the resource URL’s host against the blocklist. If it’s blocked, we check
the entitylist to verify whether there’s an override in place for the current site. Android’s
WebView provides a callback that is called every time it wants to load a resource, allowing
you to override resource loading.

The iOS content blocking API actually allows for regex based matching on the entire
resource URL, which is more complex than what we needed for basic tracking protection.
The disconnect lists only work using domains/hosts, which simplifies the implementation
somewhat. Focus on iOS originally only supported the content blocking API, and added
the browser later – the browser implementation therefore simply reused the same bundled
list format. The content blocking lists aren’t used for iOS’s WebView equivalent, although
that is apparently changing.

Implementing URL matching


The simple (but not particularly efficient) method would be iterate over the list of hosts
every time a resource is fetched. In fact, we could just iterate over the regex’s in the iOS
content blocking lists, and check those directly to avoid implementing our own matching.

The original Android implementation was actually a rushed afternoon (or two) hacky proof
of concept from our December All Hands – it turned out to be robust and fast enough, so it
was kept beyond that time. It might be possible to build an even faster implementation, but
this one hasn’t provoked any user complaints yet.

As mentioned, iterating over the list of blocked hosts is expensive, O(nh) for n = number
of blocked hosts == very large, h = host length (small). Fortunately at
some point or another I had learned about Tries (contrary to what some might assume, an
Information and Computer Engineering degree at my alma mater doesn’t actually involve
any Data Structures and Algorithms – but that’s nothing a little independent study can’t
quickly fix).

Those offer much smaller memory consumption (not that memory consumption is
particularly significant compared to what a web engine will need), and much faster lookup
[O(h)]:
A trie containing multiple domains.

(In reality, the Trie possibly consumes more memory because of the overhead of each
node being an object. More efficient representations are available in order to avoid one
node per character, but that didn’t seem worthwhile given that this implementation is
already performant enough.)

There’s still a bunch of overhead in various places: we’re using the Android/Java URL
classes to extract the hostname from the resource URL, which could well be more costly
than the actual act of searching the tree. I haven’t measured in detail yet.

(Building this concluded completed the bi-yearly cycle of proper Data Structures and
Algorithms construction – I’d last been able to build some trees for a bookmarks folder UI
the preceeding summer.)

As mentioned above, there’s also the entitylist: this consists of sets of hosts (A), for which
another set of hosts (B) is whitelisted (usually those sets would be the same, but that isn’t
guaranteed or necessary). This is simply an extension of the same tree: the set of
whitelisted domains (B) is another Trie. That Trie is then attached to every node
representing one of the whitelisted domains (A) – we simply extend the default Node to
have a WhitelistNode, which has a reference to the whitelisted-domains Trie.

Every real project needs its own


String implementation
Searching and inserting into our hostname tries involves walking strings backwards. That
would either require either some annoying index arithmetic, or reversing the String before
insertion/search (i.e. creating a copy of the String). Neither of those sounded like fun, so I
decided to add a String wrapper. This is arguably completely unnecessary, but made
things a little simpler (and perhaps more efficient). The String wrapper also meant that the
Trie implementation didn’t need to have much knowledge about subdomains either, we
can just start at the start of our reversed String. (Because we need to correctly match
subdomains, but not other domains, the Trie still needs to be aware of full stop being used
for domain separation, so it isn’t completely domain agnostic.

We only need to access the String character by character, which is why we can avoid a
complete string copy/reversal – if this weren’t the case, there would be little value in a
wrapper.

The wrapper takes care of index arithmetic for reversed strings – and implements support
for getChar(int) and substring(int). That’s pretty much all there was to FocusString. (I no
longer need to miss the amazing days of many C++ string classes…)

substring() copies…
Somewhat naively, I’d assumed that our Java implementation doesn’t create a copy when
calling String.substring() – in other words that it would just adjust internal indexes while
reusing the same String buffer and/or equivalent behaviour. Without that assumption, there
would be little point in avoiding a String copy on reversal, since – thanks to our recursive
Trie traversal – we’d be creating copies when traversing that Trie.

It turns out that assumption was wrong: it was true for Java 6, and also for earlier versions
of Java 7 – before changing in Java 7u6. I don’t really know where Android’s
implementation originates, but it also creates copies. Thus, FocusString was expanded to
include offsets, and FocusString.substring() merely fiddles those offsets.

It was hard to predict what the impact of this change might be in advance, since I didn’t
have much experience in this area – I discovered that it was actually a noticeable
improvement: on my fairly modern Nexus 6P, average URL matching time dropped by
about 20% – from approximately 1.2ms to 1.0ms (these numbers are for debug builds with
code coverage enabled – that drops to 0.26ms vs 0.42ms for coverage free debug builds,
which is even more significant). We already had tests in place which helped verify that
things wouldn’t break, so this was a fairly low risk change (I did use this as an opportunity
to extend those tests though).

Results
As mentioned above, the iOS equivalent implementation is a lot simpler. It iterates over the
lists of hosts, and does regex matching for each host. I decided to port that implementation
to Android, primarily to check for consistency of results. Fortunately the Trie based
implementation was mostly correct, except for our subdomain matching. Both bar.com
and foo.bar.com should be blocked if bar.com is in the blocklist. My Trie based
implementation also blocked foobar.com. Ooops. That was a quick fix, albeit one which
required making the Trie search implementation hostname aware. Other than that, results
have been the same in our testing.
These parallel implementations allowed for performance comparisons. (Note: the
underlying regex and other library implementations on each platform might be different, so
the difference in results could be very different if both algorithms were running on an
iPhone.) On my N6P, the Trie based implementation took an average of 0.3ms per
resource URL check, the ported iterative/regex approach took 42ms. Some pages like to
load a lot of resources – so that’s a difference you’d notice quickly. It’s possible that my
ported implementation was suboptimal, but it’s certainly clear that the Trie based approach
was worth it from a performance perspective.

To be fair, this implementation did take more work – and you have to remember that the
iOS implementation was influenced by the blocklist file format that iOS uses for its tracking
protection API, whereas the Android version was clean-sheet design.

Edits:

Trie Diagram corrected on 10th May 2017, thank you to Gervase Markham for spotting the
mistake.

‹ Postbuild gradle commands in Buddybuild for Android

ASAN_SYMBOLIZER_PATH improvements ›
Tagged with: android, firefox, focus, mozilla
Posted in Firefox for Android, Mozilla
2 comments on “Tracking Protection for Android’s WebView”

Gervase Markham says:


May 11, 2017 at 09:24

Tiny bug in diagram – u and k should be reversed.

Andrzej Hunt says:


May 12, 2017 at 05:56

Well spotted! Ooops… and Thanks!

© 2012, 2013 Andrzej Hunt ↑ Powered by WordPress

You might also like