0% found this document useful (0 votes)
33 views12 pages

25k Xss Vulnerabilities Overview

The document presents a system for automatically detecting DOM-based cross-site scripting (XSS) vulnerabilities. It describes a taint-aware JavaScript engine and DOM implementation that tracks data flows to identify vulnerabilities. An evaluation of the Alexa top 5000 sites found over 6000 unique vulnerabilities in 480 domains, showing DOM XSS issues are more prevalent than assumed.

Uploaded by

Pablo Balbii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views12 pages

25k Xss Vulnerabilities Overview

The document presents a system for automatically detecting DOM-based cross-site scripting (XSS) vulnerabilities. It describes a taint-aware JavaScript engine and DOM implementation that tracks data flows to identify vulnerabilities. An evaluation of the Alexa top 5000 sites found over 6000 unique vulnerabilities in 480 domains, showing DOM XSS issues are more prevalent than assumed.

Uploaded by

Pablo Balbii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

25 Million Flows Later - Large-scale Detection of

DOM-based XSS

Sebastian Lekies Ben Stock Martin Johns


SAP AG FAU Erlangen-Nuremberg SAP AG
[email protected] [email protected] [email protected]

Abstract bilities of client-side JavaScript are continuously increasing,


In recent years, the Web witnessed a move towards sophis- due to the steady stream of new “HTML5” APIs being added
ticated client-side functionality. This shift caused a signifi- to the Web browsers.
cant increase in complexity of deployed JavaScript code and In parallel to this ever growing complexity of the Web’s
thus, a proportional growth in potential client-side vulnera- client side, one can observe an increasing number of security
bilities, with DOM-based Cross-site Scripting being a high problems that manifest themselves only on the client [26,
impact representative of such security issues. In this paper, 11, 17]. One of these purely client-side security problems
we present a fully automated system to detect and validate is DOM-based XSS [16], a vulnerability class subsuming all
DOM-based XSS vulnerabilities, consisting of a taint-aware Cross-site Scripting problems that are caused by insecure
JavaScript engine and corresponding DOM implementation handling of untrusted data through JavaScript. DOM-based
as well as a context-sensitive exploit generation approach. XSS is caused by unsafe data flows from attacker-controlled
Using these components, we conducted a large-scale analy- sources, such as the document.location property, into se-
sis of the Alexa top 5000. In this study, we identified 6167 curity sensitive APIs, e.g., document.write.
unique vulnerabilities distributed over 480 domains, show- While the existence of DOM-based XSS is known since
ing that 9,6% of the examined sites carry at least one DOM- 2005 [16], this vulnerability class is frequently still perceived
based XSS problem. as a minor, fringe issue, especially when being compared to
reflected and persistent XSS. In this paper, we re-evaluate
this assumption and examine how prevalent DOM-based
Categories and Subject Descriptors XSS is in the wild.
H.4.3 [Communications Applications]: Information Unfortunately, testing of client-side security properties in
browsers; H.6.5 [Security and Protection]: Unauthorized general, and DOM-based XSS in particular, is difficult. In
access comparison to the conditions on the server side, the Web’s
client side has several challenges that affect both static and
Keywords dynamic security testing approaches: For one, all server-side
code is completely under the control of the application’s op-
DOM-based XSS, Taint Tracking, Vulnerability Detection, erator and available for processing, monitoring and analysis.
Exploit Generation why client side
This is not the case at the Web’s client-side, where the code vuls are difficult?
execution occurs on the user’s machine. Furthermore, com-
1. INTRODUCTION pared to server-side languages such as Java or C#, a large
The times in which JavaScript was mainly used for eye portion of JavaScript code frequently relies on runtime in-
candy and small site enhancements are long gone. Since terpretation of string data as executable code via APIs such
the advent of the so-called Web 2.0, the Web browser is the as eval(). The resulting code is interpreted and executed
host of sophisticated, complex applications, such as Gmail or on the client, making it invisible to the server. Finally, it
Google Docs, written entirely in JavaScript, that rival their is common practice for modern Web applications to include
desktop equivalents in scope and features. More and more third-party JavaScript code using script-tags that point to
functionality, which in traditional Web applications would cross-domain hosts. In 2002, Nikiforakis et al. [22] measured
have been implemented on the server, moves to the client. that 88.45% of the Alexa top 10,000 web sites included at
Consequently, the amount of required JavaScript code is in- least one remote JavaScript resource from a cross-domain
creasing proportionally to this shift. Furthermore, the capa- host. This JavaScript is transported directly from the third-
party provider to the user’s Web browser and gets executed
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
immediately. Thus, this code is neither directly controlled
for profit or commercial advantage and that copies bear this notice and the full citation by the application nor is it visible at the server.
on the first page. Copyrights for components of this work owned by others than the In this paper, we propose a fully automated system to
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or identify DOM-based XSS issues, that overcomes the outlined
republish, to post on servers or to redistribute to lists, requires prior specific permission obstacles through integrating the vulnerability detection di-
and/or a fee. Request permissions from [email protected]. rectly into the browser’s execution environment. Our system
CCS’13, November 4–8, 2013, Berlin, Germany.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
consists of a taint-aware JavaScript engine and DOM imple-
ACM 978-1-4503-2477-9/13/11 ...$15.00.
http://dx.doi.org/10.1145/2508859.2516703.
mentation as well as a context-sensitive exploit generation study, this is a very common scenario. If input is not san-
technique. itized correctly, the attacker may be able to inject his own
The main contributions of this paper are the following: code into the application. Thereby, different subtypes of
DOM-based XSS exist depending on the method used for
• We present the design and implementation of a dy- converting the string to code:
namic, byte-level taint-tracking approach in JavaScript
engines. Through directly altering the JavaScript en- HTML context.
gine’s implementation of the low-level string type, we Web applications commonly insert generated HTML code
achieve complete coverage of all JavaScript language into the DOM via functions such as document.write, in-
features and the full DOM API. nerHTML or insertAdjacentHTML. When these functions are
• We propose a novel, fully automatic vulnerability vali- called, the browser parses the string parameter and inter-
dation mechanism, that leverages the fine-grained con- prets the contents as HTML code, which is then inserted
text information provided by our taint-aware Java- into a certain position within the DOM. If user input flows
Script engine. Due to our exact knowledge of data into these sinks, sanitization or encoding functions have to
source and syntactical context of the final data sink, be used in order to avoid code injection vulnerabilities. If
our system can create attack payloads that match the the input is not sanitized correctly an attacker is able to
syntactic surroundings of the injection point. This inject own HTML tags including <script>, which enables
in turn allows unambiguous vulnerability validation JavaScript execution. For the specific differences between
through verification that our injected JavaScript was innerHTML and document.write, we refer the reader to Sec.
indeed executed. Thus, our system reports no false 5.2.1.
positives.
JavaScript context.
• We report on a large-scale empirical study on insecure Another commonly used method, which is sometimes vul-
data flows in client-side JavaScript and the resulting nerable to DOM-based XSS, is the eval function. eval takes
DOM-based XSS vulnerabilities. In total, we exam- a string parameter, interprets it as JavaScript code and ex-
ined 504,275 URLs resulting from a shallow crawl of ecutes it. Besides eval and its aliases setTimeout and set-
the Alexa top 5000 sites. In this study we observed Interval, there are also other contexts in which strings are
a total of 24,474,306 flows out of which 69,987 caused converted into JavaScript code such as script.innerText,
validated DOM-based XSS exploits, resulting in 6,167 script.text, script.textContent and the assignment of
unique vulnerabilities affecting 9,6% of the examined strings to event handler attributes.
sites.
URL context.
The remainder of this paper is organized as follows: First
If an attacker-controlled input flows into a URL attribute
we briefly revisit the technical background of DOM-based
of any DOM node (such as img.src, iframe.src, object.
XSS (Sec. 2) and give a high-level overview of our approach
data or a.href), an immediate conversion from a string to
(Sec. 3). Then, we describe our techniques for vulnerability
code does not occur. However, there are still several security
detection (Sec. 4) and validation (Sec. 5). In Section 6 we
problems related to this kind of flows. For example, if the
present the methodology and results of our empirical study.
attacker is able to control the complete URL, he could make
We end the paper with a discussion of related work (Sec. 7)
use of the javascript: or data: schemes to execute script
and a conclusion (Sec. 8).
code. If only parts of the URL are controlled, the attacker
could still conduct redirects or phishing and in some cases
2. DOM-BASED XSS even achieve JavaScript code execution as shown in Section
Cross-Site Scripting is an attack in which an attacker is 6.5.1.
able to inject his own JavaScript code into a Web appli-
cation, in such a way that the code is executed within a Other contexts.
victim’s browser in the context of the application. Since Besides those contexts that allow code execution, there
2000, when one of the first XSS vulnerabilities was reported are further sinks/contexts that are security sensitive such
[3], novel attack variants were discovered. In 2005, Amit as document.cookie, the Web Storage API, postMessage or
Klein published a paper in which he first mentioned the setAttribute. In Section 6.5.3, for example, we present
term DOM-based XSS and described the basic character- a persistent DOM-based XSS vulnerability via document.
istics of this vulnerability [16]. In contrast to traditional cookie, which was discovered by our framework.
(reflected and persistent) Cross-Site Scripting, DOM-based
XSS is caused by incorrect client-side code rather than by
server-side code. As described earlier, the dynamic nature
of this client-side code makes it hard to detect or verify this 3. APPROACH OVERVIEW
kind of vulnerability. In this paper, we propose a system to automatically de-
In order to trigger a DOM-based XSS exploit an attacker tect and validate DOM-based XSS vulnerabilities. To ad-
is able to utilize a multitude of different attack vectors to dress the outlined challenges in the assessment of client-side
inject his malicious payload (such as location.href, docu- security problems (see Sec. 1), we decided to address the
ment.referrer, window.name, and many, many more). De- problem as follows: Instead of building analytical processes
pending on the Web application’s program logic, it processes that complement [29] or emulate [25] the client-side behav-
attacker-controllable inputs and at some point in time con- ior, we chose to integrate our techniques directly into a full
ducts a string-to-code conversion. As shown in our empirical browser.
difference between each XSS type?
More precisely, our system consists of two separate com- After careful examination of the given code, we chose to
ponents: For vulnerability detection, we utilize a modified only encode the desired information directly into the header.
browsing engine that supports dynamic, byte-level taint- Every object in V8 stores a pointer to its map. The map
tracking of suspicious flows. Through directly altering the describes the class of an object. In V8, there are maps for
engine’s string type implementation, we achieve complete each type of object. We found an used part of a bitmap in
coverage of all JavaScript language features and the full the maps and used it to create new map objects for tainted
DOM API. We discuss the design and implementation in strings. Obviously, for strings of dynamic length, additional
Section 4. memory must be allocated to store the actual data. Based
The second component is a fully automated vulnerability on whether a string is pure ASCII or also contains two-byte
validation mechanism, that leverages the fine-grained con- characters, this memory is allocated on creation of the ob-
text information provided by our taint-aware browsing en- ject. The address of this newly created space is then writ-
gine. Due to the exact knowledge of data source and syn- ten to one of the aforementioned offsets in the header. Along
tactical context of the final data sink, our system is able to with the information that a string is tainted, we also need to
create attack payloads that match the syntactic surround- store the taint bytes described above. To do this, we changed
ings of the injection point. This in turn allows unambigu- the string implementation such that additional length bytes
ous vulnerability validation through verification that our in- are allocated. Since we wanted to keep the changes to ex-
jected JavaScript was indeed executed. This component is isting code as small as possible, we chose to store the taint
presented in Section 5. bytes into the last part of the allocated memory. This way,
the functionality for normal access to a string’s characters
4. VULNERABILITY DETECTION: MOD- did not have to be changed and only functionality for taint
IFIED CHROME information access had to be added.
As mentioned before, the V8 engine is optimized for per-
To automatically detect the flow of potentially attacker- formance. It therefore employs so-called generated code
controllable input (called a source) into a sink in the sense of which is assembler code directly created from macros. This
DOM-based XSS, we decided to implement a dynamic taint- way, simple operations such as string allocation can be done
tracking approach. To ensure that edge-cases, which might without using the more complex runtime code written in
not be implemented properly into pure testing engines like C++. However, for our approach to easily integrate into
HTMLUnit, were to be properly executed, we chose to im- the existing code, we chose to disable the optimizations for
plement taint-tracking into a real browser. For this, we mod- all string operations such as creation or sub-string access.
ified the open-source browser Chromium in such a manner After patching the string implementation itself, we also
that its JavaScript engine V8 as well as the DOM implemen- instrumented the string propagation function such as sub-
tation in WebKit were enhanced with taint-tracking capabil- string, concat, charAt, etc. This is necessary to ensure
ities. For both components of the browser, we selected to use that the byte-wise taint-tracking information is also propa-
a byte-wise taint-tracking approach built directly into the re- gated during string conversions.
spective string representations. In this fashion, we enabled
our tool to not only distinguish between a completely un- 4.3 Patching the WebKit DOM implementa-
tainted string and a string containing any potentially harm- tion
ful content, but also to specifically get information on the
origin of each given character in said string. In contrast to the V8 engine, WebKit makes frequent use
of the concept of member variables for its classes. Therefore,
4.1 Labeling sources and encoding functions to allow for the detection of a tainted string, we were able
To keep the memory overhead as small as possible, we to add such a member denoting whether a string is tainted
chose to implement our approach in such a way, that in- or not. The string implementation of WebKit uses an array
formation on a given character’s source is encoded in just to store the character data. Hence, we added a second array
one byte. We therefore assigned a numerical identifier to to hold our taint bytes. Since strings coming from V8 are
each of the 14 identified sources (e.g. location.href, lo- converted before being written into the DOM, we patched
cation.hash or document.referrer). Hence, we were able the corresponding functions to allow the propagation of the
to encode this information into the lower half of the byte. taint information. This is necessary because tainted data
To also be able to determine whether a given character was might be temporarily stored in the DOM before flowing to
encoded using the built-in functions encodeURI, encodeURI- a sink, e.g. by setting the href attribute of an anchor and
Component and escape, we used the lower three of the four later using this in a document.write. To allow for correct
remaining bits to store whether one or more of these func- propagation of the taint information, we not only needed
tions were applied to the string. To represent a benign char- to change the string implementation but also modify the
acter, the lower four bits are set to 0. HTML tokenizer. When HTML content is set via JavaScript
(e.g. using innerHTML), it is not just stored as a string but
4.2 Patching the V8 JavaScript engine rather parsed and split up into its tree structure. Since we
Google’s JavaScript engine V8 is highly optimized in re- want to ensure that taint information is carried into the tag
gards to both memory allocation and execution speed. Al- names and attributes in the generated tree, these changes
though the code is written in C++, V8 for the most parts were also necessary.
does not make use of a class-concept using member variables
when representing JavaScript objects like strings or arrays. 4.4 Detection of sink access
Instead, a small header is used and objects components are Until now we discussed the tracking of tainted data inside
addressed by only using given offsets relative to the object’s the V8 JavaScript engine and WebKit. The next step in our
address. implementation was to detect a tainted flow and to notify
5.1 Anatomy of a Cross-Site Scripting Exploit
Extension To develop a system that is capable of generating valid
XSS payloads, we first analyzed the nature of a Cross-Site
Scripting exploit. In general, an exploit is context depen-
V8 JS eval report dent. This means, that a payload, which an attacker seeks
to execute, depends on how the Web application processes
the attacker’s input. So, if the input flows into the eval
WebKit document.write
sink it has to utilize a different syntax than an exploit tar-
geting flows into document.write (More details on context-
dependent exploit generation can be found in the Section
Figure 1: Report functionality 5.2). However, the structure of an exploit can be general-
ized to a non-context-dependent form.
Listing 1 shows two typical exploits. The first exploit
targets a JavaScript context (e.g. eval), while the second
the user. Therefore, we modified all DOM-based Cross-Site one contains an exploit for an HTML sink (e.g. docu-
Scripting sinks – like document.write, innerHTML or eval. ment.write). In many cases a tainted value was concate-
We changed them in such a way that a reporting function is nated from several different strings, which are hard coded
called each time a tainted string is passed to such a sink. In (benign/non-attacker-controllable) or coming from either one
order to pass on the report to the user interface, we imple- or more sources (attacker-controllable). Therefore, an at-
mented a Chrome extension, that injects the JavaScript re- tacker is only able to control parts of the string that flows
porting function into the DOM. As such a function is callable into the sink. Immediate execution of JavaScript is often
from inside the runtime engine, we are able to report the flow not possible at the location where the tainted/controllable
to the extension. The details on the layout and implemen- parts are inserted into the string/code (e.g. within quoted
tation of this extension are presented in 6.1. strings). Therefore, the exploit first has to break out of the
In WebKit’s API used to provide access to the DOM tree current context to be able to execute the malicious script
for V8, the passed arguments are of V8’s string class and are code. The first part of each exploit serves as a ”break out
then converted to WebKit’s string type. Hence, we chose sequence” to escape to a context where JavaScript execu-
to implement our reporting function into V8’s string class, tion is possible. In the cases presented in Listing 1 these
therefore allowing us to invoke it from the DOM API as sequences are ”’);” and ”></a>”, respectively. Following
well as directly from V8 using the provided string reference. the break out sequence, an arbitrary JavaScript payload or
When called, this function gathers information on the code <script> tag can be executed. Afterwards, the exploit has
location of the currently executed instruction and reports to take care of trailing string fragments in such a way that
these alongside the taint information and details on the type these fragments do not interfere with the execution of the
of sink to the extension. payload. For example, if a string that is passed to eval con-
Figure 1 depicts this layout. Both the indicated functions tains a syntax error, no code will be executed at all, even
eval and document.write use the reference to the passed if the syntax error occurs at the very end of the string. To
string to invoke the reporting function which in turn passes prevent this from happening an exploit has to include an
on the information to the Chrome extension shown at the escape sequence that renders trailing characters harmless.
top. In the JavaScript case we simply comment out everything
that follows our payload and in the HTML case we close the
script block and include a <textarea> to interpret the rest
of the string as simple text instead of HTML. To summarize
our analysis, we conclude that a Cross-Site Scripting exploit
5. VULNERABILITY VERIFICATION: AU- takes the following generalized form:
TOMATIC EXPLOIT GENERATION
Although the taint-tracking engine delivers first indica-
tions for potential Cross-Site Scripting vulnerabilities, de-
tecting a flow alone is not sufficient to ensure that a vulner- exploit := breakOutSequence payload escapeSequence
ability was discovered. There are various reasons why a suc- (1)
cessful exploitation is not possible for an existing flow. For In this, only the breakOutSequence and the escapeSequence
example, the Web site could use built-in or custom encoding are context-specific. While the escapeSequence is very trivial
or filter functions that are capable of defusing a malicious to choose, the breakOutSequence needs careful crafting to
payload. Furthermore, other, random circumstance can oc- result in a successful exploit.
cur that prevent an exploit from executing. For example, if
the tainted value originates from a GET parameter, tamper-
ing with this parameter could trigger the Web server to load
a different page or to display an error message in which the Listing 1 Example Cross-Site Scripting exploits
vulnerable flow is not present anymore. Therefore, a verifi- ’); alert ( ’ XSS ’);//
cation step is needed to tell vulnerable data flows apart from " > </a > < script > alert ( ’ XSS ’) </ script > < textarea >
non-exploitable flows. In order to do so our system uses the
data received from the taint- tracking engine to reliable gen-
erate valid Cross-Site Scripting Exploits. In this Section we
describe the inner workings of the generation process.
5.2 Context-Dependent Generation of Break- Depending on the the node type, break out sequences have
out Sequences to be generated differently. In the following, we explain the
After discovering a data flow from one or more sources three different approaches:
to a sink, the taint-tracking engine delivers three pieces of
information to the exploit generation framework: TagNode generation.
If the tainted value is included within an HTML tag we
1. Information on the the data flow (sources, sink, applied first need to break out of this tag. Otherwise, opening a
built-in filters) <script> tag would have no effect. If the tainted value
is directly located within the tag, we can simple do so by
2. Tainted value: the complete string that flowed into the adding a ”>” sign to the break out sequence. If the tainted
sink (including benign and tainted parts from one or value resides within an attribute of the tag, the system first
more sources) needs to determine the delimiter of the attribute. Most of
the time such attributes are either enclosed by single or dou-
3. Byte-wise taint information for each byte contained in
ble quotes, however, sometimes, also no delimiter is present.
the tainted string.
So in order to break out of the tag in this case we need
Based on the given sink the framework first determines to add the delimiter of the attribute node before the angle
the target context. Depending on this context, the tainted brackets.
value and the taint information are passed to a context- Now our payload is able to break out of the current (open-
sensitive break out sequence generation function. In the next ing) tag and would be able to open a script tag, to execute
step, the generator adds the desired payload and a context- the payload. However, some tags have special semantics for
specific fixed escape sequence. After constructing the ex- the text between the opening and the closing tag. So for
ploit, the system builds a test case that can be executed example, HTML markup between an opening and closing
in a completely automated fashion and reports back to the iframe tag is only rendered in case iframes are not sup-
framework in case of successful exploitation. ported by the browser. Therefore, our generator optionally
adds one or more additional closing tags at the end of the
5.2.1 HTML context-specific generation break out sequence for all present tags with special seman-
An HTML context is present whenever a tainted string tics. To summarize this, a TagNode break out sequences
is directly converted into HTML code. This is the case for looks as follows:
many DOM manipulation functions such as document.write
or innerHTML. T agN odeBS := [delimiter] > [closingT ags] (2)
As mentioned before, often only parts of a string may be
tainted. Therefore, our system first determines the exact CommentNode generation.
location of the tainted parts by analyzing the taint informa- The generation of CommentNode break out sequences is
tion. In order to create a valid exploit, the system needs very trivial in most of the cases. As comments in HTML
to determine into which DOM node the tainted parts will do not have any special semantics for their content, we can
be transformed when the string-to-HTML conversion takes simply break out of a comment by adding ”- ->” to our break
place. In order to do so, the generator parses the complete out sequence. However, such a comment could in rare cases
string and identifies the corresponding nodes. Based on the be placed in between opening and closing tags of scripts,
node types the generator is able to plan the next step within iframes, etc. So, again our system analyzes the string and
the generation process. In this first step we distinguish be- adds closing tags for these elements if necessary. Summing
tween three different node types (See Listing 2 for examples): up, a CommentNode break out sequence takes the following
form:
1. Tainted TagNode: The tainted value is located in-
side an HTML tag. Either it is part of the tag name,
an attribute name, an attribute value or a combination CommentN odeBS := −− > [closingT ags] (3)
of those three possibilities.
2. Tainted CommentNode: The tainted value is con- TextNode generation.
tained within an HTML comment. Every character sequence that is placed outside a tag or
a comment or that is located in between an opening and a
3. Tainted TextNode: The tainted value is placed out- closing tag is regarded as a TextNode by the HTML parser.
side of an HTML tag or in between a closing and an In many cases executing a payload within a TextNode is
opening tag. straight forward. As we do not need to break out of the
node itself, we can simply open a script tag and execute
a payload. However, if the TextNode is placed between an
Listing 2 Example Vulnerabilities opening and a closing tag of a script or iframe we again
have to add closing tags if necessary.
document . write ( ’ < script src ="// example . org / ’
+ taintedValue + ’" > </ script > ’) T extN odeBS := [closingT ags] (4)
document . write ( ’ <div > ’ + taintedValue + ’ </ div > ’)
innerHTML vs document.write.
document . write ( ’ <! - - ’ + taintedValue + ’ --> ’) After we have generated the break out sequence for HTML
context exploits, the system needs to choose a payload to
execute. When doing so, some subtle differences in the han- Listing 3 JavaScript context example
dling of string-to-HTML conversion comes into play. When
using innerHTML, outerHTML or adjacentHTML browsers re- var code = ’ function test (){ ’ +
’ var x = " ’ + location . href + ’ "; ’
act differently than document.write in terms of script exe- // inside function test
cution. While document.write inserts script elements into + ’ doSomething ( x ); ’
the DOM and executes them immediately, innerHTML only + ’} ’; // top level
performs the first step, but does not execute the script. So eval ( code )
adding the following payload for an innerHTML flow would
not result in a successful exploit:
Listing 4 JavaScript Syntax Tree
<script>__reportingFunction__()</script>
FunctionDeclaration
However, it is still possible to execute scripts via an injec- Identifier : test
tion through innerHTML. In order to do so, the framework FunctionConstructor
makes use of event handlers: Identifier : test
Block
<img src="none_existent_resource" Declaration
onerror="__reportingFunction__()"> Identifier : x
StringLiteral : "http://example.org"
When innerHTML inserts the img tag, the browser creates ExpressionStmt
an HTTP request to the non-existing resource. Obviously, S pe c ia l Op e ra ti o n : FUNCTION_CALL
this request will fail and trigger the onerror event handler Reference
that executes the given payload. Depending on the sink we Identifier : doSomething
simply choose one of these two payloads.

5.2.2 JavaScript context-specific generation the extracted branch (in gray). For each of the extracted
JavaScript context-specific generation is necessary when- branches the generator creates one break out sequence by
ever a data flow ends within a sink that interprets a string traversing the branch from top to bottom and adding a fixed
as JavaScript code. This is the case for functions such as sequence of closing/break out characters for each node. So
eval & Function, Event handlers (such as onload and on- in our example the following steps are taken:
error) and DOM properties such as script.textContent, 1. FunctionDeclaration: ’;’
script.text and script.innerText. While browsers are 2. FunctionConstructor: ”
very forgiving when parsing and executing syntactically in- 3. Block: ’}’
correct HTML, they are quite strict when it comes to Java- 4. Declaration: ’;’
Script code execution. If the JavaScript parser encounters a 5. StringLiteral: ’”’
syntax error, it cancels the script execution for the complete 6. Resulting Breakout Sequence: ’”;};’
block/function. Therefore, the big challenge for the exploit
generator is to generate a syntactically correct exploit, that To trigger the exploit we can simply construct the test
will not cause the parser to cancel the execution. In order to case as follows: Based on the source (location.href), the
do so, the system again has to determine the exact location system simple adds the break out sequence, an arbitrary
of the tainted bytes. payload and the escape sequence to the URL of the page:
Listing 3 shows a very simple vulnerable piece of Java- http://example.org/#";};__reportingFunction__();//
Script code. In the first step, the code constructs a string of
benign/hard coded and tainted (location.href) parts. In a When executed within a browser, the string construction
second step, it executes the code using eval. Thereby, this process from Listing 3 is conducted and the following string
code can be exploited in slightly different ways. Either the flows into the eval call (Note: Line breaks are only inserted
attacker could break out of the variable x and inject his code for readability reasons):
into the function named test, or he could break out of the
variable x and the function test and inject his code into the
function test (){
top level JavaScript space. While the first method requires var x = " http :// example . org /# " ;
an additional invocation of the test function, the second ex- };
ploit executes as soon as eval is called with a syntactically _ _ r e p o r t i n g F u n c t i o n _ _ ();
correct code. However, for the last case, the complexity of // doSomething ( x );}
the break out sequence grows with the complexity of con-
structed code. Nevertheless, we do not want do rely on any
behavior of other non-controllable code or wait for a user in-
teraction to trigger an invocation of the manipulated code. 6. EMPIRICAL STUDY
As mentioned earlier, an important motivation for our
Therefore, we always seek to break out to the top level work was to gain insight into the prevalence and nature of
of the JavaScript execution. In order to do so, our system potentially insecure data flows in current JavaScript applica-
first parses the JavaScript string and creates a syntax tree tions leading to DOM-based XSS. For this reason, we created
of the code. Based on this tree and the taint information a Web crawling infrastructure capable of automatically ap-
we extract the branches that contain tainted values. Listing plying our vulnerability detection and validation techniques
4 shows the resulting syntax tree for our example code and to a large set of real-world Web sites.
Browser'1' Browser'm'
browser features that were needed for the crawling and an-
Tab'1' Tab'n' Tab'1' Tab'n'
alyzing processes were realized in the form of a browser ex-
Web' Web' Web' Web' tension.
page' page' page' page'
Following the general architecture of Chrome’s extension
' ' ' '

&'
'
&' '
&'
'
&' '

user' user' user' user'


script' …' script' …' script' …' script' model [8], the extension consists of a background and a con-
tent script (see Fig. 2). The background script’s purpose
content'' content'' content'' content'' is to request target URLs from the backend, assign these
script' script' script' script'
URLs to the browser’s tabs (for each browser instance, the
extension opened a predefined number of separate browser
Background'script' Background'script' tabs to parallelize the crawling process), and report the find-
ings to the backend. The content script conducts all actions
that directly apply to individual Web documents, such as
collecting the hyperlinks contained in the page for the fur-
Control'backend'
ther crawling process and processing the data flow reports
from the taint-tracking engine (see Sec. 4.4). Furthermore,
Figure 2: Crawling infrastructure the content script injects a small userscript into each Web
document, that prevents the examined page from display-
ing modal dialogues, such as alert() or confirm() message
6.1 Methodology & Architecture Overview boxes, which could interrupt the unobserved crawling pro-
To obtain a realistic picture on the commonness of inse- cess.
cure data flows that might lead to DOM-based XSS, it is After the background script assigns a URL to a tab, the
essential to sample a sufficiently large set of real-world Web content script instructs the tab to load the URL and ren-
sites. der the corresponding Web page. This implicitly causes all
We designed our experiment set-up to meet these require- further external (script) resources to be retrieved and all
ments, utilizing the following components: Our flow-tracking scripts, that are contained in the page, to be executed. Af-
rendering engine to identify and record potentially unsafe ter the page loading process has finished, a timeout is set to
JavaScripts (as discussed in Sec. 4), our exploit generation allow asynchronous loading processes and script execution to
and validation framework (as presented in Sec. 5), and a terminate. After the timeout has passed, the content script
crawling infrastructure that automatically causes the brows- packs all suspicious data flows, which were reported during
ing engine to visit and examine a large set of URLs. execution of the analyzed page, and communicates them to
Our crawling infrastructure consisted of several browser the background script for further processing.
instances and a central backend, which steered the crawling In addition to data flow and hyperlink data, the extension
process. Each browser was outfitted with an extension that also collects statistical information in respect to size and
provided the browser with the required external interface for nature of the JavaScripts that are used by the examined
communication with the backend (see Fig. 2). In the follow- sites.
ing paragraphs, we briefly document both the backend’s and
the extension’s functionality. 6.2 Observed Data Flows
As mentioned above, our initial set of URLs consisted of
6.1.1 Analysis engine: Central server backend the Alexa top 5000. For each of these URLs we conducted
The main duty of the central analysis backend is to dis- a shallow crawl, i.e., all same-domain links found in the re-
tribute the URLs of the examination targets to the browser spective homepages were followed, resulting in 504,275 ac-
instances and the processing of the returned information. cessed Web pages. On average each of those Web document
The backend maintains a central URL queue, which was consisted out of 8.64 frames resulting in a final number of
initially populated with the Alexa Top 5000 domains and 4,358,031 (not necessary unique) URLs.
subsequently filled with the URLs that were found by the In total our infrastructure captured 24,474,306 data flows
browsers during the crawling process. from potentially tainted sources to security sensitive sinks.
The browser instances transmit their analysis report and Please refer to Table 1 for details on the distribution of flows,
their findings to the backend. For each analyzed URL, anal- depicted by their sources and sinks.
ysis reports for several URLs are returned, as the browser
instances not only check the main page but also all con- 6.3 Selective Exploit Generation
tained iframes. In our study, we received results for an As shown in the previous Section, the total number of
average of 8.64 (sub-)frames for each URL that was given to potentially vulnerable data flows from insecure sources to
a browser instance. After pre-processing and initial filter- security sensitive sinks is surprisingly high. In our study, the
ing, the backend passes the suspicious flows to the exploit sheer number of found flows exceeds the number of analyzed
generation unit (see Sec. 6.3). pages by a factor of about 48.5.
Both our exploit generation and validation processes are
6.1.2 Data collection: Browser Extension efficient. Generating and testing an XSS exploit for a se-
As discussed in Section 4, we kept direct changes to the lected data flow requires roughly as much time as the initial
browser’s core engine as small as possible, to avoid unwanted analyzing process of the corresponding Web page. However,
side effects and provide maintainability of our modifications. due to the large amount of observed flows, testing all data
Our patches to the browser’s internal implementation con- flows would have required significantly more time than the
sisted mainly in adding the taint-tracking capabilities to the actual crawling process. Hence, to balance our coverage and
Javascript engine and DOM implementation. All further broadness goals, we selected a subset out of all recorded, po-
URL Cookie document.referrer window.name postMessage Web Storage Total
HTML Sinks 1,356,796 1,535,299 240,341 35,466 35,103 16,387 3,219,392
JavaScript Sinks 22,962 359,962 511 617,743 448,311 279,383 1,728,872
URL Sinks 3,798,228 2,556,709 313,617 83,218 18,919 28,052 6,798,743
Cookie Sink 220,300 10,227,050 25,062 1,328,634 2,554 5,618 11,809,218
Web Storage Sinks 41,739 65,772 1,586 434 194 105,440 215,165
postMessage Sink 451,170 77,202 696 45,220 11,053 117,575 702,916
Total 5,891,195 14,821,994 581,813 2,110,715 516,134 552,455 24,474,306

Table 1: Data flow overview, mapping sources (top) to sinks (left)

tentially vulnerable data flows, based on the following crite- 5083, the total number of exploits tested in Chromium was
ria: reduced to 137,826, whereas the remaining 43,412 exploits
(C1) The data flow ended in a sink that allows, if no fur- were tested using Internet Explorer.
ther sanitization steps were taken, direct JavaScript Out of these, a total number of 58,066 URLs tested in
execution. Hence, all flow into cookies, Web Storage, Chromium triggered our verification payload. Additionally,
or DOM attribute values were excluded. we could exploit 11,921 URLs visited in Internet Explorer.
(C2) The data flow originates from a source that can imme- This corresponds to a success rate of 38.61% in total, and a
diately be controlled by the adversary, without pro- success rate of 42.13% when only considering vulnerabilities
grammatic preconditions or assumptions in respect to exploitable in Chromium.
the processing code. This criteria effectively excluded As we discussed earlier, we crawled down one level from
all flows that come from second order sources, such the entry page. We assume that a high number of Web sites
as cookies or Web Storage, as well as flows from the utilize content management systems and thus include the
postMessage API. same client-side code in each of their sub pages. Hence, to
(C3) Only data flows without any built-in escaping methods zero in on the number of actual vulnerabilities we decided
and data flows with non-matching escaping methods to reduce the data set by applying a uniqueness criterion.
were considered. Data flows, for which the observed For any finding that triggered an exploit, we therefore re-
built-in escaping methods indeed provide appropriate trieved the URL, the used break out sequence, the type of
protection for the flow’s final sink were excluded. code (inline, eval or external) and the exact location. Next,
(C4) For each of the remaining data flows we generated ex- we normalized the URL to its corresponding second-level
ploits. However, many flows led to the generation of domain. To be consistent in regards to our selection of do-
exactly the same exploit payloads for exactly the same mains, we used the search feature on alexa.com to determine
URL - e.g. when a web page inserts three scripts via the corresponding second-level domain for each URL. We
document.write and always includes location.hash then determined for each of the results the tuple:
at a similar location. In order to decrease the overhead
for testing the exploits, our system only validates one {domain, break out sequence, code type, code location}
of these exploits. In regards to the code location, we chose to implement the
Starting from initial 24,474,306 flows, we successively ap- uniqueness to be the exact line and column offset in case
plied the outlined criteria to establish the set of relevant of external scripts and evals, and the column offset in in-
flows: line scripts. Applying the uniqueness filter to the com-
C1 C2
24, 474, 306 −−→ 4, 948, 264 −−→ 1, 825, 598 plete dataset including those pages only exploitable on In-
(5) ternet Explorer, we found a total of 8,163 unique exploits
C3 C4
−−→ 313.794 −−→ 181, 238 on 701 different domains, whereas a domain corresponds to
the aforementioned normalized domain. Due to the nature
Thus, in total we generated 181,238 test payloads, out of of our approach, among these were also domains not con-
which a total of 69,987 successfully caused the injected Java- tained in the top 5000 domains. Thus, we applied another
Script to execute. We discuss the specifics of these results filter, removing all exploits from these domains outside the
in the next Section. top 5000. This reduced the number of unique exploits to
6,167, stemming from 480 different domains. In respect to
6.4 Found vulnerabilities the number of domains we originally crawled, this means
In total, we generated a dataset of 181,238 test payloads that our infrastructure found working exploits on 9.6% of
utilizing several combinations of sources and sinks. As dis- the 5000 most frequented Web sites and their sub-domains.
cussed in Section 6.3 (C3), all flows which are encoded are When considering only exploits that work in Chromium,
filtered early on. For Google Chromium, which we used in we found 8,065 working exploits on 617 different domains,
our testing infrastructure, adhering to this rule we also must including those outside the top 5000. Again filtering out
filter all those exploits that use either location.search or domains not contained in the 5000 most visited sites, we
document.referrer to carry the payloads. This is due to found 6,093 working exploits on 432 of the top 5000 domains
the fact that both these values are automatically encoded or their sub-domains.
by Chromium. Hence, we chose to test these vulnerabili- Among the domains we exploited were several online bank-
ties in Internet Explorer 10 whereas the rest of the URLs ing sites, a poplar social networking site as well as govern-
were verified using our aforementioned crawling infrastruc- mental domains and a large internet-service provider run-
ture. Since the number of exploits utilizing search vulnera- ning a bug bounty program. Furthermore, we found vulner-
bilities amounts to 38,329 and the sum for referrer reached abilities on Web sites of two well-known AntiVirus products.
6.5 Selected Case Studies amples where code fragments seemed to extract automati-
During the analysis of our findings, we encountered several cally encoded values (and hence no sanitization is needed),
vulnerabilities which exposed interesting characteristics. In but due to non-standard parsing, extracted also unencoded
the following subsections, we provide additional insight into parts in malicious cases.
these cases. 1. Task: Extract host from URL
2. What it really does: Extract everything between www.
6.5.1 JSONP + HTTP Parameter Pollution and .com (e.g. whole URL)
As stated in Section 2, flows into URL sinks are not eas- 3. e.g. http://www.example.com/#notEncoded.com
ily exploitable. Only if the attacker controls the complete
string, he can make use of data and javascript URLs to ex-
var regex = new RegExp ( " / www \..*\. com / g " );
ecute JavaScript code. However, in our dataset we found var result = regex . exec ( location . href );
a particularly interesting coding pattern, that allows script
execution despite the fact that the attacker only controls
parts of the injected URLs. In order to abuse this pattern a 1. Task: Extract GET parameter foo
Web page must assign a partly tainted string to a script.src 2. What it really does: Extracts something that starts
attribute that includes a JSONP script with a callback pa- with foo=
rameter (See Listing 5). 3. e.g. http://www.example.com/#?foo=notEncoded

Listing 5 JSONP script include


var regex = new RegExp ( " [\\?&] foo =([^&#]*) " );
var script = document . createElement ( ’ script ’) var result = regex . exec ( location . href );
script . src = " http :// example . org / data . json ? u = "
+ taintedValue + " & callback = cb_name " ;
1. Task: Extract all GET parameters
2. What it really does: Last GET parameter contains the
In many cases the callback parameter is reflected back
unencoded Hash
into the script in an unencoded/unfiltered fashion. Hence,
3. e.g. http://example.com/?foo=bar#notEncoded
the attacker could inject his own code into the script via
this parameter. However, the callback parameter is hard
coded and the attacker is not able to tamper with it at first location . href . split ( ’? ’ )[1]. split ( ’& ’ )[ x ]
sight. Nevertheless, it is possible to inject a second callback . split ( ’= ’)
parameter into the script URL via the taintedValue. This
results in the fact that two parameters with the same name
and different values are sent to the server when requesting
the script. Depending on the server-side logic the server will 6.5.3 Persistent DOM-based XSS
either choose the first or the second parameter (We found As seen in Table 1, our system captured also some flows
both situations, and depending on the position of the taint- into cookies and into the Web Storage API. However, we did
edValue we were able to exploit both situations). Hence, by not include it into our automatic exploit generation. Nev-
conducting this so-called HTTP Parameter Pollution attack, ertheless, we were able to manually find several persistent
the attacker is able to inject his value into the content of the DOM-based XSS. We detected flows that first came from
script, which is afterwards embedded into the Web page. user input and went into Cookie or Web Storage sinks effec-
One particularly interesting fact is that simply encoding tively persisting the data within the user’s browser. In the
the taintedValue will not protect against exploitation. In- cases where we could trigger a successful exploit, this data
stead, the JSONP callback parameter needs to be sanitized. was then used in a call to eval, hence exposing the Web site
During our experiments we found one vulnerable callback to persistent DOM-based XSS.
parameter quite often on many different Web sites, which
seemed to stem from jQuery (or at least, always called the 6.5.4 window.name flows
same jQuery function). Within our dataset, we detected a surprisingly high num-
ber (>2 million) of flows originating from window.name that
6.5.2 Broken URL parsing we couldn’t explain at first sight. Although some of them
As browsers sometimes auto-encode certain parts of user were exploitable, we soon discovered the reason for this num-
controlled values, it is not possible to inject code into some ber. Most of these flows are not exploitable via DOM XSS
of the analyzed sources. One example for this is loca- as they are caused by a simple programming error. When
tion.search that is auto-encoded by all browser except In- declaring a local variable a developer has to use the var key-
ternet Explorer. Another source that is encoded by every word. If someone declares a variable named name inside a
modern browser is location.pathname. An injection via function and misses the var keyword or if a local variable is
location.pathname is in general not possible until the ap- created directly within a script block that is executed in the
plication itself decodes the value. An additional encoding global scope, the variable is declared global (See Listings 6
or sanitization step is therefore not necessary for these val- and 7). Since inside a browser, the global object is window,
ues. This fact, however, also leads to security vulnerabili- the data is written to window.name. If the same variable is
ties when Web developers trust in this auto-encoding feature used within a call to a sink within the same script block, the
while at the same time conducting incorrect URL parsing. corresponding flow is not exploitable as window.name was
In our analysis, we found many examples where this fact overwritten with benign data. However, this fact represents
leads to vulnerabilities. In the following we cover some ex- another serious issue: window.name is one of the very few
Listing 6 window.name bug 1: Missing var keyword 7. RELATED WORK
function test (){
To the best of our knowledge, DOMinator [7] was the first
name = doSomething (); browser-based tool to test for DOM-based XSS via dynamic
document . write ( name ); taint-tracking. For this purpose, DOMinator instruments
}; Firefox’s SpiderMonkey JavaScript engine. Unlike our tech-
nique, DOMinator does not track data flows on a byte level.
Instead, it employs a function tracking history to store the
operations which were called on the original, tainted input
Listing 7 window.name bug 2: Declaration within the to result into the final, still tainted, string flowing into a
global scope sink. Also, it does not feature a fully automated vulnerabil-
ity validation.
< script > FLAX [25] is the conceptionally closest approach to our
var name = doSomething (); work. Similar to our system, FLAX also utilizes byte-level
document . write ( name ); taint-tracking to identify insecure data flows in JavaScript.
</ script > However, there are several key differences in which we im-
prove over FLAX: For one, FLAX’s taint analysis is not
fully integrated in the JavaScript engine. Instead, the actual
analysis is done on program slices which are translated into
JASIL, a simplified version of JavaScript, which expresses
properties that can be accessed across domain boundaries. the operational semantics of only a subset of JavaScript.
Hence, any data that is written to this property can be ac- In contrast, through extending JavaScript’s low-level string
cessed by third parties. This programming error, therefore, type, we achieve full language and API coverage. Further-
represents a serious information leakage problem, if sensi- more, FLAX employs fuzzing for vulnerability testing, while
tive data is written to such a global name variable. Given our approach leverages the precise source and sink context
the huge amount of flows, it is very likely that this pattern information to create validation payloads that determinis-
could be misused to steal sensitive information. tically match the respective data flows specifics. Finally,
using a large scale study we successfully demonstrated that
our system is compatible with the current code practices in
6.6 Effectiveness of Chromium’s XSS Filter today’s Web. In contrast, FLAX was only practically eval-
Modern browsers like Chromium and its commercial coun- uated on a set of 40 Web applications and widgets.
terpart Google Chrome are equipped with client-side filter Criscione [5] presented an automatic tool to find XSS
capabilities aiming at preventing XSS attacks [1]. In order to problems in a scalable black box fashion. Similar to our
analyze the effectiveness of Chromium’s XSS Filter, we uti- approach, they also use actual browser instances for test ex-
lized our successful exploits and tried to execute them with ecution and vulnerability validation. However, they don’t
the activated filter. Out of the 701 domains we found, 300 utilize taint propagation or precise payload generation. In-
domains were still susceptible to XSS even with Chromium’s stead, the tests are done in a fuzzing fashion.
auditor enabled. Finally, a related approach was presented by Vogt et.
After further examination, we found three distinguishing al. [30], which utilizes a combination of static analysis and
characteristics for these working exploits. For one, none of dynamic information flow tracking to mitigate XSS exploits.
the exploits abusing JavaScript sinks, such as eval(), were However, instead of following the flow of untrusted data, the
detected by XSS Auditor. This stems from the fact that focus is on security sensitive values, such as the user’s cookie,
the auditor is implemented inside the HTML parser and and the potential exfiltration of those.
thus cannot detected direct JavaScript sinks. Furthermore, Server-side approaches and static taint analysis:
exploits that were caused by remote script includes were not On the server-side various approaches using dynamic taint-
detected. The third type of undetected exploits was caused tracking to detect and mitigate XSS vulnerabilities have
by JSONP vulnerabilities as discussed in Section 6.5.1. been proposed [20, 23, 4, 27, 19, 33, 2]. Furthermore, as
On a positive note, in our study, we found that none of the an alternative to dynamic taint-tracking, static analysis of
exploits that targeted inline vulnerabilities passed through source code to identify insecure data flows is a well estab-
the filter. However, please note, that this experiment carries lished tool [9, 28, 31, 14, 32, 10].
no reliable indication of protection robustness in respect to Attack generation: In order to decrease false positive
the exploits, that were stopped. We did not make any at- rates several approaches have been studied to automatically
tempts to obfuscate the exploit payload [12] or use other generate a valid exploit payloads for validation purposes.
filter evasion tricks [13]. In 2008, Martin et al. [18] presented a method to gener-
In 2011 Nikiforakis demonstrated that Chrome’s filter is ate XSS and SQL injection exploits based on goal-directed
not able to cope with exploits that utilize more than one model checking. Thereby, their system QED is capable of
injection point at once [21]. If we take our figures from performing a goal-directed analysis of any Java Web appli-
Section 6.2 into account, we see that a tainted string consists cation, which adheres to the standard servlet specification.
– on average – of about three tainted substrings. Thus, an Based on the constructed model, a model checker is able
attacker has on average three possible injection points in to generate a valid exploit that can be used to validate the
order to leverage the techniques presented by Nikiforakis. finding. As opposed to our approach the system operates
Therefore, we have good reasons to believe that the numbers on the server-side code and thus focuses on server-side in-
presented in this Section must rather be seen as a lower jection vulnerabilities. Similar to this approach, Kieyzun et
bound. al. [15] also focus on the automatic generation of attacks
targeting server-side injection vulnerabilities. In order to do 9. REFERENCES
so, the authors use symbolic taint-tracking and input mu- [1] Bates, D., Barth, A., and Jackson, C. Regular
tations to generate example exploits. Thereby, several test expressions considered harmful in client-side XSS
inputs are transmitted to the target service and depending filters. In WWW ’10: Proceedings of the 19th
on the registered data flows, inputs are mutated to generate international conference on World wide web (New
malicious payloads. As opposed to our approach, their tool York, NY, USA, 2010), ACM, pp. 91–100.
Ardilla also only works on server-side code and thus rather [2] Bisht, P., and Venkatakrishnan, V. N.
targets traditional XSS vulnerabilities. As it requires sev- XSS-GUARD: Precise dynamic detection of cross-site
eral HTTP requests for generating a valid exploit, scaling is scripting attacks. In Detection of Intrusions and
far more difficult than with our approach. In [6], d’Amore Malware & Vulnerability Assessment (DIMVA’08)
et al. present the tool snuck that is capable of automati- (2008).
cally evading server-side XSS filters. To function, however, [3] CERT. Advisory ca-2000-02 malicious html tags
the tool needs input from a human tester that identifies the embedded in client web requests, February 2000.
application’s intended workflows and possible injection vec- [4] Conti, J. J., and Russo, A. A taint mode for
tors. The tool then automatically verifies whether the filter python via a library. In NordSec (2010), T. Aura,
functions works in a correct manner. In order to do so the K. Järvinen, and K. Nyberg, Eds., vol. 7127 of Lecture
system identifies the exact injection context by using XPath
Notes in Computer Science, Springer, pp. 210–222.
queries.
[5] Criscione, C. Drinking the Ocean - Finding XSS at
Empirical studies on JavaScript security: Due to its
Google Scale. Talk at the Google Test Automation
ever growing importance in the Web application paradigm,
Conference, (GTAC’13), http://goo.gl/8qqHA, April
several security-relevant aspects of client-side JavaScript ex-
2013.
ecution have been studied empirically. For one, Yue and
Wang [34] examined the commonness of JavaScript prac- [6] d’Amore, F., and Gentile, M. Automatic and
tices that could lead to unauthorized code execution, namely context-aware cross-site scripting filter evasion.
cross-domain inclusion of external JavaScript files and usage Department of Computer, Control, and Management
of APIs that could lead to XSS. Their study is purely statis- Engineering Antonio Ruberti Technical Reports 1, 4
tically and no real vulnerability validation was conducted. (2012).
Zooming in on eval, Richards et al. [24] study how this [7] Di Paola, S. DominatorPro: Securing Next
problematic API is used in the wild, identifying both usage Generation of Web Applications. [software],
patterns that could be solved with safe alternatives as well as https://dominator.mindedsecurity.com/, 2012.
instances, in which replacing eval would not be a straight [8] Google Developers. Chrome Extensions -
forward task. Furthermore, selected “HTML5” JavaScript Developer’s Guide. [online], http://developer.
APIs have been studied in detail: Lekies & Johns [17] sur- chrome.com/extensions/devguide.html, last access
veyed the Alexa top 500,000 for potentially insecure usage 06/05/13, 2012.
of JavaScript’s localStorage for code caching purposes and [9] Guarnieri, S., Pistoia, M., Tripp, O., Dolby, J.,
Son & Shmatikov [26] examined the Alexa top 10,000 for vul- Teilhet, S., and Berg, R. Saving the world wide
nerabilities occurring from unsafe utilization of the postMes- web from vulnerable javascript. In ISSTA (2011),
sage API. M. B. Dwyer and F. Tip, Eds., ACM, pp. 177–187.
[10] Guha, A., Krishnamurthi, S., and Jim, T. Using
static analysis for Ajax intrusion detection. In
Proceedings of the 18th international conference on
8. CONCLUSION World wide web (WWW’09) (New York, NY, USA,
In this paper, we presented a fully automated approach 2009), ACM, pp. 561–570.
to detect and validate DOM-based XSS vulnerabilities. By [11] Hanna, S., Chul, E., Shin, R., Akhawe, D.,
direct integration into the browser’s JavaScript engine, we Boehm, A., Saxena, P., and Song, D. The
achieve reliable identification of potentially insecure data emperor’s new apis: On the (in) secure usage of new
flows while maintaining full compatibility with productive client-side primitives. In Web 2.0 Security and Privacy
JavaScript code. Furthermore, the precise, byte-level con- (W2SP 2010) (2010).
text informations of the resulting injection points enables us [12] Heiderich, M., Nava, E., Heyes, G., and Lindsay,
to create attack payloads which are tailored to the vulnera- D. Web Application Obfuscation:
bility’s specific conditions, thus, allowing for robust exploit -/WAFs..Evasion..Filters//alert (/Obfuscation/)-.
generation. Elsevier/Syngress, 2010.
Using our system, we conducted a large scale empirical [13] Heyes, G. Bypassing XSS Auditor. [online],
study, resulting in the identification of 6,167 unique vulner- http://www.thespanner.co.uk/2013/02/19/
abilities distributed over 480 domains, demonstrating that bypassing-xss-auditor/, last accessed 08/05/13,
9,6% of the Alexa top 5000 carry at least one DOM-based February 2013.
XSS problem. [14] Jovanovic, N., Kruegel, C., and Kirda, E. Pixy:
A Static Analysis Tool for Detecting Web Application
Vulnerabilities. In IEEE Symposium on Security and
Acknowledgments Privacy (May 2006).
This work was in parts supported by the EU Projects Web- [15] Kieyzun, A., Guo, P. J., Jayaraman, K., and
Sand (FP7-256964) and STREWS (FP7-318097). The sup- Ernst, M. D. Automatic creation of sql injection and
port is gratefully acknowledged. cross-site scripting attacks. In Proceedings of the 31st
International Conference on Software Engineering [25] Saxena, P., Hanna, S., Poosankam, P., and
(Washington, DC, USA, 2009), ICSE ’09, IEEE Song, D. FLAX: Systematic Discovery of Client-side
Computer Society, pp. 199–209. Validation Vulnerabilities in Rich Web Applications.
[16] Klein, A. Dom based cross site scripting or xss of the In NDSS (2010), The Internet Society.
third kind. Web Application Security Consortium, [26] Son, S., and Shmatikov, V. The Postman Always
Articles 4 (2005). Rings Twice: Attacking and Defending postMessage
[17] Lekies, S., and Johns, M. Lightweight Integrity in HTML5 Websites. In Network and Distributed
Protection for Web Storage-driven Content Caching. System Security Symposium (NDSS’13) (2013).
In 6th Workshop on Web 2.0 Security and Privacy [27] Su, Z., and Wassermann, G. The Essence of
(W2SP 2012) (May 2012). Command Injection Attacks in Web Applications. In
[18] Martin, M., and Lam, M. S. Automatic Generation Proceedings of POPL’06 (January 2006).
of XSS and SQL Injection Attacks with Goal-Directed [28] Tripp, O., Pistoia, M., Fink, S. J., Sridharan,
Model Checking. In Usenix Security (2008). M., and Weisman, O. TAJ: Effective Taint Analysis
[19] Nadji, Y., Saxena, P., and Song, D. Document for Java. In ACM SIGPLAN 2009 Conference on
Structure Integrity: A Robust Basis for Cross-site Programming Language Design and Implementation
Scripting Defense. In Network & Distributed System (PLDI 2009) (June 2009).
Security Symposium (NDSS 2009) (2009). [29] Vikram, K., Prateek, A., and Livshits, B. Ripley:
[20] Nguyen-Tuong, A., Guarnieri, S., Greene, D., Automatically securing distributed Web applications
Shirley, J., and Evans, D. Automatically hardening through replicated execution. In Conference on
web applications using precise tainting. In 20th IFIP Computer and Communications Security (Oct. 2009).
International Information Security Conference (May [30] Vogt, P., Nentwich, F., Jovanovic, N., Kruegel,
2005). C., Kirda, E., and Vigna, G. Cross Site Scripting
[21] Nikiforakis, N. Bypassing Chrome’s Anti-XSS filter. Prevention with Dynamic Data Tainting and Static
[online], http://blog.securitee.org/?p=37, last Analysis. In 14th Annual Network and Distributed
access 08/05/13, September 2011. System Security Symposium (NDSS 2007) (2007).
[22] Nikiforakis, N., Invernizzi, L., Kapravelos, A., [31] Wassermann, G., and Su, Z. Sound and Precise
Acker, S. V., Joosen, W., Kruegel, C., Piessens, Analysis of Web Applications for Injection
F., and Vigna, G. You Are What You Include: Vulnerabilities. In Proceedings of Programming
Large-scale Evaluation of Remote JavaScript Language Design and Implementation (PLDI’07) (San
Inclusions. In 19th ACM Conference on Computer and Diego, CA, June 10-13 2007).
Communications Security (CCS 2012) (2012). [32] Xie, Y., and Aiken, A. Static Detection of Security
[23] Pietraszek, T., and Berghe, C. V. Defending Vulnerabilities in Scripting Languages. In 15th
against Injection Attacks through Context-Sensitive USENIX Security Symposium (2006).
String Evaluation. In Recent Advances in Intrusion [33] Xu, W., Bhatkar, S., and Sekar, R.
Detection (RAID2005) (2005). Taint-Enhanced Policy Enforcement: A Practical
[24] Richards, G., Hammer, C., Burg, B., and Vitek, Approach to Defeat a Wide Range of Attacks. In 15th
J. The eval that men do - a large-scale study of the USENIX Security Symposium (August 2006).
use of eval in javascript applications. In ECOOP [34] Yue, C., and Wang, H. Characterizing insecure
(2011), M. Mezini, Ed., vol. 6813 of Lecture Notes in javascript practices on the web. In WWW (2009),
Computer Science, Springer, pp. 52–78. J. Quemada, G. León, Y. S. Maarek, and W. Nejdl,
Eds., ACM, pp. 961–970.

You might also like