J-Force: Forced Execution On Javascript
J-Force: Forced Execution On Javascript
ABSTRACT they can only cover one concrete execution path in one run and may
Web-based malware equipped with stealthy cloaking and obfusca- be unable to hit the spot that conceals malicious behaviors.
tion techniques is becoming more sophisticated nowadays. In this To address the limitations, symbolic and concolic execution based
paper, we propose J-F ORCE, a crash-free forced JavaScript exe- techniques [32, 31, 33] have also been proposed to analyze JavaScript
cution engine to systematically explore possible execution paths programs. While they can generate program inputs and drive the
and reveal malicious behaviors in such malware. In particular, J- execution along various feasible paths, due to the limitations of the
F ORCE records branch outcomes and mutates them for further ex- constraint solvers, overcoming state explosion and handling com-
plorations. J-F ORCE inspects function parameter values that may plex JavaScript operations (e.g., dynamic type conversions, arith-
reveal malicious intentions and expose suspicious DOM injections. metic/string operations) are still open problems, especially for non-
We addressed a number of technical challenges encountered. For trivial programs built atop various frameworks and other obfus-
instance, we keep track of missing objects and DOM elements, and cated programs.
create them on demand. To verify the efficacy of our techniques, In this paper, we propose J-F ORCE, a crash-free1 JavaScript forced
we apply J-F ORCE to detect Exploit Kit (EK) attacks and malicious execution engine. J-F ORCE combines the advantages of static and
Chrome extensions. We observe that J-F ORCE is more effective dynamic approaches: Similar to dynamic analysis, J-F ORCE exe-
compared to the existing tools. cutes the program so that obfuscation is not an obstacle anymore.
To increase the coverage, J-F ORCE forces the execution to go along
different paths. In particular, J-F ORCE records the outcomes of
Keywords branch predicates, mutates them, and explores unvisited paths via
JavaScript; Security; Malware; Evasion multiple executions. This iterative path exploration process con-
tinues until all possible paths are explored. Hence, J-F ORCE can
1. INTRODUCTION expose not only malicious code that can only be triggered by con-
Web-based applications powered by JavaScript are becoming more ditions uneasily met, but also code blocks that are dynamically cre-
widespread, interactive and powerful. In the meanwhile, they are ated and injected. Additionally, J-F ORCE further uncovers paths
attractive targets of various attacks. Unfortunately, detecting and hidden in event and exception handlers. J-F ORCE can detect eva-
analyzing malicious web apps against diverse combinations of ex- sive attacks triggered by non-deterministic events.
ploits and evasive techniques is complicated and challenging. Al- We evaluate J-F ORCE on 50 real-world exploits in popular EKs [1,
though various detection schemes have been proposed [14, 27, 13], 2] and over 12, 000 Chrome extensions. J-F ORCE successfully ex-
they still suffer from sophisticated attacks such as cloaking attacks [21, posed the hidden code of 41 exploits and found that more than 300
35, 22]. Chrome extensions inject advertisements. We also run J-F ORCE
Both static and dynamic approaches have been applied to detect on 100 JavaScript samples and measure its code coverage capacity.
JavaScript malware. Static analysis (e.g., [9, 8]) considers multiple The results show that J-F ORCE can cover 95% of the code with
execution paths and usually achieves better code coverage. How- 2-8x overhead, which is significantly effective than a popular con-
ever, JavaScript is highly dynamic. Static approach may be impre- colic execution technique (68% coverage, 10-10, 000x overhead).
cise and even incapable due to over-approximations and obfusca- In summary, this paper makes the following contributions.
tions. This is a critical limitation since obfuscations have been the • We propose J-F ORCE, a JavaScript forced execution engine
most common practice to hide the real intentions for protections or that explores all possible paths to expose hidden malware be-
malicious reasons. By contrast, dynamic analysis techniques (e.g., haviors. J-F ORCE records and switches branch outcomes to
[16, 32]) execute the program and thus can reveal concrete behav- explore new paths. J-F ORCE unveils function parameter val-
iors even in an obfuscated program. However, a downside is that ues to detect malicious intentions and DOM injection attacks.
• We address several technical challenges to avoid crashes dur-
ing the continuous path explorations. For instance, J-F ORCE
keeps track of missing objects/DOM nodes and creates them
c 2017 International World Wide Web Conference Committee (IW3C2),
published under Creative Commons CC BY 4.0 License. on demand. J-F ORCE can tolerate critical exceptions and
WWW 2017, April 3–7, 2017, Perth, Australia. handle infinite loops/recursions.
ACM 978-1-4503-4913-0/17/04. • We validate the efficacy of J-F ORCE through an extensive set
http://dx.doi.org/10.1145/3038912.3052674 of experiments on real-world exploits and web browser ex-
1
In our paper, crash-free is about avoiding or handling JavaScript
. exceptions.
897
http://bbb.com/shop2.html http://ppp.org/abc.js obfuscated Timer handler
<html>
function FC3d(DzV, lm8H2) {
EDXGD= function() { J-Force Driver Object Management
…
<script> …
for(HPFY=0;DVz.length>HPFY;HPFY+=8)
… Internet ... d5+=String.fromCharCode(...)...return elem.appendChild(script);
</script> unescape(d5);} eval } Exec #1 Exception
… ... setTimeout(EDXGD, 10); Exec #2 Management
</html> lTZI04 = FC3d(VkpZF,MG6V);eval(lTZI04);
obfuscated
ieTrue = navigator.userAgent.toLowerCase() k=document[‘createElement’](‘script’) …
Exploit / browser = /msie[\/s]d+/i.test(ieTrue) …
Payload … k[‘text’]=S5SSQ(“AWFRMWtbFnshSQG DOM Management
if(browser) { IESFJaRB94ZxUBXVMbUeEVXXnddR9Q
... GmpXbR9aa....”);
e.insertBefore(a,b); ...
} d.appendChild(k);
http://ggg.net/opq.js
11. document.body.appendChild(btn);
Recently, Exploit Kits (EKs) have been favored by cybercrim- 12. </script>
inals to perform web-based attacks. In the last year alone, more 13.
14.
...
<script>
than 14 attacks were reported to CVE2 . Since EKs are specially 15. x = document.getElementById("mybutton");
16. if (...) {...}
designed to exploit known browser related defects, such attacks 17. if (...) {...}
are highly effective: once a vulnerable client reaches the actual 18. </script>
EK landing page, EK will silently download and install a malware. Figure 3: Example for per-block path exploration.
Therefore, as a defense, it is critical to identify suspicious EK de-
livery at the first place. Among various delivery vectors, malver-
tising [10, 37] is one of the most dangerous and successful deliv- it is challenging for such techniques to be scalable to complicated
ery approaches. In this section, we show a real-world EK deliv- and large real-world JavaScript programs due to the limitations im-
ery equipped with layered obfuscation and cloaking techniques to posed by the underlying constraint solvers.
demonstrate our approach. Unfortunately, as shown in Table 1, existing JavaScript malware
Fig. 1 presents a carefully designed multi-layer EK attack chain detection tools are not effective to detect such malware in a scalable
featured with collaborative cloaking techniques such as code obfus- way. In particular, while Rozzle [22] performs path explorations on
cation, dynamically created scripts and evasive paths: (1) The first JavaScript programs to reveal evasive malicious behaviors, it can-
obfuscated JS(JavaScript) snippet (http://ppp.org/abc.js) not disclose code in event handlers as its analysis scope is limited
is delivered to a legitimate website via malvertising. (2) When to functions that are explicitly invoked.
it is evaluated during the page loading, it creates a piece of dy-
J-F ORCE Overview. J-F ORCE employs a forced execution tech-
namic code from strings using eval. (3) The function EDXGD
nique by switching branch outcomes and invoking event handlers.
in the resulting snippet injects code for the next. Interestingly,
As shown in Fig. 2, J-F ORCE explores feasible paths and reveals all
EDXGD is injected as an event handler and can only be invoked
the instructions irrespective of branch conditions in multiple con-
when the timeout event is fired. Once evaluated, the second piece
crete executions. Also, event and exception handlers are forcibly
of obfuscated snippet (http://ggg.net/opq.js) will be in-
invoked without emulating the events. By doing so, J-F ORCE is
jected into the DOM tree and executed. (4) As a result, another
able to reach and expose malicious logic that can only be triggered
dynamic script is created and injected (d.appendChild(k)).
by a particular combination of events and inputs. Moreover, J-
(5) The injected code uses a cloaking method to hide the mali-
F ORCE is dynamic analysis. Hence, it can handle obfuscations and
cious payload: It first checks if the client browser can be the tar-
disclose concrete function parameter values, which could further
get (navigator.userAgent and msie). The hidden code is
reveal malware behaviors (e.g., identifying eval content).
executed only if the check result (browser) is true.
Existing Approaches. As two pieces of JavaScript (abc.js and
opq.js) in the chain are obfuscated, static analysis based detec- 3. DESIGN OF J-FORCE
tion mechanisms [14, 9, 28, 11] may have difficulties in under- In this section, we present the details of J-F ORCE. We first
standing the real semantics and thus are ineffective to handle such discuss the J-F ORCE execution model. Then we describe how J-
cases. Discovering the execution path that can reveal the final ex- F ORCE explores multiple execution paths.
ploit payload using dynamic approaches is also difficult. Particu-
larly, it requires invocations of event handlers and proper environ- 3.1 J-Force Execution Model
ment settings (e.g. IE browser), which are conditions not easily The execution model of J-F ORCE is designed based on the de-
met in general. Symbolic and concolic execution techniques [32, fault page rendering model.
31, 33] can be used to explore multiple feasible paths. However,
2
3.1.1 Per-block Exploration
CVE-2015-3090, CVE-2015-3105, CVE-2015-5122, CVE-2015-1671, CVE-2015-
5119, CVE-2015-5560, CVE-2015-7645, CVE-2015-8651, CVE-2015-8446, CVE- The default page rendering order drives the execution of J-F ORCE.
2016-1019, CVE-2016-1001, CVE-2016-0189, CVE-2016-0034, CVE-2016-4117 Once a <script> block is evaluated, J-F ORCE starts exploring
898
Table 1: The comparison of the approaches for JavaScript malware detection.
Obfuscation Path Explora- State Explo- Events Exceptions
Name Category Target Scope
Resilient tion Support sion Free Covered Covered
WebEval [18] Static & Dynamic Analysis X 7 X 7 7
Expector [37] Dyanamic Analysis X 7 X X 7 Chrome Extension
Hulk [20] Static & Dynamic Analysis X 7 X X 7
Revolver [21] Static & Dynamic Analysis X 7 X 7 7
JSAND [13] Dynamic Analysis X 7 X 7 7
Nozzle [27] Dynamic Analysis X 7 X 7 7 Generic
Zozzle [14] Static Analysis 7 7 N/A 7 7
Rozzle [22] Dynamic (Symbolic Value) X X 7 7 7
J-F ORCE Forced Execution X X X X X Generic
1. function __necdel()
all other possible paths within the block. In particular, when J- 2. {
F ORCE reaches the exit of the block, it goes back and explores 3. var script = document.createElement("script");
4. //...
another unvisited path. Consider the example in Fig. 3. J-F ORCE 5. script.src = "http: //xxx.xxxxxxx.net/";
6. var protocol = ("https:" == document.location.protocol: "http://");
explores the two paths in lines 1-12 before exploring the paths in 7.
the next <script> block in 14-18. 8. var head = document.getElementsByTagName("head")[0];
9. if ((protocol === "http://") && head)
An alternative is to consider all code blocks as one giant block 10. head.appendChild(script);
and explore paths in the “merged” block. However, it can hardly 11. }
12. window.addEventListener("mouseover", __necdel, false);
scale because the total number of paths to be explored is the product
of the path numbers in every individual block, whereas in the per- Figure 4: Code injection upon “mouseover” event.
block strategy it is the sum of the number of paths in every block.
Please note that an external JS script is essentially a single code
set of handlers can only be triggered by user and timer events. In
block and hence can be explored in a similar way.
our experience, JS malware extensively leverages event handling
mechanism to lay out the attack agenda. Fig. 4 shows a simplified
3.1.2 Handling Inter-Block Dependencies step in the malware delivery chain. __necdel() is registered as
One challenge brought by the per-block design is how to con- an event handler of mouseover event. The script for the next
sider the dependences across code blocks. For example, in Fig. 3, a step will not be injected unless the event is triggered. Indeed, we
same button is set with different texts (Remove and Skip) along observed many malicious payloads only get triggered by a series
different paths in lines 2-11. Without storing states along different carefully organized user or timer events to escape from being de-
execution paths, our analysis may miss critical states that may lead tected by honey-client systems or other automatic detection tools.
to malicious behavior. For instance, if we explore the path 7-9 af- Therefore, exploring event handlers is critical.
ter 2-5. “Remove” will be overwritten by “Skip” and becomes J-F ORCE remembers functions registered as event handlers and
invisible to blocks afterwards. forces them to be executed. In particular, after the exploration of
While exploring paths globally is the ideal solution, it is unscal- the current code block, handlers that are registered during explo-
able and impractical. Instead, we develop the following technique ration are executed, without requiring the triggering events. The
based on the observation that most inter-block dependences are individual handlers are considered as code blocks that are explored
caused by DOM objects. Since it is valid to have multiple ele- separately. To the best of our knowledge, most existing honey-
ments with the same name or id on the DOM tree, J-F ORCE allows client systems and JS symbolic execution engines (e.g, [31]) do not
any DOM injections along any paths. Also, J-F ORCE intercepts emulate events. Hence, they cannot reveal sophisticated handler-
relevant DOM APIs (e.g. getElementById) and injects choice related behaviors.
points, which are conceptually equivalent to switch-case state-
ments. So, each execution returns a DOM element (with the same 3.1.4 Handling Asynchronous Execution
id or name) until all such elements are explored. For example, in Currently, J-F ORCE does not focus on exposing race conditions
Fig. 3, both buttons will be appended to the DOM tree. It fur- caused by asynchronizations [29, 38]. In fact, most JS races are
ther inserts a choice point at line 15. As a result, totally 8 paths transient [24]. In our experience, we have not observed any real-
are explored in the second block, where 4 are corresponding to the world malicious attacks leveraging race conditions due to its non-
“Remove” button and the remaining 4 are for the “Skip” button. deterministic and unreliable nature.
In theory, dependencies caused by global variables are handled in J-F ORCE respects browser’s decision on which block runs first.
the same way. However, it is very expensive to do so for all global Note that JavaScript execution is single threaded and the execution
variables. Given our focuses are stealthy behaviors that are usu- of a code block cannot be interrupted. J-F ORCE only steps in when
ally based on string operations, we selectively support global string a block is being evaluated for the purpose of per-block code explo-
variables. Furthermore, J-F ORCE also overwrites container inter- ration.
faces (e.g., hashmap) to support inserting multiple strings with the
same key to a global container. String attributes of DOM objects 3.1.5 Handling Dynamic Code Evaluation
are handled similarly, where choice points are injected to access the JavaScript is highly dynamic. Malicious JS snippets can be dy-
different versions. namically created from strings. For example, a common practice is
to create a <script> element, specify its source and attach it to
3.1.3 Handling Event Handlers the DOM tree. eval() is another way to run dynamic code.
Some event handlers, such as onload, are automatically executed J-F ORCE admits all code injections found along different paths
when the corresponding DOM objects are loaded or created. The during the path exploration. Consequently, they will be explored
exploration is driven by the rendering procedure. However, another like other code on the DOM tree. Some code snippets may be added
899
1. obj = new XMLHttpRequest(); // D1 Line # Defines
to DOM elements that have already been rendered and explored by 2. //... 1 D1
3. if (cond)
J-F ORCE. For such cases, J-F ORCE restarts the rendering proce- 4. obj = null; // D2
2 D1
D1
3
dure but only explores the uncovered injected snippets. 5. if (obj == null) 4 D2
6. return; 5 D1 | D2
For code dynamically evaluated by functions like eval, J-F ORCE 7. obj.send(); 6 D1 | D2
explores the code snippet concealed in the function parameter, as a 7 D1 | D2
part of the parent code block exploration. Note that J-F ORCE pro- Execution #1 Execution #2 Value (obj)
vides versioning support for strings so that different but concrete 1. obj := XMLHttpRequest 1. obj := XMLHttpRequest 1. XMLHttpRequest
2. ---- 2. ---- 2. XMLHttpRequest
parameter values produced by previous logic will be explored. 3. (taken) 3. (taken) 3. XMLHttpRequest
4. obj := null 4. obj := null 4. null
5. (taken) 5. (untaken) 5. null
3.2 Path Exploration 6. return 6. ---- 6. null
7. ---- 7. obj.send (crash!) 7. null
J-F ORCE explores different paths in multiple runs. In each run,
Figure 5: Handling crashes caused by missing objects.
it looks for opportunities where mutating a predicate leads to un-
explored instructions. Once found, it forces the execution to cover plored instructions. At line 16, J-F ORCE starts the execution with
them in future iterations. It repeats this procedure until all instruc- no forced execution scheme and just runs the whole program nor-
tions are covered. We designed two exploration strategies depend- mally. The purpose of this step is to obtain a list of predicates on
ing on the needs. one path. Then, J-F ORCE can develop a new scheme by mutating
• L-path executes each instruction at least once with linear a predicate at line 22 to execute uncovered instructions (line 21).
time complexity. Exploring all distinct paths is not its prior- The driver repeats this until the worklist is empty, meaning that
ity. For JS malware analysis, this strategy is sufficient in most no further opportunities can be discovered. Although the explo-
cases as malicious behaviors are usually hidden in blocks. ration algorithm stems from L-path strategy, E-path takes the same
• E-path aims at exploring all possible execution paths with phase except at line 21. Particularly, at the given branch, instead of
exponential time complexity. We observed that only a few checking if its feasible targets are disclosed, E-path makes sure the
advanced malware examples requires the E-path strategy. branch is followed along with two different targets.
900
1. if (window.attachEvent) { 1. if (...) {
2. window.attachEvent("onload", window["load" + initialize]); // ... 2. var script = document.createElement("script");
3. } else { 3. script.src = "http://.../a.js";
4. window.addEventListener("load", initialize, false); // ... 4. document.body.appendChild(script);
5. } 5. } else {
6. window.location = "http://.../b.html"; /* page redirection */
Figure 6: Browser-compatibility exception in forced execution. 7. }
901
# of # of samples whose obfuscations / evasions can be handled
Our solution is to load the target page in a separate frame so that Exploit Kits
samples Native run Rozzle [22] WebEval [18] J-F ORCE
J-F ORCE can continue exploring the current page. Since frames Angler 10 2/1 7/6 3/3 10 / 10
are isolated from each other, the effect of loading the destination RIG 10 5/0 7/2 5/0 10 / 10
page in a frame is functionally equivalent to a page redirection. In Nuclear 10 3/0 6/2 3/1 10 / 7
Magnitude 10 6/2 10 / 6 6/4 10 / 10
this particular example, J-F ORCE loads b.html in an iframe
SweetOrange 10 2/0 8/4 4/4 10 / 6
and thus is able to explore the behaviors in a.js.
Table 2: Comparing detection techniques on EKs.
4.5 Infinite Loop and Recursion
J-F ORCE may suffer from infinite loops or endless recursions # of Ad-injecting # of Info. leakage
because it ignores the loop and recursion conditions. To handle Total Ajax Script Injection Total Ajax Script Injection
this issue, we set an upper bound on the number of times a loop or Hulk [20] 195 29 166 14 9 5
Expector [37] 187 28 159 9 6 3
a recursive function can be invoked. For loops, J-F ORCE monitors WebEval [18] 158 15 143 8 5 3
the loop executions and makes sure that they do not go beyond J-F ORCE 322 45 277 30 21 9
the threshold. Otherwise, J-F ORCE forces the execution to skip the
loop. Similarly, for recursions, we use a threshold to limit recursion Table 3: The analysis result of 12,132 Chrome extensions.
depth. We make sure that whenever new stack frame is created, the
stack depth is smaller than the threshold.
number of the samples can be handled by each tool, in terms of ob-
fuscation handled and evasion passed. Since we know the ground
5. EVALUATION truth about deobfuscation, counting successful de-obfuscations is
J-F ORCE is implemented atop WebKit-r171233 with GTK+ port. straightforward. For evasions, if the exploitation entry point (e.g.
Our evaluation consists of two experiments. The first one is a sys- <object>) is reached, we say the evasion is detected.
tematic study on 50 EK samples and 12, 132 Chrome extensions The results show that J-F ORCE is able to handle more obfusca-
to see if J-F ORCE is able to detect (malicious) behaviors covered tions and evasions than others, hence can expose more hidden ma-
by sophisticated cloaking and obfuscation techniques. Also, since licious behaviors in EK attacks. In particular, J-F ORCE is signifi-
being able to explore more code is important, in the second exper- cantly effective in detecting evasions. While J-F ORCE outperforms
iment, we further quantify J-F ORCE’s performance by measuring other techniques, it misses a few evasions in Nuclear and SweetOr-
the coverage and the overhead on 100 real-world JavaScript pro- ange. We manually inspected these cases and found that they use
grams. All experiments are performed on a machine with an Intel Visual Basic (VB) scripts which are not currently supported by J-
Core i7 3.40 GHz CPU and 12 GB RAM running Ubuntu 14.04 F ORCE. However, our design is general and can be implemented
LTS. on VB scripts too.
5.1 Detecting Suspicious Hidden Behaviors 5.1.2 Detecting Ads Injections in Chrome Extensions
Browser extensions are commonly used nowadays to enhance
5.1.1 Detecting Obfuscations and Evasions in EKs user experience and thus becoming a target of adversaries. Several
We have collected 50 EK samples from various sources [1, 2], recent work [20, 18, 37] have been proposed to analyze extensions.
and classified them based on the underlying EKs, namely Angler, In this section, we show how J-F ORCE can effectively disclose sus-
RIG, Nuclear, Magnitude, SweetOrange. Although different, we picious behaviors in Chrome extensions.
observed they all share similar mechanisms listed as follows: We crawled and obtained 12,132 extensions from Chrome Web
• Obfuscation. Obfuscation conceals program functionalities Store [5] in July 2016. The analysis is done offline. As the JavaScript
using string operations to make detecting malware challeng- APIs used in extensions are slightly different from those in web
ing. In EK, obfuscation technique is used more than once applications, we enhance J-F ORCE to support such Chrome APIs
throughout multiple layers of code injection. (e.g., chrome.browserAction.onClicked). In this exper-
• Evasion. To minimize the possibility of being caught (e.g., iment, we are particularly interested in detecting ad-injections and
by honey-pot based approaches), EK only invokes the ma- information leaks. We also compare with recent work on Chrome
licious logic when it satisfies certain conditions. Specifi- extension analysis [20, 18, 37].
cally, EK usually scans visitors’ system (e.g. the signatures Table 3 summarizes the experiment results. J-F ORCE detected
of browsers, extensions, etc.) before moving on to the next 322 extensions that inject advertisement, where 277 deliver ad con-
stage. An example is shown in Fig. 1 in Sec. 2. tents using script injections and the remaining ones bring in ads via
• Exploiting Vulnerabilities. EK is designed to exploit partic- Ajax. Comparing to other techniques, J-F ORCE is able to find 195
ular vulnerabilities in browsers or add-ons by hijacking the more ad-injecting extensions, which confirms its effectiveness of
control flow and elevating permissions. The typical targets of handling cloaking and fingerprinting techniques. In addition, J-
such exploitation are Adobe Flash, MS Silverlight and Java F ORCE detected 30 extensions that send out sensitive information
runtime as well as browsers themselves. such as passwords and cookies via Ajax, while other techniques
• Payload Delivery. As the last step, a malicious binary is can detect at most 14 of them.
downloaded and executed without user’s consent. Ransomware [7] Table 4 presents the statistics of the Chrome extension execution
and click fraud [6] are two common examples. analysis. We report the minimum, average and maximum number
As J-F ORCE focuses on detecting malicious JavaScript behav- of JavaScript IR instructions, script injections, Ajax requests, eval
iors, only the JavaScript parts (obfuscation and evasion) are function invocations, event handlers and page redirections observed
included for evaluation. Analyzing non-JavaScript code, such as in exploring one extension. The results show that J-F ORCE can
exploiting vulnerabilities in the web browser or plug-ins, is beyond exercise more instructions and discover more behaviors than the
the scope of this paper. The results of experiments on 50 EK sam- native run. We also report the number of runs required by J-F ORCE
ples (10 for each EK type) are presented in Table 2. It shows the to cover all instructions (using the L-path search strategy explained
902
JavaScript IR Script Injections Ajax Eval Event Handlers Redirections Handled Crashes # of Runs
avg min max avg min max avg min max avg min max avg min max avg min max avg min max avg min max
J-F ORCE 1, 478 10 31, 248 0.71 0 28 0.21 0 5 0.27 0 10 1.57 0 19 0.15 0 5 2.74 0 117 11.32 1 609
Native run 406 10 14, 151 0.46 0 13 0.03 0 2 0.15 0 8 0.85 0 12 0.02 0 2 N/A N/A
in Sec. 3.2). We show the number of potential crashes caused by ternal script included at line 1 must be blocked by an adblocker,
the forced execution. We observed 2.74 crashes per extension on which is highly dependent on the execution environment. If the
average and they are mostly caused by missing objects and DOM adblocker has not been configured correctly or the URL of the ex-
elements. All of them are handled correctly using the approach ternal resource is not on the blacklist anymore, dynamic analysis
discussed in Sec. 4. cannot unveil the stealthy operations either.
By contrast, J-F ORCE decouples the dependencies on the en-
5.1.3 Case Study - Anti-adblocker vironment and hence allows us to effectively and deterministically
Unlike traditional programs, web applications have various ex- observe unusual behaviors. On the left hand side of Fig. 8, we com-
ternal dependences. For example, they can navigate the execution pare the control flow graphs that highlight the differences between
depending on browsers environment settings. They can download J-F ORCE and dynamic analysis based approaches. J-F ORCE is able
and load different external JavaScript on the fly from third parties to explore both paths while the dynamic analysis only covers one
during executions. Therefore, although it is possible mutating in- path. As such, J-F ORCE is able to discover the real ads contents by
put values may change the execution paths, in general, it is highly forced execution without requiring complicated system settings to
nontrivial or even infeasible for an automatic exploration tool to actually trigger the logic in traditional dynamic approaches.
satisfy the triggering conditions of the execution environment and More importantly, through J-F ORCE, we can uncover the actual
third party scripts. In this case study, we showcase a real-world values of function parameters (the right side of Fig. 8) and track
anti-adblocker [4] to demonstrate how J-F ORCE bypasses sophisti- the origin of suspicious values. With such capabilities (especially
cated predicates and thus can be helpful for understanding stealthy the hidden contents that can only be obtained dynamically), it is
program behaviors. straightforward to conclude the ads are included in the image file.
Ad-blocker (e.g., [3]) is a piece of software that allows clients to
roam the web without encountering any Ads. In particular, it uti- 5.2 Efficiency
lizes network control and in-page manipulation to help users block As described in Sec. 3.2, J-F ORCE can be configured to im-
advertisements loaded from ad-network. As many content publish- prove coverage on instructions (the L-path strategy) or paths (the
ers make their primary legitimate income from Ads, there are grow- E-path strategy). To measure its efficiency, we extracted 100 exam-
ing demands for delivering ads even the ad-blockers are running in ples (from Alexa.com) and evaluate J-F ORCE on these real-world
client browsers. As a result, anti-adblockers have been developed JavaScript programs. We compare J-F ORCE with Jalangi, a con-
and deployed by publishers on their websites. Anti-adblockers are colic JavaScript execution engine [32], which is one of the closest
usually scripts delivered by publishers to detect if adblockers are alternate approaches available at present.
enabled in the client browsers. Once found, it either hides the con- Fig. 9 presents the code coverage comparison results. The num-
tent or delivers the ads by circumventing the ads filters. ber of branches of the benchmarks varies from 109 to 1, 200. In
Fig. 8 presents a simplified version of a popular anti-adblocker Fig.9, the JavaScript benchmarks on the X-axis are sorted by the
BlockAdblock [4], where the arrows denote important call edges. It branch count in ascending order. The result shows that, on aver-
first detects if an adblocker is enabled on the client-side and loads age, J-F ORCE is able to cover 95% of the code (the same result
the real ads contents that are delivered as an image. In particular, for both exploration strategies), which is significantly more than
line 1 includes an external script (“advertising.js”). If it can be Jalangi (less than 68%). We found that the main reason for the im-
successfully loaded, variable __haz will be set to false. If an provement is that the concolic execution based approach does not
adblocker presents, the script will not be blocked and the value of explore the code in event and timer handlers. In addition, Jalangi
__haz remains undefined. Therefore, BlockAdblock can tell if often fails to handle complex arithmetic operations such as division
an adblocker is running by checking the value of __haz. At line 4, and modulo. By contrast, J-F ORCE does not suffer from such lim-
it invokes function __ac() and defines the function to be invoked itation and is able to expand its analysis scope to event and excep-
for the next step. Depending on the presence of an adblocker, it will tion handlers. Besides, J-F ORCE does not miss conditional blocks
invoke a function (defined in lines 13-23) or do nothing. In function as our exploration technique is designed to cover both branches by
__dec, it loads an image, where its URL is specified at line 3 and switching branch outcomes. We also manually inspect the scenar-
further transformed at line 4. Interestingly, instead of displaying the ios where J-F ORCE fails to cover all instructions. We found that
image, it uses this image as a circumvention of ad-blocking rules this is mainly due to coding errors in the sample JavaScript pro-
and loads the raw data of the images. At line 21, function __cb grams.
is invoked, which creates a div element and displays the HTML Beside the coverage, we also measure the runtime performance
hidden in the image at line 27. of J-F ORCE. Fig. 10 summarizes the comparison result of the over-
It is highly nontrivial for static analysis based approaches to pre- heads collected during the coverage test. For each approach, the
cisely analyze such complicated call relations, as it requires ad- overhead is normalized to the native run. The result shows that the
vanced alias and string analysis (e.g., the operations in line 4 and overhead of J-F ORCE is 2-8x (2-300x for E-path) whereas Jalangi
20). More importantly, as the ads contents are actually hidden in has much higher overhead 10-10, 000x. Observe that such a differ-
an image, they may not even be in the analysis scope. As a result, it ence is caused by the fact that concolic execution based approaches
is very unlikely that the static analysis can handle such cases. An- may not scale well with the number of branches, showing expo-
other option is to actually run the program. However, one important nentially increasing overhead. Particularly, generating and solving
triggering condition of the secret loading procedure is that the ex- path constraints is more expensive than mutating branch outcomes.
903
J-Force 1 <script src=“http://.../advertising.js” ..></script> // “var __haz = false;”
2 ... 24 __cb = function (s) {
if (typeof …) 25 …
3 __durl = ‘//.../hallon-p12065a-:r:.gif’;
4 __ac(function(){ __dec(__durl.replace(“:r:”, __s(5, 12)), __cb); 26 _new = d.createElement(‘div’);
return f(); 5 }); 27 _new.innerHTML = s.html;
… 28 k.insertBefore(_new, k);
… 6 function __ac(f) { 13 function __dec(src, callback) { 29 …
7 … 14 i = new Image();
8 if (typeof __haz === ‘undefined’) 15 i.onload = function() {
Native 9 return f(); 16 …
exec 10 ... 17 t.drawImage(i, 0, 0); J-Force
if (typeof …)
11 return; 18 b = __p24(t.getImageData(...).data);
19 for (...) callback(s)
12 }
return f(); 20 if (b[x]) s+= str.fromCharCode(b[x]); s: “..html: <div class=\fram
\></div>\n<divclass=\k3rwp
21 callback(s);
j9jwhynv\>\n<div class=\gb
… 22 } qfwapg\>\n<span class=\g
23 i.src = src; bqfwaabemdey ….</div>”
100
support dynamic nature and scale to real-world applications built
J-Force Native atop various JavaScript frameworks.
80 Concolic
Coverage (%)
060
evasive malware by comparing with a large amount of JavaScript
0 20 40 60 80 100
40 JS files
collected in advance. It heavily resorts to the result of pre-classification
by oracle, and may not be robust against newly crafted malware
200020
Figure 9: Coverage
Concolic of J-F ORCE in comparison with native run (e.g., zero-day exploit). MineSpider [34] extracts URLs from JS
J-Force(L-path)
Overhead (times)
and concolic
0
1500 0
execution.
20
J-Force(E-path) 40 60 80 100 snippets equipped with evasion techniques that performs drive-by
JS files
download attacks. It collects execution paths relevant to redirec-
1000
2000
Concolic tions using program slicing methods. While it is useful to track
J-Force(L-path)
page redirections, it is not able to handle the dynamic remote code
Overhead (times)
500
1500 J-Force(E-path)
injection using iframe or simple <script> tag. Lekies et al. [23]
0
1000 show attack methods enabled by the object scoping and dynamic
0 20 40 60 80 100
JS files nature of JavaScript. They investigate a set of high-ranked do-
500
mains and verify that those are vulnerable to Cross-Site Script In-
0
0 20 40 60 80 100
clusion(XSSI) attacks. ScriptInspector [39] examines third-party
JS files script injection to restrict accesses to critical resources. This is
Figure 10: Performance overhead of J-F ORCE in comparison achieved by allowing site administrators to establish their own se-
with concolic execution. curity policies. WebCapsule [25] records and replays web contents
executions for forensic analysis. It records and all non-deterministic
inputs to the core web rendering engine including user interactions.
6. RELATED WORK RAIL [12] can verify security patches of web applications by rerun-
Multiple Path Execution. The concept of forced execution was ning patched web applications with previous buggy inducing inputs
employed in previous researches [26, 15, 36, 19]. Although the such as exploits. The system can tolerate state divergences caused
concept has been applied in various domains, such as native binary by the patches. Unlike the record and replay approaches, J-Force
programs [26], mobile apps [15, 19], and identifying kernel rootk- explores all possible paths to reveal evasive malicious logics which
its [36], our work is the first to propose the forced execution en- are difficult to expose.
gine for JavaScript to the best of our knowledge. Furthermore, the Browser Extensions. Hulk [20] analyzes Chrome browser exten-
challenges that J-F ORCE solves, such as handling missing object- sions and detects malicious (or suspicious) behaviors, such as ad-
s/DOM, handling event/exception handlers and more (Sec. 4) are injecting and information leak. Expector [37] tries to figure out
unique to JavaScript and are not proposed (or solved) by previous the correlation between malvertising and plug-ins. It shows that,
work. Rozzle [22] also places emphasis on analyzing self-revealing in a condition where a specific extension is working, malvertising
program behaviors. It explores multiple execution paths with sin- is more likely to appear. WebEval [18] inspects Chrome exten-
gle execution. However, it is done via a different approach which sions upon the combination of static and dynamic analysis. In order
is based on symbolic values. More importantly, they have limited to trigger malicious activities, it sets up simulations by recording
support for program faults and exceptions handling. By contrast, complex interactions between web pages and network events. Ob-
our tool can explore all feasible paths without being interrupted by serve that though such techniques have their own way to increase
exceptions. Symbolic (or concolic) execution has been applied to coverage and unveil hidden malicious actions, it would not be suf-
analyze JavaScript based Web applications [32, 31, 33]. Due to ficient to induce all possible behaviors.
the limitations in underlying constraint solvers, it is challenging to
904
7. DISCUSSION applications with retroactive auditing. In OSDI, pages
As our solution aims to expose malware hidden under a certain 555–569, 2014.
program path, detecting data driven attacks is still challenging. Al- [13] M. Cova, C. Kruegel, and G. Vigna. Detection and analysis
though diverting control flow by the forced execution occasionally of drive-by-download attacks and malicious javascript code.
breaks the program semantics, due to the stealthy pattern and con- In Proceedings of the 19th international conference on World
ditional nature of the hidden code, we are confident that J-F ORCE wide web, pages 281–290. ACM, 2010.
is able to disclose most of evasive malware in the wild. Since J- [14] C. Curtsinger, B. Livshits, B. G. Zorn, and C. Seifert. Zozzle:
F ORCE is currently designed to detect client-side JavaScript mal- Fast and precise in-browser javascript malware detection. In
ware, handling cloaking schemes in the server-side scripts (e.g. USENIX Security Symposium, pages 33–48, 2011.
SQL, PHP, etc. [30]) is beyond the scope of this paper. [15] Z. Deng, B. Saltaformaggio, X. Zhang, and D. Xu. iris:
Vetting private api abuse in ios applications. In Proceedings
8. CONCLUSION of the 22nd ACM SIGSAC Conference on Computer and
Communications Security, pages 44–56. ACM, 2015.
In this paper, we proposed J-F ORCE, a forced execution engine
for JavaScript to expose hidden and even malicious program behav- [16] L. Gong, M. Pradel, M. Sridharan, and K. Sen. Dlint:
iors. J-F ORCE explores all possible execution paths by mutating Dynamically checking bad coding practices in javascript. In
the outcomes of branch predicates. We solved multiple technical Proceedings of the 2015 International Symposium on
challenges and make J-F ORCE a practical, robust and crash-free Software Testing and Analysis, pages 94–105. ACM, 2015.
tool. We validate the efficacy of J-F ORCE through an extensive set [17] L. Invernizzi and P. M. Comparetti. Evilseed: A guided
of experiments. J-F ORCE has been evaluated on 50 exploits of pop- approach to finding malicious web pages. In Security and
ular exploit kits and more than 12, 000 Chrome extensions. It suc- Privacy (SP), 2012 IEEE Symposium on, pages 428–442.
cessfully unveiled the hidden code in 41 exploits and detected more IEEE, 2012.
than 300 Chrome extensions injecting advertisements. The exper- [18] N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis,
iments on 100 real-world JavaScript samples show that J-F ORCE N. Provos, M. A. Rajab, and K. Thomas. Trends and lessons
is able to achieve 95% code coverage and perform 2-8x better than from three years fighting malicious extensions. In 24th
existing approaches. USENIX Security Symposium (USENIX Security 15), pages
579–593, 2015.
9. ACKNOWLEDGMENTS [19] R. Johnson and A. Stavrou. Forced-path execution for
android applications on x86 platforms. In Software Security
We thank the anonymous reviewers for their constructive com- and Reliability-Companion (SERE-C), 2013 IEEE 7th
ments. This research was supported, in part, by DARPA under con- International Conference on, pages 188–197. IEEE, 2013.
tract FA8650-15-C-7562, NSF under awards 1409668, 1320444,
[20] A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna,
and 1320306, ONR under contract N000141410468, and Cisco
and V. Paxson. Hulk: Eliciting malicious behavior in browser
Systems under an unrestricted gift. Any opinions, findings, and
extensions. In Proceedings of the 23rd Usenix Security
conclusions in this paper are those of the authors only and do not
Symposium, 2014.
necessarily reflect the views of our sponsors.
[21] A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and
G. Vigna. Revolver: An automated approach to the detection
10. REFERENCES of evasive web-based malware. In USENIX Security, pages
[1] http://malware.dontneedcoffee.com. 637–652. Citeseer, 2013.
[2] http://http://malware-traffic-analysis.net. [22] C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert. Rozzle:
[3] Adblock plus. https://adblockplus.org. De-cloaking internet malware. In Security and Privacy (SP),
[4] Blockadblock. http://blockadblock.com. 2012 IEEE Symposium on, pages 443–457. IEEE, 2012.
[5] Chrome Web Store. https://chrome.google.com/webstore. [23] S. Lekies, B. Stock, M. Wentzel, and M. Johns. The
[6] Clickfraud. http://digitalmarketingmagazine.co.uk/digital- unexpected dangers of dynamic javascript. In 24th USENIX
marketing-advertising/the-crooks-willing-to-put-you-out-of- Security Symposium (USENIX Security 15), pages 723–735,
business-for-5/1740. Washington, D.C., Aug. 2015. USENIX Association.
[7] Cryptolocker: What is and how to avoid it. [24] E. Mutlu, S. Tasiran, and B. Livshits. Detecting javascript
http://www.pandasecurity.com/mediacenter/malware/cryptolocker/. races that matter. In Proceedings of the 2015 10th Joint
[8] JSHint. http://jshint.com. Meeting on Foundations of Software Engineering,
ESEC/FSE 2015, pages 381–392, New York, NY, USA,
[9] JSLint. http://www.jslint.com.
2015. ACM.
[10] Malvertising, Exploit Kits, ClickFraud & Ransomware: A
[25] C. Neasbitt, B. Li, R. Perdisci, L. Lu, K. Singh, and K. Li.
Thriving Underground Economy.
Webcapsule: Towards a lightweight forensic engine for web
https://www.zscaler.com/blogs/research/malvertising-
browsers. In Proceedings of the 22nd ACM SIGSAC
exploit-kits-clickfraud-ransomware-thriving-underground-
Conference on Computer and Communications Security,
economy.
pages 133–145. ACM, 2015.
[11] Y. Cao, X. Pan, Y. Chen, and J. Zhuge. Jshield: towards
[26] F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su.
real-time and vulnerability-based detection of polluted
X-force: Force-executing binary programs for security
drive-by download attacks. In Proceedings of the 30th
applications. In Proceedings of the 2014 USENIX Security
Annual Computer Security Applications Conference, pages
Symposium, San Diego, CA (August 2014), 2014.
466–475. ACM, 2014.
[27] P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle:
[12] H. Chen, T. Kim, X. Wang, N. Zeldovich, and M. F.
A defense against heap-spraying code injection attacks. In
Kaashoek. Identifying information disclosure in web
USENIX Security Symposium, pages 169–186, 2009.
905
[28] V. Raychev, M. Vechev, and A. Krause. Predicting program drive-by download attacks. In Computer Software and
properties from big code. In ACM SIGPLAN Notices, Applications Conference (COMPSAC), 2015 IEEE 39th
volume 50, pages 111–124. ACM, 2015. Annual, volume 2, pages 444–449. IEEE, 2015.
[29] V. Raychev, M. Vechev, and M. Sridharan. Effective race [35] D. Y. Wang, S. Savage, and G. M. Voelker. Cloak and
detection for event-driven programs. In ACM SIGPLAN dagger: dynamics of web search cloaking. In Proceedings of
Notices, volume 48, pages 151–166. ACM, 2013. the 18th ACM conference on Computer and communications
[30] K. Sadalkar, R. Mohandas, and A. R. Pais. Model based security, pages 477–490. ACM, 2011.
hybrid approach to prevent sql injection attacks in php. In [36] J. Wilhelm and T.-c. Chiueh. A forced sampled execution
Security Aspects in Information Technology, pages 3–15. approach to kernel rootkit identification. In International
Springer, 2011. Workshop on Recent Advances in Intrusion Detection, pages
[31] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and 219–235. Springer, 2007.
D. Song. A symbolic execution framework for javascript. In [37] X. Xing, W. Meng, B. Lee, U. Weinsberg, A. Sheth,
Security and Privacy (SP), 2010 IEEE Symposium on, pages R. Perdisci, and W. Lee. Understanding malvertising through
513–528. IEEE, 2010. ad-injecting browser extensions. In Proceedings of the 24th
[32] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A International Conference on World Wide Web, pages
selective record-replay and dynamic analysis framework for 1286–1295. International World Wide Web Conferences
javascript. In Proceedings of the 2013 9th Joint Meeting on Steering Committee, 2015.
Foundations of Software Engineering, pages 488–498. ACM, [38] Y. Zheng, T. Bao, and X. Zhang. Statically locating web
2013. application bugs caused by asynchronous calls. In
[33] K. Sen, G. Necula, L. Gong, and W. Choi. Multise: Proceedings of the 20th international conference on World
Multi-path symbolic execution using value summaries. In wide web, pages 805–814. ACM, 2011.
Proceedings of the 2015 10th Joint Meeting on Foundations [39] Y. Zhou and D. Evans. Understanding and monitoring
of Software Engineering, pages 842–853. ACM, 2015. embedded web scripts. In Security and Privacy (SP), 2015
[34] Y. Takata, M. Akiyama, T. Yagi, T. Hariu, and S. Goto. IEEE Symposium on, pages 850–865. IEEE, 2015.
Minespider: Extracting urls from environment-dependent
906