Automated JavaScript Unit Test Generation
Automated JavaScript Unit Test Generation
Abstract—The event-driven and highly dynamic nature of such generated test cases, the tester still needs to manually
JavaScript, as well as its runtime interaction with the Document write many assertions, which is time and effort intensive.
Object Model (DOM) make it challenging to test JavaScript-based On the other hand, soft oracles target generic fault types
applications. Current web test automation techniques target and are limited in their fault finding capabilities. However,
the generation of event sequences, but they ignore testing the to be practically useful, unit testing requires strong oracles
JavaScript code at the unit level. Further they either ignore the
oracle problem completely or simplify it through generic soft
to determine whether the application under test executes
oracles such as HTML validation and runtime exceptions. We correctly.
present a framework to automatically generate test cases for Java- To address these two shortcomings, we propose an automated
Script applications at two complementary levels, namely events
and individual JavaScript functions. Our approach employs a
test case generation technique for JavaScript applications.
combination of function coverage maximization and function Our approach, called JS EFT (JavaScript Event and Func-
state abstraction algorithms to efficiently generate test cases. tion Testing) operates through a three step process. First, it
In addition, these test cases are strengthened by automatically
generated mutation-based oracles. We empirically evaluate the
dynamically explores the event-space of the application using
implementation of our approach, called JS EFT, to assess its a function coverage maximization method, to infer a test
efficacy. The results, on 13 JavaScript-based applications, show model. Then, it generates test cases at two complementary
that the generated test cases achieve a coverage of 68% and that levels, namely, DOM event and JavaScript functions. Our
JS EFT can detect injected JavaScript and DOM faults with a high technique employs a novel function state abstraction algorithm
accuracy (100% precision, 70% recall). We also find that JS EFT to minimize the number of function-level states needed for
outperforms an existing JavaScript test automation framework test generation. Finally, it automatically generates test oracles,
both in terms of coverage and detected faults. through a mutation-based algorithm.
Keywords—Test generation; oracles; JavaScript; DOM A preliminary version of this work appeared in a short
New Ideas paper [7]. In this current paper, we present the
I. I NTRODUCTION complete technique with conceptually significant improve-
ments, including detailed new algorithms (Algorithms 1–2), a
JavaScript plays a prominent role in modern web ap- fully-functional tool implementation, and a thorough empirical
plications. To test their JavaScript applications, developers analysis on 13 JavaScript applications, providing evidence of
often write test cases using web testing frameworks such as the efficacy of the approach.
S ELENIUM (GUI tests) and QU NIT (JavaScript unit tests).
Although such frameworks help to automate test execution, the This work makes the following main contributions:
test cases still need to be written manually, which is tedious
and time-consuming. • An automatic technique to generate test cases for Java-
Script functions and events.
Further, the event-driven and highly dynamic nature of • A combination of function converge maximization and
JavaScript, as well as its runtime interaction with the Docu- function state abstraction algorithms to efficiently gener-
ment Object Model (DOM) make JavaScript applications error- ate unit test cases;
prone [1] and difficult to test. • A mutation-based algorithm to effectively generate test
Researchers have recently developed automated test gen- oracles, capable of detecting regression JavaScript and
eration techniques for JavaScript-based applications [2], [3], DOM-level faults;
[4], [5], [6]. However, current web test generation techniques • The implementation of our technique in a tool called
suffer from two main shortcomings, namely, they: JS EFT, which is publicly available [8];
• An empirical evaluation to assess the efficacy of JS EFT
1) Target the generation of event sequences, which operate using 13 JavaScript applications.
at the event-level or DOM-level to cover the state space
of the application. These techniques fail to capture faults The results of our evaluation show that on average (1)
that do not propagate to an observable DOM state. As the generated test suite by JS EFT achieves a 68% JavaScript
such, they potentially miss this portion of code-level Java- code coverage, (2) compared to A RTEMIS, a feedback-directed
Script faults. In order to capture such faults, effective test JavaScript testing framework [2], JS EFT achieves 53% better
generation techniques need to target the code at the Java- coverage, and (3) the test oracles generated by JS EFT are able
Script unit-level, in addition to the event-level. to detect injected faults with 100% precision and 70% recall.
2) Either ignore the oracle problem altogether or simplify
it through generic soft oracles, such as W3C HTML II. R ELATED W ORK
validation [2], [5], or JavaScript runtime exceptions [2]. A
generated test case without assertions is not useful since Web application testing. Marchetto and Tonella [3] propose a
coverage alone is not the goal of software testing. For search-based algorithm for generating event-based sequences
to test Ajax applications. Mesbah et al. [9] apply dynamic 1 var currentDim=20;
2 function cellClicked() {
analysis to construct a model of the application’s state space, 3 var divTag = '<div id='divElem' />';
from which event-based test cases are automatically generated. 4 if($(this).attr('id') == 'cell0'){
In subsequent work [5], they propose generic and application- 5 $('#cell0').after(divTag);
specific invariants as a form of automated soft oracles for 6 $('div #divElem').click(setup);
7 }
testing A JAX applications. Our earlier work, JS ART [10], auto- 8 else if($(this).attr('id') == 'cell1'){
matically infers program invariants from JavaScript execution 9 $('#cell1').after(divTag);
traces and uses them as regression assertions in the code. Sen 10 $('div #divElem').click(function(){setDim(20)});
11 }
et al. [11] recently proposed a record and replay framework 12 }
called Jalangi. It incorporates selective record-replay as well
as shadow values and shadow execution to enable writing of 14 function setup() {
15 setDim(10);
heavy-weight dynamic analyses. The framework is able to 16 $('#startCell').click(start);
track generic faults such as null and undefined values 17 }
as well as type inconsistencies in JavaScript. Jensen et al. [12]
propose a technique to test the correctness of communication 19 function setDim(dimension) {
20 var dim=($('#endCell').width() + $('#endCell').height←
patterns between client and server in A JAX applications by ()))/dimension;
incorporating server interface descriptions. They construct 21 currentDim += dim;
server interface descriptions through an inference technique 22 $('#endCell').css('height', dim+'px');
23 return dim;
that can learn communication patterns from sample data. 24 }
Saxena et al. [6] combine random test generation with the use
of symbolic execution for systematically exploring a JavaScript 26 function start() {
27 if(currentDim > 40)
application’s event space as well as its value space, for security 28 $(this).css('height', currentDim+'px');
testing. Our work is different in two main aspects from these: 29 else $(this).remove();
(1) they all target the generation of event sequences at the 30 }
DOM level, while we also generate unit tests at the JavaScript 32 $document.ready(function() {
code level, which enables us to cover more and find more 33 ...
faults, and (2) they do not address the problem of test oracle 34 $('#cell0').click(cellClicked);
generation and only check against soft oracles (e.g., invalid 35 $('#cell1').click(cellClicked);
36 });
HTML). In contrast, we generate strong oracles that capture
application behaviours, and can detect a much wider range of Fig. 1. JavaScript code of the running example.
faults.
with the DOM.
Perhaps the most closely related work to ours is A RTEMIS
[2], which supports automated testing of JavaScript applica- III. C HALLENGES AND M OTIVATION
tions. A RTEMIS considers the event-driven execution model
of a JavaScript application for feedback-directed testing. In In this section, we illustrate some of the challenges asso-
this paper, we quantitatively compare our approach with that ciated with test generation for JavaScript applications.
of A RTEMIS (Section V). Figure 1 presents a snippet of a JavaScript game application
Oracle generation. There has been limited work on oracle that we use as a running example throughout the paper. This
generation for testing. Fraser et al. [13] propose µTEST, simple example uses the popular jQuery library [18] and
which employs a mutant-based oracle generation technique. contains four main JavaScript functions:
It automatically generates unit tests for Java object-oriented 1) cellClicked is bound to the event-handlers of DOM
classes by using a genetic algorithm to target mutations elements with IDs cell0 and cell1 (Lines 34–35).
with high impact on the application’s behaviour. They further These two DOM elements become available when the
identify [14] relevant pre-conditions on the test inputs and DOM is fully loaded (Line 32). Depending on the element
post-conditions on the outputs to ease human comprehension. clicked, cellClicked inserts a div element with ID
Differential test case generation approaches [15], [16] are sim- divElem (Line 3) after the clicked element and makes
ilar to mutation-based techniques in that they aim to generate it clickable by attaching either setup or setDim as its
test cases that show the difference between two versions of a event-handler function (Lines 5–6, 9–10).
program. However, mutation-based techniques such as ours, do 2) setup calls setDim (Line 15) to change the value of
not require two different versions of the application. Rather, the the global variable currentDim. It further makes an
generated differences are in the form of controllable mutations element with ID startCell clickable by setting its
that can be used to generate test cases capable of detecting event- handler to start (Line 16).
regression faults in future versions of the program. Staats et 3) setDim receives an input variable. It performs some
al. [17] address the problem of selecting oracle data, which computations to set the height value of the css
is formed as a subset of internal state variables as well as property of a DOM element with ID endCell and the
outputs for which the expected values are determined. They value of currentDim (Lines 20–22). It also returns the
apply mutation testing to produce oracles and rank the inferred computed dimension.
oracles in terms of their fault finding capability. This work is 4) start is called at runtime when the element with ID
different from ours in that they merely focus on supporting startCell is clicked (Line 16), which either updates
the creation of test oracles by the programmer, rather than the width dimension of the element on which it was
fully automating the process of test case generation. Further, called, or removes the element (Lines 27-29).
(1) they do not target JavaScript; (2) in addition to the code-
level mutation analysis, we propose DOM-related mutations There are four main challenges in testing JavaScript appli-
to capture error-prone [1] dynamic interactions of JavaScript cations.
The first challenge is that a fault may not immediately prop- 1
agate into a DOM-level observable failure. For example, if the Instrument Crawl
Collect Maximize
‘+’ sign in Line 21 is mistakenly replaced by ‘-’, the affected Trace Coverage
help to detect the fault in this case. Web Run Collect Extract Abstract Func.
App Instrument
tests Trace Function State States
The second challenge is related to fault localization; even
if the fault propagates to a future DOM state and a DOM-level Extract
DOM State
test case detects it, finding the actual location of the fault is
challenging for the tester as the DOM-level test case is agnostic 3
of the JavaScript code. However, a unit test case that targets
DOM Func.
individual functions, e.g., setDim in this running example, Oracles
Diff
Oracles
Diff
helps a tester to spot the fault, and thus easily resolve it.
The third challenge pertains to the event-driven dynamic Mutate Extract
DOM State
nature of JavaScript, and its extensive interaction with the
DOM resulting in many state permutations and execution Instrument
Run Collect Extract
tests Trace Function State
paths. In the initial state of the example, clicking on cell0
or cell1 takes the browser to two different states as a result
of the if-else statement in Lines 4 and 8 of the function
cellClicked. Even in this simple example, expanding Fig. 2. Overview of our test generation approach.
either of the resulting states has different consequences due to
different functions that can be potentially triggered. Executing An overview of the technique is depicted in Figure 2. At
either setup or setDim in Lines 6 and 10 results in different a high level, our approach is composed of three main steps:
execution paths, DOM states, and code coverage. It is this 1) In the first step (Section IV-A), we dynamically explore
dynamic interaction of the JavaScript code with the DOM (and various states of a given web application, in such a way
indirectly CSS) at runtime that makes it challenging to generate as to maximize the number of functions that are covered
test cases for JavaScript applications. throughout the program execution. The output of this
The fourth important challenge in unit testing JavaScript initial step is a state-flow graph (SFG) [5], capturing the
functions that have DOM interactions, such as setDim, is explored dynamic DOM states and event-based transitions
that the DOM tree in the state expected by the function, has between them.
to be present during unit test execution. Otherwise the test will 2) In the second step (Section IV-B), we use the inferred
fail due to a null or undefined exception. This situation SFG to generate event-based test cases. We run the gen-
arises often in modern web applications that have many DOM erated tests against an instrumented version of the applica-
interactions. tion. From the execution trace obtained, we extract DOM
element states as well as JavaScript function states at the
entry and exit points, from which we generate function-
IV. A PPROACH level unit tests. To reduce the number of generated test
Our main goal in this work is to generate client-side test cases to only those that are constructive, we devise a state
cases coupled with effective test oracles, capable of detecting abstraction algorithm that minimizes the number of states
regression JavaScript and DOM-level faults. Further, we aim by selecting representative function states.
to achieve this goal as efficiently as possible. Hence, we make 3) To create effective test oracles for the two test case
two design decisions. First, we assume that there is a finite levels, we automatically generate mutated versions of the
amount of time available to generate test cases. Consequently application (Section IV-C). Assuming that the original
we guide the test generation to maximize coverage under a version of the application is fault-free, the test oracles
given time constraint. The second decision is to minimize the are then generated at the DOM and JavaScript code levels
number of test cases and oracles generated to only include by comparing the states traced from the original and the
those that are essential in detecting potential faults. Conse- mutated versions.
quently, to examine the correctness of the test suite generated,
the tester would only need to examine a small set of assertions,
which minimizes their effort. A. Maximizing Function Coverage
Our approach generates test cases and oracles at two In this step, our goal is to maximize the number of
complementary levels: functions that can be covered, while exercising the program’s
event space. To that end, our approach combines static and
DOM-level event-based tests consist of DOM-level event se- dynamic analysis to decide which state and event(s) should be
quences and assertions to check the application’s be- selected for expansion to maximize the probability of cover-
haviour from an end-user’s perspective. ing uncovered JavaScript functions. While exploring the web
Function-level unit tests consist of unit tests with assertions application under test, our function coverage maximization
that verify the functionality of JavaScript code at the algorithm selects a next state for exploration, which has the
function level. maximum value of the sum of the following two metrics:
1. Potential Uncovered Functions. This pertains to the total JavaScript function-level unit testing. To generate unit tests
number of unexecuted functions that can potentially be visited that target JavaScript functions directly (as opposed to event-
through the execution of DOM events in a given DOM state triggered function executions), we log the state of each func-
si . When a given function fi is set as the event-handler of tion at their entry and exit point, during execution. To that end,
a DOM element d ∈ si , it makes the element a potential we instrument the code to trace various entities. At the entry
clickable element in si . This can be achieved through various point of a given JavaScript function we collect (1) function
patterns in web applications depending on which DOM event parameters including passed variables, objects, functions, and
model level is adopted. To calculate this metric, our algorithm DOM elements, (2) global variables used in the function,
identifies all JavaScript functions that are directly or indirectly and (3) the current DOM structure just before the function
attached to DOM elements as event handlers, in si through is executed. At the exit point of the JavaScript function and
code instrumentation and execution trace monitoring. before every return statement, we log the state of the (1)
return value of the function, (2) global variables that have been
2. Potential Clickable Elements. The second metric, used to accessed in that function, and (3) DOM elements accessed
select a state for expansion, pertains to the number of DOM (read/written) in the function. At each of the above points, our
elements that can potentially become clickable elements. If instrumentation records the name, runtime type, and actual
the event-handlers bound to those clickables are triggered, values. The dynamic type is stored because JavaScript is a
new (uncovered) functions will be executed. To obtain this dynamically typed language, meaning that the variable types
number, we statically analyze the previously obtained potential cannot be determined statically. Note that complex JavaScript
uncovered functions within a given state in search of such objects can contain circular or multiple references (e.g., in
elements. JSON format). To handle such cases, we perform a de-
While exploring the application, the next state for ex- serialization process in which we replace such references by
pansion is selected by adding the two metrics and choosing an object in the form of $ref : P ath, where P ath denotes
the state with the highest sum. The procedure repeats the a JSON P ath string1 that indicates the target path of the
aforementioned steps until the designated time limit, or state reference.
space size is reached.
In addition to function entry and exit points, we
In the running example of Figure 1, in the initial state, log information required for calling the function from
clicking on elements with IDs cell0 and cell1 results in the generated test cases. JavaScript functions that are
two different states due to an if-else statement in Lines 4 accessible in the public scope are mainly defined
and 8 of cellClicked. Let’s call the state in which a DIV in (1) the global scope directly (e.g., function
element is located after the element with ID cell0 as s0 , and f(){...}), (2) variable assignments in the global scope
the state in which a DIV element is placed after the element (e.g., var f = function(){...}), (3) constructor
with ID cell1 as s1 . If state s0 , with the clickable cell0, is functions (e.g, function constructor() {this.
chosen for expansion, function setup is called. As shown in member= function(){...}}), and (4) prototypes (e.g.,
Line 15, setup calls setDim, and thus, by expanding s0 both Constructor.prototype.f= function() {...}).
of the aforementioned functions get called by a single click. Functions in the first and second case are easy to call from test
Moreover, a potential clickable element is also created in Line cases. For the third case, the constructor function is called via
16, with start as the event-handler. Therefore, expanding s1 the new operator to create an object type, which can be used
results only in the execution of setDim, while expanding s0 to access the object’s properties (e.g., container=new
results in the execution of functions setup, setDim, and a Constructor(); container.member();). This
potential execution of start in future states. At the end of allows us to access the inner function, which is a member
this step, we obtain a state-flow graph of the application that of the constructor function in the above example. For
can be used in the next test generation step. the prototype case, the function can be invoked through
container.f() from a test case.
B. Generating Test Cases
Going back to our running example in Figure 1, at the entry
In the second step, our technique first extracts sequences point of setDim, we log the value and type of both the input
of events from the inferred state-flow graph. These sequences parameter dimension and global variable currentDim,
of events are used in our test case generation process. We which is accessed in the function. Similarly, at the exit point,
generate test cases at two complementary levels, as described we log the values and types of the returned variable dim and
below. currentDim.
DOM-level event-based testing. To verify the behaviour In addition to the values logged above, we need to capture
of the application at the user interface level, each event the DOM state for functions that interact with the DOM. This
path, taken from the initial state (Index) to a leaf node is to address the fourth challenge outlined in Section III. To
in the state-flow graph, is used to generate DOM event- mitigate this problem, we capture the state of the DOM just
based test cases. Each extracted path is converted into a before the function starts its execution, and include that as a
JU NIT S ELENIUM-based test case, which executes the se- test fixture [19] in the generated unit test case.
quence of events, starting from the initial DOM state. Go-
ing back to our running example, one possible event se- In the running example, at the entry point of setDim, we
quence to generate is: $(‘#cell0’).click→$(‘div log the innerHTML of the current DOM as the function con-
#divElem’).click→$(‘#startCell’).click. tains several calls to the DOM, e.g., retrieving the element with
ID endCell in Line 22. We further include in our execution
To collect the required trace data, we capture all DOM trace the way DOM elements and their attributes are modified
elements and their attributes after each event in the test path is by the JavaScript function at runtime. The information that we
fired. This trace is later used in our DOM oracle comparison,
as explained in Section IV-C. 1 http://goessner.net/articles/JsonPath/
Algorithm 1: Function State Abstraction in lines 27 and 29 clearly takes the application into a
input : The set of function states sti ∈ STf for a given function f different DOM state. In this example, we need to include the
output: The obtained abstracted states set AbsStates states of the start function that result in different covered
begin branches, e.g., two different function states where the value
1 for sti ∈ STf do of the global variable currentDim at the entry point falls
2 L = 1; StSetL ← ∅
into different boundaries.
3 if B RN C OV L NS[sti ] = B RN C OV L NS[StSet]L
l=1 then
4 StSetL+1 ← sti Return value type: A variable’s type can change in Java-
5 L++ Script at runtime. This can result in changes in the expected
6 else outcome of the function. Going back to our example, if dim
7 StSetl ← sti ∪ StSetl is mistakenly assigned a string value before adding it to
8 K = L + 1; StSetK ← ∅ currentDim (Line 21) in function setDim, the returned
9 if DOMP ROPS[sti ] = DOMP ROPS [StSet]K k=L+1 || value of the function becomes the string concatenation of
RetType[sti ] = R ET T YPE[StSet]K
k=L+1 then the two values rather than the expected numerical addition.
10 StSetK+1 ← sti
11 K++
Accessed DOM properties: DOM elements and their prop-
erties accessed in a function can be seen as entry
12 else
13 StSetk ← stk ∪ StSetk states. Changes in such DOM entry states can affect
the behaviour of the function. For example, in line 29
14 while StSetK+L = ∅ do this keyword refers to the clicked DOM element of
15 SelectedSt ← S ELECT M AX S T(sti |sti ∩ StSetK+L
j=1 ) which function start is an event-handler. Assuming that
AbsStates.ADD(SelectedSt)
16
17 StSetK+L ← StSetK+L − SelectedSt currentDim ≤ 40, depending on which DOM element
18 return AbsStates
is clicked, by removing the element in line 29 the resulting
state of the function start differs. Therefore, we take into
consideration the DOM elements accessed by the function
as well as the type of accessed DOM properties.
log for accessed DOM elements includes the ID attribute, the Algorithm 1 shows our function state abstraction algorithm.
XPath position of the element on the DOM tree, and all the The algorithm first collects covered branches of individual
modified attributes. Collecting this information is essential for functions per entry state (B RN C OV L NS[sti ] in Line 3). Each
oracle generation in the next step. We use a set to keep the function’s states exhibiting same covered branches are cate-
information about DOM modifications, so that we can record gorized under the same set of states (Lines 4 and 7). StSetl
the latest changes to a DOM element without any duplication corresponds to the set of function states, which are classified
within the function. For instance, we record ID as well as both according to their covered branches, where l = 1, ..., L and
width and height properties of the endCell element. L is the number of current classified sets in covered branch
Once our instrumentation is carried out, we run the gener- category. Similarly, function states with the same accessed
ated event sequences obtained from the state-flow graph. This DOM characteristics as well as return value type, are put into
way, we produce an execution trace that contains: the same set of states (Lines 10 and 13). StSetk corresponds
to the set of function states, which are classified according to
• Information required for preparing the environment for their DOM/return value type, where k = 1, ..., K and K is
each function to be executed in a test case, including its the number of current classified sets in that category. After
input parameters, used global variables, and the DOM tree classifying each function’s states into several sets, we cover
in a state that is expected by the function; each set by selecting one of its common states. The state
• Necessary entities that need to be assessed after the func- selection step is a set cover problem [20], i.e., given a universe
tion is executed, including the function’s output as well U and a family S of subsets of U , a cover is a subfamily C ⊆ S
as the touched DOM elements and their attributes (The of sets whose union is U . Sets to be covered in our algorithm
actual assessment process is explained in Section IV-C). are StSetK+L , where sti ∈ StSetK+L . We use a common
greedy algorithm for obtaining the minimum number of states
Function State Abstraction. As mentioned in Section III, the that can cover all the possible sets (Lines 15-17). Finally, the
highly dynamic nature of JavaScript applications can result in abstracted list of states is returned in Line 18.
a huge number of function states. Capturing all these different
states can potentially hinder the technique’s scalability for C. Generating Test Oracles
large applications. In addition, generating too many test cases
can negatively affect test suite comprehension. We apply a In the third step, our approach automatically generates test
function state abstraction method to minimize the number of oracles for the two levels of test cases generated in the previous
function-level states needed for test generation. step, as depicted in the third step of Figure 2. Instead of
randomly generating assertions, our oracle generation uses a
Our abstraction method is based on classification of func- mutation-based process.
tion (entry/exit) states according to their impact on the func-
tion’s behaviour, in terms of covered branches within the Mutation testing is typically used to evaluate the quality of
function, the function’s return value type, and characteristics a test suite [21], or to generate test cases that kill mutants [13].
of the accessed DOM elements. In our approach, we adopt mutation testing to (1) reduce the
number of assertions automatically generated, (2) target critical
Branch coverage: Taking different branches in a given func- and error-prone portions of the application. Hence, the tester
tion can change its behaviour. Thus, function entry states would only need to examine a small set of effective assertions
that result in a different covered branch should be taken to verify the correctness of the generated oracles. Algorithm 2
into account while generating test cases. Going back to shows our algorithm for generating test oracles. At a high level,
our example in Figure 1, executing either of the branches the technique iteratively executes the following steps:
Algorithm 2: Oracle Generation naive approach would be to compare the DOM tree in its
input : A Web application (App), list of event sequences obtained from SFG entirety, after the event execution. Not only is this approach
(EvSeq), maximum number of mutations (n) inefficient, it results in brittle test-cases, i.e., the smallest
output: Assertions for function-level (F cAsserts) and DOM event-level tests update on the user interface can break the test case. We
(DomAsserts)
propose an alternative approach that utilizes DOM mutation
1 App ← I NSTRUMENT(App)
begin
testing to detect and selectively compare only those DOM
2 while GenM uts < n do elements and attributes that are affected by an injected fault at
3 foreach EvSeq ∈ SF G do the DOM-level of the application. Our DOM mutations target
4 OnEvDomSt ← T race.G ET O N E V D OM S T (Ev ∈ only the elements that have been accessed (read/written) during
EvSeq)
5 Af terEvDomSt ← T race.G ETA FTER E V D OM S T (Ev ∈ execution, and thus have a larger impact on the application’s
EvSeq) behaviour. To select proper DOM elements for mutation, we
6 AccdDomP rops ← G ETACCD D OM N DS (OnEvDomSt) instrument JavaScript functions that interact with the DOM,
7 EquivalentDomM ut ← true
8 while EquivalentDomM ut do i.e., code that either accesses or modifies DOM elements.
9 M utDom ←
M UTATE D OM(AccdDomP rops, OnEvDomSt) We execute the instrumented application by running the
10 ChangedSt ← EvSeq.E XEC E VENT(M utDom) generated S ELENIUM test cases and record each accessed
11 Dif fChangedSt,Af terEvDomSt ←
D IFF(ChangedSt, Af terEvDomSt)
DOM element, its attributes, the triggered event on the
12 if Dif fChangedSt,Af terEvDomSt = ∅ then DOM state, and the DOM state after the event is triggered
13 EquivalentDomM ut ← f alse (G ET O N E V D OM S T in line 4, G ETA FTER E V D OM S T in line
14 DomAsserti = 5, and G ETACCD D OM N DS in line 6 to retrieve the original
Dif fChangedSt,Af terEvDomSt
15 DomAsserts
Ev,Af terEvDomSt =
DOM state, DOM state after event Ev is triggered, and the
DomAsserti accessed DOM properties as event Ev is triggered, respec-
16 AbsF cSts ← T race.G ETA BS F C S TS ()
tively, in Algorithm 2). To perform the actual mutation, as
17 EquivalentCodeM ut ← true the application is re-executed using the same sequence of
18 while EquivalentCodeM ut do events, we mutate the recorded DOM elements, one at a time,
19 M utApp ← M UTATE J S C ODE(App) before the corresponding event is fired. M UTATE D OM in line
20 M utF cSts ← EvSeq.E XEC E VENT(M utApp)
21 foreach F cEntry ∈ AbsF cSts.G ET F C E NTRIES do 9 mutates the DOM elements, and EvSeq.E XEC E VENT in
22 F cExit ← AbsF cSts.G ET F C E XIT (F cEntry) line 10 executes the event sequence on the mutated DOM.
23 M utF cExit ← The mutation operators include (1) deleting a DOM element,
M utF cSts.G ET M UT F C E XIT (F cEntry)
24 Dif fF cExit,M utF cExit ←
and (2) changing the attribute, accessed during the original
D IFF(F cExit, M utF cExit) execution. As we mutate the DOM, we collect the current state
25 if Dif fF cExit,M utF cExit = ∅ then of DOM elements and attributes.
26 EquivalentCodeM ut ← f alse
27
cAsserti =
F Figure 3 shows part of a DOM-level test case
Dif fF cExit,M utF cExit
28 F cAssertsF cEntry = F cAsserti generated for the running example. Going back to
our running example, as a result of clicking on
$(‘div #divElem’) in our previously obtained
29 return {F cAsserts, DOM Asserts} event sequence ($(‘#cell0’).click→$(‘div
#divElem’).click→$(‘#startCell’)), the
height and width properties of DOM element with
ID endCell, and the DOM element with ID startCell
1) A mutant is created by injecting a single fault into the are accessed. One possible DOM mutation is altering the
original version of the web application (Line 9 and 19 in width value of the endCell element before click on
Algorithm 2 for DOM mutation and code-level mutation, $(‘div #divElem’) happens. We log the consequences
respectively), of this modification after the click event on $(‘div
2) Related entry/exit program states at the DOM and Java- #divElem’) as well as the remaining events. This mutation
Script function levels of the mutant and the original affects the height property of DOM element with ID
version are captured. OnEvDomSt in Line 4 is the endCell in the resulting DOM state from clicking on
original DOM state on which the event Ev is triggered, $(‘div #divElem’). Line 6 in Figure 3 shows the
Af terEvDomSt in line 5 is the observed DOM state corresponding assertion. Furthermore, Assuming that the
after the event is triggered, M utDom in line 9 is the DOM mutation makes currentDim≤ 40 in line 27, after
mutated DOM, and ChangedSt in line 10 is the corre- click on element #startCell happens, the element is
sponding affected state for DOM mutations. F cExit in removed and no longer exists in the resulting DOM state. The
Line 22 is the exit state of the function in the original ap- generated assertion is shown in line 10 of Figure 3.
plication and M utF cExit in line 23 is the corresponding
exit state for that function after the application is mutated Hence, we obtain two sets of execution traces that contain
for function-level mutations. information about the state of DOM elements for each fired
3) Relevant observed state differences at each level are event in the original and mutated application. By compar-
detected and abstracted into test oracles (D IFF in Line 11 ing these two traces (D IFF in line 11 in Algorithm 2), we
and 24 for DOM and function-level oracles, respectively), identify all changed DOM elements and generate assertions
4) The generated assertions (Lines 15 and 28) are injected for these elements. Note that any changes detected by the
into the corresponding test cases. D IFF operator (line 12 in Algorithm 2) is an indication
that the corresponding DOM mutation is not equivalent (line
DOM-level event-based test oracles. After an event is trig- 13); if no change is detected, another DOM mutation is
gered in the generated S ELENIUM test case, the resulting DOM generated. We automatically place the generated assertion
state needs to be compared against the expected structure. One immediately after the corresponding line of code that executed
1 @Test 1 test("Testing setDim",4,function(){
2 public void testCase1(){ 2 var fixture = $("#qunit-fixture");
3 WebElement divElem=driver.findElements(By.id("divElem"← 3 fixture.append("<button id=\"cell0\"> <div id=\"←
)); divElem\"/> </button> <div id=\"endCell\" style←
4 divElem.click(); =\"height:200px;width:100px;\"/>");
5 int endCellHeight=driver.findElements(By.id("endCell")← 4 var currentDim=20;
).getSize().height; 5 var result= setDim(10);
6 assertEquals(endCellHeight, 30); 6 equal(result, 30);
7 WebElement startCell=driver.findElements(By.id("← 7 equal(currentDim, 50);
startCell")); 8 ok($(#endCell).length > 0));
8 startCell.click(); 9 equal($(#endCell).css('height'), 30); });
9 boolean exists=driver.findElements(By.id("startCell"))←
.size!=0; Fig. 4. Generated QU NIT test case.
10 assertTrue(exists);
11 int startCellHeight=driver.findElements(By.id("←
startCell")).getSize().height; of the application (Am ), f exhibits an exit state exitm that
12 assertEquals(startCellHeight, 50); is different from both exit1 and exit2 , then we combine the
13 } resulting assertions as follows: assert1(exit1 ,expRes1 )a-
ssert2(exit2 ,expRes2 ), where the expected values expRes1
Fig. 3. Generated S ELENIUM test case. and expRes2 are obtained from the execution trace of A.
the event, in the generated event-based (S ELENIUM) test case. Each assertion for a function contains (1) the function’s
DomAssertsEv,Af terEvDomSt in line 15 contains all DOM returned value, (2) the used global variables in that function,
assertions for the state Af terEvDOM St and the triggered and/or (3) the accessed DOM element in that function. Each
event Ev. assertion is coupled with the expected value obtained from the
Function-level test oracles. To seed code level faults, we execution trace of the original version.
use our recently developed JavaScript mutation testing tool, The generated assertions that target variables, compare
M UTANDIS [22]. Mutations generated by M UTANDIS are the value as well as the runtime type against the expected
selected through a function rank metric, which ranks functions ones. An oracle that targets a DOM element, first checks
in terms of their relative importance from the application’s the existence of that DOM element. If the element exists, it
behaviour point of view. The mutation operators are chosen checks the attributes of the element by comparing them against
from a list of common operators, such as changing the value of the observed values in the original execution trace. Assuming
a variable or modifying a conditional statement. Once a mutant that width and height are 100 and 200 accordingly in
is produced (M UTATE J S C ODE in line 19), it is automatically Figure 1, and ‘+’ sign is mutated to ‘-’ in line 20 of the
instrumented. We collect a new execution trace from the running example in Figure 1, the mutation affects the global
mutated program by executing the same sequence of events variable currentDim, height property of element with ID
that was used on the original version of the application. This endCell, and the returned value of the function setDim.
way, the state of each JavaScript function is extracted at its Figure 4 shows a QU NIT test case for setDim function
entry and exit points. AbsF cSts.G ET F C E NTRIES in line 21 according to this mutation with the generated assertions.
retrieves the function’s entries from the abstracted function’s
states. G ET F C E XIT in line 22, and G ET M UT F C E XIT in line 23
retrieve the corresponding function’s exit state in the original D. Tool Implementation
and mutated application. This process is similar to the function We have implemented our JavaScript test and oracle gen-
state extraction algorithm explained in Section IV-B. eration approach in an automated tool called JS EFT. The tool
After the execution traces are collected for all the generated is written in Java and is publicly available for download [8].
mutants, we generate function-level test oracles by comparing Our implementation requires no browser modifications, and is
the execution trace of the original application with the traces hence portable. For JavaScript code interception, we use a web
we obtained from the modified versions (D IFF in line 24 in proxy, which enables us to automatically instrument JavaScript
Algorithm 2). If the D IFF operator detects no changes (line 25 code before it reaches the browser. The crawler for JS EFT ex-
of the algorithm), an equivalent mutant is detected, and thus tends and builds on top of the event-based crawler, C RAWLJAX
another mutant will be generated. [9], with random input generation enabled for form inputs.
As mentioned before, to mutate JavaScript code, we use our
Our function-level oracle generation targets postcondition recently developed mutation testing tool, M UTANDIS [22]. The
assertions. Such postcondition assertions can be used to exam- upper-bound for the number of mutations can be specified by
ine the expected behaviour of a given function after it is exe- the user. However, the default is 50 for code-level and 20 for
cuted in a unit test case. Our technique generates postcondition DOM-level mutations. We observed that these default numbers
assertions for all functions that exhibit a different exit-point provide a balanced trade-off between oracle generation time,
state but the same entry-point state, in the mutated execution and the fault finding capability of the tool. DOM-level test
traces. F cAsserti in line 27 contains all such post condition cases are generated in a JU NIT format that uses S ELENIUM
assertions. Due to the dynamic and asynchronous behaviour of (WebDriver) APIs to fire events on the application’s DOM
JavaScript applications, a function with the same entry state inside the browser. JavaScript function-level tests are generated
can exhibit different outputs when called multiple times. In in the QU NIT unit testing framework [19], capable of testing
this case, we need to combine assertions to make sure that the any generic JavaScript code.
generated test cases do not mistakenly fail. F cAssertsF cEntry
in line 28 contains the union of function assertions gener- V. E MPIRICAL E VALUATION
ated for the same entry but different outputs during multiple
executions. Let’s consider a function f with an entry state To quantitatively assess the efficacy of our test generation
entry in the original version of the application (A), with two approach, we have conducted an empirical study, in which we
different exit states exit1 and exit2 . If in the mutated version address the following research questions:
TABLE I. C HARACTERISTICS OF THE EXPERIMENTAL OBJECTS . TABLE II. R ESULTS SHOWING THE EFFECTS OF OUR FUNCTION
COVERAGE MAXIMIZATION , FUNCTION STATE ABSTRACTION , AND
ID Name LOC URL
MUTATION - BASED ORACLE GENERATION ALGORITHMS .
1 SameGame 206 crawljax.com/same-game/
2 Tunnel 334 arcade.christianmontoya.com/tunnel/ St. Coverage State Abstraction Oracles
App ID
11 WymEditor 3,035 http://www.wymeditor.org
12 TuduList 1,963 http://tudu.ess.ch/tudu
13 TinyMCE 26,908 http://www.tinymce.com
1 99 80 447 33 93 5101 136
RQ1 How effective is JS EFT in generating test cases with 2 78 78 828 21 97 23212 81
3 90 66 422 14 96 3520 45
high coverage? 4 75 75 43 19 56 1232 109
RQ2 How capable is JS EFT of generating test oracles that 5 49 45 534 23 95 150 79
detect regression faults? 6 78 75 797 30 96 1648 125
RQ3 How does JS EFT compare to existing automated Java- 7 63 58 1653 54 97 198202 342
8 56 50 32 18 43 78 51
Script testing frameworks? 9 82 82 1509 49 97 65403 253
10 71 69 71 23 67 6584 96
JS EFT and all our experimental data in this paper are 11 56 54 1383 131 90 2530 318
available for download [8]. 12 41 38 1530 62 96 3521 184
13 51 47 1401 152 89 2481 335
AVG 68.4 62.8 - - 85.5 - -
A. Objects
used for oracle generation have been selectively generated (as
Our study includes thirteen JavaScript-based applications
discussed in Section IV-C), mutations used for the purpose of
in total. Table I presents each application’s ID, name, lines of
evaluation are randomly generated from the entire application.
custom JavaScript code (LOC, excluding JavaScript libraries)
Note that if the mutation used for the purpose of evaluation and
and resource. The first five are web-based games. AjaxTabs
the mutation used for generating oracles happen to be the same,
is a J Q UERY plugin for creating tabs. NarrowDesign and
we remove the mutant from the evaluation set. Next we run the
JointLondon are websites. FractalViewer is a fractal tree zoom
whole generated test suite (including both function-level and
application. SimpleCart is a shopping cart library, WymEditor
event-based test cases) on the faulty version of the application.
is a web-based HTML editor, TuduList is a web-based task
The fault is considered detected if an assertion generated by
management application, and TinyMCE is a JavaScript based
JS EFT fails and our manual examination confirms that the
WYSIWYG editor control. The applications range from 206
failed assertion is detecting the seeded fault. We measure the
to 27K lines of JavaScript code.
precision and recall as follows:
The experimental objects are open-source and cover dif-
ferent application types. All the applications are interactive in Precision is the rate of injected faults found by the tool that
nature and extensively use JavaScript on the client-side. are actual faults: TPTP
+FP
Recall is the rate of actual injected faults that the tool finds:
TP
B. Setup TP+FN
To address our research questions, we provide the URL where TP (true positives), FP (false positives), and FN (false
of each experimental object to JS EFT. Test cases are then negatives) respectively represent the number of faults that are
automatically generated by JS EFT. We give JS EFT 10 minutes correctly detected, falsely reported, and missed.
in total for each application. 5 minutes of the total time is Comparison (RQ3). To assess how JS EFT performs with re-
designated for the dynamic exploration step. spect to existing JavaScript test automation tools, we compare
Test Case Generation (RQ1). To measure client-side code its coverage and fault finding capability to that of A RTEMIS
coverage, we use JSC OVER [23], an open-source tool for mea- [2]. Similar to JS EFT, we give A RTEMIS 10 minutes in total
suring JavaScript code coverage. We report the average results for each application; we observed no improvements in the
over five runs to account for the non-determinism behaviour results obtained from running A RTEMIS for longer periods of
that stems from crawling the application. In addition, we assess time. We run A RTEMIS from the command line by setting
each step in our approach separately as follows: (1) compare the iteration option to 100 and enabling the coverage priority
the statement coverage achieved by our function coverage strategy, as described in [2]. Similarly, JSCover is used to
maximization with a method that chooses the next state/event measure the coverage of A RTEMIS (over 5 runs). We use
for the expansion uniformly at random, (2) assess the efficacy the output provided by A RTEMIS to determine if the seeded
of our function state abstraction method (Algorithm 1), and mutations are detected by the tool, by following the same
(3) evaluate the effectiveness of applying mutation techniques procedure as described above for JS EFT.
(Algorithm 2) to reduce the number of assertions generated.
C. Results
Test Oracles (RQ2). To evaluate the fault finding capability
of JS EFT (RQ2), we simulate web application faults by auto- Test Case Generation (RQ1). Figure 5 depicts the statement
matically seeding each application with 50 random faults. We coverage achieved by JS EFT for each application. The results
automatically pick a random program point and seed a fault show that the test cases generated by JS EFT achieve a coverage
at that point according to our fault category. While mutations of 68.4% on average, ranging from 41% (ID 12) up to 99% (ID
TABLE III. FAULT DETECTION .
JS EFT A RTEMIS
JSeft
# Injected Faults
Artemis
Precision (%)
Precision (%)
Recall (%)
Recall (%)
80
App ID
#FN
#TP
#FP
1 50 0 0 50 100 100 30 100 20
60
2 50 9 0 41 100 82 73 100 12
Coverage (%)
3 50 4 0 46 100 92 17 100 8
4 50 15 0 35 100 70 28 100 22
5 50 26 0 24 100 48 25 100 0
6 50 9 0 41 100 82 15 100 16
40
7 50 17 0 33 100 66 24 100 0
8 50 23 0 27 100 54 26 100 0
9 50 6 0 44 100 88 41 100 24
10 50 16 0 34 100 68 65 100 8
20
11 50 21 0 29 100 58 27 100 6
12 50 26 0 24 100 48 17 100 22
13 50 23 0 27 100 54 26 100 28
AVG - 15 0 35 100 70 32 100 12.8
0