0% found this document useful (0 votes)
2 views10 pages

Automated JavaScript Unit Test Generation

The document presents JS EFT, a framework for automated test case generation for JavaScript applications, addressing the challenges of event-driven and dynamic testing. It combines function coverage maximization and function state abstraction to generate tests at both event and unit levels, achieving 68% coverage and high precision in fault detection. The framework also includes a mutation-based oracle generation technique to enhance test effectiveness, outperforming existing frameworks in both coverage and fault detection capabilities.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

Automated JavaScript Unit Test Generation

The document presents JS EFT, a framework for automated test case generation for JavaScript applications, addressing the challenges of event-driven and dynamic testing. It combines function coverage maximization and function state abstraction to generate tests at both event and unit levels, achieving 68% coverage and high precision in fault detection. The framework also includes a mutation-based oracle generation technique to enhance test effectiveness, outperforming existing frameworks in both coverage and fault detection capabilities.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

JS EFT: Automated JavaScript Unit Test Generation

Shabnam Mirshokraie Ali Mesbah Karthik Pattabiraman


University of British Columbia
Vancouver, BC, Canada
{shabnamm, amesbah, karthikp}@ece.ubc.ca

Abstract—The event-driven and highly dynamic nature of such generated test cases, the tester still needs to manually
JavaScript, as well as its runtime interaction with the Document write many assertions, which is time and effort intensive.
Object Model (DOM) make it challenging to test JavaScript-based On the other hand, soft oracles target generic fault types
applications. Current web test automation techniques target and are limited in their fault finding capabilities. However,
the generation of event sequences, but they ignore testing the to be practically useful, unit testing requires strong oracles
JavaScript code at the unit level. Further they either ignore the
oracle problem completely or simplify it through generic soft
to determine whether the application under test executes
oracles such as HTML validation and runtime exceptions. We correctly.
present a framework to automatically generate test cases for Java- To address these two shortcomings, we propose an automated
Script applications at two complementary levels, namely events
and individual JavaScript functions. Our approach employs a
test case generation technique for JavaScript applications.
combination of function coverage maximization and function Our approach, called JS EFT (JavaScript Event and Func-
state abstraction algorithms to efficiently generate test cases. tion Testing) operates through a three step process. First, it
In addition, these test cases are strengthened by automatically
generated mutation-based oracles. We empirically evaluate the
dynamically explores the event-space of the application using
implementation of our approach, called JS EFT, to assess its a function coverage maximization method, to infer a test
efficacy. The results, on 13 JavaScript-based applications, show model. Then, it generates test cases at two complementary
that the generated test cases achieve a coverage of 68% and that levels, namely, DOM event and JavaScript functions. Our
JS EFT can detect injected JavaScript and DOM faults with a high technique employs a novel function state abstraction algorithm
accuracy (100% precision, 70% recall). We also find that JS EFT to minimize the number of function-level states needed for
outperforms an existing JavaScript test automation framework test generation. Finally, it automatically generates test oracles,
both in terms of coverage and detected faults. through a mutation-based algorithm.
Keywords—Test generation; oracles; JavaScript; DOM A preliminary version of this work appeared in a short
New Ideas paper [7]. In this current paper, we present the
I. I NTRODUCTION complete technique with conceptually significant improve-
ments, including detailed new algorithms (Algorithms 1–2), a
JavaScript plays a prominent role in modern web ap- fully-functional tool implementation, and a thorough empirical
plications. To test their JavaScript applications, developers analysis on 13 JavaScript applications, providing evidence of
often write test cases using web testing frameworks such as the efficacy of the approach.
S ELENIUM (GUI tests) and QU NIT (JavaScript unit tests).
Although such frameworks help to automate test execution, the This work makes the following main contributions:
test cases still need to be written manually, which is tedious
and time-consuming. • An automatic technique to generate test cases for Java-
Script functions and events.
Further, the event-driven and highly dynamic nature of • A combination of function converge maximization and
JavaScript, as well as its runtime interaction with the Docu- function state abstraction algorithms to efficiently gener-
ment Object Model (DOM) make JavaScript applications error- ate unit test cases;
prone [1] and difficult to test. • A mutation-based algorithm to effectively generate test
Researchers have recently developed automated test gen- oracles, capable of detecting regression JavaScript and
eration techniques for JavaScript-based applications [2], [3], DOM-level faults;
[4], [5], [6]. However, current web test generation techniques • The implementation of our technique in a tool called
suffer from two main shortcomings, namely, they: JS EFT, which is publicly available [8];
• An empirical evaluation to assess the efficacy of JS EFT
1) Target the generation of event sequences, which operate using 13 JavaScript applications.
at the event-level or DOM-level to cover the state space
of the application. These techniques fail to capture faults The results of our evaluation show that on average (1)
that do not propagate to an observable DOM state. As the generated test suite by JS EFT achieves a 68% JavaScript
such, they potentially miss this portion of code-level Java- code coverage, (2) compared to A RTEMIS, a feedback-directed
Script faults. In order to capture such faults, effective test JavaScript testing framework [2], JS EFT achieves 53% better
generation techniques need to target the code at the Java- coverage, and (3) the test oracles generated by JS EFT are able
Script unit-level, in addition to the event-level. to detect injected faults with 100% precision and 70% recall.
2) Either ignore the oracle problem altogether or simplify
it through generic soft oracles, such as W3C HTML II. R ELATED W ORK
validation [2], [5], or JavaScript runtime exceptions [2]. A
generated test case without assertions is not useful since Web application testing. Marchetto and Tonella [3] propose a
coverage alone is not the goal of software testing. For search-based algorithm for generating event-based sequences
to test Ajax applications. Mesbah et al. [9] apply dynamic 1 var currentDim=20;
2 function cellClicked() {
analysis to construct a model of the application’s state space, 3 var divTag = '<div id='divElem' />';
from which event-based test cases are automatically generated. 4 if($(this).attr('id') == 'cell0'){
In subsequent work [5], they propose generic and application- 5 $('#cell0').after(divTag);
specific invariants as a form of automated soft oracles for 6 $('div #divElem').click(setup);
7 }
testing A JAX applications. Our earlier work, JS ART [10], auto- 8 else if($(this).attr('id') == 'cell1'){
matically infers program invariants from JavaScript execution 9 $('#cell1').after(divTag);
traces and uses them as regression assertions in the code. Sen 10 $('div #divElem').click(function(){setDim(20)});
11 }
et al. [11] recently proposed a record and replay framework 12 }
called Jalangi. It incorporates selective record-replay as well
as shadow values and shadow execution to enable writing of 14 function setup() {
15 setDim(10);
heavy-weight dynamic analyses. The framework is able to 16 $('#startCell').click(start);
track generic faults such as null and undefined values 17 }
as well as type inconsistencies in JavaScript. Jensen et al. [12]
propose a technique to test the correctness of communication 19 function setDim(dimension) {
20 var dim=($('#endCell').width() + $('#endCell').height←
patterns between client and server in A JAX applications by ()))/dimension;
incorporating server interface descriptions. They construct 21 currentDim += dim;
server interface descriptions through an inference technique 22 $('#endCell').css('height', dim+'px');
23 return dim;
that can learn communication patterns from sample data. 24 }
Saxena et al. [6] combine random test generation with the use
of symbolic execution for systematically exploring a JavaScript 26 function start() {
27 if(currentDim > 40)
application’s event space as well as its value space, for security 28 $(this).css('height', currentDim+'px');
testing. Our work is different in two main aspects from these: 29 else $(this).remove();
(1) they all target the generation of event sequences at the 30 }
DOM level, while we also generate unit tests at the JavaScript 32 $document.ready(function() {
code level, which enables us to cover more and find more 33 ...
faults, and (2) they do not address the problem of test oracle 34 $('#cell0').click(cellClicked);
generation and only check against soft oracles (e.g., invalid 35 $('#cell1').click(cellClicked);
36 });
HTML). In contrast, we generate strong oracles that capture
application behaviours, and can detect a much wider range of Fig. 1. JavaScript code of the running example.
faults.
with the DOM.
Perhaps the most closely related work to ours is A RTEMIS
[2], which supports automated testing of JavaScript applica- III. C HALLENGES AND M OTIVATION
tions. A RTEMIS considers the event-driven execution model
of a JavaScript application for feedback-directed testing. In In this section, we illustrate some of the challenges asso-
this paper, we quantitatively compare our approach with that ciated with test generation for JavaScript applications.
of A RTEMIS (Section V). Figure 1 presents a snippet of a JavaScript game application
Oracle generation. There has been limited work on oracle that we use as a running example throughout the paper. This
generation for testing. Fraser et al. [13] propose µTEST, simple example uses the popular jQuery library [18] and
which employs a mutant-based oracle generation technique. contains four main JavaScript functions:
It automatically generates unit tests for Java object-oriented 1) cellClicked is bound to the event-handlers of DOM
classes by using a genetic algorithm to target mutations elements with IDs cell0 and cell1 (Lines 34–35).
with high impact on the application’s behaviour. They further These two DOM elements become available when the
identify [14] relevant pre-conditions on the test inputs and DOM is fully loaded (Line 32). Depending on the element
post-conditions on the outputs to ease human comprehension. clicked, cellClicked inserts a div element with ID
Differential test case generation approaches [15], [16] are sim- divElem (Line 3) after the clicked element and makes
ilar to mutation-based techniques in that they aim to generate it clickable by attaching either setup or setDim as its
test cases that show the difference between two versions of a event-handler function (Lines 5–6, 9–10).
program. However, mutation-based techniques such as ours, do 2) setup calls setDim (Line 15) to change the value of
not require two different versions of the application. Rather, the the global variable currentDim. It further makes an
generated differences are in the form of controllable mutations element with ID startCell clickable by setting its
that can be used to generate test cases capable of detecting event- handler to start (Line 16).
regression faults in future versions of the program. Staats et 3) setDim receives an input variable. It performs some
al. [17] address the problem of selecting oracle data, which computations to set the height value of the css
is formed as a subset of internal state variables as well as property of a DOM element with ID endCell and the
outputs for which the expected values are determined. They value of currentDim (Lines 20–22). It also returns the
apply mutation testing to produce oracles and rank the inferred computed dimension.
oracles in terms of their fault finding capability. This work is 4) start is called at runtime when the element with ID
different from ours in that they merely focus on supporting startCell is clicked (Line 16), which either updates
the creation of test oracles by the programmer, rather than the width dimension of the element on which it was
fully automating the process of test case generation. Further, called, or removes the element (Lines 27-29).
(1) they do not target JavaScript; (2) in addition to the code-
level mutation analysis, we propose DOM-related mutations There are four main challenges in testing JavaScript appli-
to capture error-prone [1] dynamic interactions of JavaScript cations.
The first challenge is that a fault may not immediately prop- 1
agate into a DOM-level observable failure. For example, if the Instrument Crawl
Collect Maximize
‘+’ sign in Line 21 is mistakenly replaced by ‘-’, the affected Trace Coverage

result does not immediately propagate to the observable DOM


state after the function exits. While this mistakenly changes the SFG

value of a global variable, currentDim, which is later used


in start (Line 27), it neither affects the returned value of the 2
Extract
setDim function nor the css value of element endCell. Event
Event-based Function-level
Tests Unit Tests
Therefore, a GUI-level event-based testing approach may not Seq.

help to detect the fault in this case. Web Run Collect Extract Abstract Func.
App Instrument
tests Trace Function State States
The second challenge is related to fault localization; even
if the fault propagates to a future DOM state and a DOM-level Extract
DOM State
test case detects it, finding the actual location of the fault is
challenging for the tester as the DOM-level test case is agnostic 3
of the JavaScript code. However, a unit test case that targets
DOM Func.
individual functions, e.g., setDim in this running example, Oracles
Diff
Oracles
Diff

helps a tester to spot the fault, and thus easily resolve it.
The third challenge pertains to the event-driven dynamic Mutate Extract
DOM State
nature of JavaScript, and its extensive interaction with the
DOM resulting in many state permutations and execution Instrument
Run Collect Extract
tests Trace Function State
paths. In the initial state of the example, clicking on cell0
or cell1 takes the browser to two different states as a result
of the if-else statement in Lines 4 and 8 of the function
cellClicked. Even in this simple example, expanding Fig. 2. Overview of our test generation approach.
either of the resulting states has different consequences due to
different functions that can be potentially triggered. Executing An overview of the technique is depicted in Figure 2. At
either setup or setDim in Lines 6 and 10 results in different a high level, our approach is composed of three main steps:
execution paths, DOM states, and code coverage. It is this 1) In the first step (Section IV-A), we dynamically explore
dynamic interaction of the JavaScript code with the DOM (and various states of a given web application, in such a way
indirectly CSS) at runtime that makes it challenging to generate as to maximize the number of functions that are covered
test cases for JavaScript applications. throughout the program execution. The output of this
The fourth important challenge in unit testing JavaScript initial step is a state-flow graph (SFG) [5], capturing the
functions that have DOM interactions, such as setDim, is explored dynamic DOM states and event-based transitions
that the DOM tree in the state expected by the function, has between them.
to be present during unit test execution. Otherwise the test will 2) In the second step (Section IV-B), we use the inferred
fail due to a null or undefined exception. This situation SFG to generate event-based test cases. We run the gen-
arises often in modern web applications that have many DOM erated tests against an instrumented version of the applica-
interactions. tion. From the execution trace obtained, we extract DOM
element states as well as JavaScript function states at the
entry and exit points, from which we generate function-
IV. A PPROACH level unit tests. To reduce the number of generated test
Our main goal in this work is to generate client-side test cases to only those that are constructive, we devise a state
cases coupled with effective test oracles, capable of detecting abstraction algorithm that minimizes the number of states
regression JavaScript and DOM-level faults. Further, we aim by selecting representative function states.
to achieve this goal as efficiently as possible. Hence, we make 3) To create effective test oracles for the two test case
two design decisions. First, we assume that there is a finite levels, we automatically generate mutated versions of the
amount of time available to generate test cases. Consequently application (Section IV-C). Assuming that the original
we guide the test generation to maximize coverage under a version of the application is fault-free, the test oracles
given time constraint. The second decision is to minimize the are then generated at the DOM and JavaScript code levels
number of test cases and oracles generated to only include by comparing the states traced from the original and the
those that are essential in detecting potential faults. Conse- mutated versions.
quently, to examine the correctness of the test suite generated,
the tester would only need to examine a small set of assertions,
which minimizes their effort. A. Maximizing Function Coverage

Our approach generates test cases and oracles at two In this step, our goal is to maximize the number of
complementary levels: functions that can be covered, while exercising the program’s
event space. To that end, our approach combines static and
DOM-level event-based tests consist of DOM-level event se- dynamic analysis to decide which state and event(s) should be
quences and assertions to check the application’s be- selected for expansion to maximize the probability of cover-
haviour from an end-user’s perspective. ing uncovered JavaScript functions. While exploring the web
Function-level unit tests consist of unit tests with assertions application under test, our function coverage maximization
that verify the functionality of JavaScript code at the algorithm selects a next state for exploration, which has the
function level. maximum value of the sum of the following two metrics:
1. Potential Uncovered Functions. This pertains to the total JavaScript function-level unit testing. To generate unit tests
number of unexecuted functions that can potentially be visited that target JavaScript functions directly (as opposed to event-
through the execution of DOM events in a given DOM state triggered function executions), we log the state of each func-
si . When a given function fi is set as the event-handler of tion at their entry and exit point, during execution. To that end,
a DOM element d ∈ si , it makes the element a potential we instrument the code to trace various entities. At the entry
clickable element in si . This can be achieved through various point of a given JavaScript function we collect (1) function
patterns in web applications depending on which DOM event parameters including passed variables, objects, functions, and
model level is adopted. To calculate this metric, our algorithm DOM elements, (2) global variables used in the function,
identifies all JavaScript functions that are directly or indirectly and (3) the current DOM structure just before the function
attached to DOM elements as event handlers, in si through is executed. At the exit point of the JavaScript function and
code instrumentation and execution trace monitoring. before every return statement, we log the state of the (1)
return value of the function, (2) global variables that have been
2. Potential Clickable Elements. The second metric, used to accessed in that function, and (3) DOM elements accessed
select a state for expansion, pertains to the number of DOM (read/written) in the function. At each of the above points, our
elements that can potentially become clickable elements. If instrumentation records the name, runtime type, and actual
the event-handlers bound to those clickables are triggered, values. The dynamic type is stored because JavaScript is a
new (uncovered) functions will be executed. To obtain this dynamically typed language, meaning that the variable types
number, we statically analyze the previously obtained potential cannot be determined statically. Note that complex JavaScript
uncovered functions within a given state in search of such objects can contain circular or multiple references (e.g., in
elements. JSON format). To handle such cases, we perform a de-
While exploring the application, the next state for ex- serialization process in which we replace such references by
pansion is selected by adding the two metrics and choosing an object in the form of $ref : P ath, where P ath denotes
the state with the highest sum. The procedure repeats the a JSON P ath string1 that indicates the target path of the
aforementioned steps until the designated time limit, or state reference.
space size is reached.
In addition to function entry and exit points, we
In the running example of Figure 1, in the initial state, log information required for calling the function from
clicking on elements with IDs cell0 and cell1 results in the generated test cases. JavaScript functions that are
two different states due to an if-else statement in Lines 4 accessible in the public scope are mainly defined
and 8 of cellClicked. Let’s call the state in which a DIV in (1) the global scope directly (e.g., function
element is located after the element with ID cell0 as s0 , and f(){...}), (2) variable assignments in the global scope
the state in which a DIV element is placed after the element (e.g., var f = function(){...}), (3) constructor
with ID cell1 as s1 . If state s0 , with the clickable cell0, is functions (e.g, function constructor() {this.
chosen for expansion, function setup is called. As shown in member= function(){...}}), and (4) prototypes (e.g.,
Line 15, setup calls setDim, and thus, by expanding s0 both Constructor.prototype.f= function() {...}).
of the aforementioned functions get called by a single click. Functions in the first and second case are easy to call from test
Moreover, a potential clickable element is also created in Line cases. For the third case, the constructor function is called via
16, with start as the event-handler. Therefore, expanding s1 the new operator to create an object type, which can be used
results only in the execution of setDim, while expanding s0 to access the object’s properties (e.g., container=new
results in the execution of functions setup, setDim, and a Constructor(); container.member();). This
potential execution of start in future states. At the end of allows us to access the inner function, which is a member
this step, we obtain a state-flow graph of the application that of the constructor function in the above example. For
can be used in the next test generation step. the prototype case, the function can be invoked through
container.f() from a test case.
B. Generating Test Cases
Going back to our running example in Figure 1, at the entry
In the second step, our technique first extracts sequences point of setDim, we log the value and type of both the input
of events from the inferred state-flow graph. These sequences parameter dimension and global variable currentDim,
of events are used in our test case generation process. We which is accessed in the function. Similarly, at the exit point,
generate test cases at two complementary levels, as described we log the values and types of the returned variable dim and
below. currentDim.
DOM-level event-based testing. To verify the behaviour In addition to the values logged above, we need to capture
of the application at the user interface level, each event the DOM state for functions that interact with the DOM. This
path, taken from the initial state (Index) to a leaf node is to address the fourth challenge outlined in Section III. To
in the state-flow graph, is used to generate DOM event- mitigate this problem, we capture the state of the DOM just
based test cases. Each extracted path is converted into a before the function starts its execution, and include that as a
JU NIT S ELENIUM-based test case, which executes the se- test fixture [19] in the generated unit test case.
quence of events, starting from the initial DOM state. Go-
ing back to our running example, one possible event se- In the running example, at the entry point of setDim, we
quence to generate is: $(‘#cell0’).click→$(‘div log the innerHTML of the current DOM as the function con-
#divElem’).click→$(‘#startCell’).click. tains several calls to the DOM, e.g., retrieving the element with
ID endCell in Line 22. We further include in our execution
To collect the required trace data, we capture all DOM trace the way DOM elements and their attributes are modified
elements and their attributes after each event in the test path is by the JavaScript function at runtime. The information that we
fired. This trace is later used in our DOM oracle comparison,
as explained in Section IV-C. 1 http://goessner.net/articles/JsonPath/
Algorithm 1: Function State Abstraction in lines 27 and 29 clearly takes the application into a
input : The set of function states sti ∈ STf for a given function f different DOM state. In this example, we need to include the
output: The obtained abstracted states set AbsStates states of the start function that result in different covered
begin branches, e.g., two different function states where the value
1 for sti ∈ STf do of the global variable currentDim at the entry point falls
2 L = 1; StSetL ← ∅
into different boundaries.
3 if B RN C OV L NS[sti ] = B RN C OV L NS[StSet]L
l=1 then
4 StSetL+1 ← sti Return value type: A variable’s type can change in Java-
5 L++ Script at runtime. This can result in changes in the expected
6 else outcome of the function. Going back to our example, if dim
7 StSetl ← sti ∪ StSetl is mistakenly assigned a string value before adding it to
8 K = L + 1; StSetK ← ∅ currentDim (Line 21) in function setDim, the returned
9 if DOMP ROPS[sti ] = DOMP ROPS [StSet]K k=L+1 || value of the function becomes the string concatenation of
RetType[sti ] = R ET T YPE[StSet]K
k=L+1 then the two values rather than the expected numerical addition.
10 StSetK+1 ← sti
11 K++
Accessed DOM properties: DOM elements and their prop-
erties accessed in a function can be seen as entry
12 else
13 StSetk ← stk ∪ StSetk states. Changes in such DOM entry states can affect
the behaviour of the function. For example, in line 29
14 while StSetK+L = ∅ do this keyword refers to the clicked DOM element of
15 SelectedSt ← S ELECT M AX S T(sti |sti ∩ StSetK+L
j=1 ) which function start is an event-handler. Assuming that
AbsStates.ADD(SelectedSt)
16
17 StSetK+L ← StSetK+L − SelectedSt currentDim ≤ 40, depending on which DOM element
18 return AbsStates
is clicked, by removing the element in line 29 the resulting
state of the function start differs. Therefore, we take into
consideration the DOM elements accessed by the function
as well as the type of accessed DOM properties.
log for accessed DOM elements includes the ID attribute, the Algorithm 1 shows our function state abstraction algorithm.
XPath position of the element on the DOM tree, and all the The algorithm first collects covered branches of individual
modified attributes. Collecting this information is essential for functions per entry state (B RN C OV L NS[sti ] in Line 3). Each
oracle generation in the next step. We use a set to keep the function’s states exhibiting same covered branches are cate-
information about DOM modifications, so that we can record gorized under the same set of states (Lines 4 and 7). StSetl
the latest changes to a DOM element without any duplication corresponds to the set of function states, which are classified
within the function. For instance, we record ID as well as both according to their covered branches, where l = 1, ..., L and
width and height properties of the endCell element. L is the number of current classified sets in covered branch
Once our instrumentation is carried out, we run the gener- category. Similarly, function states with the same accessed
ated event sequences obtained from the state-flow graph. This DOM characteristics as well as return value type, are put into
way, we produce an execution trace that contains: the same set of states (Lines 10 and 13). StSetk corresponds
to the set of function states, which are classified according to
• Information required for preparing the environment for their DOM/return value type, where k = 1, ..., K and K is
each function to be executed in a test case, including its the number of current classified sets in that category. After
input parameters, used global variables, and the DOM tree classifying each function’s states into several sets, we cover
in a state that is expected by the function; each set by selecting one of its common states. The state
• Necessary entities that need to be assessed after the func- selection step is a set cover problem [20], i.e., given a universe
tion is executed, including the function’s output as well U and a family S of subsets of U , a cover is a subfamily C ⊆ S
as the touched DOM elements and their attributes (The of sets whose union is U . Sets to be covered in our algorithm
actual assessment process is explained in Section IV-C). are StSetK+L , where sti ∈ StSetK+L . We use a common
greedy algorithm for obtaining the minimum number of states
Function State Abstraction. As mentioned in Section III, the that can cover all the possible sets (Lines 15-17). Finally, the
highly dynamic nature of JavaScript applications can result in abstracted list of states is returned in Line 18.
a huge number of function states. Capturing all these different
states can potentially hinder the technique’s scalability for C. Generating Test Oracles
large applications. In addition, generating too many test cases
can negatively affect test suite comprehension. We apply a In the third step, our approach automatically generates test
function state abstraction method to minimize the number of oracles for the two levels of test cases generated in the previous
function-level states needed for test generation. step, as depicted in the third step of Figure 2. Instead of
randomly generating assertions, our oracle generation uses a
Our abstraction method is based on classification of func- mutation-based process.
tion (entry/exit) states according to their impact on the func-
tion’s behaviour, in terms of covered branches within the Mutation testing is typically used to evaluate the quality of
function, the function’s return value type, and characteristics a test suite [21], or to generate test cases that kill mutants [13].
of the accessed DOM elements. In our approach, we adopt mutation testing to (1) reduce the
number of assertions automatically generated, (2) target critical
Branch coverage: Taking different branches in a given func- and error-prone portions of the application. Hence, the tester
tion can change its behaviour. Thus, function entry states would only need to examine a small set of effective assertions
that result in a different covered branch should be taken to verify the correctness of the generated oracles. Algorithm 2
into account while generating test cases. Going back to shows our algorithm for generating test oracles. At a high level,
our example in Figure 1, executing either of the branches the technique iteratively executes the following steps:
Algorithm 2: Oracle Generation naive approach would be to compare the DOM tree in its
input : A Web application (App), list of event sequences obtained from SFG entirety, after the event execution. Not only is this approach
(EvSeq), maximum number of mutations (n) inefficient, it results in brittle test-cases, i.e., the smallest
output: Assertions for function-level (F cAsserts) and DOM event-level tests update on the user interface can break the test case. We
(DomAsserts)
propose an alternative approach that utilizes DOM mutation
1 App ← I NSTRUMENT(App)
begin
testing to detect and selectively compare only those DOM
2 while GenM uts < n do elements and attributes that are affected by an injected fault at
3 foreach EvSeq ∈ SF G do the DOM-level of the application. Our DOM mutations target
4 OnEvDomSt ← T race.G ET O N E V D OM S T (Ev ∈ only the elements that have been accessed (read/written) during
EvSeq)
5 Af terEvDomSt ← T race.G ETA FTER E V D OM S T (Ev ∈ execution, and thus have a larger impact on the application’s
EvSeq) behaviour. To select proper DOM elements for mutation, we
6 AccdDomP rops ← G ETACCD D OM N DS (OnEvDomSt) instrument JavaScript functions that interact with the DOM,
7 EquivalentDomM ut ← true
8 while EquivalentDomM ut do i.e., code that either accesses or modifies DOM elements.
9 M utDom ←
M UTATE D OM(AccdDomP rops, OnEvDomSt) We execute the instrumented application by running the
10 ChangedSt ← EvSeq.E XEC E VENT(M utDom) generated S ELENIUM test cases and record each accessed
11 Dif fChangedSt,Af terEvDomSt ←
D IFF(ChangedSt, Af terEvDomSt)
DOM element, its attributes, the triggered event on the
12 if Dif fChangedSt,Af terEvDomSt = ∅ then DOM state, and the DOM state after the event is triggered
13 EquivalentDomM ut ← f alse (G ET O N E V D OM S T in line 4, G ETA FTER E V D OM S T in line
14 DomAsserti = 5, and G ETACCD D OM N DS in line 6 to retrieve the original
Dif fChangedSt,Af terEvDomSt
15 DomAsserts
 Ev,Af terEvDomSt =
DOM state, DOM state after event Ev is triggered, and the
DomAsserti accessed DOM properties as event Ev is triggered, respec-
16 AbsF cSts ← T race.G ETA BS F C S TS ()
tively, in Algorithm 2). To perform the actual mutation, as
17 EquivalentCodeM ut ← true the application is re-executed using the same sequence of
18 while EquivalentCodeM ut do events, we mutate the recorded DOM elements, one at a time,
19 M utApp ← M UTATE J S C ODE(App) before the corresponding event is fired. M UTATE D OM in line
20 M utF cSts ← EvSeq.E XEC E VENT(M utApp)
21 foreach F cEntry ∈ AbsF cSts.G ET F C E NTRIES do 9 mutates the DOM elements, and EvSeq.E XEC E VENT in
22 F cExit ← AbsF cSts.G ET F C E XIT (F cEntry) line 10 executes the event sequence on the mutated DOM.
23 M utF cExit ← The mutation operators include (1) deleting a DOM element,
M utF cSts.G ET M UT F C E XIT (F cEntry)
24 Dif fF cExit,M utF cExit ←
and (2) changing the attribute, accessed during the original
D IFF(F cExit, M utF cExit) execution. As we mutate the DOM, we collect the current state
25 if Dif fF cExit,M utF cExit = ∅ then of DOM elements and attributes.
26 EquivalentCodeM ut ← f alse
27
 cAsserti =
F Figure 3 shows part of a DOM-level test case
Dif fF cExit,M utF cExit
28 F cAssertsF cEntry = F cAsserti generated for the running example. Going back to
our running example, as a result of clicking on
$(‘div #divElem’) in our previously obtained
29 return {F cAsserts, DOM Asserts} event sequence ($(‘#cell0’).click→$(‘div
#divElem’).click→$(‘#startCell’)), the
height and width properties of DOM element with
ID endCell, and the DOM element with ID startCell
1) A mutant is created by injecting a single fault into the are accessed. One possible DOM mutation is altering the
original version of the web application (Line 9 and 19 in width value of the endCell element before click on
Algorithm 2 for DOM mutation and code-level mutation, $(‘div #divElem’) happens. We log the consequences
respectively), of this modification after the click event on $(‘div
2) Related entry/exit program states at the DOM and Java- #divElem’) as well as the remaining events. This mutation
Script function levels of the mutant and the original affects the height property of DOM element with ID
version are captured. OnEvDomSt in Line 4 is the endCell in the resulting DOM state from clicking on
original DOM state on which the event Ev is triggered, $(‘div #divElem’). Line 6 in Figure 3 shows the
Af terEvDomSt in line 5 is the observed DOM state corresponding assertion. Furthermore, Assuming that the
after the event is triggered, M utDom in line 9 is the DOM mutation makes currentDim≤ 40 in line 27, after
mutated DOM, and ChangedSt in line 10 is the corre- click on element #startCell happens, the element is
sponding affected state for DOM mutations. F cExit in removed and no longer exists in the resulting DOM state. The
Line 22 is the exit state of the function in the original ap- generated assertion is shown in line 10 of Figure 3.
plication and M utF cExit in line 23 is the corresponding
exit state for that function after the application is mutated Hence, we obtain two sets of execution traces that contain
for function-level mutations. information about the state of DOM elements for each fired
3) Relevant observed state differences at each level are event in the original and mutated application. By compar-
detected and abstracted into test oracles (D IFF in Line 11 ing these two traces (D IFF in line 11 in Algorithm 2), we
and 24 for DOM and function-level oracles, respectively), identify all changed DOM elements and generate assertions
4) The generated assertions (Lines 15 and 28) are injected for these elements. Note that any changes detected by the
into the corresponding test cases. D IFF operator (line 12 in Algorithm 2) is an indication
that the corresponding DOM mutation is not equivalent (line
DOM-level event-based test oracles. After an event is trig- 13); if no change is detected, another DOM mutation is
gered in the generated S ELENIUM test case, the resulting DOM generated. We automatically place the generated assertion
state needs to be compared against the expected structure. One immediately after the corresponding line of code that executed
1 @Test 1 test("Testing setDim",4,function(){
2 public void testCase1(){ 2 var fixture = $("#qunit-fixture");
3 WebElement divElem=driver.findElements(By.id("divElem"← 3 fixture.append("<button id=\"cell0\"> <div id=\"←
)); divElem\"/> </button> <div id=\"endCell\" style←
4 divElem.click(); =\"height:200px;width:100px;\"/>");
5 int endCellHeight=driver.findElements(By.id("endCell")← 4 var currentDim=20;
).getSize().height; 5 var result= setDim(10);
6 assertEquals(endCellHeight, 30); 6 equal(result, 30);
7 WebElement startCell=driver.findElements(By.id("← 7 equal(currentDim, 50);
startCell")); 8 ok($(#endCell).length > 0));
8 startCell.click(); 9 equal($(#endCell).css('height'), 30); });
9 boolean exists=driver.findElements(By.id("startCell"))←
.size!=0; Fig. 4. Generated QU NIT test case.
10 assertTrue(exists);
11 int startCellHeight=driver.findElements(By.id("←
startCell")).getSize().height; of the application (Am ), f exhibits an exit state exitm that
12 assertEquals(startCellHeight, 50); is different from both exit1 and exit2 , then we combine the
13 } resulting assertions as follows: assert1(exit1 ,expRes1 )a-
ssert2(exit2 ,expRes2 ), where the expected values expRes1
Fig. 3. Generated S ELENIUM test case. and expRes2 are obtained from the execution trace of A.
the event, in the generated event-based (S ELENIUM) test case. Each assertion for a function contains (1) the function’s
DomAssertsEv,Af terEvDomSt in line 15 contains all DOM returned value, (2) the used global variables in that function,
assertions for the state Af terEvDOM St and the triggered and/or (3) the accessed DOM element in that function. Each
event Ev. assertion is coupled with the expected value obtained from the
Function-level test oracles. To seed code level faults, we execution trace of the original version.
use our recently developed JavaScript mutation testing tool, The generated assertions that target variables, compare
M UTANDIS [22]. Mutations generated by M UTANDIS are the value as well as the runtime type against the expected
selected through a function rank metric, which ranks functions ones. An oracle that targets a DOM element, first checks
in terms of their relative importance from the application’s the existence of that DOM element. If the element exists, it
behaviour point of view. The mutation operators are chosen checks the attributes of the element by comparing them against
from a list of common operators, such as changing the value of the observed values in the original execution trace. Assuming
a variable or modifying a conditional statement. Once a mutant that width and height are 100 and 200 accordingly in
is produced (M UTATE J S C ODE in line 19), it is automatically Figure 1, and ‘+’ sign is mutated to ‘-’ in line 20 of the
instrumented. We collect a new execution trace from the running example in Figure 1, the mutation affects the global
mutated program by executing the same sequence of events variable currentDim, height property of element with ID
that was used on the original version of the application. This endCell, and the returned value of the function setDim.
way, the state of each JavaScript function is extracted at its Figure 4 shows a QU NIT test case for setDim function
entry and exit points. AbsF cSts.G ET F C E NTRIES in line 21 according to this mutation with the generated assertions.
retrieves the function’s entries from the abstracted function’s
states. G ET F C E XIT in line 22, and G ET M UT F C E XIT in line 23
retrieve the corresponding function’s exit state in the original D. Tool Implementation
and mutated application. This process is similar to the function We have implemented our JavaScript test and oracle gen-
state extraction algorithm explained in Section IV-B. eration approach in an automated tool called JS EFT. The tool
After the execution traces are collected for all the generated is written in Java and is publicly available for download [8].
mutants, we generate function-level test oracles by comparing Our implementation requires no browser modifications, and is
the execution trace of the original application with the traces hence portable. For JavaScript code interception, we use a web
we obtained from the modified versions (D IFF in line 24 in proxy, which enables us to automatically instrument JavaScript
Algorithm 2). If the D IFF operator detects no changes (line 25 code before it reaches the browser. The crawler for JS EFT ex-
of the algorithm), an equivalent mutant is detected, and thus tends and builds on top of the event-based crawler, C RAWLJAX
another mutant will be generated. [9], with random input generation enabled for form inputs.
As mentioned before, to mutate JavaScript code, we use our
Our function-level oracle generation targets postcondition recently developed mutation testing tool, M UTANDIS [22]. The
assertions. Such postcondition assertions can be used to exam- upper-bound for the number of mutations can be specified by
ine the expected behaviour of a given function after it is exe- the user. However, the default is 50 for code-level and 20 for
cuted in a unit test case. Our technique generates postcondition DOM-level mutations. We observed that these default numbers
assertions for all functions that exhibit a different exit-point provide a balanced trade-off between oracle generation time,
state but the same entry-point state, in the mutated execution and the fault finding capability of the tool. DOM-level test
traces. F cAsserti in line 27 contains all such post condition cases are generated in a JU NIT format that uses S ELENIUM
assertions. Due to the dynamic and asynchronous behaviour of (WebDriver) APIs to fire events on the application’s DOM
JavaScript applications, a function with the same entry state inside the browser. JavaScript function-level tests are generated
can exhibit different outputs when called multiple times. In in the QU NIT unit testing framework [19], capable of testing
this case, we need to combine assertions to make sure that the any generic JavaScript code.
generated test cases do not mistakenly fail. F cAssertsF cEntry
in line 28 contains the union of function assertions gener- V. E MPIRICAL E VALUATION
ated for the same entry but different outputs during multiple
executions. Let’s consider a function f with an entry state To quantitatively assess the efficacy of our test generation
entry in the original version of the application (A), with two approach, we have conducted an empirical study, in which we
different exit states exit1 and exit2 . If in the mutated version address the following research questions:
TABLE I. C HARACTERISTICS OF THE EXPERIMENTAL OBJECTS . TABLE II. R ESULTS SHOWING THE EFFECTS OF OUR FUNCTION
COVERAGE MAXIMIZATION , FUNCTION STATE ABSTRACTION , AND
ID Name LOC URL
MUTATION - BASED ORACLE GENERATION ALGORITHMS .
1 SameGame 206 crawljax.com/same-game/
2 Tunnel 334 arcade.christianmontoya.com/tunnel/ St. Coverage State Abstraction Oracles

#Func.States with abstraction


3 GhostBusters 282 10k.aneventapart.com/2/Uploads/657/

#Func.States w/o abstraction


4 Peg 509 www.cccontheweb.org/peggame.htm

#Assertions with mutation


Func.State Reduction (%)

#Assertions w/o mutation


Random exploration (%)
Fun. cov. maximize (%)
5 BunnyHunt 580 themaninblue.com/experiment/BunnyHunt/
6 AjaxTabs 592 github.com/amazingSurge/jquery-tabs/
7 NarrowDesign 1,005 http://www.narrowdesign.com
8 JointLondon 1,211 http://www.jointlondon.com
9 FractalViewer 1,245 onecm.com/projects/canopy/
10 SimpleCart 1,900 github.com/wojodesign/simplecart-js/

App ID
11 WymEditor 3,035 http://www.wymeditor.org
12 TuduList 1,963 http://tudu.ess.ch/tudu
13 TinyMCE 26,908 http://www.tinymce.com
1 99 80 447 33 93 5101 136
RQ1 How effective is JS EFT in generating test cases with 2 78 78 828 21 97 23212 81
3 90 66 422 14 96 3520 45
high coverage? 4 75 75 43 19 56 1232 109
RQ2 How capable is JS EFT of generating test oracles that 5 49 45 534 23 95 150 79
detect regression faults? 6 78 75 797 30 96 1648 125
RQ3 How does JS EFT compare to existing automated Java- 7 63 58 1653 54 97 198202 342
8 56 50 32 18 43 78 51
Script testing frameworks? 9 82 82 1509 49 97 65403 253
10 71 69 71 23 67 6584 96
JS EFT and all our experimental data in this paper are 11 56 54 1383 131 90 2530 318
available for download [8]. 12 41 38 1530 62 96 3521 184
13 51 47 1401 152 89 2481 335
AVG 68.4 62.8 - - 85.5 - -
A. Objects
used for oracle generation have been selectively generated (as
Our study includes thirteen JavaScript-based applications
discussed in Section IV-C), mutations used for the purpose of
in total. Table I presents each application’s ID, name, lines of
evaluation are randomly generated from the entire application.
custom JavaScript code (LOC, excluding JavaScript libraries)
Note that if the mutation used for the purpose of evaluation and
and resource. The first five are web-based games. AjaxTabs
the mutation used for generating oracles happen to be the same,
is a J Q UERY plugin for creating tabs. NarrowDesign and
we remove the mutant from the evaluation set. Next we run the
JointLondon are websites. FractalViewer is a fractal tree zoom
whole generated test suite (including both function-level and
application. SimpleCart is a shopping cart library, WymEditor
event-based test cases) on the faulty version of the application.
is a web-based HTML editor, TuduList is a web-based task
The fault is considered detected if an assertion generated by
management application, and TinyMCE is a JavaScript based
JS EFT fails and our manual examination confirms that the
WYSIWYG editor control. The applications range from 206
failed assertion is detecting the seeded fault. We measure the
to 27K lines of JavaScript code.
precision and recall as follows:
The experimental objects are open-source and cover dif-
ferent application types. All the applications are interactive in Precision is the rate of injected faults found by the tool that
nature and extensively use JavaScript on the client-side. are actual faults: TPTP
+FP
Recall is the rate of actual injected faults that the tool finds:
TP
B. Setup TP+FN

To address our research questions, we provide the URL where TP (true positives), FP (false positives), and FN (false
of each experimental object to JS EFT. Test cases are then negatives) respectively represent the number of faults that are
automatically generated by JS EFT. We give JS EFT 10 minutes correctly detected, falsely reported, and missed.
in total for each application. 5 minutes of the total time is Comparison (RQ3). To assess how JS EFT performs with re-
designated for the dynamic exploration step. spect to existing JavaScript test automation tools, we compare
Test Case Generation (RQ1). To measure client-side code its coverage and fault finding capability to that of A RTEMIS
coverage, we use JSC OVER [23], an open-source tool for mea- [2]. Similar to JS EFT, we give A RTEMIS 10 minutes in total
suring JavaScript code coverage. We report the average results for each application; we observed no improvements in the
over five runs to account for the non-determinism behaviour results obtained from running A RTEMIS for longer periods of
that stems from crawling the application. In addition, we assess time. We run A RTEMIS from the command line by setting
each step in our approach separately as follows: (1) compare the iteration option to 100 and enabling the coverage priority
the statement coverage achieved by our function coverage strategy, as described in [2]. Similarly, JSCover is used to
maximization with a method that chooses the next state/event measure the coverage of A RTEMIS (over 5 runs). We use
for the expansion uniformly at random, (2) assess the efficacy the output provided by A RTEMIS to determine if the seeded
of our function state abstraction method (Algorithm 1), and mutations are detected by the tool, by following the same
(3) evaluate the effectiveness of applying mutation techniques procedure as described above for JS EFT.
(Algorithm 2) to reduce the number of assertions generated.
C. Results
Test Oracles (RQ2). To evaluate the fault finding capability
of JS EFT (RQ2), we simulate web application faults by auto- Test Case Generation (RQ1). Figure 5 depicts the statement
matically seeding each application with 50 random faults. We coverage achieved by JS EFT for each application. The results
automatically pick a random program point and seed a fault show that the test cases generated by JS EFT achieve a coverage
at that point according to our fault category. While mutations of 68.4% on average, ranging from 41% (ID 12) up to 99% (ID
TABLE III. FAULT DETECTION .
JS EFT A RTEMIS

By func-level tests (%)


100

JSeft

# Injected Faults
Artemis

Precision (%)

Precision (%)
Recall (%)

Recall (%)
80

App ID

#FN

#TP
#FP
1 50 0 0 50 100 100 30 100 20
60

2 50 9 0 41 100 82 73 100 12
Coverage (%)

3 50 4 0 46 100 92 17 100 8
4 50 15 0 35 100 70 28 100 22
5 50 26 0 24 100 48 25 100 0
6 50 9 0 41 100 82 15 100 16
40

7 50 17 0 33 100 66 24 100 0
8 50 23 0 27 100 54 26 100 0
9 50 6 0 44 100 88 41 100 24
10 50 16 0 34 100 68 65 100 8
20

11 50 21 0 29 100 58 27 100 6
12 50 26 0 24 100 48 17 100 22
13 50 23 0 27 100 54 26 100 28
AVG - 15 0 35 100 70 32 100 12.8
0

on the fault finding capabilities of JS EFT. The table shows the


1 2 3 4 5 6 7 8 9 10 11 12 13
total number of injected faults, the number of false negatives,
Experimental Objects false positives, true positives, and the precision and recall of
JS EFT.
Fig. 5. Statement coverage achieved.
1). We investigated why JS EFT has low coverage for some of JS EFT achieves 100% precision, meaning that all the de-
the applications. For instance, we observed that in JointLondon tected faults reported by JS EFT are real faults. In other words,
(ID 7), the application contains JavaScript functions that are there are no false-positives. This is because the assertions
browser/device specific, i.e., they are exclusively executed in generated by JS EFT are all stable i.e., they do not change
Internet Explorer, or iDevices. As a result, we are unable to from one run to another. However, the recall of JS EFT is
cover them using JS EFT. We also noticed that some applica- 70% on average, and ranges from 48 to 100%. This is due
tions required more time to achieve higher statement coverage to false negatives, i.e., missed faults by JS EFT, which occur
(e.g., in NarrowDesign ID 8), or they have a large DOM state when the injected fault falls is either in the uncovered region
space (e.g., BunnyHunt ID 5) and hence JS EFT is only able of the application, or is not properly captured by the generated
to cover a portion of these applications in the limited time it oracles.
had available.
The table also shows that on average 32% percent of the
Table II columns under “St. Coverage” present JavaScript injected faults (ranges from 15–73%) are detected by function-
statement coverage achieved by our function coverage maxi- level test cases, but not by our DOM event-based test cases.
mization algorithm versus a random strategy. The results show This shows that a considerable number of faults do not propa-
a 9% improvement on average, for our algorithm, across all gate to observable DOM states, and thus cannot be captured by
the applications. We observed that our technique achieves DOM-level event-based tests. For example in the SimpleCart
the highest improvement when there are many dynamically application (ID 10), if we mutate the mathematical operation
generated clickable DOM elements in the application, for that is responsible for computing the total amount of purchased
example, GhostBusters (ID 3). items, the resulting error is not captured by event-based tests
as the fault involves internal computations only. However, the
The columns under “State Abstract” in Table II present
fault is detected by a function-level test that directly checks the
the number of function states before and after applying our
returned value of the function. This points to the importance
function state abstraction algorithm. The results show that
of incorporating function-level tests in addition to event-based
the abstraction strategy reduces function states by 85.5% on
tests for JavaScript web applications. We also observed that
average. NarrowDesign (ID 7) and FractalViewer (ID 9) benefit
even when an event-based test case detects a JavaScript fault,
the most by a 97% state reduction rate. Note that despite
localizing the error to the corresponding JavaScript code can
this huge reduction, our state abstraction does not adversely
be quite challenging. However, function-level tests pinpoint
influence the coverage as we include at least one function
the corresponding function when an assertion fails, making it
state from each of the covered branch sets as described in
easier to localize the fault.
Section IV-B.
The last two columns of Table II, under “Oracles”, present Comparison (RQ3). Figure 5 shows the code coverage
the number of assertions obtained by capturing the whole ap- achieved by both JS EFT and A RTEMIS on the experimental
plication’s state, without any mutations, and with our mutation- objects running for the same amount of time, i.e., 10 minutes.
based oracle generation algorithm respectively. The results The test cases generated by JS EFT achieve 68.4% coverage
show that the number of assertions is decreased by 86.5% on on average (ranging from 41–99%), while those generated by
average due to our algorithm. We observe the most significant A RTEMIS achieve only 44.8% coverage on average (ranging
reduction of assertions for JointLondon (ID 7) from more than from 0–92%). Overall, the test cases generated by JS EFT
198000 to 342. achieve 53% more coverage than A RTEMIS, which points to
the effectiveness of JS EFT in generating high coverage test
Fault finding capability (RQ2). Table III presents the results cases. Further, as can be seen in the bar plot of Figure 5,
for all the applications, the test cases generated by JS EFT [2] S. Artzi, J. Dolby, S. Jensen, A. Møller, and F. Tip, “A framework
achieve higher coverage than those generated by A RTEMIS. for automated testing of JavaScript web applications,” in Proc. 33rd
International Conference on Software Engineering (ICSE’11), 2011, pp.
This increase was more than 226% in the case of Bunnyhunt 571–580.
(ID 5). For two of the applications, NarrowDesign (ID 7) and [3] A. Marchetto and P. Tonella, “Using search-based algorithms for
JointLondon (id 8), A RTEMIS was not able to complete the Ajax event sequence generation during testing,” Empirical Software
testing task within the allocated time of ten minutes. Thus Engineering, vol. 16, no. 1, pp. 103–140, 2011.
we let A RTEMIS run for an additional 10 minutes for these [4] A. Marchetto, P. Tonella, and F. Ricca, “State-based testing of Ajax web
applications (i.e., 20 minutes in total). Even then, neither applications,” in Proc. 1st Int. Conference on Sw. Testing Verification
and Validation (ICST’08). IEEE Computer Society, 2008, pp. 121–130.
application completes under A RTEMIS. [5] A. Mesbah, A. van Deursen, and D. Roest, “Invariant-based automatic
Table III shows the precision and recall achieved by JS EFT testing of modern web applications,” IEEE Transactions on Software
Engineering (TSE), vol. 38, no. 1, pp. 35–53, 2012.
and A RTEMIS. With respect to fault finding capability, unlike
[6] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song,
A RTEMIS that detects only generic faults such as runtime “A symbolic execution framework for JavaScript,” in Proc. Symp. on
exceptions and W3C HTML validation errors, JS EFT is able to Security and Privacy (SP’10). IEEE Computer Society, 2010, pp.
accurately distinguish faults at the code-level and DOM-level 513–528.
through the test oracles it generates. Both tools achieve 100% [7] S. Mirshokraie, A. Mesbah, and K. Pattabiraman, “Pythia: Generating
precision, however, JS EFT achieves five-fold higher recall test cases with oracles for JavaScript applications,” in Proc. Inter-
national Conference on Automated Software Engineering (ASE), New
(70% on average) compared with A RTEMIS (12.8% recall on Ideas Track. IEEE Computer Society, 2013, pp. 610–615.
average). [8] “JSeft,” http://salt.ece.ubc.ca/software/jseft/.
[9] A. Mesbah, A. van Deursen, and S. Lenselink, “Crawling Ajax-based
D. Threats to Validity web applications through dynamic analysis of user interface state
changes,” ACM Transactions on the Web (TWEB), vol. 6, no. 1, pp.
An external threat to the validity of our results is the 3:1–3:30, 2012.
limited number of web applications that we use to evaluate [10] S. Mirshokraie and A. Mesbah, “JSART: JavaScript assertion-based re-
our approach. We mitigated this threat by using JavaScript gression testing,” in Proc. International Conference on Web Engineering
(ICWE’12). Springer, 2012, pp. 238–252.
applications that cover various application types. Another
[11] K. Sen, S. Kalasapur, T., and S. Gibbs, “Jalangi: A selective record-
threat is that we validate the failed assertions through manual replay and dynamic analysis framework for JavaScript,” in Proc.
inspection, which can be error-prone. To mitigate this threat, European Software Engineering Conference and ACM SIGSOFT In-
we carefully inspected the code in which the assertion failed ternational Symposium on Foundations of Software Engineering (ES-
to make sure that the injected fault was indeed responsible EC/FSE’013). ACM, 2013.
for the assertion failure. Regarding the reproducibility of our [12] C. Jensen, A. Møller, and Z. Su, “Server interface descriptions for
automated testing of JavaScript web applications,” in Proc. European
results, JS EFT and all the applications used in this study are Software Engineering Conference and ACM SIGSOFT International
publicly available, thus making the study replicable. Symposium on Foundations of Software Engineering (ESEC/FSE’013).
ACM, 2013.
VI. C ONCLUSIONS [13] G. Fraser and A. Zeller, “Mutation-driven generation of unit tests and
oracles,” IEEE Transactions on Software Engineering (TSE), vol. 38,
In this paper, we presented a technique to automatically no. 2, pp. 278–292, 2012.
generate test cases for JavaScript applications at two comple- [14] G. Fraser and A. Zeller, “Generating parameterized unit tests,” in Proc.
International Symposium on Software Testing and Analysis (ISSTA’11),
mentary levels: (1) individual JavaScript functions, (2) event 2011, pp. 364–374.
sequences. Our technique is based on algorithms to maximize [15] K. Taneja and T. Xie, “DiffGen: Automated regression unit-test gen-
function coverage and minimize function states needed for eration,” in Proc. International Conference on Automated Software
efficient test generation. We also proposed a method for Engineering (ASE’08). IEEE Computer Society, 2008, pp. 407–410.
effectively generating test oracles along with the test cases, [16] S. Elbaum, H. N. Chin, M. B. Dwyer, and M. Jorde, “Carving and
for detecting faults in JavaScript code as well as on the DOM replaying differential unit test cases from system test cases,” IEEE
Transactions on Software Engineering (TSE), vol. 35, no. 1, pp. 29–
tree. We implemented our approach in an open-source tool 45, 2009.
called JS EFT. We empirically evaluated JS EFT on 13 web [17] M. Staats, G. Gay, and M. Heimdahl, “Automated oracle creation
applications. The results show that the generated tests by support, or: How I learned to stop worrying about fault propagation and
JS EFT achieve high coverage (68.4% on average), and that love mutation testing,” in Proc. International Conference on Software
Engineering (ICSE’11), 2011, pp. 870–880.
the injected faults can be detected with a high accuracy rate
(recall 70%, precision 100%). [18] “jQuery API,” http://api.jquery.com.
[19] “Qunit,” http://qunitjs.com.
[20] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson, Introduction
ACKNOWLEDGMENT to Algorithms, 2nd ed. McGraw-Hill Higher Education, 2001.
This work was supported in part by a Strategic Project [21] R. DeMillo, R. Lipton, and F. Sayward, “Hints on test data selection:
Grant (SPG) from the Natural Science and Engineering Re- Help for the practicing programmer,” Computer, vol. 11, no. 4, pp.
34–41, 1978.
search Council of Canada (NSERC), and a research gift [22] S. Mirshokraie, A. Mesbah, and K. Pattabiraman, “Efficient JavaScript
from Intel Corporation. We thank the anonymous reviewers mutation testing,” in Proc. 6th International Conference on Software
of ICST’15 for their insightful comments. Testing Verification and Validation (ICST’13). IEEE Computer Society,
2013, pp. 74–83.
R EFERENCES [23] “JSCover,” http://tntim96.github.io/JSCover/.
[1] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah, “An empirical
study of client-side JavaScript bugs,” in Proc. ACM/IEEE Interna-
tional Symposium on Empirical Software Engineering and Measurement
(ESEM’13). IEEE Computer Society, 2013, pp. 55–64.

You might also like