0% found this document useful (0 votes)
6 views11 pages

Using Multi-Locators To Increase The Robustness of Web Test Cases

The document discusses the fragility of web test cases due to the instability of web element locators when the DOM changes. It introduces a new locator type called 'multi-locator' that selects the most robust locator from a set generated by various algorithms through a voting mechanism. Experimental results indicate that multi-locators significantly reduce the number of broken locators and have minimal execution overhead compared to single locators.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

Using Multi-Locators To Increase The Robustness of Web Test Cases

The document discusses the fragility of web test cases due to the instability of web element locators when the DOM changes. It introduces a new locator type called 'multi-locator' that selects the most robust locator from a set generated by various algorithms through a voting mechanism. Experimental results indicate that multi-locators significantly reduce the number of broken locators and have minimal execution overhead compared to single locators.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Using Multi-Locators to Increase the Robustness of

Web Test Cases


Maurizio Leotta, Andrea Stocco, Filippo Ricca, Paolo Tonella

Abstract:
The main reason for the fragility of web test cases is the inability of web element locators to
work correctly when the web page DOM evolves. Web elements locators are used in web test
cases to identify all the GUI objects to operate upon and eventually to retrieve web page
content that is compared against some oracle in order to decide whether the test case has
passed or not. Hence, web element locators play an extremely important role in web testing
and when a web element locator gets broken developers have to spend substantial time and
effort to repair it.
While algorithms exist to produce robust web element locators to be used in web test scripts,
no algorithm is perfect and different algorithms are exposed to different fragilities when the
software evolves. Based on such observation, we propose a new type of locator, named multi-
locator, which selects the best locator among a candidate set of locators produced by different
algorithms. Such selection is based on a voting procedure that assigns different voting
weights to different locator generation algorithms. Experimental results obtained on six web
applications, for which a subsequent release was available, show that the multi-locator is
more robust than the single locators (about –30% of broken locators w.r.t. the most robust
kind of single locator) and that the execution overhead required by the multiple queries done
with different locators is negligible (2-3% at most).

Digital Object Identifier (DOI):

http://dx.doi.org/10.1109/ICST.2015.7102611

Copyright:

© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all
other uses, in any current or future media, including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
Using Multi-Locators to Increase the Robustness of
Web Test Cases

Maurizio Leotta1 , Andrea Stocco1 , Filippo Ricca1 , Paolo Tonella2


1
Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), Università di Genova, Italy
2
Fondazione Bruno Kessler, Trento, Italy
[email protected], [email protected], [email protected], [email protected]

Abstract—The main reason for the fragility of web test cases to their rich APIs, testers can easily implement the test cases
is the inability of web element locators to work correctly when by invoking commands that operate on web elements localised
the web page DOM evolves. Web elements locators are used in by means of DOM properties (attributes, textual information,
web test cases to identify all the GUI objects to operate upon and XPaths, etc). The choice of appropriate web element locators
eventually to retrieve web page content that is compared against is fundamental, because it impacts enormously the test script
some oracle in order to decide whether the test case has passed
or not. Hence, web element locators play an extremely important
resilience to change (i.e., their robustness) when the application
role in web testing and when a web element locator gets broken evolves. Previous works [8], [9] show that often minor changes
developers have to spend substantial time and effort to repair it. between releases, resulting in changes in the DOM structure,
While algorithms exist to produce robust web element locators are responsible for most of the cases when test cases are broken
to be used in web test scripts, no algorithm is perfect and different and cannot be executed any more. The manual effort to repair
algorithms are exposed to different fragilities when the software such test scripts is tedious, time-consuming and intellectually
evolves. Based on such observation, we propose a new type of frustrating, so that often existing test suites are abandoned,
locator, named multi-locator, which selects the best locator among despite their potential value for catching regressions.
a candidate set of locators produced by different algorithms. Such For this reason, in the literature several researchers [11],
selection is based on a voting procedure that assigns different
voting weights to different locator generation algorithms. Ex-
[16] have attacked the test script fragility problem by proposing
perimental results obtained on six web applications, for which algorithms able to compute locators that are resilient to the
a subsequent release was available, show that the multi-locator evolution of the software. These robust locators, based on
is more robust than the single locators (about –30% of broken the XPath language, have been shown to be more resilient to
locators w.r.t. the most robust kind of single locator) and that the changes than those available from state of the practice tools,
execution overhead required by the multiple queries done with such as for example FirePath [11]. When web pages change
different locators is negligible (2-3% at most). because of a new release of the web application, they continue
Keywords—Web Testing, Testware Evolution, Test Case Robust- to select the target web element correctly.
ness, Web Element Locators, XPath Locators. While experimenting with different locator generation al-
gorithms, we have noticed that locators are resilient to different
types of changes and that they tend to be fragile individually,
I. I NTRODUCTION not collectively. Even locators produced by algorithms with
The cost of software testing is impressively high and the highest robustness performance can occasionally be broken
estimated between 40% and 80% of the total development (i.e., they do not locate the target web element correctly in the
cost [1]. Test automation plays a key role in reducing such new DOM) by specific code changes, while other locators,
cost [5]. Software testers implement the testing logics by based on different web page properties, may remain valid. In
writing scripts that provide input data, set the values of GUI other words, different locators, built by different algorithms,
components, operate on such components by changing their tend to be complementary with each other. The idea of this
state and retrieve information to be compared with oracles, to work is to compensate for one locator’s fragility by resorting
determine if the program behaves correctly. Automated testing to the capabilities of another locator.
tools, that run such scripts, interact with the application in In this paper, we overcome the potential weaknesses of a
a similar way as the real user does. Reuse of testing logics single locator by means of a novel locator type, which we
for the same functionalities across successive releases (i.e., call multi-locator and which is capable of aggregating the
for regression purposes) is the main benefit of adopting a test results produced by a set of different locators (generated by
automation technique. Unfortunately, new releases of the appli- different algorithms) into a single web element localisation,
cation with modified GUIs can easily break the corresponding the most voted one. In the test cases, we replace each single
test scripts, hindering the benefits of test automation [2]. This locator with a multi-locator, i.e., a set of locators all select-
problem is particularly dramatic in the context of the web ing the same web element and all automatically generated
applications, because these are subject to a tremendous pres- by different algorithms. When the web application evolves,
sure for change [19]. New releases are continuously produced, some locators in this set become broken while others may
often accommodating just style improvements or presentation work correctly and return the right element. By applying a
changes. voting decision procedure, the multi-locator will select the web
Typically, DOM-based web test automation tools (e.g., element receiving the highest number of weighted votes from
Selenium toolkit) are used to test a web application. Thanks all locators. We expect that the multi-locator will be more
robust than each individual locator taken in isolation. On the – XPath locators are sometimes the only option. Sele-
other hand, the advantage of adopting the multi-locator is also nium WebDriver offers different localisation methods1 beyond
in its automatic repair capability: when the multi-locator is XPath, mostly tailored on specific DOM attributes and prop-
able to locate the desired web element, the broken locators erties (e.g., id or text). A tester may prefer the use of these
belonging to the set and generated by the various algorithms methods, instead of generating/writing XPath expressions that
can be automatically repaired. This accounts just for re-running may be perceived as more complex. However, sometimes these
the locator creation algorithms with the web element returned methods cannot be employed, since no unique attribute value
by the multi-locator as target. In this way, the test scripts are or textual information are available to uniquely identify the
continuously evolved by the automated repair procedure, hence web element of interest. In such cases, the only way to get
better accommodating the future changes occurring in the next a locator is by specifying a navigational path on the DOM
software releases. tree. As an example, in our previous work [9], we considered
The paper is organised as follows: Section II introduces the six Selenium WebDriver test suites and we were forced to use
problems associated with web testware evolution and shows XPath locators (i.e., no specific localisation methods can be
how locators are generated by state of the art algorithms. used) for about one-third of the considered web elements.
Section III describes the multi-locator, our novel contribution.
Empirical results about the robustness and execution time over- B. Algorithms/Tools for Generating XPath Locators
head of the multi-locator as compared to single locators are
We populate our multi-locator with the most used and
reported in Section IV, followed by related works (Section V)
most promising XPath locators (to the best of our knowledge),
and conclusions (Section VI).
generated by state of the practice tools and by state of the art
research algorithms. In particular we considered:
II. BACKGROUND FirePath Absolute: FirePath2 is a browser-integrated plugin
for XPath expressions generation. For each web element, it is
Software testers are required to execute manual repair
able to generate a corresponding absolute XPath locator. An
actions on the test cases, whenever these are affected by the
absolute XPath consists of the full navigational path from the
changes that have been performed on the web application
root of the DOM (i.e., the html tag) to the target web element.
under test (WAUT). For the sake of simplicity, the changes
Only when strictly necessary, element position values are used
to the WAUT can be categorised into two families: logical and
to select the correct node among a set of siblings.
structural [9], [10]. A logical change involves the modification
of the web application logics for the introduction of new FirePath Relative ID-based: When a unique value for the id
features or the modification of existing features. On the tester attribute exists for the target element or one of its ancestors,
side, this means for example creating new test cases, or FirePath can also generate a relative ID-based XPath locator.
modifying the existing ones. A structural change, instead, Otherwise an absolute XPath is returned. In case a unique
impacts the web page structure, modified to beautify the web value for the id attribute exists, the XPath locator starts by
page appearance or to reorganise its content (e.g., switching selecting the node (closest to the target) that contains id and
from a table-based to a table-less layout). In the test suite, the then navigates the remaining portion of the DOM to the target
tester has to modify one or more test script lines containing element.
locators that are affected by the structural changes. Selenium IDE3 is a capture/replay tool for quick development
of web test cases. During the test case recording phase, it is
In this paper, we focus on reducing the web test suite
able to generate locators for the web page elements on which
maintenance effort due to structural changes, since such effort
the tester is performing actions. Selenium IDE contains an
is heavily affected by the fragility of the locators. On the
advanced XPath locators generator algorithm4 that generates
other hand, logical changes require manual interventions on
locators using different strategies and that ranks them depend-
the test suite that go beyond the creation of robust locators.
ing on an internal robustness heuristic estimate.
Structural changes are indeed quite important, since web site
Montoto et al. [16] proposed an algorithm for identifying
re-styling, a frequently occurring activity, tends to affect the
the target elements during the navigation of AJAX websites.
DOM structure, leaving the application logics unaffected. In
The algorithm starts from a simple XPath expression, progres-
the following, we assume that XPath locators are used to
sively augmented with textual and attribute information. The
retrieve the web elements required by the test cases (form
algorithm first tries to identify the element according to its
fields, buttons, check boxes, textual output, etc.) and that the
associated text (if the element is a leaf node) and the value of
tester’s effort can be reduced when structural changes occur,
its attributes. If the XPath produced does not uniquely identify
by making such locators robust.
the element, every ancestor (and the value of their attributes)
A. XPath Locators is considered until the root of the DOM is reached.
ROBULA + is an extension of our previous algorithm ROBULA
Considering only XPath locators is by no means restrictive, (ROBUst Locator Algorithm) [11]. Basically, ROBULA starts
for the following reasons: with a generic XPath expression that returns all nodes (“//*”). It
– XPath is a powerful and expressive language. If properly then iteratively refines the expression until only the element of
generated, XPath locators can be highly expressive and com- interest is selected. In such iterative refinement, the algorithm
pact. To the best of our knowledge, most of the localisation 1 http://docs.seleniumhq.org/docs/02_selenium_ide.jsp#locating-elements
methods provided by DOM-based tools can be easily rewritten 2 https://addons.mozilla.org/firefox/addon/firepath/
as an XPath locator with no substantial impact on its under- 3 http://seleniumhq.org/projects/ide/
standability. For example, the Selenium WebDriver locator: 4 https://code.google.com/p/selenium/source/browse/ide/main/src/content/loca-
By.name(“xy”) is equivalent to By.xpath(“//*[@name=‘xy’]”). torBuilders.js
Name: John Name: John
Surname: Doe Target Element Surname: Doe
Mobile: 123456789 Gender: Male Target Element
Phone: 123456789
<html>
<body> <html>
<table id="userInfo"> <body>
<tr><td>Name: </td><td title ="name"> John</td></tr> <table id="userInfo">
<tr><td>Surname:</td><td title ="surname"> Doe</td></tr> <tr><td>Name: </td><td title ="name"> John</td></tr>
<tr><td>Mobile: </td><td title ="mobile"> 123456789</td></tr> <tr><td>Surname:</td><td title ="surname"> Doe</td></tr>
</table> <tr><td>Gender: </td><td title ="gender"> Male</td></tr>
</body> <tr><td>Phone: </td><td title ="mobile"> 123456789</td></tr>
</html> </table>
</body>
Tool Kind Generated XPath Locators for the Target Element </html>

FirePath Abs abs /html/body/table/tr[3]/td[2]


Tool XPath Locators Robustness 3 robust 2 broken
FirePath Rel rel //*[@id="userInfo"]/tr[3]/td[2]
Selenium IDE rel //table[@id="userInfo"]/tr[3]/td[2]
FirePath Abs 2 /html/body/table/tr[3]/td[2]
Montoto rel //td[text()="123456789"]
FirePath Rel 2 //*[@id="userInfo"]/tr[3]/td[2]
ROBULA+ rel //*[contains(text(),'123456789')]
Selenium IDE 2 //table[@id="userInfo"]/tr[3]/td[2]
Fig. 1. showInfo.php – Ver. 1 – Page, Source, Locators Montoto 3 //td[text()="123456789"]
ROBULA+ 3 //*[contains(text(),'123456789')]

applies four refinement transformations, according to a set of Fig. 2. showInfo.php – Ver. 2 – Page, Source, Locators
heuristic XPath specialisation steps [11]. ROBULA has been
developed in order to create very short and simple XPath ex- We now consider a new version of the web application
pressions, with the goal to increase their resilience to changes. (Ver. 2), in which a new text box is present, allowing the user
ROBULA + enhances ROBULA with: (i) a prioritisation strategy, to insert gender information (see Fig. 2 (top)). Depending on
to rank candidate XPath expressions by heuristically estimated the robustness of the XPath locator used to select the target
attribute robustness, when multiple attributes are available; element, the test case described above will be broken (and will
(ii) a blacklisting technique, to exclude attributes that are have to be repaired) or will work without problems. Looking at
intrinsically fragile; and (iii) textual information, potentially Fig. 2 (bottom), we can see that only the locators generated by
a reliable anchor when the web application evolves [18]. A ROBULA + and Montoto work, while all the other locators are
technical report describing ROBULA + is available on our web broken. Indeed, all of them include node tr[3] that in the new
site: http://sepl.dibris.unige.it/TR/R OBULA +.pdf. version becomes tr[4]. Hence, they locate the wrong element
The outputs of the five algorithms considered in our (i.e., the “gender” field).
work are usually different, as depicted in the example in
Fig. 1. Focusing only on the two algorithms proposed by III. T HE M ULTI -L OCATOR A PPROACH
the research community, Montoto and ROBULA +, they both
adopt a top-down approach in the construction of the XPaths. In this section we describe the multi-locator approach,
However, remarkable differences exist thus, the XPath ex- which selects a web element using a candidate set of lo-
pressions generated by ROBULA + are usually very different cators performing a vote decision procedure (weighted or
from the ones generated by Montoto. For instance, to localise unweighted). We also show that the multi-locator can be
the target div element in the web page used as example employed to automatically repair the broken locators in the
in the Montoto et al. paper [16], their algorithm generates candidate set, so as produce a better candidate set of locators,
//td/a[@href=“#”]/div[@class=“c1” and text()=“More Info”] while to be used by the multi-locator on the successive versions of
ROBULA + generates the following simpler XPath expression the application.
//td/*/div.
A. Multi-locator Definition
C. XPath Locators and Software Evolution: an Example Let us assume that a candidate set L (with |L| > 1)
Let us consider Ver. 1 of a simplified web application com- of alternative locators can be obtained to extract the web
posed of two web pages — insertInfo.php and showInfo.php element e from the DOM D. Such alternative locators can
— that allow users to insert and visualise some personal be generated in different ways: by alternative algorithms or
information previously stored in a database. A test case for tools that help web testers to produce robust locators for
this functionality may open the insertInfo.php page, fill a form, their test cases, manually, or they can be defined according
submit the information and verify that the inserted data are to simple rules (e.g., use the absolute XPath or the web
correctly displayed in the resulting showInfo.php page, shown element identifier/name). When they are initially defined, all
in Fig. 1 (top). such locators select element e uniquely:
For the test case implementation, it is necessary to locate ∀l ∈ L : query(l, D) = {e} (1)
some web page elements as, for instance, the field of the
table showing the mobile phone number (see the underlined i.e., all XPath queries using such locators return a result set
td in Fig. 1 (center)). Fig. 1 (bottom) lists the XPath locators containing exactly one entity, element e.
provided by the algorithms considered in this work. With the When the web application evolves, some locators in L may
exception of the absolute (abs) XPath locator generated by become unusable because they return more than one element
FirePath, the others are relative XPaths. Different XPath gen- or no element in the new DOM D′ . Among those that return
eration strategies are adopted resulting in different expressions. a single web element (i.e., |query(l, D′ )| = 1), there might
be disagreement. The idea of the multi-locator is to establish weights; (2) learned weights; (3) heuristic weights. Uniform
a voting procedure that involves all locators still returning weights are obtained by trivially assigning the same weight
exactly one element from the new DOM D′ . The multi-locator (e.g., 0.5) to each method used to generate the locators in
will select the web element receiving the highest weighted L, hence, to each locator. Learned weights are obtained by
vote from all locators that uniquely select a single element training them on a corpus of web applications for which
in the DOM D′ . Since different locators may have different successive versions are available. Weights can be optimised so
“reputations” (e.g., the absolute XPath locator is known to as to minimise the number of broken locators that is measured
be quite fragile [8], [11]), it makes sense to assign different when the multi-locator algorithm is applied to the next versions
weights to the voters. The web element returned by the multi- of the web applications in the corpus. Otherwise, a simpler
locator will be the one with the highest weighted vote. We method consists of measuring the robustness of the locators
decided to compute the aggregate vote v for the element e′ (number of non-broken locators) on the corpus and using such
using the following formula: measurement as the weight for the algorithm that generated
such locators. We implemented the latter method. As usual
v[e′ ] = 1 − Πl∈Le′ (1 − nw[l]) (2) with training, the training corpus must be different from the
where Le′ is the set of locators returning uniquely the element web applications on which the multi-locator performance is
e′ in the new DOM D′ , while nw[l] is the normalised weight assessed. This can be achieved, for instance, through the cross-
assigned to locator l. Weights are normalized between 0 and 1, validation (aka, leave-one-out) procedure. Heuristic weights
and each of them is interpreted as the locator reliability, i.e., are produced manually, based on a-priori knowledge about the
probability of correct localisation. A reliable locator, generated expected fragility of the locators created by the various locator
by an algorithm with high robustness performance, will have generation methods [11].
a high weight (i.e., close to 1). Such interpretation justifies In Fig. 1 (bottom), there are five different XPath locators
formula (2): the result of the formula gives the probability generated using different algorithms. When evaluated on the
that the aggregate vote localises correctly element e′ . new version of the web page (see Fig. 2 (bottom)) three of
them are broken, while two select the correct target element
Algorithm 1: Multi-locator DOM Selection
(i.e., the phone number field). It should be noticed that in
Input: the new version of the web page all three broken locators
D ′ : DOM of the evolved web application.
L: set of locators selecting uniquely web element e in the initial select the same web element, i.e., the gender field. Thus, in
DOM D. The information about the algorithms that generated this case, if the unweighted (i.e., uniformly weighted, with
each locator l ∈ L is stored in an auxiliary datastructure weight=0.5 for all the considered algorithms) version of the
Result: multi-locator is adopted, the result will select the wrong ele-
e′ : a web element from DOM D ′ or null if no web element can be ment. In fact, the gender field obtains three votes (correspond-
located ing to v[gender]=0.875, see equation (2)) while the phone
1 begin number field only two (corresponding to v[phone]=0.75). On
2 L′c := {l ∈ L : |query(l, D ′ )| = 1}
3 // candidate locators
the other hand, using a weighted version of the multi-locator,
4 E ′ := {e′ ∈ D ′ : e′ ∈ query(l′ , D ′ ), l′ ∈ L′c } depending on the weights assigned, it is possible to select
5 // candidate target web elements the correct target element. For instance, we can assign the
6 if |E ′ | = 0 then return null following weights based on a-priori knowledge about the
7
expected robustness of the locators created by the various
8 foreach e′ ∈ E ′ do v[e′ ] := 1
9
locator generation methods: (1) for absolute XPaths, weight
10 foreach l′ ∈ L′c do is 0.25, since they are known to be quite fragile; (2) for
11 e′ := elementOf(query(l′ , D ′ )) locators obtained by tools for ID-based DOM navigation,
12 // elementOf returns the unique element of the set weight is 0.50, since they are probably more robust than
13 v[e′ ] := v[e′ ] · (1 − nw(l′ ))
14 // nw returns the weight, normalised between 0 and 1,
absolute XPaths, but they are not designed specifically to make
relative to the algorithm used to generate l′ test cases resilient to the evolution of the web application
15 foreach e′ ∈ E ′ do v[e′ ] := 1 − v[e′ ]
(e.g., FirePath Relative ID-based); (3) for locators obtained by
16 algorithms specifically designed for producing robust locators
17 return e′ ∈ E ′ : v[e′ ] = maxk∈E ′ v[k] for web testing, weight is 0.90 (e.g., Selenium IDE, Montoto
and ROBULA +). Using these weights the result is different.
Indeed, the gender field obtains the lowest weighted vote
The pseudocode for the multi-locator procedure is shown
(corresponding to v[gender]=0.9625) while the phone number
in Algorithm 1. At lines 2-4, candidate web elements are
field the highest (corresponding to v[phone]=0.99). Thus, this
determined as the results of all XPath queries that return
version of the weighted multi-locator adopting “knowledge
one element. Formula (2) is implemented from line 8 to line
based” weights is able to select the correct element.
15. In particular, the loop at line 10 attributes the weight
of each locator l′ to the selected web element e′ according When the web application evolves, there is a theoretical
to formula (2). At line 17, the multi-locator returns the web limit to the capabilities of any multi-locator, defined as an
element that was attributed the highest vote. In case of parity, arbitrary procedure to select among the XPaths of the set L of
a randomly selected element among those with highest weight locators. In fact, if all locators in L are broken when used to
is chosen. query the evolved DOM D′ , the multi-locator has no chance
An important component of Algorithm 1 is the voting of being able to select the right element, since any selection
weight assigned to each locator (loop at line 10). We sug- of a locator l ∈ L will result in a broken locator. This sets an
gest three strategies to determine such weights: (1) uniform upper bound to the robustness achievable by any multi-locator.
B. Automated Multi-locator Repair related to the correctness of the multi-locator, when it returns
a non-null element. One of the research questions investigated
When the multi-locator procedure succeeds, by selecting in our empirical evaluation deals with the correctness of the
the most voted locator and returning the web element identified automated repair actions.
by such locator, it is possible to automatically repair all other
broken locators. In fact, the algorithms that have been used
to produce the locator set L for the initial DOM D can be
re-executed on the new DOM D′ to locate the web element e′ C. Setting up the Multi-locator
returned by the multi-locator. Each algorithm whose locator is For setting-up the multi-locator it is necessary to execute
considered broken on D′ by the multi-locator is re-executed two steps: (1) generating the set L (i.e., a list of XPath
with e′ (e.g., the element selected by the multi-locator) as locators) for each target element employed in the original test
target. The new locators that these algorithms produce replace suite (i.e., located by a single locator) and (2) defining the
the locators considered broken. When the web application set of weights in case the weighted multi-locator is applied.
evolves to a new version, the multi-locator will be able to use Step (1) can be completed by resorting to aspect oriented
the automatically repaired locators instead of the original ones, programming: during the execution of the test suite an aspect
hence further increasing its chances of robustly identifying the will intercept each locator invocation on the current page DOM
web element used by the test cases. D, that selects the target element e, and will generate the
corresponding set L by using the XPath generation algorithms
Algorithm 2: Multi-locator Repair implementations (i.e., L={FirePathAbs(D, e), FirePathID(D, e),
Input: Montoto(D, e), SeleniumIDE(D, e), RobulaPlus(D, e)}). Step (2)
D ′ : DOM of the evolved web application. can be completed by estimating the robustness of the various
e′ : the web element in the evolved DOM D ′ selected by
multi-locator. kinds of XPath locators on the considered web application.
L: set of locators selecting uniquely web element e in the initial Alternatively, heuristic weights can be used. Indeed, in Sec-
DOM D. The information about the algorithm that generated each tion IV-E, we show that for a practical adoption of the multi-
locator l ∈ L is stored in an auxiliary datastructure. locator it is not even necessary to execute the cross-validation
Result: procedure to obtain the weights.
L′ : the repaired set of locators selecting web element e′ from D ′
1 begin
2 L′ := ∅
3 foreach l ∈ L do D. Deploying the Multi-locator
4 if query(l, D ′ ) = {e′ } then
5 L′ := L′ ∪ {l} For deploying the multi-locator the test code has to be
6 else changed so as to replace single locator invocations on the DOM
7 nl := createNewLocator(algoForLocator(l), D ′ , e′ ) D (e.g., By.xpath(xp,D)), with invocations to the multi-locator,
8 // algoForLocator uses information from the which requires a set of XPaths L={xp1,...,xp5} instead of a
auxiliary datastructure associated with L single XPath (e.g., By.multiLocator({xp1,...,xp5}, D)) and a list
9 L′ := L′ ∪ {nl}
of weights in case non-uniform weights are to be applied (e.g.,
10 return L′ By.multiLocator({xp1,...,xp5}, {w1,...,w5}, D)). This can be done
automatically by means of code transformations. In this way,
the multi-locator can be used to locate the target element in
Algorithm 2 shows the pseudo-code of the automated repair the evolved DOM D′ using the previously computed set L. We
method. The input web element e′ is uniquely identified by are currently developing a tool able to automatically migrate
the Algorithm 1. The repaired set of locators L′ is constructed existing test cases to the multi-locator approach.
iteratively at lines 3-9, by just keeping unmodified the locators
that uniquely identify e′ in the new DOM D′ (line 5), and
by creating a new locator nl (lines 7-9) in the other cases.
To generate such new locators, the same algorithm originally E. Evolving Multi-locator based Test Cases
used to produce the locator l is run (such algorithm is returned When a new version of the web application is available, in
by function algoForLocator(l) on the new DOM D′ , with web case the multi-locator (Algorithm 1) is able to correctly select
element e′ as target). the target web element e′ , the set of alternative locators L is
For example, in the case of the XPath locators shown automatically repaired (Algorithm 2) by invoking the XPath
in Fig. 2 (bottom), using the weighted version of the multi- generation algorithms with the web element selected by the
locator described above (i.e., using weights 0.25, 0.50 and multi-locator and the evolved DOM D′ as input (e.g., if the
0.90), we obtain that the phone number field has the highest multi-locator selects e′ on D′ , the other locators that are not
vote. Thus, the three locators considered (in this case correctly) able to select e′ are updated as follows: xp3 = Montoto(D′,
broken (i.e., FirePath Absolute, FirePath Relative ID-based and e′ ), xp5 = RobulaPlus(D′, e′ )). Otherwise, the multi-locator is
Selenium IDE) are re-generated in order to select the phone unable to select the target element, thus the test case is broken,
number field. The related repair operation affects only the and the web tester has to manually repair only one of the
locator fragment tr[3], which in the new release becomes tr[4]. locators; the other locators can be automatically re-generated
Of course, in case the web element e′ returned by the using the implementations of the various XPath generation
multi-locator is wrong, the automated repair operation will algorithms. Finally, the test suite can be automatically updated
also produce a set of incorrectly repaired locators. Hence, with the new generated locators set L′ similarly as described
the possibility to perform correct repair operations is strictly in Section III-D.
TABLE I. Objects: Web Applications from SourceForge.net
1st Release 2nd Release
Description Web Site
Release Date Filea kLOCb Release Date Filea kLOCb
MantisBT bug tracking system http://sourceforge.net/projects/mantisbt/ 1.1.8 Jun-09 492 90 1.2.0 Feb-10 733 115
PPMAc password manager http://sourceforge.net/projects/ppma/ 0.2 Mar-11 93 4 0.3.5.1 Jan-13 108 5
Claroline collaborative learning environment http://sourceforge.net/projects/claroline/ 1.10.7 Dec-11 840 277 1.11.5 Feb-13 835 285
Address Book address/phone book, contact manager, organizer http://sourceforge.net/projects/php-addressbook/ 4.0 Jun-09 46 4 8.2.5 Nov-12 239 30
MRBS system for multi-site booking of meeting rooms http://sourceforge.net/projects/mrbs/ 1.2.6.1 Jan-08 63 9 1.4.9 Oct-12 128 27
Collabtive collaboration software http://sourceforge.net/projects/collabtive/ 0.65 Aug-10 148 68 1.0 Mar-13 151 73
a b c
Only PHP source files were considered - PHP LOC - Comment and Blank lines are not considered - Without considering the source code of the framework used by this application ( Yii framework)

IV. E XPERIMENTAL R ESULTS the random selection occurring when different elements obtain
the same number of votes?
This section presents the design, objects, research ques-
The goal of the first research question is to compare the
tions, metrics, procedure, quantitative and qualitative analysis,
robustness of the unweighted multi-locator with the robustness
and threats to validity of the empirical study conducted to
of five single locators, generated by state of the practice tools
evaluate the effectiveness and execution time overhead of the
(FirePath, release 0.9.7; Selenium IDE, release 2.8.0) and by
multi-locator. We follow the guidelines by Wohlin et al. [20]
research algorithms (Montoto, ROBULA +). The aim is to give
on designing and reporting empirical studies in software engi-
developers and project managers a precise idea of the benefits
neering.
coming from the adoption of the multi-locator. The metrics
The goal of this study is to analyse the effectiveness used to answer RQ1 is the number of broken XPath locators
and performance of the multi-locator in selecting the correct in the next software release.
target element, with the purpose of understanding the strengths
RQ2: What is the robustness of the weighed multi-locator as
and the weaknesses of the proposed approach. The results of
compared with the unweighted multi-locator?
this study are interpreted according to the perspective of: (1)
developers and project managers, interested in data about the The second research question is about the influence of
benefits of adopting the multi-locator in an industrial context, the weights used in the multi-locator. In particular, with
in order to increase the test suite robustness; (2) researchers, this research question we want to compare unweighted and
interested in empirical data about the impact of the multi- weighted multi-locators. The aim is to give developers, project
locator on web testing. The software objects used in the managers and researchers an idea of the importance of a good
experiment are six web applications already used in a different calibration of the used weights. The metrics used to answer
work [10]. RQ2 is the same used for RQ1.
RQ3: How far is the robustness of the multi-locator from the
theoretical limit?
A. Web Applications
The third research question aims at understanding whether
We conducted our experiments over a sample of six open- the multi-locator strategy has any margin of further improve-
source web applications from SourceForge.net. We considered ment. This aims at giving researchers an idea about the con-
only applications that: (1) are quite recent, so that they can tributions possibly coming from more complex multi-locator
work without problems on the latest versions of Apache, strategies. The metrics used to answer RQ3 is the distance
PHP and MySQL, technologies we are familiar with (since between our algorithm and the theoretical limit explained in
the XPath locators localise web elements in the HTML code the previous section.
processed by the client browser, the server side technologies RQ4: What is the amount of correct repair actions triggered
do not affect the results of the study); (2) are well-known and by the multi-locator?
used (some of them have been downloaded more than one The fourth research question deals with the correctness of
hundred thousand times last year); (3) have at least two major the automated repair actions operated on the broken locators.
releases (we have excluded minor releases because with small In particular, we are interested in the total number of correctly
differences between versions the majority of the locators — repaired broken locators w.r.t. the incorrectly repaired ones.
and, thus, of the corresponding test cases — are expected to The metrics used to answer RQ4 is the number of correctly
work without problems); (4) belong to different application repaired locators over the total number of repair actions.
domains. RQ5: What is the performance overhead of the multi-locator
Table I reports some information about the selected ap- on test case execution?
plications. We can notice how all of them are quite recent The last research question is about the additional time
(ranging from 2009 to 2013) and different in terms of number required for executing a test suite when the multi-locator
of source files (ranging from 46 to 840) and number of lines selection (i.e., Algorithm 1) is adopted, as experienced in
of code (ranging from 4 kLOC to 285 kLOC, considering only practical cases. This gives developers and project managers
the lines of code contained in the PHP source files, comments an idea of the penalty in terms of execution time coming from
and blank lines excluded). the adoption of the multi-locator. The metrics used to answer
RQ5 is the execution time expressed in seconds.
B. Research Question and Metrics
C. Procedure
Our study aims at answering the following research ques-
tions: To answer our RQs we proceeded as follows:
RQ1: What is the robustness of the unweighted multi-locator (I) We selected six open-source web applications from Source-
as compared to that of single locators? What is the effect of Forge.net as explained in Section IV-A.
TABLE II. Robustness of the various XPath Locators and of different kinds of Multi-Locator Algorithms
Address Book Collabtive MRBS Claroline PPMA Mantis All Apps
Total Number of Target Web Elements 80 125 102 235 30 103 675
Locators Broken % Weight Broken % Weight Broken % Weight Broken % Weight Broken % Weight Broken % Weight Broken %
FirePath Absolute 45 56 0,32 125 100 0,41 102 100 0,39 69 29 0,14 30 100 0,35 78 76 0,35 449 67
FirePath Relative ID-based 43 54 0,52 34 27 0,46 102 100 0,60 55 23 0,37 19 63 0,52 78 76 0,56 331 49
Selenium IDE 12 15 0,84 22 18 0,85 35 34 0,87 23 10 0,81 11 37 0,85 4 4 0,82 107 16
Montoto 10 13 0,76 7 6 0,74 39 38 0,80 68 29 0,81 16 53 0,79 11 11 0,76 151 22
ROBULA+ 10 13 0,89 3 2 0,86 30 29 0,92 22 9 0,87 10 33 0,89 3 3 0,87 78 12
Unweighted Multi-Locator (Worst Order) 9 11 3 2 18 18 24 10 9 30 5 5 68 10
Unweighted Multi-Locator (Best Order) 7 9 3 2 20 20 18 8 9 30 2 2 59 9
Weighted Multi-Locator (CrossValidation) 3 4 3 2 20 20 18 8 9 30 2 2 55 8
Theoretical Limit 1 1 3 2 16 16 15 6 9 30 2 2 46 7
%: percentage of broken locators over the total number of locators of this kind - Weight: represents the average robustness of this kind of locator computed on the other five applications

(II) For each application and for each web page we manually To answer RQ1, for each web element, the robustness of the
selected all the web elements: (1) on which it is possible multi-locator is automatically evaluated against the oracle on
to perform actions (e.g., links, input fields, submit buttons); the next release of the web application by verifying whether it
(2) which report information that can be used to evaluate is still able to locate the web element of interest. To this end,
assertions (e.g., the number of rows in a table or a confir- we verify if the web element selected by the multi-locator and
mation message); (3) which belong to pages related to core by the absolute locator abs’ is the same. The voting procedure
functionalities of the application (e.g., we did not consider of the unweighted multi-locator could generate ties and in these
the configuration and installation pages); and, (4) which are cases we decided to randomly select one element. Thus, to
present in both releases of the applications. This last require- measure the performance boundaries associated with such non-
ment is particularly important for computing the number of deterministic choice, we considered the two extreme cases that
broken locators. may happen when the random selection is done: the best case
In order to avoid biased results, we excluded multiple and the worst case. In particular, based on the results collected
instances of the same web element present in different pages, in our experiments, we defined the best order as: ROBULA +,
or different web elements that can be considered the same. In Selenium IDE, Montoto, Relative ID-based, Absolute. The
detail, we excluded multiple instances of: (1) the same web worst order is the reverse one. Instead of considering a random
element repeated in different web pages as part, for instance, element, in case of parity, we report the results obtained in two
of the header or the footer (e.g., the link to the home page cases: best order and worst order.
of the web application can be found in every page and has To answer RQ2, we evaluated the robustness of the multi-
exactly the same locator), and (2) similar web elements from locator produced by the weighted multi-locator as done in the
common groups (e.g., for a calendar with a check box for each previous step.
day we selected only one of the check boxes). To answer RQ3, for each application, we compared the the-
(III) For each selected web element in the first release (lo- oretical limit — computed as described in Section III-A —
cated by the absolute XPath abs, obtained from FirePath) we with the results of the multi-locator.
manually defined a mapping (abs→abs’) that associates it with To answer RQ4, for each application, we counted the number
its counterpart in the second release (located by the absolute of correctly repaired broken locators in the multi-locators and
XPath abs’, also obtained from FirePath). The absolute XPath analysed all the actions performed by the repair algorithm
locators defined on the second release of the applications are described in Section III-B.
used as oracle to verify the robustness of the generated XPath To answer RQ5, only for three different applications (Claro-
locators for the elements of the first release of the applications. line, AddressBook, PPMA) we have built two Selenium Web-
(IV) For the first release of each web application and for each Driver test suites: one that localizes the web elements using
web element (located by the absolute XPath defined above), the absolute XPath and the other that uses the multi-locator
four additional XPath locators have been created by using selection algorithm. We have re-executed three times these test
respectively: (1) FirePath Relative ID-based, (2) Selenium IDE, suites comparing the mean times of the single locator versions
(3) Montoto, and (4) ROBULA +. with the ones of the multi-locator, so as to measure the average
(V) For each web element, we applied the unweighted and overhead of the multi-locator for each application.
weighted variants of the multi-locator. We have defined the
weights for the weighted variant of multi-locator using k- D. Results
fold cross validation [6]. In particular, we used a leave-one
out cross validation with k = 6, where 6 is the size of the Table II reports the data used to answer RQ1, RQ2, and
original data set (i.e., the number of web applications selected RQ3. For each application and for each target web element, it
for this experiment). Thus, we split the original data set into reports the number of broken locators and the corresponding
five applications used for training and one application used breakage percentage over the total number of locators. In
for testing, with the testing application rotated so as to test the last two columns, aggregate results over all the six web
the multi-locator on each of the six available applications. applications are also reported.
Finally, for each test application, we evaluate the robustness Based on these results we can notice that absolute XPath
of the multi-locator using weights proportional to the average locators are the most fragile in the considered set of locators.
robustness of the single-locators algorithms, measured when In three cases (i.e., Collabtive, MRBS, PPMA) out of six, all
these are applied to the other five training applications (see absolute locators are broken. Considering all six applications,
columns “Weight” in Table II). 449 over 675 absolute locators (i.e., 67%) are broken.
TABLE III. Actions performed by the repair algorithm (Correctly Repaired, Incorrectly Repaired, Unmodified and Unrepairable)
Correct Address Book Collabtive MRBS Claroline PPMA Mantis All Apps
C1 Correct Locators - No Repair Triggered 279 434 198 936 64 341 2252
C2 Correct Locators - Incorrectl Repaired 1 0 4 2 0 0 7
CT Total Correct Locators (C1+C2) 280 434 202 938 64 341 2259

B1 Broken Locators - Correctl Repaired 106 176 212 149 41 164 848
Broken Locators - No Repair Triggered
Broken

B2 10 0 11 31 5 2 59
B3 Broken Locators - Incorrectl Repaired 4 0 20 22 10 3 59
B4 Broken Locators - Unrepairable 0 15 65 35 30 5 150
BT Total Broken Locators (B1+B2+B3+B4) 120 191 308 237 86 174 1116

T Total Locators (CT+BT) 400 625 510 1175 150 515 3375

The results of FirePath relative XPath locators are better 55 broken locators in total (i.e., it is only 1.3% from the best
than those of absolute XPath locators. Still, in MRBS all possible results, corresponding to only 9 locators out of 675).
relative locators are broken and over the six applications, 331 RQ4: Table III reports the data about the execution of the
out of 675 absolute locators (i.e., 49%) are broken. Multi-locator Repair algorithm described in Section III-B.
The locators generated by state of the art XPath generator The analysis is conducted considering each locator composing
algorithms targeting web testware evolution are more robust the multi-locator, thus in our case five for each target web
than the previous ones. In particular, the most robust is element (e.g., AddressBook has 80 web elements thus there
ROBULA +, whose XPath locators are broken only in 12% of are 400 locators). The locators composing the multi-locator
the cases (78 out of 675). The locators produced by Selenium set L can be correct or broken depending on whether they are
IDE achieve also a very high level of robustness (16% of able to locate the correct element on D′ or not. The correct
broken locators). The algorithm proposed by Montoto et al. locators in L can be: (C1) simply copied in L′ if the multi-
is also quite good, with 151 broken locators out of 675 (i.e., locator is correct; or (C2) incorrectly repaired, if the multi-
22%). locator selects the wrong element. The broken locators in L
In order to answer our research questions, we analyse can be: (B1) correctly repaired if the multi-locator is correct;
the experimental results quantitatively. The reasons and im- (B2) simply copied in L′ if the multi-locator is broken and
plications of the results are further analysed qualitatively in selects the same wrong element; or (B3) incorrectly repaired,
Section IV-E. if the multi-locator selects a different wrong element. When the
RQ1: The unweighted multi-locator (worst order) is more or multi-locator is not able to select any element on D′ , i.e., when
equally robust as single locators in almost all the cases, with all the locators in L are not able to select any element, no repair
the only exception of Selenium IDE and ROBULA + in the cases action is possible (B4). From the data, it is possible to notice
of Mantis and Claroline. Overall, multi-locator (worst order) that in the majority of the cases (92%), our algorithm performs
is able to outperform ROBULA +, the algorithm that produces the correct action (see the rows C1 and B1 in green), i.e., for
the most robust locators, globally reducing the number of 3100 locators out of 3375. Overall, 848 locators are correctly
broken locator from 78 to 68 (12.8% reduction). Multi-locator repaired (C1) over a total 1116 broken locators (76%). The
(best order) improves the results of multi-locator (worst order) incorrect repairs of correct locators (C2) are only 7 out of
and further reduces the number of broken locators, to 59, 2259 (0.31%). To answer RQ4, we focus only on the repair
corresponding to reduce by 24.4% the number of broken actions, i.e., when the locators are modified (914 cases, B1,
C2, B3), and we can observe that 848 locators (i.e., 93%) are
locators w.r.t. ROBULA +. Only in the case of MRBS, multi-
locator (worst order) is able to perform slightly better (i.e., -2 correctly repaired (B1), while only 66 locators are repaired
broken) than multi-locator (best order). incorrectly (C2, B3).
RQ5: The overhead of the multi-locator selection algorithm
RQ2: The adoption of the weights (see columns “Weight” in
on test case execution is very low in general. Indeed, in the
Table II) allows multi-locator to further improve the results
worst case, Claroline, the test suite is composed by 18 test
provided by multi-locator (worst and best order). Indeed,
cases and a complete execution of the entire test suite requires
weighted multi-locator is able to reduce the number of broken
on average 84.6 seconds when using one locator per web
locators to 55, corresponding to reduce by 29.5% the number
element, which corresponds in total to 132 XPath locators
of broken locators w.r.t. ROBULA +, and to achieve an overall
being evaluated, and 87.8 seconds when using the multi-locator
percentage of only 8% broken locators. It is interesting to
selection, which corresponds in total to 660 XPath locators
notice that in this case, there is no single locator algorithm,
being evaluated. Thus, the increment of the time required for
among the ones we considered, which is able to outperform
a complete execution of the test suite is only 3.8%. With
weighted multi-locator in any case. Only in the case of MRBS,
AddressBook and PPMA, composed respectively by 13 and 21
multi-locator (worst order) is able to perform slightly better
test cases, the overhead due to the introduction of the multi-
(i.e., -2 broken) than weighted multi-locator.
locator is even lower, being respectively 2.9% (from 84 to 420
RQ3: The results in Table II show that weighted multi-locator XPaths are evaluated) and 2.8% (from 143 to 715 XPaths are
is able to reach the theoretical limit in 3 cases out of 6, i.e., evaluated).
for web applications Collabtive, PPMA, and Mantis. In the
other cases, its results are quite close to the best achievable
E. Qualitative Analysis and Discussion
ones (with the considered algorithms). Indeed, applying the
multi-locator using the outputs (i.e., the locators) of the five Weighted multi-locator has the best performance, but even
selected algorithms, it is not possible to have less than 46 the locators selected by the unweighted multi-locator are more
broken locators out 675 and weighted multi-locator has only robust that the ones generated by the best algorithm considered
in this work (i.e., ROBULA +). In case of a tie, the unweighted since any reasonable weight assignment that respects the
multi-locator selects randomly the locator to return. Hence, order ROBULA +, Selenium IDE, Montoto, FirePath Relative
the robustness of the unweighted multi-locator is intermediate ID-based and FirePath Absolute is expected to work fine.
between the worst order and the best order of selection, For instance, the following weights can be used in practice:
considered in our experiment. Since unweighted multi-locator ROBULA + 0.90, Selenium IDE 0.85, Montoto 0.80, FirePath
(worst order) is already better than ROBULA +, we conclude Relative ID-based 0.50, and FirePath Absolute 0.33. On our
that the multi-locator is beneficial even when uniform weights subjects, these weights provide exactly the same result of the
are used. Analysing the results, we discover that often 2 or 3 weighted multi-locator with cross validated weights.
locators (generally the ones generated by ROBULA +, Montoto The other issue potentially affecting the adoption of the
and Selenium IDE) select the correct target element. Using the multi-locator is the test case execution overhead, but our
unweighted multi-locator, the votes assigned to the correct web experimental data show that this is negligible, even in the
elements range typically from 2 to 5, thus there are no cases in worst case. Hence, we conclude that: the proposed approach
which an element is chosen since voted only by one algorithm (1) is very easy to adopt (e.g., weights can be assigned
and the unweighted variant is often enough to improve the heuristically, without using cross-validation), and (2) offers
performance of single locators. major benefits, in terms of robustness of the locators during
The difference between multi-locator (best order) and software evolution.
multi-locator (worst order) is due to the cases in which there
is parity in the votes assigned to the candidate web elements
F. Threats to Validity
(in the other cases their behaviours are exactly the same).
Looking at Table II, we can see that the absolute locators One threat to the internal validity of our study is associated
generated by FirePath are usually broken in the highest number with the approach used to select the target web elements. To
of cases (67%), while the ones generated by ROBULA + are remove this threat, we adopted the procedure described in
broken in the lowest number of cases (12%). Thus, in case of Section IV-C. While the choice of the releases considered in
parity, multi-locator (worst order) selects the locator generated this study may have affected the results of RQ1, RQ2, RQ3, we
by FirePath, which has a good chance of being broken, have no reason to believe that the ranking of the algorithms
while multi-locator (best order) selects the element voted in terms of broken locators, as reported in Table II, would
by ROBULA +, having many more chances of being correct. vary significantly considering different releases, although the
Since the unweighted multi-locator makes a random choice magnitude of our findings might change. Concerning the
in case of parity, its actual performance will be intermediate strategy used to assign weights to the weighted multi-locator,
between multi-locator (best order) and multi-locator (worst it is clear that other choices (e.g., minimising the number
order) which means that, overall, it performs better that any of broken locators selected by the multi-locator, instead of
single locator. just measuring the robustness of each algorithm) are possible.
Weighted multi-locator has the best performance. Its im- However, from the obtained results it is clear that this choice
provement over the unweighted multi-locator is a reduction of is not so critical, since a simple weight assignment based on
broken locators between 4 and 13 (best/worst order, respec- each algorithm’s robustness gives already results close to the
tively). It is interesting to notice that making an optimally theoretical limit. Finally, concerning the generalization of re-
ordered choice (best order) in case of parity leads to very sults, we selected real open source web applications belonging
similar results as those obtained with carefully determined to different domains, which makes the context realistic, even
weights. On the other hand, such optimal ordering is unknown though further studies with other applications are necessary to
when the unweighted multi-locator is used, so it represents corroborate the obtained results.
just the upper bound for the performance expected when the
multi-locator has to make a random choice, in case of parity. V. R ELATED W ORK
The locator repair algorithm is able to perform the correct
repairs in most of the cases. In Collabtive it repairs all the The problem of test script maintenance and repair has been
broken locators without making any error. In fact, in this case extensively studied by the research community. Grechanik et
every time the multi-locator is broken (3 cases) it does not al. [7] describe an approach for maintaining and evolving test
select any element. Thus the repair algorithm is not triggered. scripts by means of GUI-tree diffs, in order to find altered
Overall, the multi-locator is broken in 55 cases of which in 30 GUI objects. Choudhary et al. [3] propose WATER, a tool
cases (false negatives) it does not select any element – hence, that suggests changes that can be applied to repair test scripts
no incorrect repair is performed – in 25 cases (false positives) for web applications. It compares the test executions of two
it selects the wrong element – hence, the repair algorithm is successive releases of a web application. By analysing the
executed using the wrong element to repair the other locators. difference between the two executions, it suggests repairs to
If we consider the weights assigned to the five locators the script code. Thummalapenta et al. [18] present ATA, a
creation algorithms by the cross-validation procedure (see tool to automatically repair test scripts. For certain types of
columns “Weight” in Table II), we can see that they agree applications or environment changes, they are able to repair
on the ranking of the algorithms, independently of the web the XPath on-the-fly. Yandrapally et al. [21] show a novel
application left out by the cross-validation procedure. ROB - solution to the problem of test-script fragility based on what
ULA + is always assigned the highest weight, followed by they define as “contextual clues”. Fard et al. [14] mine an
Selenium IDE and Montoto. FirePath Relative ID-based and existing test suite to gather input and assertion information, and
FirePath Absolute close the ranking. This means that for a extend it to the uncovered portions of the web application by
practical adoption of the multi-locator it is not even necessary means of automated crawling and test generation techniques.
to execute the cross-validation procedure to obtain the weights, Mirzaaghaei et al. [15] present TestCareAssistant, a technique
that automatically repairs test cases broken due to changes [2] S. Berner, R. Weber, and R. Keller. Observations and lessons learned
in method declarations. Daniel et al. [4] use GUI change from automated testing. In Proceedings of 27th International Confer-
refactoring information to repair the test code accordingly. ence on Software Engineering, ICSE 2005, pages 571–579. IEEE, 2005.
[3] S. R. Choudhary, D. Zhao, H. Versee, and A. Orso. WATER: Web
Memon et al. [13] describe a method to repair GUI test scripts application test repair. In Proceedings of 1st International Workshop on
by means of user-specified transformations. Zhang et al. [22] End-to-End Test Script Engineering, ETSE 2011, pages 24–29. ACM,
present FlowFixer, a technique able to automatically migrate 2011.
scripts towards a new and evolved GUI. Differently from the [4] B. Daniel, Q. Luo, M. Mirzaaghaei, D. Dig, D. Marinov, and M. Pezze.
Automated GUI refactoring and test script repair. In Proceedings of 1st
works aforementioned, our work aims at strengthening the test International Workshop on End-to-End Test Script Engineering, ETSE
scripts by means of a multi-locator, which is useful in two 2011, pages 38–41. ACM, 2011.
respects: (I) to make the test script more robust, by taking [5] M. Fewster and D. Graham. Software Test Automation: Effective Use of
advantage of redundant information; (II) to repair the locators Test Execution Tools. Addison-Wesley Longman Publishing Co., Inc.,
set itself, for the successive releases of the software. To the best Boston, MA, USA, 1999.
[6] S. Geisser. Predictive Inference. Chapman & Hall/CRC Monographs
of our knowledge, no previous work proposed and evaluated on Statistics & Applied Probability. Taylor & Francis, 1993.
the effectiveness of a voting procedure within such context, [7] M. Grechanik, Q. Xie, and C. Fu. Maintaining and evolving GUI-
designed to make the script more resilient to software changes. directed test scripts. In Proceedings of 31st International Conference
Our repair mechanism is also different from the existing ones, on Software Engineering, ICSE 2009, pages 408–418. IEEE, 2009.
because it resorts on voting to select the repair action among [8] M. Leotta, D. Clerissi, F. Ricca, and C. Spadaro. Comparing the
maintainability of Selenium WebDriver test suites employing different
those provided by the alternative algorithms aggregated by the locators: A case study. In Proceedings of 1st International Workshop
multi-locator. on Joining AcadeMiA and Industry Contributions to testing Automation,
JAMAICA 2013, pages 53–58. ACM, 2013.
[9] M. Leotta, D. Clerissi, F. Ricca, and P. Tonella. Capture-replay vs.
VI. C ONCLUSIONS AND F UTURE W ORK programmable web testing: An empirical assessment during test case
evolution. In Proceedings of 20th Working Conference on Reverse
The main sources of fragility for web test scripts are Engineering, WCRE 2013, pages 272–281. IEEE, 2013.
web element locators that must be repaired manually when [10] M. Leotta, D. Clerissi, F. Ricca, and P. Tonella. Visual vs. DOM-based
the software evolves and locators get broken. Algorithms for web locators: An empirical study. In Proceedings of 14th International
Conference on Web Engineering (ICWE 2014), volume 8541 of LNCS,
the creation of robust locators have different strengths and pages 322–340. Springer, 2014.
weaknesses; they often exhibit complementary performance. [11] M. Leotta, A. Stocco, F. Ricca, and P. Tonella. Reducing web test
For this reason, we proposed the multi-locator, a novel ap- cases aging by means of robust XPath locators. In Proceedings
proach that uses a voting decision procedure to aggregate of 25th International Symposium on Software Reliability Engineering
the results of multiple, alternative locators for producing a Workshops, ISSREW 2014, pages 449–454. IEEE, 2014.
[12] M. Leotta, A. Stocco, F. Ricca, and P. Tonella. Automated generation
consolidated locator. Adoption of the multi-locator requires of visual web tests from DOM-based web tests. In Proceedings of 30th
minimal effort and has minimal impact: (1) its parameters Symposium on Applied Computing, SAC 2015. ACM, 2015.
(weights) can be easily approximated heuristically; (2) the [13] A. M. Memon. Automatically repairing event sequence-based GUI
automated repair actions it performs are correct most of the test suites for regression testing. ACM Transactions on Software
times; (3) the execution overhead introduced by the multi- Engineering and Methodology (TOSEM), 18(2):4:1–4:36, Nov. 2008.
[14] A. Milani Fard, M. Mirzaaghaei, and A. Mesbah. Leveraging existing
locator selection algorithm is negligible, and (4) existing single tests in automated test generation for web applications. In Proceedings
locator test suites can be migrated to the multi-locator approach of 29th International Conference on Automated Software Engineering,
automatically. Experimental results show that the multi-locator ASE 2014, pages 67–78. ACM, 2014.
is substantially more robust than single locators, since it [15] M. Mirzaaghaei, F. Pastore, and M. Pezze. Automatically repairing
test cases for evolving method declarations. In Proceedings of 26th
reduces by 29.5% the number of broken locators w.r.t. the International Conference on Software Maintenance, ICSM 2010, pages
best single locator algorithm (ROBULA +). This may represent 1–5. IEEE, 2010.
a substantial saving of test case repair effort in case, e.g., of [16] P. Montoto, A. Pan, J. Raposo, F. Bellas, and J. Lopez. Automated
large industrial test suites. browsing in AJAX websites. Data & Knowl. Eng., 70(3):269–283,
2011.
In our future work, we plan to: (1) experiment with more [17] A. Stocco, M. Leotta, F. Ricca, and P. Tonella. PESTO: A tool
web applications, (2) analyse the effectiveness of the repair for migrating DOM-based to visual web tests. In Proceedings of
algorithm across more than two releases of the applications, 14th International Working Conference on Source Code Analysis and
(3) analyse the contribution of the various algorithms to the Manipulation, SCAM 2014, pages 65–70. IEEE, 2014.
creation of the multi-locators. Moreover, we plan to complete [18] S. Thummalapenta, P. Devaki, S. Sinha, S. Chandra, S. Gnanasundaram,
D. D. Nagaraj, and S. Sathishkumar. Efficient and change-resilient
the development of the tool for automating the migration of test automation: An industrial case study. In Proceedings of 35th
existing DOM-based test suite to the multi-locator approach International Conference on Software Engineering, ICSE 2013, pages
using a technique similar to the one we adopted in a previous 1002–1011. IEEE, 2013.
work [12], [17]. Once the tools will be completed, we plan [19] P. Tonella, F. Ricca, and A. Marchetto. Recent advances in web testing.
Advances in Computers, 93:1–51, 2014.
to make it available for download together with a Java im- [20] C. Wohlin, P. Runeson, M. Höst, M. Ohlsson, B. Regnell, and A. Wess-
plementation of each XPath generation algorithm used by the lén. Experimentation in Software Engineering - An Introduction. Kluwer
multi-locator. We will also extend our work beyond the area of Academic Publishers, 2000.
structural locators, considering visual locators [10], which take [21] R. Yandrapally, S. Thummalapenta, S. Sinha, and S. Chandra. Robust
advantage of image recognition to identify the web elements. test automation using contextual clues. In Proceedings of the 2014
International Symposium on Software Testing and Analysis, ISSTA
2014, pages 304–314. ACM, 2014.
R EFERENCES [22] S. Zhang, H. Ly, and M. D. Ernst. Automatically repairing broken
workflows for evolving GUI applications. In Proceedings of the 2013
[1] B. Beizer. Software Testing Techniques (2nd Ed.). Van Nostrand International Symposium on Software Testing and Analysis, ISSTA
Reinhold Co., New York, NY, USA, 1990. 2013, pages 45–55. ACM, 2013.

You might also like