0% found this document useful (0 votes)
121 views8 pages

Automatic Selection of Test Cases For Regression Testing PDF

Model based testing

Uploaded by

Vaishalichourey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views8 pages

Automatic Selection of Test Cases For Regression Testing PDF

Model based testing

Uploaded by

Vaishalichourey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Automatic Selection of Test Cases for Regression Testing


Cláudio Magalhães Flávia Barros Alexandre Mota
Centro de Informática Centro de Informática Centro de Informática
Recife, Brazil Recife, Brazil Recife, Brazil
[email protected] [email protected] [email protected]
Eliot Maia
Motorola Mobility
Jaguariúna, Brazil
[email protected]

ABSTRACT tests because it can focus on the changes performed during


Regression testing is a safety measure to attest that changes a certain period of time. That is, given a specific develop-
made on a system preserve prior accepted behavior. Identi- ment period of time, one can get the corresponding CRs and
fying which test cases must compose a regression test suite select the most appropriate test cases to be re-executed.
in a certain development stage is tricky, particularly when When source code is available, this selection can be per-
one only has test cases and change requests described in formed safely and precisely [11]. Unfortunately in some con-
natural language, and the execution of the test suite will be texts, specifically where source code is unavailable and tests
performed manually. That is the case of our industrial part- are executed manually1 , even the amount of test cases cho-
ner. We propose a selection of regression test cases based sen taking into account change requests can be unfeasible to
on information retrieval and implement as a web-service. In be executed. In such a situation, one has to consider some
performed experiments, we show that we can improve the sort of selection criteria based on information retrieval [12].
creation of regression test suites of our industrial partner In this paper we propose a selection process based on in-
by providing more effective test cases based on keywords formation retrieval, specifically using the Apache Lucene
analysis in an automatic way. framework [8]. This is implemented in a tool named Au-
toTestPlan, which is already being used by our industrial
partner. The development of this tool was divided in two
Keywords main phases: (i) automatization of several manual activi-
Regression testing; Test case selection; Information Retrieval ties performed by our industrial partner (this represented a
75% improvement in working time); (ii) implementation of
the selection method presented in this paper (this aims at
1. INTRODUCTION tackling the other 25% of working time). As we will see in
Nowadays it is usual that software evolves iteratively and Section 4, by implementing (ii) we have an additional gain
incrementally. From iteration to iteration, some changes oc- in the time spent in (i) and an impressive working time gain
cur to fix faults found or to improve functionality. Such in (ii) because we replace the test architect activity com-
changes are documented through Change Requests (or sim- pletely. Currently, the test architect has just to validate our
ply CRs). As changes can potentially introduce faults, it list of selected test cases because they required this addi-
is good practice to test the system periodically by using tional control.
several different approaches. One of them, particularly re- The main contributions of this paper are:
lated to changes through consecutive versions of a system,
is regression testing. In regression testing, certain tests are • A process that receives a set of CRs, a set of test cases
reexecuted to attest that the changes did not introduce mis- (TCs, for short) and returns an ordered list of TCs to
behavior in the system. create a test plan based on information retrieval;
As the system size grows, the amount of tests to check its
behavior grows as well. Regression testing presents an in- • A prototype tool (called AutoTestPlan) where the
teresting opportunity to avoid performing a huge amount of concepts presented in this paper were implemented;

∗This is our contact author • Some experiments demonstrating the advantages of


the proposed selection process to create regression test
suites.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
This paper is organized as follows. In Section 2 we intro-
tion on the first page. Copyrights for components of this work owned by others than duce the main concepts used in this work, namely regres-
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- sion test selection, prioritization, and information retrieval.
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
Section 3 presents our proposed solution to the problem of
selection regression tests. We perform some experiments to
SAST, September 19-20, 2016, Maringa, Parana, Brazil
1

c 2016 ACM. ISBN 978-1-4503-4766-2/16/09. . . $15.00 This is the case of our industrial partner: Motorola Mobil-
DOI: http://dx.doi.org/10.1145/2993288.2993299 ity.
analyse the advantages of our proposed process and these are In this work we use the material in the next section to
reported in Section 4. Our conclusions, related and future create a prioritized list of test cases, where f is a similar-
work are discussed in Section 5. ity function based on keywords frequency found in test cases
and CRs. Thus we deviate slightly from the literal definition
2. BACKGROUND of test case prioritization because we do not intend to get an
ordered list of test cases where higher priority executes ear-
This section briefly presents basic concepts related to re-
lier than those with lower priority, but instead that the lower
gression test selection, and information retrieval, focusing
priority can be possibly discarded whether the frequency f
their applications in software engineering.
is too low.
2.1 Regression Test Selection
Efficient regression testing is important, even crucial, for
2.3 Information Retrieval and Impact Analy-
organizations with a large share of their cost in software de- sis
velopment. This includes, among other tasks, identifying Information retrieval (IR) applications are being increas-
which test cases need to be re-executed (that is, regression ingly used in software engineering problems [7]. IR focuses
test selection) to check whether the behavior of modified on search: given a massive collection of documents, IR at-
software was preserved. Regression test selection involves a tempts to find the most relevant documents based on a user-
trade-off between the cost for re-executing test cases, and query. Generally an IR employs three main tasks: text pre-
the risk for missing faults introduced through side-effects of processing, indexing, and retrieval. Preprocessing concerns
changes to the software. Iterative development strategies text normalization, stop-word removal, and stemming. A
and reuse are common means of saving time and effort for text normalizer removes punctuation, performs case-folding,
the development. However they both require frequent retest- tokenizes terms, etc. In the stop-word removal phase, an
ing of previously tested functions due to changes in related IR application discards the frequently used terms such as
code. The need for efficient regression testing strategies is prepositions, articles, and so on, to improve efficiency and
thus becoming more and more important. reduce spurious matches. Finally, stemming combines vari-
A test selection technique, in the context of regression ants of the same term (for instance, see, seeing, saw) to
testing, tries to find a subset of test cases that satisfies cer- improve term matching between query and document. Af-
tain goals, relevant to a change in the system under test. terwards, documents are indexed for fast retrieval. Once
The nature of such goals can vary considerably. Follow- indexed, queries are submitted to the search engine, which
ing the work reported in [10], the goals that can be com- returns a ranked-list of documents in response. Finally, the
pared and evaluated are inclusiveness, precision, efficiency, search engine is evaluated by measuring the quality of its
and generality. Inclusiveness measures the extent to which a output ranked-list relative to each user’s input query. In
technique chooses tests that will cause the modified program this paper we use the text processing framework Lucene [8].
to produce different output than the original program, and Beyond information retrieval, our work also uses concepts
thereby expose faults caused by modifications. Precision from impact analysis. Impact analysis is the identification
measures the ability of a technique to avoid choosing tests of the work products affected by a proposed change request,
that will not cause the modified program to produce different either a correction of flaws or a new feature request [2]. Our
output than the original program. Efficiency measures the approach to impact analysis is based on the hypothesis that
computational cost, and thus, practicality, of a technique. the set of revision comments of a file and the set of CRs
Generality measures the ability of a technique to handle re- that previously impacted it are a good descriptor of the file
alistic and diverse language constructs, arbitrarily complex to support impact analysis of new CRs. We use textual
code modifications, and realistic testing applications. similarity to retrieve past CRs similar to a new CR and to
It is worth noting that the previous categories proposed compute the impacted files.
by [10] are related to source code. But in our context we Textual similarity is a critical part of our approach. The
do not have access to source code. We can only access two Information Retrieval community dealt with text similarity
kinds of artifacts: change requests and text based test cases. for a long time. Given a set of text documents and a user
Therefore our goals are somewhat different. We are inter- information needs represented as a set of words, or more
ested in measuring our effectiveness in terms of keywords generally as a free text, the information retrieval problem is
presence and frequency in such documents [7]. This is fur- to retrieve all documents relevant to the user.
ther detailed in the next section.

2.2 Test case prioritization 3. PROPOSAL


Test case prioritization intends to order test cases for re- In this section we present our proposal for automatic TC
gression testing in such a manner that test cases with higher selection and prioritization.
priority executes earlier than those with lower priority, ac-
cording to some performance criteria. 3.1 Selection
This works as follows. Assume that T is a test suite, P T The proposed solution for the automatic TC selection and
is the set of permutations of T and f is a function from P T prioritization is to base this process on the CRs which are
to real numbers. The problem is to find T 0 which belongs to being resolved in current regression test campaign. The aim
P T , such that, for every T 00 , T 00 belongs to P T and (T 00 6= is to automate the manual process, which is very costly and
T 0 ) [f (T 0 ) ≥ f (T ”)]. not always effective. The automated process receives as in-
In the above definition, P T denotes the set of all possible put a set of CRs manually selected by the test team, and
prioritization or order of T , f is the function which is applied returns as output an ordered list of TCs which will be used
to any such order, and returns an award value for it. to perform the regression tests (a test plan).
The TCs are obtained from a general test repository main- indexing/search engine will constitute one module of the im-
tained by the company. As the general repository is very plemented system (Section 3.2). The indexing phase will
large and frequently updated, all testing campaigns start by be executed only once per product, since the MP does not
selecting from this repository a (possibly high) number of change.
TCs related to the test goals (named as Master Plan—MP). Phase 2 (Creation of CRs keyword based represen-
This selection is traditionally manually executed, based on tation) - this phase aims to create a keyword based repre-
the test goals description and on the architect’s experience. sentation of each input CR, counting on three steps:
As our aim is to automate the whole process of test plan
creation, we must also address the selection and creation of • Step 1 (Extraction of keywords from the input
this initial TC subset (from the Master Plan). This sub pro- CRs): The keywords are extracted from previously
cess is part of the Phase 1 of the general automation process, defined fields of the CR template - the ones with mean-
which is fully described below. ingful information for the task (e.g., title, product com-
Finally, note that although the Master Plan is already a ponent, problem description). These fields were chosen
subset of the general repository, it is usually still large, to with the help of a test architect;
preserve coverage of the code to be tested. Thus, the Master
Plan is later reduced by the test architect to manually create • Step 2 (Stopwords elimination): Each CR rep-
more objective/focused test plans (which is the main aim of resentation is then filtered through the elimination of
our automation process). stopwords. These are words which are too frequent
in the TC repository (and thus will not help to iden-
3.1.1 The Manual Process tify a particular test), or which are not relevant in
All testing campaigns start with the creation of a Master the current application. The list of stopwords is usu-
Plan (MP), a large set of TCs related to the new product to ally built during the initial indexing process described
be tested. The TCs are obtained from a general test repos- above (Phase 1), since it may change according to the
itory maintained by the company. The selection process is current MP vocabulary and words frequency.
manually executed, based on the test goals and on the ar- • Step 3 (Stemming): The list of keywords may also
chitect’s experience. Note that although they are a subset undergo a stemming process, to reduce inflected or
of the general TC repository, Master Plans tend to be large, derived words to their word stem/base form (for in-
to preserve the coverage of the code under test. stance, messages -> message). This process favors the
As new versions of the software/product are released, re- retrieval of a higher quantity of TCs, so it should be
gression tests are executed. However, it is not always feasible optional and only used when necessary. Section 3.2
to execute all TCs in the MP for every new product release will bring more details about this process and its con-
(build); particularly because execution is mostly manually sequences.
performed. Yet, it is not necessary to test features which
have not been changed in the build under test. Thus, the Phase 3 (MP Index File consultation) - in this phase,
test architect creates Tests Plans focused on the currently the CRs keyword representations are used to search the MP
open CRs. index file to create the initial TC lists. This phase will be
It worth noting that the main bottleneck of this manual executed once for each input CR.
process is the creation of Test Plans, since each new product
demands only one MP, however several test plans (for each • Step 1 (Queries creation): This step receives each
new build). And there may be several builds released until CR representation as input and delivers one query per
a product reaches an acceptable stability to be released in CR to be submitted to the search engine. The du-
the market. plications are eliminated. However, the duplicated
In this light, the main goal of our work is to automatically words are placed at the beginning of the query, so
generate Test Plans for each new product release. that they will have more importance during the search
process (see more details of the searching process in
3.1.2 The Automated Process Section 3.2).
The proposed solution for the automatic TC selection and
prioritization focuses on the generation of test plans for a re- • Step 2 (Queries processing): each query created in
gression test campaign. The aim is to automate the manual Step 1 above is individually submitted to the search
process, which is very costly and not always effective. engine, retrieving a list of TCs related to the corre-
The automated process receives as input a Master Plan sponding CR. These lists of TCs are automatically or-
(MP) and a set of open Change Requests (CRs) related to a dered by the search engine according to the relevance
release of the product under test, and returns one test plan of each TC to the current query.3 . The obtained lists
per CR, which will be used to perform the regression tests. of TCs are given as input to Phase 4.
This process counts on four main phases, detailed below.
Phase 1 (MP Index File creation) - as said, MPs tend Phase 4 (Test Plan Creation) - in this phase, the ob-
to be large, thus hardening the (manual or automatic) selec- tained lists of TCs are merged, originating the Test Plan
tion process. To ease the Test Plans creation, the adopted to be executed. Note that different queries (representing
solution here was to automatically index the input MP, so different CRs) may retrieve the same TC from the index
that TCs can be retrieved based on keyword queries related base. During the merging step, the exiting duplications are
to each open CR. The index file creation was carried out eliminated. However, we understand that if a TC was re-
with the use of the Lucene Apache2 indexing engine. This trieved by more than one query, it may be more relevant for
2 3
https://lucene.apache.org/ Relevance measures will be mentioned in Section 4.
Figure 1: General architecture of the proposed process

the testing campaign as a whole. So, the duplicated TCs duplications and irrelevant words (that is, stopwords). The
are prioritized in the final ordered Test Plan. The merging resulting set of words (vocabulary) will then be used to in-
strategy will be detailed and illustrated in Section 3.2. dex the documents in the base. It is worth mentioning that
The above process will be repeated every time new build the user is able to edit the stopwords list to add or remove
is tested and presents defects. However, the process starts any word term of interest.
from Phase 2. As said, Phase 1 will run only once per This module was implemented using the Apache Lucene
new product. open-source information retrieval software library6 . We de-
Figure 1 depicts the general architecture proposed for gen- ployed the PyLucene7 , a Phyton version available from the
eral process. Lucene website. PyLucene is built upon the Vector Space
Information Retrieval model [9]. In this algebraic model,
3.2 The Test Selection Tool Prototype each document in the base is represented as a vector in a
The automated process described above was implemented multidimensional space. Each dimension corresponds to one
in a tool prototype named as AutoTestPlan. It was imple- word/term in the vocabulary of the documents base.
mented using Django Bootstrap4 , a high-level Python Web In our system, the documents base corresponds to the
framework, bearing an MVC (Model-View-Controller) ar- Master Plan, a file containing Test Cases textual descrip-
chitecture. tions (each TC representing a document to be indexed). The
Aiming to respect Software Engineering principles of mod- vocabulary is obtained automatically by the indexing engine
ularity (to provide for extensibility and easy maintenance), during the indexation process, and it will keep the relevant
this prototype counts on three separate modules and one words/terms found in the TCs descriptions.
data base, the MP Index File. Finally, the vocabulary may still undergo a stemming pro-
cess, to reduce each word to its base form. The stemming
3.2.1 Module 1 - Indexing/Search engine may reduce the vocabulary size, since two inflected or de-
This module consists of an indexing and search engine. rived words may be mapped onto the same stem/base form
Therefore, it implements the two phases of the automated (e.g., frequently and infrequent will be reduced to frequent).
process related to Information Retrieval tasks: Phase 1 The PyLucene already provides a stemmer (the Snowball
(which creates the index file), and Phase 3 (which allows algorithm), which can be activated when desired.
the retrieval of TCs based on keywords queries). As said, the index file creation is executed only one per
new product, since there is only one MP per product.
Index file creation - Phase 1 of the automated process.
The Index File is a data structure created to facilitate the Index File consultation - Phase 3 of the automated pro-
retrieval of text documents based on keywords queries. In cess.
this structure, each document is indexed by the words/terms This module receives as input the CRs keyword represen-
appearing in its text5 . When a query is submitted to the tations and delivers one query per CR to be submitted to
index file, all documents containing the words in the query the search engine.
are retrieved. • Step 1 (query creation): The duplications are elimi-
The words used to index the documents constitute the nated, however the duplicated words are placed at the
index base vocabulary, which is clearly dependent upon the beginning of the query, so that they will have more
documents being indexed. But note that not all words/terms importance during the search process (see running ex-
appearing in the base are relevant to index documents (be- ample).
cause some words are too frequent/infrequent, or carry no
semantic meaning—for instance, prepositions, conjunctions). • Step 2 (Queries processing): Each query submit-
Thus, the initial set of words is pre-processed to eliminate ted to the search engine is represented as a vector in
4
the multidimensional vector space created in Phase
http://www.djangoproject.com/
5 6
This structure is also known as inverted index file, since https://lucene.apache.org/
7
the words index the documents where they appear https://lucene.apache.org/pylucene/index.html
Figure 2: The AutoTestPlan prototype

1. Following, the search engine measures the similar- • Step 3 (Stemming): this process must only be used
ity between the query vector and the vectors repre- here when it is also used in the generation of the in-
senting each document in the space to retrieve most dex base. Otherwise, the matching between keywords
relevant documents to the query. Here, relevance is will be wrongly reduced. The stemming may reduce
measured by the cosine of the angle between the vec- the list of keywords which represent a CR, since two
tor representing the query and the vectors represent- inflected or derived words may be mapped onto the
ing the documents (the smaller the angle between two same stem/base form. However, this process favors a
vectors, the higher the cosine value). This strategy re- higher number of matches between the query and the
turns an ordered list of documents based on the cosine index base documents, thus retrieving a larger quan-
measure. The obtained lists of TCs are given as input tity of TCs from the index base. Therefore, it should
to Module 3, which creates the final Test Plan. be optional and only used when necessary.

3.2.2 Module 2 - Creation of CRs keyword based


representation Figure 3: Initial TC merged list
This module implements Phase 2 of the automated pro-
cess. It receives the input CRs and delivers their individ-
ual keyword based representation. This module was imple-
mented using Django Bootstrap, already mentioned here.

• Step 1 (Extraction of keywords from the input


CRs): As said, the keywords are extracted from previ-
ously defined fields of the CR template—for example,
title, product component, problem description (Step
1 of Phase 2). The targeted fields are located within
the CR using a pattern matching strategy: the CR
text is sequentially searched, looking for the fields ti-
tles. When a targeted field is located, its content is
extracted and appended to the initial query keywords
representation.

• Step 2 (Stopwords elimination): the list of key-


words obtained from Step 1 is filtered through the
elimination of stopwords. The list of stopwords can be
obtained from the indexing engine used implemented
in Module 1. In our experiments, the final keyword 3.2.3 Module 3 - Test Plan Creation
lists usually counted on up to 8 words/terms (see Sec- This module implements Phase 4 of the automated pro-
tion 4). cess. It receives as input the ordered lists of TCs retrieved
by the search engine and merges these lists, delivering the Architect Chosen Elected Top Recall Match
final Test Plan. A 27 27(92) 20 100% 74%
The merging strategy developed in this work takes into B 25 25(92) 18 100% 72%
account the position of the TC in the retrieved list plus the
number of times the same TC was retrieved by different Table 1: 1st experiment (200 Test Cases)
queries.
Figures 3 and 4 illustrate the merging strategy currently Architect Chosen Elected Top Recall Match
under use, considering as example a regression test based on A 195 148(315) 116 76% 59.5%
3 CRs. This strategy has 2 steps (regardless the number of B 128 100(315) 43 78% 33.6%
input CRs):

• Step 1 (the input lists are initially merged al- Table 2: 2nd experiment (427 Test Cases)
ternating, one by one, the TCs from each input
list (Figure 3): The idea is to prioritize TCs well gression campaign. These release notes have three CRs
ranked by the search engine for each CR query. This in the first release notes and 149 CRs in the other;
way, the better ranked TCs will be on the top of the
Test Plan list, regardless the CR which retrieved them. • Corpus selection: the test architects manually se-
The resulting list is passed onto Step 2, to eliminate lected 200 TCs from the general TC repository to de-
duplications. fine the Master Plan in the first experiment and 427
TCs to define the second Master Plan;
• Step 2 (the merged list will then be examined
to treat duplications): As said before, number of In Tables 1 and 2 we have data collected from experiments
occurrences of each TC will influence on its final rank. 1 and 2, respectively. Each of these tables has 6 columns,
TCs appearing tree times are positioned on the top of explained as follows:
the list, followed by the TCs with two and then one
• Architect - represents the test architects that per-
occurrence, maintaining the initial order (Figure 4).
formed the manual selection;
• Chosen - represents the TCs selected by the archi-
tects, according to the MP;
Figure 4: TC duplication elimination and reordering
• Elected - represents the TCs suggested by AutoTest-
Plan, where the numbers X(Y) in Tables 1 and 2 mean
that AutoTestPlan returned Y TCs originally where X
TCs coincided with those chosen by the test architect;
• Top - represents the amount of TCs, chosen by Au-
toTestPlan, appearing in the top of the list of TCs that
coincide with those chosen by test architects;
• Recall - measures the amount of intersection between
the TCs returned by AutoTestPlan and the architect
with respect to all TCs chosen by the architect;
• Match - measures the top TCs against the TCs chosen
The final merged list will constitute the Test Plan to be by the test architect.
executed in the regression test. This strategy was validated
via an experiment presented in the following section. In the first experiment (see Table 1), the test architect A
chose 27 TCs from the MP (a total of 200 TCs). AutoTest-
Plan returned 92 TCs, from which all 27 TCs chosen by the
4. EXPERIMENTS AND THREATS TO VA- architect A lie within this returned set. This yielded a 100%
LIDITY recall. From this 27 TCs, 20 TCs appeared in the top of
In the following sections we present some performed ex- the prioritized list of TCs. This resulted in a match of 74%
periments as well as the main threats to validity we have (that is, 20/27). For architect B, we had similar percent-
identified. ages: 100% for recall and 72% for match. This experiment
received positive acknowledgment from the test architects.
4.1 Performed Experiments In the second experiment, the MP was bigger and com-
The implemented prototype was tested in two initial ex- posed of 427 TCs, from which architect A has chosen 195
periments, described in this section. The results were vali- TCs and architect B 128 TCs. AutoTestPlan returned 315
dated by the test team, which provided a very positive eval- TCs, from which 148 TCs coincided with the selection made
uation of the system final output (the Test Plan). by architect A and only 100 TCs with architect B. This
The experiments were created with the help of the com- time the recall was not so good as the previous experiment,
pany test architects and test team. We have to observe the obtaining 76% (or, 148/195) for architect A and 78% (or,
following setup information. 100/128) for architect B. Match was not so good as well be-
cause the top TCs ranked by AutoTestPlan were not selected
• CRs selection: the test architects selected two re- by the test architects (59.5% [or, 116/195] for architect A
lease notes (one for each experiment) to execute a re- and 33.6% [or, 43/128] for architect B).
4.1.1 General discussion tomatization (automatic queries for TC selection from the
As we can observe from experiment 1 and 2, experiment MP, report generation, etc.) has provided a 75% gain in the
1 exhibited higher recall and match percentages than exper- effort taken by the architects. And with the current results,
iment 2. The main reason for this is related to the number we can attack the other 25% although we still need a vali-
of components involved in the release notes. In experiment dation from the test architects. This means that we do not
1, just one component was affected. Whereas in experiment have a full automatic solution but it decreased at least 90%
2, several components were affected. This is directly related of the test architects labor.
to our merge strategy because when we combine isolated The goal of the work reported in [14] is completely aligned
results we have had some lost in matching. Using a previ- to ours. In particular, its industrial setting is very similar to
ous merge strategy, we only had 40% (against the current ours and its use of natural language processing. However,
59.5%) and 23% (against 33.5%) in experiment 2. We need there it deals with source code to better find the related
to further investigate other merge strategies to improve this test cases (based on features) whereas we do not have ac-
percentage match. cess to source code. In our case, components are our main
Another point that deserves attention is that in exper- features, restricting considerably the amount of test cases
iment 1, the choices made by architect B are a subset of to be executed. The refinement to components come from
the choices of architect A. But in experiment 2 this is com- the keywords appearing simultaneously in the texts of the
pletely different. Just 20% of coincidence occurs between change requests and test cases. Another difference is that
the choices of architects A and B. This is too low and thus in [14], the authors deal with software product lines (SPL)
our low match cannot be criticized without a metric inde- and we do not address this problem specifically, although
pendent of the choices made by the architects. Currently our context is based on SPL as well.
we only get feedback from an EDA (Escaped Defect Anal- The Work reported in [6] inherits a similar strategy as
ysis) report provided by our industrial partner. According ours in the sense of using information retrieval and similarity
to this report, our selections were better than those of the analysis. However, that work uses a specific ranking function
architects in the sense of less escaped defects. But we need with particular weights (in the direction of a previous work
to investigate further this point. of ours [1]), etc., whereas we use the built-in functionality
provided by Lucene [9].
4.2 Threats to validity Concerning test selection in particular, an interesting re-
A first threat we identified is related to the MP’s size. If lated work is reported in [13]. In that work, test selection
a MP has less than 50 TCs, for example, the similarity algo- is indeed dealt with as test suite reduction where several
rithm cannot work with efficacy and the automatic selection criteria are used to reduce a test suite while retain as most
can result in the same 50 TCs. But in practice this is not a as possible of its original properties in terms of source code
problem because in general MP’s size is at least of 100 TCs. and requirements coverage, etc. Its approach is more gen-
Another threat is associated to the use of the Master Plan eral than ours where we are more closely related to what is
defined by the test team. By assuming such an MP, we are named requirements coverage in [13]. That is, we try to cre-
considering that it has the best TCs to be selected for a ate test plans that covers similar requirements (keywords)
particular regression campaign. But similarly to what we in change requests as well as in test cases procedures.
observe with the architect’s selection, such an MP cannot be Several works use mathematical models, indeed transition
the best one. One future work is exactly to avoid needing systems, to select regression test cases such as [3, 5]. The
to use this MP to see whether the results improve or remain work reported in [3] reduces a test plan by checking depen-
the same. dency in EFSM structures whereas [5] reduces by applying
We have another concern that is related to the availability a similarity algorithm on transition systems. Although both
of just one test team to work with. We are already collab- use some kind of similarity algorithm like ourselves, they use
orating with other teams to perform similar experiments. some formal notation. The main difference to our work is
This is mainly associated to the quality of the textual parts exactly that we do not use any mathematical model except
in the test cases as well as CRs. the similarity algorithm for natural language provided by
As pointed out in the previous threat, the text provided the Lucene tool. We think our similarity criteria is more
in CRs is totally informal and does not follow guidelines. convenient because it is guided by change requets whereas
Fortunately in test cases, this is not a concern because they in the work reported in [5] it is related to the mathemati-
are written carefully and reviewed to guarantee that they cal model itself. Thus this reduction simply discards similar
are clear and well defined. test cases. In our case we discard those test cases not related
to the current change requests.
5. CONCLUSION The closest work to ours is the one reported in [4]. Like
ours, the work [4] performs a similarity check based on test
In this work we presented an automatic TC selection and
cases and change descriptions (or requests). But differently
prioritization strategy based on information retrieval applied
to ours, source code is used as well. In our case only infor-
on test cases and CRs. We developed a prototype tool,
mal documents are used which complicates considerably the
named AutoTestPlan, which implemented the concepts pre-
similarity algorithm. We intend to consider source code in
sented in the paper. We were able to run some experiments
the future but this is not done currently.
with real data obtained from our industrial partner. And we
As future work we intend to perform further experiments,
also evaluated our results with real test architects. Although
focusing other features of the Motorola smartphones.
the results obtained in experiment 2 were not so good, over-
We also intend to make a continuous analysis of the results
all the proposed strategy and tool is seen positively by our
obtained with our tool against the architect’s selections to
industrial partner because it decreases considerably the ef-
test a statistical hypothesis to completely replace the manual
fort taken by the human beings. For instance, just the au-
selection with the automatic one, improving the daily test Trans. Softw. Eng. Methodol., 6(2):173–210, April
process of Motorola Mobility. 1997.
From the experiments, we need to try other variants of the [12] Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid,
merge strategy to see whether we can obtain better match and Dewayne E. Perry. An information retrieval
results. approach for regression test prioritization based on
Finally we are trying to obtain access to the source-code program changes. In Proceedings of the 37th
related to the CRs at least. With this we can try to cal- International Conference on Software Engineering -
culate some kind of coverage to see whether our selections Volume 1, ICSE ’15, pages 268–279. IEEE Press, 2015.
are indeed better or not when compared to the architects. [13] August Shi, Alex Gyori, Milos Gligoric, Andrey
Currently we only have information from EDA reports. Zaytsev, and Darko Marinov. Balancing trade-offs in
test-suite reduction. In Proceedings of the 22Nd ACM
Acknowledgments. SIGSOFT International Symposium on Foundations
We would like to thank Alice Arashiro, Viviana Toledo of Software Engineering, FSE 2014, pages 246–256.
and Lucas Heredia from Motorola Mobility, and Virginia ACM, 2014.
Viana. This research is supported by Motorola Mobility. [14] Michael Unterkalmsteiner, Tony Gorschek, Robert
Feldt, and Niklas Lavesson. Large-scale information
6. REFERENCES retrieval in software engineering - an experience report
[1] Cláudio Magalh aes, Alexandre Mota, and Eliot Maia. from industrial application. Empirical Software
Automatically finding hidden industrial criteria used Engineering, pages 1–42, 2015.
in test selection. In 28th International Conference on
Software Engineering and Knowledge Engineering,
SEKE’16, San Francisco, USA, pages 1–4, 2016.
[2] G. Canfora and L. Cerulo. Impact Analysis by Mining
Software and Change Request Repositories. In 11th
IEEE International Software Metrics Symposium
(METRICS’05), page 29. IEEE, 2005.
[3] Paolo Tonella Cu D. Nguyen, Alessandro Marchetto.
Model based regression test reduction using
dependence analysis. In In Proceedings of the
International IEEE Conference on Software
Maintenance, pages 214–223. IEEE, 2002.
[4] Paolo Tonella Cu D. Nguyen, Alessandro Marchetto.
Test case prioritization for audit testing of evolving
web services using information retrieval techniques. In
Web Services (ICWS), 2011 IEEE International
Conference on, pages 636–643. IEEE, 2011.
[5] Patrı́cia Duarte de Lima Machado Francisco Gomes de
Oliveira Neto. Seleção automática de casos de teste de
regressão baseada em similaridade e valores. In
Revista de Informática Teórica e Aplicada:RITA,
v20(2), pages 139–154, 2013.
[6] Manisha Khattar, Yash Lamba, and Ashish Sureka.
Sarathi: Characterization study on regression bugs
and identification of regression bug inducing changes:
A case-study on google chromium project. In
Proceedings of the 8th India Software Engineering
Conference, pages 50–59. ACM, 2015.
[7] Christopher D. Manning, Prabhakar Raghavan, and
Hinrich Schütze. Introduction to Information
Retrieval. Cambridge University Press, New York, NY,
USA, 2008.
[8] Michael McCandless, Erik Hatcher, and Otis
Gospodnetic. Lucene in Action: Covers Apache
Lucene 3.0. Manning Publications Co., 2010.
[9] Michael McCandless, Erik Hatcher, and Otis
Gospodnetic. Lucene in Action: Covers Apache
Lucene 3.0. Manning Publications Co., 2010.
[10] Gregg Rothermel and Mary Jean Harrold. Analyzing
regression test selection techniques. IEEE Trans.
Softw. Eng., 22(8):529–551, August 1996.
[11] Gregg Rothermel and Mary Jean Harrold. A safe,
efficient regression test selection technique. ACM

You might also like