Mutation Testing Cost Reduction Techniques
Mutation Testing Cost Reduction Techniques
testing
F
rom the research perspective, mutation is a mature testing technique that has
Although mutation’s
often shown its value for evaluating both software and software testing tech-
main steps (mutant
niques. However, to the best of our knowledge, there’s an important gap be-
generation, test tween its current research status and the possibilities of adopting it for the indus-
case execution, and trial world, owing to its high costs.
result analysis) can For three decades, researchers have made consid- So, a mutant M of a program under test P is a copy
be costly, research erable effort and obtained sufficient results regard- of P that contains a small code change that’s inter-
allows developers to ing mutation. However, neither software practition- preted as a fault. Mutation relies on the ability of
ers nor testing-tool developers have put the results the test data set (the test suite) to find faults in the
apply it to industry. to work. Here, we describe research on cost reduc- set of mutants.
tion in mutation testing, focusing on techniques Test engineers typically use automated tools to
that could easily transfer to industrial practice. generate mutants. These tools apply a set of muta-
tion operators to P. They define each mutation op-
Preliminary concepts erator to introduce some type of syntactic change
Richard DeMillo and his colleagues proposed mu- to a statement. For example, a simple instruction
tation as a testing technique in 1978.1 They de- such as return a + b (where a and b are integers) can
scribe this basic idea as follows: mutate in at least 20 different ways (a − b, a × b, a
/ b, a + b++, −a + b, a + − b, 0 + b, a + 0, |a| + b,
A programmer enters from a terminal a pro- a + |b|, and so on), depending on the mutation op-
gram, P, and a proposed test data set whose erators. Thus, the number of mutants generated
adequacy is to be determined. The mutation even for a medium-size program can be very large.
system first executes the program on the test Dealing with this number of mutants has implica-
data: if the program gives incorrect answers tions regarding the time needed to compile, link,
then certainly the program is in error. On and execute them.
the other hand, if the program gives correct Automated tools typically execute test cases
answers, then it may be that the program is against the original program and the mutants,
still in error, but the test data is not sensi- registering the results with each program version
tive enough to distinguish that error: it is not (original or mutant). When the result of execut-
adequate. The mutation system then creates a ing a test case against a mutant M differs from the
number of mutations of P that differ from P same test case against P, the test case has found
only in the occurrence of simple errors.1 the fault introduced in M, and the mutant is killed;
May/June 2010 I E E E S O F T W A R E 81
Offutt’s process proposes generating mutants and mutation, which consists of generating mutants
iteratively executing the test cases against the liv- using only a reduced subset of mutation opera-
100 percent ing ones. As long as the process doesn’t reach a tors. The criterion for selecting the operators
May/June 2010 I E E E S O F T W A R E 83
unattended, nightly batch processes, the com- mutants and shows meaningful cost reductions
plete execution and further application of a in mutation testing, especially during result anal-
Combining greedy algorithm is a good choice for approach- ysis. At first glance, the number of second-order
A
Result Analysis n industrially applicable mutation-test-
The most important obstacle in this third step is ing tool should have these requirements:
the presence of equivalent mutants. Phyllis Frankl
and her colleagues discuss the almost prohibitive
cost of detecting equivalent mutants.24 Bernhard ■■ Users should be able to generate mutants with
Grün and his colleagues report of a duration of a selective set of generally applicable mutation
15 minutes to assess the equivalence of a single operators, most likely AOR, ROR, UOI, ABS,
mutation.25 and LCR. Additionally, and for specific lan-
From a formal point-of-view, the problem guages or environments, the tool should con-
with detecting all equivalent mutants is undecid- sider including other concrete operators.
able, although in practice you can detect some by ■■ Users should be able to select a random set of
annotating the program under test with restric- mutants.
tions26 and program slicing.27 However, the in- ■■ Also depending on the specific environment,
dustry doesn’t usually apply these techniques, so the tool should allow mutation at compiled-
they aren’t easily adaptable to common software code level (bytecode for Java, Microsoft Inter-
development practice. mediate Language for .NET, and so on).
Many selective-mutation concepts aim to re- ■■ In test execution, the tool should support
duce the number of equivalent mutants, which both executing test cases on only the mutants
implies a considerable reduction during result remaining alive and, regarding batch, unat-
analysis. From the automatable-techniques per- tended testing cycles, selecting a reduced test
spective, a recent paper discusses perhaps the suite with, for example, a greedy algorithm.
most significant results and relies on n-order mu- ■■ The tool should support instrumentation of
tation.28 An n-order mutant contains n faults both the original program and the mutants to
instead of 1 and proceeds from a previous gen- keep a log of the execution. Changes in a log
eration’s combination of mutants. Thus, two would highlight a behavior difference, mean-
first-order mutants (each with a fault) are com- ing that the corresponding mutant has been
bined into a second-order mutant with two faults, killed, and making this technique a type of
which might in turn be combined with another weak mutation.
first-order mutant to obtain a third-order mutant. ■■ To reduce result analysis costs, the tool should
The paper describes three algorithms for pro- allow n-order mutation, which is easily auto-
ducing second-order mutants from first-order matable and transferable to industry.
May/June 2010 I E E E S O F T W A R E 85
Int’l Conf. Software Eng. (ICSE 05), ACM Press, 2005, 24. P.G. Frankl, S.N. Weiss, and C. Hu, “All-Uses versus
pp. 402–411. Mutation Testing: An Experimental Comparison of
15. R. DeMillo, R.J. Lipton, and F.G. Sayward, “Hints on Effectiveness,” J. Systems and Software, vol. 38, no. 3,
Test Data Selection: Help for the Practicing Program- 2007, pp. 235–253.
mer,” IEEE Computer, vol. 11, no. 4, 1978, pp. 34–41. 25. B.J.M. Grün, D. Schuler, and A. Zeller, “The Impact of
16. A.J. Offut, “Investigations of the Software Testing Equivalent Mutants,” Proc. IEEE Int’l Conf. Software
Coupling Effect,” ACM Trans. Software Eng. and Testing, Verification, and Validation Workshops (ICST
Methodology, vol. 1, no. 1, 1992, pp. 15–20. 09), IEEE CS Press, 2009, pp. 192–199.
17. E.F. Barbosa et al., “Toward the Determination of 26. A.J. Offutt and J. Pan, “Automatically Detecting
Sufficient Mutant Operators for C,” Software Testing, Equivalent Mutants and Infeasible Paths,” Software
Verification and Reliability, vol. 11, no. 2, 2001, pp. Testing, Verification and Reliability, vol. 7, no. 3, 1997,
113–136. pp. 165–192.
18. Y.-S. Ma, “MuJava: An Automated Class Mutation 27. R. Hierons and M. Harman, “Using Program Slicing to
System,” Software Testing, Verification and Reliability, Assist in the Detection of Equivalent Mutants,” Soft-
vol. 15, no. 2, 2005, pp. 97–133. ware Testing, Verification and Reliability, vol. 9, no. 4,
19. M.R. Garey and D.S. Johnson, Computers and Intrac- 1999, pp. 233–262.
tability, W.H. Freeman, 1979. 28. M. Polo, M. Piattini, and I. García-Rodríguez, “De-
20. D. Jeffrey and N. Gupta, “Test Suite Reduction with creasing the Cost of Mutation Testing with Second-
Selective Redundancy,” Proc. 21st Int’l Conf. Software Order Mutants,” Software Testing, Verification and
Maintenance (ICSM 05), IEEE CS Press, 2005, pp. Reliability, vol. 19, no. 2, 2008, pp. 111–131.
549–558.
21. R. DeMillo, E. Krauser, and A. Mathur, “Compiler-
Integrated Program Mutation,” Proc. 15th Ann.
Computer Software and Applications Conf. (Compsac
91), pp. 351–356. 1991.
22. W.E. Howden, “Weak Mutation Testing and Complete-
ness of Test Sets,” IEEE Trans. Software Eng., vol. 8,
no. 4, 1982, pp. 371–379.
23. A.J. Offutt and S.D. Lee, “An Empirical Evaluation of
Weak Mutation,” IEEE Trans. Software Eng., vol. 20, Selected CS articles and columns are also available
no. 5, 1994, pp. 337–344. for free at http://ComputingNow.computer.org.
http://careers.computer.org