Sufficient Mutant Operators: Offutt, Rothermel, Lee, Untch, and Zapf TOSEM, April 1996
Sufficient Mutant Operators: Offutt, Rothermel, Lee, Untch, and Zapf TOSEM, April 1996
Mutation testing: randomly change code, run against test suite For example, change
int min(int a, int b) { int result = a; if ( b < a ) result = b; return result; } into int min(int a, int b) { int result = b; if ( b < a ) result = b; return result; }
Now:
min(6, 1)
gives 1, but
min(1, 6)
gives 6
If new program produces same output on all test data, then either
Test data not sufficient to capture all error Program contains dead code New program equivalent to the old
Plan: generate code, generate tests, generate non-equivalent mutants, add to tests until all mutants killed What operators for generating mutants? Mothra: system developed in 90s which used 22 operators
Assumption: if program to be tested isnt correct, it contains at most a few small errors
Are all of these operators necessary? In general: do we need a huge number of operators to make this work?
CRP
CSR DER
Constant replacement
Constant for scalar variable replacement DO statement end replacement
GLR
LCR ROR RSR SAN SAR SCR SDL SRC SVR UOI
22 operators too many mutants! But note some operators produce more mutants than others:
~18% by SVR, ~13% by ASR; ~60% by top five Can we find a useful subset?
An experiment
Cal
Euclid Find Insert
29
11 28 14
3010
196 1022 460
Mid
Quad Trityp Warshall
16
10 28 11
183
359 951 305
Experiment, contd.
Use tool to generate test data sets Augmented these with hand-generated data sets to kill remaining incorrect mutants
Small number of hand-generated cases compared to total test cases Also tried completely random data; results similar
Results
Tried different sets, found best results with five operators ABS, AOR, LCR, ROR, UOI Each test set generated using these mutant operators resulted in killing 98.5% (and an average of 99.5%) of all mutants: Conclusion: these operators sufficient
Results, Continued
Removing UOI, AOR, ABS would clearly weaken tests Evidence for removing LCR is weak; few such connectives in sample programs ROR provides branch coverage and so hard to justify removing
insert calls to an absolute value function AOR: replace all arithmetic ops by every syntactically legal alternatives LCR: replaces AND, OR by all logical connectors ROR: replace (modify) relational operators UOI: insert unary operators
Conclusion
Five operators, responsible for ~17% of all mutants, sufficient for 10 test programs
In general: get O(Lines + References) mutants Constant for O is large! Above 10 programs (200 lines): 231,972 mutants
Write code Write tests, possibly using a tool to generate cases Evaluate tests using mutants; if some non-equivalent mutant not killed, extend test set