Metaheuristics Applied To Automatic Software Testing: A Brief Overview
Metaheuristics Applied To Automatic Software Testing: A Brief Overview
'
This problem can be solved using numerical optimization techniques for constrained problems. The
approach also includes the presence of arrays and pointers which are difficult to handle by symbolic
representation. Thats the first interpretation of the test data generation problem as an optimization
problem.
But, this approach was difficult: is based in direct search methods, and requires continuity of the branch
functions because it uses information about derivatives, and is susceptible to find a can be trapped in
local minimum.
The success of metaheuristic techniques in solve global optimization problems, especially problems
involve several variables and multiple local optimum, take the attention of the researchers in software
testing.
Since 90 several research efforts was made about the application of metaheuristics techniques to
software engineering problems. These applications will be covered in following sections.
4. METAHEURISTICS IN SOFTWARE ENGINEERING.
The term Metaheuristics can be interpreted as a group of general purpose algorithms based on iterative
methods that guide the search for the optimum using concepts of exploration and exploitation of the
search spaces. It works on several points (candidate solution population) and not only on a single point
in the search space.
Metaheuristics have several advantages over he classical direct search method based in gradients: one
of the best advantage is that metaheuristics can works on problems with several variables and dont
need information about the objective function or its derivatives. All of the methods presents here are
used to carry out the structural testing and uses information about the program flow graph.
One of the firsts approach in the application of metaheuristics was made by Pei et.al. [10], which uses a
binary coded genetic algorithm to solve the test data generation problem in structural testing. Peis
approach is based on a pathwise test data generation which consists in three steps: program flow
control graph constructions, path selection and test data generation and dynamic program execution.
The path selection is made manually.
They use the same branch functions defined by Korel and depicted in table 1. As fitness function, they
define
1 2 n
F F F F + + + L , where F is the sum of the branch functions along the path previously
selected is.
One year later, Roper et.al. [12], presents another application of genetic algorithms to the test data
generation. In this report, they fail to explained several aspects of the method employed, and only
define that the fitness of a chromosome corresponds to the coverage it achieves of the program under
test [12], assign high fitness to chromosomes that executes the true value of each branch.
In 1995, Sthamer [3] presents a doctoral thesis about test data generation applying genetic algorithms,
and he compares it with random testing. Sthamer consider three fitness functions based on the
predicates, one of them is defined by:
( )
2
Fit
h g
+
9
Where h and
g
are the expressions involved in the predicate,
+
Where the function ( )
distance x is defined by the functions presented in table 2 and used as fitness
function in [21]. Te test criteria used is the branch coverage.
The results of [23] conclude that ES and PSO have similar performance, with small differences for
some program under test. So, ES and PSO presents better performance that GA and Ramdom testing.
The worst results was obtained for the select program (select k-th element for non ordering list), with
PSO = 88.89%, ES = 83.33%, GA = 83.33% and Rand = 11.11% of coverage. The better results was
obtained for sa program (Simmulated annealing), with PSO = 100%, ES = 99.94%, GA = 96.72% and
Rand = 96.67% of coverage. The PSO approach needs several less evaluations that other to achieve this
levels of coverage. Chicano concludes that approach with lowest number of evaluations is better than
other if it have highest coverage than other.
Chicanos thesis is also a very good reference because the explanation of the methods and the appendix
containing the methods parameter adjustment procedure.
5. A SIMPLE TESTING APPROACH.
Now, we consider a simple method for testing, based on the proposed by Harmen-Hinrich Sthamer in
hid Doctoral Thesis [3], because is a good start point in the testing strategies, principally in the If-Then
statement. Thats a first approach, which must be evaluated for acceptation or modifications.
An If-then-else statement has the following form:
If Cond(n)
Sen(n+1)
Else
Sen(n+2)
End if
Where Cond(n) represents the condition (node predicate) associated with the node n, S(n+k) represents
the statement that be executed and result in node (n+k) depending of the result of condition Cond(n).
The flow graph of this structure has the following form:
N
N+2 N+1
If
Cond(n)
then
Else
Fig. 5. Flowgraph for a If-Then-Else structure.
12
The flowgraph mean that if Cond(N) is true, then S(n+1) is executed, else, if Cond(N) is false, then
S(n+2) is executed.
We define a Test Case ( ) T n as a pair of variables , x y , and define an objective node N for each
iteration, a condition ( ) Cond n x y , and statement ( ) 1 S x y + and ( ) 2 S x y . We initialize
( ) T n as a random number. The source code must be instrumented as follows:
branch[0] = branch[0]+1;
If x y
branch[1]=branch[1]+1;
x+y;
Else
branch[2]=branch[2]+1;
x-y;
End if
Each time that condition Cond(n) is true, the counter branch[1] is incremented and x+y is executed,
and when Cond(n) is false, branch[2] is incremented and x-y is executed.
A suite of test cases (called in evolutionary computing terms as population) is generated by mean of
random number generator. Each test case is evaluated in the program, and counters branch[k] is
incremented if this branch k = (n, n+1) is traversed by the test case. In each iteration, an objective
node is defined; this objective node is the node where the test is start.
There are some statuses of the test result:
If a node n is traversed only in one way of the condition, once all test cases were evaluated, we
say that node n is achieved.
If a node n is traversed in the two ways of the condition, once all test cases were evaluated, we
say that node n is covered.
If a node n is not traversed, once all test cases were evaluated, we say that node k is not
evaluated.
Of course, the last status of test is obtained for nodes associated with branch not covered by a specific
test case.
At the end of test procedure, the status structure containing the results of partial branch coverage is
compared and we determine the coverage criteria for the test as the ratio between branches cover to
total program branch.
% 100 *100
Number of branches covered
Max
Number of total branches
_
,
In a first stage, the test cases were generated as random numbers, and the testing was made as we
explain above.
A scheme of the approach can be depicted as follows:
13
Program under
test
Control
unit
Module
of optimization
Suite of test cases
(Initial population)
Fig. 6. Schema of the first approach proposed.
The structure of the program under test is extracted manually and is an input to the testing program. A
suite of test cases is randomly generated. The control unit is the main module of the testing procedure,
and it conducts the search procedure for each test case. Once a test case is presented, the control unit
evaluates the fitness function and sends this information to the optimization module, which contain the
evolutionary methods to search the best test cases. Once the search procedure was made, this optimum
test case is evaluated into the program under test and the status is determinate.
At the end of the test procedure, the control unit determines the metrics for evaluate the performance of
the test executed.
The optimization module contains the evolutionary methods that we are development to carry out the
search of optimal test data, considering the fitness function which is based on the distance between the
variables of a specific predicate in the program under test. The objective of the application of such
metaheuristics is to find the boundary values which make a condition change its value from true to
false.
The optimization algorithm take the information about the predicate distance defined by each condition
as presents Sthamer [3] or Chicano [23] thesis, and run the optimization approach maximizing the
fitness functions which is defined as a function of this distance. The details of the evolutionary methods
applied to the testing procedure, will be present in other report about this methods.
6. OUR FIRST APPROACH.
The first approach to the strategy that will be applied in our project, is similar to the approach show
above. We consider some ideas from Chicanos and Sthamers Doctoral works.
On the beginning, we extract manually the structure of the program under test to construct the
flowgraph. Follow, we generate a random initial population of test case ( ) T n . We consider that each
node is an partial objective to be satisfied by the generated data; as if, we divide the entire program in
partial objectives, and consider that each objective has associated a distance function that depends of
the relational operator and also have a fitness or evaluation function defined in terms of these
distance. The distance end fitness function is the same proposed by Chicanos and University of
Malaga group, and presents in Table 2.
For each partial objective (each node in the flowgraph), is desirable that this objective would be
covered by the suite of test data generated; when we work with a random generated test data, we obtain
several objectives covered by the test data and some objectives achieved by the same test suite.
At this time, we conclude the simulations with a random test data generator and we are defining the
strategies for the application of the genetic algorithm to the generation of test data using the distance
and fitness function depicted above.
14
The first program under analysis is the triangle classifier, which classifies the type of triangle
depending of the sides entering or determines if it is not a triangle. This is the principal program tested
in the references cited.
As preliminary results of the application of the random generator, we achieve from 80% to 90% as
coverage criteria. These results is being detailed in the tests report.
7. CONCLUSIONS.
The representation of software engineering problems as an optimization problem is a research subject
that have been studied in the last 10 years, especially by UK researchers in the beginning of the 2000
[3, 4, 12, 14, 15, 16, 17, 18, 19], and in the three last years with Spanish researchers [5, 20, 21, 22, 23].
Metaheuristics techniques have been better performance than classical optimization techniques
especially in problem that involves great dimension search spaces and non-continuous objective
functions, as the objective functions associated with the program predicates.
We have find several approaches to define the fitness or evaluation function, but in general, it is
constructed based on the predicate relationship; this functions has the particularity of is an positive
value functions. The authors not have any explain for this, bus we thinks that is for use of very simple
chromosome selection scheme like roulette wheel.
There are two coverage approaches presented in literature: statement coverage and branch coverage.
Statement coverage implies that all atomic conditions take value of true and false, if the statement take
both logical values we say that statement is cover, but if only take true or false, we say that statement is
achieved. A branch coverage criterion implies that all branches in program must be traversed by an
specific test data set.
Another approach made by the authors is the partition of the global objective in several partial
objectives, defining an objective statement that must take a logical value; each partial objective is
treated as an optimization problem with objective function based on the predicate condition.
The metrics commonly applied to measure the performance of the test methods is the percent coverage,
defined as the ratio between coverage statements and total statements. Another metrics are used as
number of evaluations, the measure of how closer is the test data to the predicate condition boundaries.
The metrics varies between researchers.
The software testing techniques applying metaheuristics is a promising research field that is still in
research phase, with several successfully applications like occurs in DaimlerChrysler, where test the
software control of the vehicles. The latest research works conducted in Spain, leads the efforts to he
use of Evolution Strategies over the classical Genetic Algorithms.
As a part of our project, wee will propose the application of a technique to the test data generation,
using a novel approach in evolutionary computing named Evolutionary Flexible Computing. This
approach will be compared with classical genetic algorithm and evolution strategies methods applied to
generically test programs.
Finally, we have presented a first approach to the testing strategy that we are development. This
approach must be change, but its show the principal ideas of the testing strategy under development.
15
8. REFERENCES.
[1]. Pressman, R. S., Software engineering: a practitioners approach. McGraw-Hill, fifth edition,
2001. ISBN 0-07-365578-3..
[2]. Zhu, H., Hall, P. A. V., and May, J. H. R., Software unit test coverage and adequacy. ACM
Computing Surveys, 29(4):366427, 1997.
[3]. Sthamer, H-H., The Automatic Generation of Software Test Data Using Genetic Algorithms,
Ph.D. Thesis, University of Glamorgan, United Kingdom, 1995.
[4]. Harman, M., Software Measurement and Testing, Class notes CS3SMT, Kings College
London, Egland, Attum Term, 2004.
[5]. Diaz, E., Tuya, J., Blanco, R., Dolado, J. J., A tabu search algorithm for structural software
testing, Computers and Operations Research, Vol. xx, No. xx, Febrary 2007.
[6]. Kaner, C., Measurement Issues and Software Testing, in Testing Computer Software, 3
rd
edition, XXXX
[7]. McGraw, G., Mchael, C., Automatic Generation of Test-Cases for Software Testing, in
Proceedings of the 18
th
Annual Conference of the Cognitive Science Society, July 1996.
[8]. Edvarson, J., A survey on Automatic Test Data Generation, in Proceedings of the Second
Conference on Computer Sciences and Engineering in Linkoing, pp. 21-28, October 1999.
[9]. McGibbon, T., An Analysis of Two Formal methods VDM and Z, disponible en
http://www.dacs.dtic.mil, Agosto 1997.
[10]. Pei, M., Goodman, E. D., Zongyi, G., Zhong, K., Automated Software Test Data Generation
Using A Genetic Algorithm, Tehnical Report GARAGE, Michigan State University, June
2004.
[11]. Korel, B., Automated Software Test Data Generation, IEEE Transactions on Software
Engineering, Vol. 16, No. 8, August 1990.
[12]. Roper, M., MacLean, I., Brooks, A., Miller, J., Wood, M., Genetic algorithms ans the
automatic generation of test data, Research Report RR/95/195, Department of Computer and
Information Sciences, University of Strathclyde, UK, 1995.
[13]. Pargas, R. P., Harrold, M. J., Peck, R. R., Test-Data Generation Usng Genetic Algorithms,
Journal of Software Testing, Verification and Reliability, 9(4), pp. 263-282, 1999.
[14]. Wegner, J., Overview on evolutionary testing, IEEE Seminal Workshop, Toronto, Canada,
May 2001.
[15]. Wegner, J., Overview on Evolutionary Testing at DaimlerChrysler, Seminal Meeting,
University of Glamorgan, UK, July 2000.
[16]. Harman, M., Search Based Software Engineering, Information and Software Technology,
Vol. 43, No. 14, pp. 833-839, December 2001.
[17]. McMinn, P., Search-based Software Test Data Generation: A survey, Software Testing
Verification and Reliability, Vol. 14(2), pp. 105-156, 2004.
16
[18]. Clarke, J., Harman, M., Hierons, R., Jones, B., Lumkin, M., Rees, K., Roper, M., Shepperd, M.,
The Application of Metaheuristic Search Techniques to Problems in Software Engineering,
Technical Report SEMINAL-TR-01-2000, Brunel University, United Kingdom, August 2000.
[19]. Clarke, J., Harman, M., Hierons, R., Jones, B., Lumkin, M., Rees, K., Roper, M., Shepperd, M.,
Reformulating software engineering as a search problem, IEEE Proceedings of Software
Engineering, Vol. 150(3), pp. 665-690, 2003.
[20]. Blanco, R., Diaz, E., Tuya, J., Generacin automtica de casos de prueba mediante bsqueda
dispersa, Revista Espaola de Innovacin, Calidad e Ingeniera de Software, Vol. 2, No. 1,
2006.
[21]. Alba, E., Chicano, F., Software testing using evolutionary strategies, The 2
nd
Workshop on
Rapid Integration of Software Engineering Techniques (RISE 05), LNCS3943, pp.56-65,
Greece, September 2005.
[22]. Alba, E., Chicano, F., Janson, S., Testeo de software on dos tcnicas metaheuristicas, in VX
Jornadas de Ingeniera de Software y Bases de Datos JISBD 2006, Jos Riquelme-Pere Botella
eds., Barcelona, 2006.
[23]. Chicano, J. F., Metaheursticas e Ingeniera del Software, Tesis Doctoral, Departamento de
Lenguajes y Ciencias de la Computacin, Universidad de Mlaga, Spain, 2007.
17