0% found this document useful (0 votes)
32 views9 pages

Using Genetic Algorithm As Test Data Gen

Uploaded by

Alfonso Romero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views9 pages

Using Genetic Algorithm As Test Data Gen

Uploaded by

Alfonso Romero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Software Engineering and Applications, 2013, 6, 65-73 65

doi:10.4236/jsea.2013.62011 Published Online February 2013 (http://www.scirp.org/journal/jsea)

Using Genetic Algorithm as Test Data Generator for


Stored PL/SQL Program Units
Mohammad A. Alshraideh, Basel A. Mahafzah, Hamzeh S. Eyal Salman, Imad Salah
The Department of Computer Science, The University of Jordan, Amman, Jordan.
Email: [email protected]

Received December 8th, 2012; revised January 9th, 2013; accepted January 20th, 2013

ABSTRACT
PL/SQL is the most common language for ORACLE database application. It allows the developer to create stored pro-
gram units (Procedures, Functions, and Packages) to improve software reusability and hide the complexity of the execu-
tion of a specific operation behind a name. Also, it acts as an interface between SQL database and DEVELOPER.
Therefore, it is important to test these modules that consist of procedures and functions. In this paper, a new genetic
algorithm (GA), as search technique, is used in order to find the required test data according to branch criteria to test
stored PL/SQL program units. The experimental results show that this was not fully achieved, such that the test target in
some branches is not reached and the coverage percentage is 98%. A problem rises when target branch is depending on
data retrieved from tables; in this case, GA is not able to generate test cases for this branch.

Keywords: Genetic Algorithms; SQL Stored Program Units; Test Data; Structural Testing; SQL Exceptions

1. Introduction test data [2].


Generally, structural testing techniques are classified
PL/SQL is an imperative third generation language (3GL)
into two categories: static testing (manual) and dynamic
that was designed specifically for the processing of SQL
testing (automatic). In the static testing, a code reviewer
commands. It provides specific syntax for this purpose
reads the source code statement by statement and visu-
and supports exactly the same data types as SQL. ORA-
ally follows the logical program flow by feeding an in-
CLE database can be accessed by calling PL/SQL named put, so it is costly. In contrast, dynamic testing tech-
block that include functions and procedures. Therefore, niques execute the program under test on test input data
they must be executed properly in order to guarantee a and then simply observe the results. Consequently, dy-
reliable and confidence database system [1]. namic testing reduces the cost of software development
Software testing is an important stage of software de- and maintenance [2]. Search-based software testing is an
velopment life cycle (SDLC). It is an activity that helps example of dynamic method used to generate test set that
finding out bugs and errors in a software system that is can be successfully applied in structural testing. It relies
under development in order to provide a bug free and on a cost function that can be used to compare candidate
reliable system/solution to the customer [2]. Testing has test data [5].
two main types based on the knowledge of the system: Genetic algorithms (GA) have been very interesting
black box testing (functional) and white box testing area of study in many disciplines, such as optimization,
(structural) [2-4]. The functional testing deals with the automatic programming, economics, immune systems,
system as a black-box that does not explicitly use knowl- ecology and social systems. In this paper we apply the
edge of the internal structure; which means it usually GA as a search technique to find test data to test named
makes sure that the system is working according to the block in ORACLE; specifically, IF-statement and While-
system requirements, while the structural testing gener- statement and their combinations are considered [6].
ates the test data depend on the knowledge of internal The rest of the paper is organized as follows: Section 2
code of the system. During structural testing, the goal is presents background and related work. Section 3 presents
to generate a test data which satisfy a given testing crite- a strategy for applying GA to test named block in ORA-
rion to cover given elements of the program. In this pa- CLE. Section 4 presents experimental environment and
per, the branch coverage criterion is considered, where Section 5 presents experimental results. Finally, Section
each branch of the program should be reached by some 6 concludes the paper.

Copyright © 2013 SciRes. JSEA


66 Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units

2. Background and Related Work according to a cooling schedule. Initially the temperature
is high, in order to allow free movement around the
This section presents an overview of evolutionary algo-
search space. As the search progresses, the temperature
rithms; such as random test data generation and Hill
decreases. However, if cooling is too rapid, not enough
Climbing, and meta-heuristic search algorithms which
of the search space will be explored, and the chance of
proposed a potential better alternative for developing test
the search becoming stuck in the local optima is in-
data generators [7,8]. Efficient existing meta-heuristic
creased [12].
search algorithms include Simulated Annealing, Tabu
Search, GA and Ant Colony Optimization. Each of these
2.4. The Principles of Genetic Algorithms
search algorithms has its own advantages and disadvan-
tages over the others. They are strongly domain depend- The basic concepts of GAs are developed by Holland
ent problem, because they use domain dependent knowl- [13]. GAs is commonly applied to a variety of problems
edge or heuristics related to the problem domain under involving searching and optimization. GAs search meth-
consideration. Also in this section, stored program units ods are rooted in the mechanisms of evolution and natu-
are explained and an overview about Jordan University ral genetics. GAs draw inspiration from the natural sea-
Hospital Computer systems is presented. rch and selection processes leading to the survival of the
fittest individuals. GAs generates a sequence of popula-
2.1. Random Test Data Generation tions by using a selection mechanism, and use crossover
and mutation as search mechanisms [14,15,16].
Random test data generation is a technique based on se-
lection test data randomly until the suitable test data is The principle behind GAs is that they create and main-
found. It only explores the search space by randomly tain a population of individuals represented by chromo-
selecting solutions and evaluating their fitness. This is somes (essentially a character string analogous to the
quite an unintelligent strategy but it does not take much chromosomes appearing in DNA). These chromosomes
effort to be implemented [9]. are typically encoded solutions to a problem. The chro-
mosomes then undergo a process of evolution according
2.2. Hill Climbing to the rules of selection, mutation and reproduction. Each
individual in the environment (represented by a chromo-
Hill Climbing is a well known local search algorithm. some) receives a measure of its fitness. Reproduction
Hill Climbing works to improve one solution, with an selects individuals with high fitness values in the popula-
initial solution randomly chosen from the search space as tion, and through crossover and mutation of such indi-
a starting point. The neighborhood of this solution is in- viduals, a new population is derived in which individuals
vestigated. If a better solution is found, then the current may be even better fitted to their environment. The proc-
solution is replaced. The neighborhood of the new solu- ess of crossover involves two chromosomes swapping
tion is then investigated. If a better solution is found, the chunks of data (genetic information) and is analogous to
current solution is replaced again, and so on, until no the process of sexual reproduction. Mutation introduces
improved neighbors can be found for the current solution. slight changes into a small proportion of the population
Hill climbing is simple and gives fast results. However, it and is representative of an evolutionary step. The struc-
is easy for the search to yield sub-optimal results when ture of a simple GA is given in Figure 1. The algorithm
the Hill Climbing leads to a solution that is locally opti- in Figure 1 will iterate until the population has evolved
mal, but not globally [10]. to form a solution to the problem, or until a maximum
number of iterations have occurred.
2.3. Simulated Annealing
Simulated Annealing (SA) extends Hill Climbing such Simple Genetic algorithm ( )
{
that it accepts poor solutions with low probability. SA initialize population;
allows for less restricted movement around the search evaluate population;
space. The probability of acceptance (p) of an inferior while termination criterion not reached
solution changes as the search progresses, and is calcu- {
lated as in Equation (1) [11,12]. select solutions for next population;
perform crossover and mutation;
p  e  t
(1) evaluate population;
where (δ) represents the difference in the objective value
}
between the current solution and the neighboring inferior }
solution being considered, and (t) is a control parameter
known as the temperature. The temperature is cooled Figure 1. The structure of a simple GA.

Copyright © 2013 SciRes. JSEA


Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 67

2.5. Stored PL/SQL Program Units Oracle forms, and upgrades it to Oracle 10 g. HIS devel-
oped to provide best medical services for patients and
There are three types of stored program units in PL/SQL;
physicians. Delivering these services require hospitals to
procedures, functions, and packages. Every stored pro-
review the way they manage their business processes and
gram unit has a declarative part, an executable part or
body and an exception handling part which is optional supply more efficient features to physicians, patients, and
[1]. Declarative part contains variable declarations. Body hospitals officials as well as other decision makers. In
of the named block contains executable statements of order to provide such services, the health facility must
SQL and PL/SQL. Statements to handle exceptions are focus on developing a solution to connect all its re-
written in exception part. However, subprograms provide sources and makes it available to all who needs utilizing
the following advantages [13]: it using latest technology. This kind of solution will en-
 They allow you to write PL/SQL program that meets hance the performance and optimize the efficiency and
our need. will reduce the cost of ownership.
 They allow you to break the program into manageable IT department in JUH creates a solution suite that
modules. transforms the hospital to a community allowing the ac-
 They provide reusability and maintainability for the cess to all resources and data as needed. HIS is a com-
code. prehensive solution developed specifically for health
Procedure is a subprogram used to perform a specific facilities in the region. It is flexible, comprehensive,
action. A procedure contains two parts; specification and multilingual, integrated and secured solution that sup-
body. Procedure specification begins with the procedure ports clinical, financial, administration and higher man-
name and ends with parameters list. Procedures that do agement needs.
not take parameters are written without a parenthesis. In general, a hospital management system can be sub
The body of the procedure starts after the reserved word categorize into the following groups (Figure 2):
(IS) or (AS) and ends with keyword END [17]. A func-  Medical Information System (Administrative and
tion is PL/SQL Block which is similar to a procedure. Clinical).
The major difference between a procedure and a function  Enterprise Resource Planning (ERP) (Material, Fi-
is the function must always return a value, but a proce- nancial and Human Resources).
dure does not return a value [1,17]. A package is an en-  Support System.
capsulated collection of related program objects (for Medical systems are developed to deliver all needed
example, procedures, functions, variables, constants, cur- services to the hospital community (Physicians, Patients
sors, and exceptions) stored together in the database. and Administration). The systems manage all patients’
Using packages is an alternative to creating procedures data and information during their treatment episode in a
and functions as standalone schema objects. Packages professional and efficient manner. Medical systems stra-
have many advantages over standalone procedures and tegically support a full range of hospital functions. It
functions. For example, they: contains a repository of all patients’ clinical, billing and
 Let you organize your application development more
efficiently.
 Let you grant privileges more efficiently. Medical
 Let you modify package objects without recompiling Sys tem
dependent schema objects.
 Enable Oracle Database to read multiple package ob-
jects into memory at once.
 Can contain global variables and cursors that are
available to all procedures and functions in the pack-
age.
 Let you overload procedures or functions. Overload- Support
Medical
System
ing a procedure means creating multiple procedures Sys tem
(Clinical)
with the same name in the same package, each taking
arguments of different number or data type.

2.6. Jordan University Hospital Computer Enterpris e


Res ource
System Planning
(ERP)
The core of Jordan University Hospital (JUH) informa-
tion system is bought in 1994, and then the JUH IT team
developed the Hospital Information System (HIS) using Figure 2. Hospital management system sub-categorizes.

Copyright © 2013 SciRes. JSEA


68 Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units

demographic data, reducing paper work, manual effort as: how to build the fitness function and how to represent
and errors. Furthermore; it allows for better staff utiliza- the problem in a chromosome expression (individual),
tion allowing for more time to focus on planning and i.e. sort of a sequence of binary digits that resembles the
goals achievements. This enables the hospital to provide chromosome sequence, which GA can understand and
better quality and more efficient services, needed by pa- manipulate. GA works on this encoded problem and de-
tients and physicians. Medical systems are integrated livers the result as the problem solution; hence, the user
with financial, administration, human resources, and ma- should provide the meaning of the encoded problem [20,
terial management systems. It contains vast collection of 21]. In this paper integer vector and binary string repre-
data including patient data, treatment data, hospital visit sentation will be considered.
data, patient transactions data, hospital data, and statisti-
cal information. 3.1. Branch Cost Functions
HIS medical systems provide many key functions in-
To use a control dependency path or any set of branches
cluding:
as a search goal, it is necessary to determine the cost
 Medical administrative including:
values for each branch predicate. To accomplish this,
 Patient master index
each conditional node in the program is associated with a
 Admission, discharge and transfer
real-valued predicate cost function that is evaluated
 Scheduling and appointments
whenever the conditional node is executed. This predi-
 Medical records
cate cost function returns a positive value whenever the
 Medical reports
predicate is false and a negative value if the predicate is
 Medical statistics
true. The cost of an evaluation of a logical negation of a
 Catering
 Order entry and results communication predicate is the arithmetic negation of the cost of the
 Medical clinical including: evaluation of the predicate. Each reached branch main-
 Out-patient clinics tains two cost values, both derived from the associated
 Accidents and emergency predicate cost function. One cost value is the cost that all
 Operation theater attempts to execute the branch are successful. This is
 Maternity called the cumulative and-cost. The other cost value is
 Doctors desktop the cost when any attempt is successful, is called the cu-
 Nurse station mulative or-cost. These costs can be illustrated with an
 Laboratory example showing three failed and two successful at-
 Radiology tempts to execute the predicate a ≤ b for various integer
 Pharmacy values of a and b (Table 1). The predicate cost function
 Patient accounting including: is a – b when the predicate is false and a – b – 1 when
 Pricing and package deals the predicate is true. The cost function of or-cost and
 Patient billing and-cost are shown in Table 1, where a and b are posi-
 Insurance contract management tive (false), and a` and b` are negative (true), also a and b
 Claims management. are never zero.
In order to test our system in this research we will se- The cost values produced by relational predicates are
lect different procedures and functions, which will be normalized, but the un-normalized values are used in
described later in this paper. Table 2. The cost of a conjunction of two false costs is the
sum of the costs of the conjuncts. However, the cost of a
3. A Strategy for Applying GA to Test disjunction of two false costs (Costd) is shown in Equation
Stored PL/SQL Program Units (2), where P and Q are the disjunct costs.
In general, the process of automatic structural test data PQ
Cost d  (2)
generation for branch coverage consists of three major PQ
steps [7,18,19]: Table 1. Logical or-cost and logical and-cost table.
1) Construction of control logic graph, e.g. control
flow graph (CFG) or control dependency graph (CDG). a b or-cost and-cost
2) Selection the target according to branch coverage a b (ab)/(a + b) a+b
criterion.
3) Finding out a set of test data that satisfies the se- a b` b` a
lected adequacy criterion. a` b a` b
In order to use GA for solving an optimization prob-
a` b` a` + b` (a`b`)/(a` + b`)
lem, there are multiple issues must be considered such

Copyright © 2013 SciRes. JSEA


Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 69

Table 2. Cumulative or-cost and and-cost for the predicate coverage or a control dependency condition then branch-
a ≤ b for the listed values. or is the relevant branch goal. Note that the goal of sat-
a b cost or-cost and-cost
isfying the branch or-cost is not adopted primarily to
avoid creating an excessive number of sub-goals. More-
8 3 5 5 5 over, if it is necessary to find an input that executes both
6 3 3 15/8 8 branches at a predicate, it is hoped that such an input will
be found during the search for an input to satisfy the
5 3 2 30/31 10
and-cost. Recall that, the and-cost is adopted as the
3 3 −1 −1 10 branch goal after one branch at a predicate has already
been executed. The above rules are applied to non-loop
1 3 −3 −4 10
branches only. Loops are treated different than if-state-
ment, because for programs that terminate loop entries are
These costs can be illustrated with an example show- eventually followed by a loop exit. For this reason, a sub-
ing three failed and two successful attempts to execute goal that specifies a loop predicate is always true is not
the predicate a ≤ b for various integer values of a and b sensible and thus the two possible branch goals at a loop
as shown in Table 2. Note that when both branches at a condition are loop entry and no loop entry.
conditional node have been executed, the and-cost is
positive and the or-cost is negative. Moreover, the mag- 4. Experimental Environment
nitude of the and-cost is an indication of the number and
magnitude of the failures to satisfy the predicate. A high An experimental study was designed to feature test goals
and-cost indicates that the predicate has hardly been sat- that cause problems for evolutionary testing. The exper-
isfied. A low and-cost indicates that the predicate has not imental study featured JUH real six test objects. These
been satisfied. The cost of a search goal is calculated, objects are drawn from the system applied at JUH hospi-
according to Equation (2), as the conjunction of the indi- tal.
vidual branch goal costs. Each individual branch cost is
either a branch or-cost or a branch and-cost. Using this 4.1. Test Objects
method, there is a disadvantage that a single large branch This section describes the test objects and the input do-
cost may dominate the overall cost value. Normalization main sizes used. The following are source code for the
or costs reduces this risk. The cost values produced by test objects:
relational predicates (Costr) are normalized to lie within  OutPricing: Determines the pricing of treatments at
[−1, 1] using Equation (3), where c is the branch distance outpatient clinics. Depending on his insurance the pa-
value. tient, this function calculates the amount of money
 1 the patient has to pay (depending on the type of in-
1  1  c if c  0 surance, the patient pays different ratios for his treat-
 ment) and the amount of money the insurance has to
 1
Cost r   1 if c  0 (3) pay for the patient’s treatment.
1  c  InPricing: This procedure calculates the invoice
0 otherwise value of the patient inside the hospital based on the

 type of patient insurance and the type of medical
An alternative method, not used in the work reported procedure offered to patients (accommodation, scout-
here, is to compute a cost consisting of two components. ing, doctors’ fees, operations, laboratory, radiology,
One component, counts the branch goals that have yet to medicine, etc.). Also, this program calculates the per-
be satisfied. This cost component is the analogue to the centage paid by the patient and the percentage paid by
ones used by Wegener et al. [22]. The second component the insurance company, if any. Moreover, this func-
is applicable only if the first component is nonzero and it tion bills the patient with the amount of money he has
is calculated as the disjunction of the unsatisfied branch to pay and bills the insurance company with the
goals. For each branch, there are two associated branch amount of the money it has to pay.
search goals that may be specified to guide a search,  JU-Med-fees-deduction: This package used for Jor-
namely branch-or (on at least one occasion that the dan University staff, where there is an allocated ac-
branch is reached, and it is executed) and branch-and count number for each staff in the system of JUH. It
(on every occasion that the branch is reached, and it is calculates bill value based on Jordan University in-
executed). A branch goal is satisfied if the associated or- surance. Then deported the total amount of bill after
cost or and-cost is negative. deduct the hand-collect from the patients into tables
If the execution of a branch is required to satisfy branch to be used in Jordan University financial department

Copyright © 2013 SciRes. JSEA


70 Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units

later on. Table 3. The functions used for empirical investigation.


 Pat-info-ibr: This function calculates the invoice val- Lines of Number of Number of
ue for private patients (in patients and out patients), Program name
code branches input variables
then bills the patients with the amount of money he or
OutPricing 295 116 14
she has to pay.
 Lab-interface: The main goal of this function is to InPricing 362 148 13
transfer the results from medical machines (lab de- JU-Med-fees-deduction 307 92 4
vices) to HIS system automatically (without user in-
teraction). So, the function receives the message from Pat-info-ibr 259 48 3
medical devices then converts it to be entered to HIS Lab-interface 1389 538 21
system.
Salup_new_calc_all 707 326 6
 Salup_new_calc_all: This procedure calculates staff
incentives as follows: it selects the category that owns
the nursing, administrative, officer or a medical tech-
nician, by the department and qualifications. Then it
determines the share of the incentives that the em-
ployee is entitled, as his career (Branch Chief, Chief,
Division of, etc.). It discounts days leave without pay
from the employee share incentives.

4.2. Hardware and Software Environment


In this section, the specifications of the experimental
environment utilized by this work are presented. These
specifications include both hardware and software mod-
ules used in implementing the simulator. More specifi-
cally, the hardware specifications that are used in the
experiments include a Dual-Core Intel Processor (CPU Figure 3. A cyclomatic complexity of the test programs.
2.66 GHz), 2 MB L2 Cache per CPU, and 1 GB RAM.
lomatic complexity gives an upper bound on the number
Moreover, the software specifications that are used in the
of test cases required to cover all feasible branches if
experiments include windows XP. Also, the tested pro-
collateral coverage is taken into account. Each program
grams that have been used to evaluate this algorithm are
has special characteristics to investigate the performance
described in this section.
of GA as test data generator in order to test named block
Moreover, in order to assess the reliability of the cost
in ORACLE. Figure 3 shows a cyclomatic complexity of
functions introduced in the previous Section 4.1, an em-
the test programs. The range of program’s cyclomatic
pirical investigation was done. A number of test pro- complexity is between 25 and 270.
grams were assembled from JUH system including func- These programs are available from the authors on re-
tions and procedures. These programs are described in quest. Each of the cost functions and associated search
Table 3. The size of each program is given as Lines of operators were implemented in a prototype test data gen-
Code (LOC), number of branches, where the number of eration tool. The tool has been constructed by modifying
input variables ranges from 3 to 21, as shown in Table 3. the JScript (JavaScript) language compiler within the
The programs have been selected from JUH HIS system. Shared Source Common Language Infrastructure (SSCLI)
Figure 3 shows a cyclomatic complexity for each pro- and can therefore be used to test PL/SQL Unit by passing
gram. The cyclomatic complexity metric is described by test cases as parameters to these units, while these pro-
Watson and McCabe [23], which provides an objective grams are connected to JUH HIS. The program must in-
measure of the complexity of a given module of a pro- clude directives to specify any input domain constraints
gram code by examining its decision structure. Cyclo- that are to be applied. The tool then inserts instrumenta-
matic complexity is calculated as e – n + 2, where n is tion code at each branch in the function. This instrument-
the number of nodes in a graph and e is the number of tation code calculates the cost of each branch predicate
edges between nodes, or we can calculate cyclomatic whenever it is executed. The cost of each relational
complexity as P + 1, where P is the number of predicate predicate expression was calculated according to the cost
nodes in the flow graph (While and If statements). Pre- functions given in the previous Section 3.1. Where bran-
dicate nodes are those representing control structures and ch predicate expressions consist of two or more relational
have one or more edges emanating from them. Cyc- predicates joined by logical connectives, and, or and not,

Copyright © 2013 SciRes. JSEA


Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 71

and the cost values were combined according to the sch- any of the programs are higher of those typical programs
eme given in Bottaci [24]. that would used in unit testing, although the number of
The search was directed to generate data for one bran- branches is probably higher than usual, which is why it
ch at a time. The order in which the branches of the pro- were selected for the experiment.
gram were targeted was arbitrary, except that no nested
branch was targeted before the containing branch. 5. Experimental Results and Discussions
A steady-state style genetic algorithm, similar to Geni-
This section presents the results of the experiments that
tor [25], was used in this work. The cost function values
have been carried out to evaluate the effectiveness of our
computed for each candidate input were used to rank GA. Table 4 shows the number of subject program exe-
candidates within the population in which no duplicate cutions required by each genetic algorithm over 20 trials,
genotypes are allowed. A probabilistic selection function where >50,000 means that the cost function not able to
selected parent candidates from the population with a cover all the branches within this criteria. Also, in this
probability based on their rank, where the highest rank- table branch coverage ratio is shown, which is defined as
ing having the highest probability. More specifically, for the following:
a population of size n, the probability of selection (Ps) is
shown in Equation (4). number of branch executed
total number of branches
2  n  rank  1
Ps  (4)
n  n  1 The branch coverage ratio ranged from 94% in JU-
Med-fees-deduction program to 100% in Pat-info-ibr
In this work, a fixed population size of 100 was used. program. The total average of branch coverage for all
This parameter was not “tuned” to suit any particular programs is:
program under test. In a steady state update style of ge-
1242
netic algorithms (as used in this work); new individuals  98%
that are sufficiently fit are inserted in the population as 1268
soon as they are created. Full branch coverage was at- Analyzing our results, we found that a 100% of condi-
tempted for each of the programs under test. Each branch tion-decision coverage is impossible to reach in some test
was taken as individual target of the search, unless it was programs because there are conditions that cannot be true
fortuitously covered during the search for test data for or false in certain situations. So, a weakness of GA could
another branch. Genetic algorithms search generates in- be observed in the generation of test cases to cover
puts for the function containing the current structural branch, especially when there is a strong dependency in
target. A vector of floating point, integer, characters, and data (records) retrieved from table, such that a specific
string variable values corresponding to the input data is order is required. As long as there are only few records
optimized. The ranges of each variable are specified. The that play an important role to satisfy a certain condition,
test subject is then called with this input data. The crite- it is possible to find adequate test scenarios. For example,
rion to stop the search was set up to terminate the search this could be observed while applying GA on OutPric-
after 50,000 executions of the program under test, when ing function, as shown in Figure 4, in line 21 insur
only if full coverage was not achieved. Individuals were ance_status to be equal to 7, this only happen if line
recombined using binary and real-valued (one-point and 18executed and this happen only if line 16 execute
uniform) recombination, and mutated using real-valued branch to be true (execute insert command and insur-
mutation. Real-valued mutation was performed using ance_status = 7) this happen only when dummy variable
“Gaussian distribution” and “number creep”. The size of is equal to “p” in line 2 and this is depends on the value

Table 4. The number of branch covered and uncovered with 50,000 executions of each program.
Number of branch Number of branch Branch Number of subject
Program Name
covered uncovered coverage ratio program executions
OutPricing 112 4 96% >50,000

InPricing 144 2 99% >50,000

JU-Med-fees-deduction 87 5 94% >50,000

Pat-info-ibr 48 0 100% 11,680

Lab-interface 527 11 98% >50,000

salup_new_calc_all 321 5 98% >50,000

Copyright © 2013 SciRes. JSEA


72 Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units

[1] …
[2] select p_insur_type into dummy from pricing where …
[3] …
[4] select decode(p_insur_type, '1', prc_limit_out_e, '2', prc_limit_out_f)
[5] into p_max_cov
[6] from prc_limits
[7] where prc_group = p_group_id
[8] and prc_division = p_div;
[9] p_max_cov := nvl(p_max_cov, 99999);
[10] exception
[11] when no_data_found then
[12] p_error_no := 1; raise exit_proc;
[13] when others then
[14] p_error_no := 11; raise exit_proc;
[15] ….
[16] if dummy = 'p' then
[17] insert into ….
[18] insurance_status :=7;
[19] end if;
[20] …
[21] if insurance_status =7 then
[22] update out_invoice
[23] set …
[24] end if;
[25] when exit_proc then
[26] if p_error_no = 1 then
[27] raise_application_error( -20001,' Coverage Limits do not Exits');
[28] elsif p_error_no = 2 then
[29] raise_application_error( -20003, ' Rate pricing does not exits for this materials!!');
[30] elsif p_error_no = 3 then
[31] raise_application_error( -20004, ' Material is not Defined in the Table Price !!');
[32] elsif p_error_no = 5 then
[33] raise_application_error( -20001, 'Pricing Data is incomplte for this patient!!');
[34] elsif p_error_no = 11 then
[35] …
[36] end if;

Figure 4. OutPricing program fragment.

retrieved from pricing table, also in line 20 to execute


branch to be true this happen only when line 12 is exe-
cuted. In this case, where the target of the search is node
20 to be true, the fact that p_error_no needs to be 1 at
line 12 and this happen only when the select statement in
line 4 executes and exception no_data_found rose. This
also is applied to lines 22, 24, 26, 28, 30 and 32. These
situations are occurred in all other programs apart from
Pat-info-ibr, where GA generates test data for all branch.
In these cases, the test generator cannot reach the 100%
of coverage due to the test program itself. With respect to
the program Pat-info-ibr, there are branch conditions
depend on data retrieved from table, but by coincidence,
these branches are covered. This problem become more Figure 5. The number of executions required to find test
difficult when there is more than branch depends on un- data to achieve branch coverage after excluding uncovered
branches in Table 4.
covered branch.
In Figure 5, we notice that all branches are covered,
Information System are used to test GA. The experi-
the number of execution of program ranged from 11,680
mental results show that the test target in all programs
for Pat-info-ibr to 36,134 for salup_new_ calc_all.
under test is not reached and that the average coverage
ratio percentage is 98%. A problem occurs when the tar-
6. Conclusions and Future Work get branch depends on data retrieved from oracle tables.
In this paper we present how GA can be used as test data That GA cannot generate test data to execute the target
generator to find suitable test data according to branch branch that depends on data retrieved from tables.
criteria to test stored program units (procedures, func- The future work will be focused on testing SQL
tions, and packages) in ORACLE. Selected procedures, exceptions, SQL statements, and combinations between
functions, and packages from Jordan University Hospital branch coverage criteria and SQL commands testing try-

Copyright © 2013 SciRes. JSEA


Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 73

ing to increase the coverage ratio to 100%. Verification & Reliability, Vol. 14, No. 2, 2004, pp. 105-
156. doi:10.1002/stvr.294
[13] H. S. Eyal Salman, “Using Genetic Algorithm in Test
REFERENCES Data Generation for ORACLE Named Block,” Master
[1] M. G. Alshraideh and L. Bottaci, “Search-Based Software Thesis, The University of Jordan, Amman, 2010.
Test Data Generation for String Data Using Program- [14] H. W. Arthur and J. Thomas, “Structured Testing: A Test-
Specific Search Operators, Special Issue of Software ing Methodology Using the Cyclomatic Complexity Met-
Testing, Verification and Reliability Devoted to Extended ric,” National Institute of Standards, Gaithersburg, 1996.
Papers from the Third UK Testing Conference (UKTest [15] M. Mitchell, “An Introduction to Genetic Algorithms,”
2005), Vol. 16, No. 3, 2006, pp. 175-203. 1st Edition, Massachusetts Institute of Technology, Cam-
[2] M. Alshraideh, B. A. Mahafzah and S. Al-Sharaeh, “A bridge, London 1996.
Multiple-Population Genetic Algorithm for Branch Cov- [16] M. Srinivas and L. M. Patnaik, “Genetic Algorithms: A
erage Test Data Generation,” Software Quality Control, Survey,” IEEE Computer, Vol. 27, No. 6, 1994, pp. 17-
Vol. 19, No. 3, 2011, pp. 489-513. 26. doi:10.1109/2.294849
doi:10.1007/s11219-010-9117-4
[17] S. Kirkpatrick, C. D. Gellat and M. P. Vecchi, “Optimiza-
[3] M. Alshraideh, L. Bottaci and B. A. Mahafzah, “Using tion by Simulated Annealing,” Science, Vol. 220, No.
Program Data-State Scarcity to Guide Automatic Test 4598, 1983, pp. 671-680.
Data Generation,” Software Quality Control, Vol. 18, No. doi:10.1126/science.220.4598.671
1, 2010, pp. 109-144. doi:10.1007/s11219-009-9083-x
[18] S. Urman, R. Hardman and M. McLaughlin, “Oracle
[4] A. Baresel, H. Pohlheim and S. Sadeghipour, “Structural Database 10g Pl/SQL Programming,” 1st Edition, McGraw-
and Functional Sequence Test of Dynamic and State- Hill, New York, 2004.
Based Software with Evolutionary Algorithms,” Pro-
ceedings of the Genetic and Evolutionary Computation [19] A. H. Watson and T. J. McCabe, “Structured Testing: A
Conference, Chicago, 12-16 July 2003, pp. 2428-2441. Testing Methodology Using the Cyclomatic Complexity
Metric,” NIST Special Publication, No. 500-235. Natio-
[5] B. Korel, “Automated Test Generation for Programs with nal Institute of Standards and Technology, Gaithersburg,
Procedures,” Proceedings of the International Symposium 1996.
on Software Testing and Analysis, San Diego, 8-10 Janu-
ary 1996, pp. 209-215. [20] N. J. Tracey, J. Clark, K. Mander and J. McDermid, “An
Automated Framework for Structural Test-Data Genera-
[6] S. N. Sivanandam and S. N. Deepa, “Introduction to Ge- tion,” Proceedings 13th IEEE Conference in Automated
netic Algorithms,” 1st Edition, Springer, New York, 2010. Software Engineering, Hawaii, 13-16 October 1998, pp.
[7] C. C. Michael, G. E. McGraw and M. A. Schatz, “Gener- 285-288.
ating Software Test Data by Evolution,” IEEE Transac- [21] R. Pargas, M. Harrold and R. Peck, “Test-Data Genera-
tions on Software Engineering, Vol. 27, No. 12, 2001, pp. tion Using Genetic Algorithms,” Software Testing, Veri-
1085-1110. doi:10.1109/32.988709 fication and Reliability, Vol. 9, No. 4, 1999, pp. 263-282.
[8] B. Korel, “Automated Software Test Data Generation,” doi:10.1002/(SICI)1099-1689(199912)9:4<263::AID-ST
IEEE Transactions on Software Engineering, Vol. 16, No. VR190>3.0.CO;2-Y
8, 1990, pp. 870-879. doi:10.1109/32.57624 [22] J. Wegener, A. Baresel and H. Sthamer, “Evolutionary
[9] J. A. Edvardsson, “Survey on Automatic Test Data Gen- Test Environment for Automatic Structural Testing,” In-
eration,” Proceedings of the Second Conference on Com- formation and Software Technology, Vol. 43, No. 14,
puter Science and Engineering, Linkoping, 21-22 October 2001, pp. 841-854. doi:10.1016/S0950-5849(01)00190-2
1999, pp. 21-28. [23] J. Holland, “Adaptation in Natural and Artificial Sys-
[10] J. Duran and S. Ntafos, “An Evaluation of Random Test- tems,” University of Michigan Press, Ann Arbor, 1975.
ing,” IEEE Transactions on Software Engineering, Vol. [24] L. Bottaci, “Predicate Expression Cost Functions to
10, No. 4, 1984, pp. 438-444. Guide Evolutionary Search for Test Data,” Genetic and
doi:10.1109/TSE.1984.5010257 Evolutionary Computation Conference (GECCO 2003),
[11] M. Harman and P. McMinn, “A Theoretical & Empirical Chicago, 12-16 July 2003, pp. 2455-2464.
Analysis of Evolutionary Testing and Hill Climbing for doi:10.1007/3-540-45110-2_149
Structural Test Data Generation,” Proceedings of the [25] D. Whitley, “The genitor Algorithm and Selective Pres-
2007 International Symposium on Software Testing and sure: Why Rank-Based Allocation of Reproductive Trials
Analysis, London, 9-12 July 2007, pp. 73-83. Is Best,” Proceedings of the Third International Confer-
doi:10.1145/1273463.1273475 ence on Genetic Algorithms (ICGA-89), 1989, Morgan
[12] P. McMinn, “Search-Based Software Test Data Genera- Kaufmann Publishers Inc., San Francisco, pp. 116-121.
tion: A Survey: Research Articles,” Software Testing,

Copyright © 2013 SciRes. JSEA

You might also like