0% found this document useful (0 votes)
45 views7 pages

Bar PDF

This document proposes a generic algorithm for program repair based on the concept of relative correctness. Relative correctness defines a partial ordering of candidate programs, where more correct programs are higher in the ordering. The algorithm proceeds iteratively, increasing relative correctness with each step until absolute correctness is achieved. It is argued that considering relative correctness can improve the effectiveness and efficiency of program repair by providing a criterion for validating candidate patches. The algorithm focuses on the patch validation step and relies on a patch generator. An experiment applying the algorithm to programs from the Siemens Benchmark is described to demonstrate the approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Bar PDF

This document proposes a generic algorithm for program repair based on the concept of relative correctness. Relative correctness defines a partial ordering of candidate programs, where more correct programs are higher in the ordering. The algorithm proceeds iteratively, increasing relative correctness with each step until absolute correctness is achieved. It is argued that considering relative correctness can improve the effectiveness and efficiency of program repair by providing a criterion for validating candidate patches. The algorithm focuses on the patch validation step and relies on a patch generator. An experiment applying the algorithm to programs from the Siemens Benchmark is described to demonstrate the approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Generic Algorithm for Program Repair

Besma Khaireddine Aleksandr Zakharchenko Ali Mili


University of Tunis Manar, Tunisia NJIT, Newark NJ NJIT, Newark NJ
[email protected] [email protected] [email protected]

Abstract—Relative correctness is the property of a program same way that absolute correctness is the criterion by
to be more-correct than another with respect to a specification; which we judge the derivation of a program P from
whereas traditional (absolute) correctness distinguishes between a specification R, relative correctness ought to be the
two classes of candidate programs with respect to a specification
(correct and incorrect), relative correctness defines a partial criterion by which we can judge that a candidate P ′ is a
ordering between candidate programs, whose maximal elements valid repair for program P with respect to specification R.
are the (absolutely) correct programs. In this paper we argue In this paper, we propose a generic algorithm for program
that relative correctness ought to be an integral part of the repair, which proceeds iteratively by enhancing relative
study of program repair, as it plays for program repair the correctness until it achieves absolute correctness.
role that absolute correctness plays for program construction:
• Improved Efficiency. The definition of relative correctness
in the same way that absolute correctness is the criterion by
which we judge the process of deriving a program P from enables us, for a given level of granularity at which we
a specification R, we argue that relative correctness ought to want to model faults, to define the concept of elementary
be the criterion by which we judge the process of repairing a fault removal, which represents a unitary fault removal
program P to produce a program P ′ that is more-correct than increment. This concept enables us, in turn, to distinguish
P with respect to R. In this paper we build on this premise
to design a generic program repair algorithm, which proceeds between a single multi-site fault and multiple single-site
by successive increases of relative correctness until we achieve faults. This distinction is important because if we are
absolute correctness. We further argue that in the same way that interested to remove several single-site faults (which are
correctness ideas were used, a few decades ago, as a basis for the most common type) then we can remove them one by
correct-by-design programming, relative correctness ideas may one and test the program for relative correctness at each
be used, in time, as a basis for more-correct-by-design program
repair. step; on the other hand, if we want to remove a multiple-
Index Terms—Oracle design, program repair, absolute cor- site fault, then the relevant multiplicity is the number of
rectness, relative correctness, elementary fault removal, Siemens sites in the faults (two or three, at most), not the number
benchmark. of faults in the program (unbounded).
I. P ROGRAM R EPAIR WITH R ELATIVE C ORRECTNESS In this paper, we briefly present a definition of relative correct-
ness, due to [8, 19], then we use it to sketch an algorithm for
Relative correctness (introduced in [19]) is the property of program repair; our algorithm relies on the existence of a patch
a program to be more-correct than another with respect to a generator, and focuses exclusively on the patch validation step.
given specification. Whereas traditional (absolute) correctness In section II we introduce some elements of mathematical
distinguishes between two classes of candidate programs (cor- notation, then we present our definition of relative correctness
rect, incorrect), relative correctness defines a partial ordering and discuss why we feel that this definition is appropriate
between candidate programs, whose maximal elements are the for our purposes. In section III we present our algorithm,
(absolutely) correct programs. While we acknowledge the sig- and discuss its validity in light of the definitions given in
nificant advances achieved in the area of automated program section II. In section IV we show the results of an experiment
repair, we feel that consideration of relative correctness in the where we apply the algorithm, albeit partially by hand for
workflow of program repair methods has the potential to make now (as its automation is under way) on sample programs
these methods more effective and more efficient. from the Siemens Benchmark, and draw some lessons from
• Improved Effectiveness. Program repair proceeds through our observations. Finally in section V we briefly summarize
two broad steps: patch generation, when candidate our findings and compare them to related work; in particular,
patches are generated from the original faulty program; we show how the solutions adopted by other researchers for
and patch validation, when the generated patches are patch validation use approximations of relative correctness, but
tested to assess their validity. As we discuss in re- not quite relative correctness as we define it and validate it.
lated work, section V-B, current program repair methods
perform patch validation through a number of criteria, II. BACKGROUND
which test various aspects/ dimensions/ approximations
of relative correctness, but not relative correctness per A. Relational Mathematics
se. Yet we feel that program repair is inconceivable We assume the reader familiar with simple relational math-
without a vetted definition of relative correctness. In the ematics and we briefly introduce some notations that we use
throughout the paper. Given a program p that operates on domain (rather P ′ may have a different correct behavior on
some variables x and y, we let the space of p be the set the competence domain of P ). See Figure 1.
S of all the values that the aggregate of variables hx, yi may How do we know that our definition of relative correctness
take; elements of S are called states of the program, and are is any good? Below are four properties that one would want a
usually denoted by lower case s. A relation on set S is a subset definition of relative correctness to satisfy; we find in [8] that
of S × S; constant relations on a set S include the empty our definition satisfies all of them.
relation (φ), the identity relation (I) and the universal relation • Ordering Properties. Relative correctness is reflexive
(L = S × S); operations on relations include the set theoretic and transitive, but not antisymmetric (i.e. two candidate
operations of union, intersection, difference and complement; programs could be equally correct, yet compute distinct
other operations include the product of two relations (denoted functions).
by R ◦ R′ , or RR′ for short), the converse of a relation (R) b • Relative correctness culminates in absolute correctness.
and the domain of a relation (dom(R)). The pre-restriction of An absolutely correct program is more-correct than any
relation R to set T is denoted by T \ R. candidate program. We write this property as: P ′ ⊒ R ⇔
A relation R is said to be reflexive if and only if I ⊆ R, (∀P : P ′ ⊒R P ).
symmetric if and only if R ⊆ R, b antisymmetric if and only if • Enhanced Correctness Implies Higher Reliability. If P

b
R ∩ R ⊆ I, and transitive if and only if RR ⊆ R. A relation is more-correct than P with respect to R, then it is more
R is said to be deterministic if and only if RR b ⊆ I. reliable than P ; but more-reliable is not equivalent to
more-correct: P ′ may be more reliable because its compe-
B. Absolute Correctness and Relative Correctness
tence domain includes states that have higher probability
Refinement is a recurrent theme in the study of correctness; of occurrence than those of the competence domain of
our version of refinement is defined as follows. P.
Definition 1: Given two relations R and R′ , we say that R′ • Relative Correctness and Refinement. Program P refines

refines R (abbrev: R′ ⊒ R) if and only if RL ∩ R′ L ∩ (R ∪ ′
program P if and only if P is more-correct than P with
R′ ) = R. respect to any specification R. We write this property as:
Intuitively, this means that R′ captures a stronger requirement P ′ ⊒ P ⇔ (∀R : P ′ ⊒R P ).
(is harder to satisfy) than R. In order to contrast relative correctness with absolute
Given a program p on space S, we define the function of p correctness, we present an example of a specification and ten
(denoted by P ) as the set of pairs (s, s′ ) such that if program candidate programs, which we rank by relative correctness
p starts execution in state s it terminates in state s′ . We may, as shown in Figure 2; correct programs are at the top
by abuse of notation, refer to a program p by its function P . of the graph. We consider the specification R on space
Definition 2: A program p on space S is said to be correct S = {a, b, c, d, e}:
with respect to specification R on S if and only if its function R = {(a, a), (a, b), (a, c), (b, b), (b, c), (b, d), (c, c), (c, d), (c, e)},
P refines R. and we consider the following candidate programs, along
This definition is identical (modulo differences of notation) with their competence domains with respect to R:
to traditional definitions of total correctness [12, 13, 17]. The • P0 = {(a, d), (b, a)}. CD0 = {}.
following Proposition, due to [21] sets the stage for the • P1 = {(a, b), (b, e)}. CD1 = {a}.
definition of relative correctness. • P2 = {(a, d), (b, c)}. CD2 = {b}.
Proposition 1: Given a specification R, a deterministic • P3 = {(b, e), (c, d)}. CD3 = {c}.
program p is correct with respect to R if and only if • P4 = {(a, b), (b, c), (c, a)}. CD4 = {a, b}.
dom(R ∩ P ) = dom(R). • P5 = {(a, d), (b, c), (c, d)}. CD5 = {b, c}.
The set dom(R ∩ P ) is the set of initial states on which P • P6 = {(a, c), (b, e), (c, d)}. CD6 = {a, c}.
behaves according to R; we call it the competence domain of • P7 = {(a, a), (b, b), (c, c), (d, d)}. CD7 = {a, b, c}.
P with respect to R. • P8 = {(a, b), (b, c), (c, d), (d, e)}. CD8 = {a, b, c}.
Definition 3: Due to [19]. Given a specification R and two • P9 = {(a, c), (b, d), (c, e), (d, a)}. CD9 = {a, b, c}.
deterministic programs P and P ′ , we say that P ′ is more- See Figure 2; programs P7 , P8 , P9 are (absolutely) correct
correct (resp. strictly more-correct) than P with respect to R while programs P0 , P1 , P2 , P3 , P4 , P5 , P6 are incorrect. Figure
if and only if (R ∩ P ′ )L ⊇ (R ∩ P )L (resp. (R ∩ P ′ )L ⊃ 5 shows a more concrete example of programs ordered by
(R ∩ P )L). relative correctness.
In [7] we generalize this definition to non-deterministic
programs, and discuss why this generalization may be the C. Faults and Fault Removals
key to scaling up. To contrast relative correctness with cor- Any definition of a fault must imply a level of granularity
rectness (Definition 2), we may refer to the latter as absolute at which we want to isolate faults. We use the term feature to
correctness. For deterministic programs P and P ′ , P ′ is more- refer to any part of the source code at an appropriate level of
correct than P if and only if the competence domain of P ′ granularity, including non-contiguous parts.
is a superset of that of P ; note this does not mean that Definition 4: Due to [19]. Given a specification R and a
P ′ duplicates the correct behavior of P on its competence program P , a fault in program P is any feature f that admits
 
R P P′
-
: 0 0 XXX
 :

0

0 0  0
 XXX
z
X 
X :

XX 1 1 XXX :


1 1 1 1
XXX XXX 

2 
XXX z 2 2 XX
X
:
 X 2 2 
z :


XX 2

XX X 

3  Xz 3 3
X
- Xz 3 3 
X 3

Fig. 1. P ′ ⊒R P , Deterministic Programs




P7 , P8 , P9
p’’

' $
@
@ programs
correct
6
7 o
S
@  S
@ t1  S t0
P@ @  (t0 , t1 ) S
@P5
6
P4 @  S
@ @  S
@ @  t0 t1 S
@ @@P3 p0’ p p1’
P1
@ @
@ @ P2
@ @ Fig. 3. One Two-site Fault
@
@ incorrect
p’’

@
& %
programs
P0
@
I
@
t1 @ t0
Fig. 2. Ordering Candidate Programs by Relative Correctness @
@
p0’ @ p1’
′ ′ @
I
@ 
a substitute f such that the program P obtained from P by
@
replacing f with f ′ is strictly more-correct than P . A fault t0@ t1
removal in P is a pair of features (f, f ′ ) such that f is a @
feature in P and program P ′ obtained from P by replacing f @
with f ′ is strictly more-correct than P . p

This definition of a fault encompasses cases where the


Fig. 4. Two One-site Faults
feature in question is non-contiguous, i.e. it may involve two
statements or for example two lexemes that are found in
different locations of the source code. We consider a program (p′′ ). By contrast, Figure 4 illustrates a situation where each
P and a specification R and we assume that we have identified individual transformation raises the relative correctness of the
two statements, say f0 and f1 that admit substitutes, say f0′ and program (both p′0 and p′1 are strictly more-correct than p, and
f1′ , such that the program P ′ obtained from P by replacing p′′ is strictly more-correct than p′0 and p′1 , hence by transitivity
f0 by f0′ and f1 by f1′ is strictly more-correct than P with strictly more-correct than p).
respect to R. The question that we address is: do we have
two single-site faults (f0 and f1 ) or a single two-site fault III. A N A LGORITHM FOR S TEPWISE P ROGRAM R EPAIR
(f = hf0 , f1 i)? The answer depends on whether f0 alone is a
A. Oracle Design
fault, and whether f1 alone is a fault, whence the following
definition. We consider a program P ′ on space S and we are interested
Definition 5: Given a specification R and a program P , an to design an oracle that tests the execution of P ′ on some
elementary fault in program P is a fault such that no part of initial state; the oracle takes the form of a binary predicate in
it is a fault. (s, s′ ), where s is the initial state and s′ is the final state. What
All single-site faults are elementary faults; multi-site faults form this oracle takes depends on what property we want to
are elementary faults if and only if no subset of their ele- test about P ′ .
ments is a fault. Figures 3 and 4 (where t0 represents the 1) Absolute Correctness with respect to R: Given a speci-
transformation f0 → f0′ and t1 represents the transformation fication R on space S, the oracle for absolute correctness with
f1 → f1′ ) show the contrast between a single two-site fault respect to R is denoted as Ω(s, s′ ) and defined by:
and two one-site faults: in Figure 3 we need to apply both Ω(s, s′ ) ≡ (s ∈ dom(R) ⇒ (s, s′ ) ∈ R).
transformations before the program becomes more-correct;
when we apply t0 alone (resp. t1 ), we obtain p′0 (resp. p′1 , We find in [20] that if a program P satisfies the condition
which is not strictly more-cottect than p; it is only when we Ω(s, P (s)) for all s in S then it is absolutely correct with
apply them both that we obtain a strictly more-correct program respect to R. In practice, since we cannot check Ω(s, P (s))
for all s in S, we check it for a bounded size test data T . – A message to the effect that no correctness enhance-
Hence we define the following predicate: ment of P with respect to R is possible, given the
existing patch generation capability.
ΩT (P ′ ) ≡ (∀s ∈ T : Ω(s, P ′ (s))).
Note that whereas other program repair methods require two
We find in [20] that if a program P ′ satisfies this predicate test data sets (positive test data, negative test data), we do
then it is absolutely correct with respect to T \ R. not need this information, because it can be inferred from the
2) Relative Correctness over a program P with respect to available input parameters: The positive test data is, actually
a specification R: Given a specification R on space S and (T ∩CD)\ R, and the negative test data is (T ∩CD)\ R, where CD
a program P on S, the oracle for relative correctness over is the competence domain of P with respect to R.
program P with respect to R is denoted by ω(s, s′ ) and defined C. Algorithm
by:
This algorithm relies on the availability of a patch generator,
ω(s, s′ ) ≡ (Ω(s, P (s)) ⇒ Ω(s, s′ )). which takes the forms of two functions:
This formula stems readily from the definition of relative • nextcandidate(base). Given a baseline program base,
correctness; a program P ′ is more-correct than program P this function returns candidate repairs of base in a
with respect to R if and only if ω(s, P ′ (s)) holds for all s in deterministic sequence; this can be a mutant generator,
S. In practice, since we cannot check ω(s, P ′ (s)) for all s in e.g., which takes base along with with some mutation
S, we check it for a bounded size data set T . Hence we define parameters/ options and generates, in sequence, all the
the following predicate: relevant mutants for the selected parameters.
• morecandidates(base). Given a baseline program base,
ωT (P ′ ) ≡ (∀s ∈ T : ω(s, P ′ (s))). this boolean function returns true as long it has more
candidate repairs to offer, false otherwise.
3) Strict Relative Correctness over a program P with
respect to a specification R: A program P ′ is strictly more- This algorithm is generic in the sense that it can be composed
correct than a program P with respect to a specification R if with any patch generator for which we can provide these two
and only if P ′ is more-correct than P , and there exists at least functions.
one element s in S such that Ω(s, P ′ (s)) ∧ ¬Ω(s, P (s)). In {programtype base=P; programtype candidate=P;
bool exhausted=false; bool enhanced=false;
practice, since we cannot check Ω(s, P ′ (s))∧¬Ω(s, P (s)) for while (! abscorT(candidate) && ! exhausted)
all s in S, we check it for a bounded size data set T . Hence {while (morecandidates(base) &&
we define the following predicate: ! strictrelcor(candidate,base))
{// no viable candidate, but we have more
σT (P ′ ) ≡ (ωT (P ′ ) ∧ (∃s ∈ T : Ω(s, P ′ (s)) ∧ ¬Ω(s, P (s)))) candidate = nextcandidate(base);}
// if candidate is abs. correct done, else..
if (! abscorT(candidate))
. {// analysis of exit condition
if strictrelcorT(candidate,base)
B. Specification {// we let candidate be new base
base = candidate; enhanced=true;
We use the oracles discussed above to design a generic } //also reset patch generation
program repair algorithm; before we present the algorithm, else
{// we ran out of candidates
as discuss its specification. exhausted = true;}}}
• Inputs: if (! exhausted)
{cout<<’Correct Program: ’<<candidate<<endl;}
– A program P on S (where S can be inferred from else
the variable declarations of P ). if (enhanced)
{cout<<’No correct program found. ’<<endl;
– Test data set, T , a subset of S. cout<<’Most correct: ’<<candidate<<endl;}
– Specification R on S, in the form of a binary C-like else
boolean function R(s, sprime). {cout<<’No correctness enhancement’<<endl;}}
– A specification of the domain of R, in the form of The following functions are a direct reflection of the for-
a unary C-like boolean function domR(s). mulas presented in section III-A.
• Output: Three possible outcomes, depending on patch bool abscor (candidate, inits)
generation: {stype s; s=inits; candidate();// alters s
return (! domR(inits) || R(inits, s));}
– A program P ′ that is absolutely correct with respect
to T \ R. Note that if P fails on some state of T and bool abscorT(candidate)
{bool abscorforall; abscorforall=true;
P ′ is absolutely correct with respect to T \ R then P ′ forall (t in T)
is strictly more correct than P (hence it is a repair {abscorforall
of P ) with respect to R. = abscorforall && abscor(candidate,t)};
return abscorforall;}
– A program P ′ that is strictly more-correct than P
with respect to T \ R, though possibly still incorrect. bool relcor(candidate, base, inits)
{stype s; s=inits; base();//alters s, not inits more-correct than the base; the organizational part of this work
bool abscorbase = (!domR(inits)||R(inits,s)); (management of the evolving graph) is done by hand, as it is
s=inits; candidate(); //alters s, not inits
bool abscorcandidate=(!domR(inits)||R(inits,s)); not yet fully automatic.
return (! abscorbase || abscorcandidate);}
B. Experimental Observations
bool relcorT(candidate, base)
{bool relcorforall; relcorforall=true; The resulting graph is shown in Figure 5; each iteration of
forall (t in T) the outer loop generates a new layer of the graph. The bottom
{relcorforall = relcorforall && of the graph is the faulty version of tcas, and the top is
relcor(candidate,base,t)};
return relcorforall;} the correct version, as found in the Siemens benchmark. Note
that even though we made eight modifications to the original
bool strict (candidate, base,inits) program, our algorithm made only seven fault removals; this
{stype s; s=inits; base();//alters s, not inits
bool abscorbase = (!domR(inits)||R(inits,s)); may be because the eighth modification does not change the
s=inits; candidate(); //alters s, not inits function of the program (it is not a fault) or because the test
bool abscorcandidate=(!domR(inits)||R(inits,s)); data T is not large enough to distinguish the original program
return (! abscorbase && abscorcandidate);}
from the repaired program; in either case, the program at
bool strictT (candidate, base) the top of the graph is certified to be absolutely correct with
{bool strictforone; strictforone=false; respect to T \ R.
forall (t in T)
{strictforone = strictforone Now, note that even though the program at the bottom of
|| strict(candidate,base,t)}; the graph has seven faults, only four of them are visible (since
return strictforone;} there are four outgoing arcs from the bottom). What happened
bool strictrelcorT (candidate, base) to the other three? They are masked, and can only be exposed
{return relcorT(candidate,base) as the first four are removed. The lesson we can draw from this
&& strictT(candidate,base);} observation: when we observe a failure of a program and we
IV. I LLUSTRATION resolve to repair it, we should not define success as remedial
to that particular failure, because the fault that causes that
A. Experimental Setup
failure may be masked by other faults; rather we should view
For the purposes of our experiment, we carry out patch any enhancement in the relative correctness of the program as
generation by means of a mutation generator, specifically a measure of success/ progress. In other words, we do not get
muJava [5, 16]. According to the specification given in section to decide in what order a program exposes its faults; rather
III-B, we must provide the following parameters: we let the program reveal its faults in the order it determines.
• A Program to Repair. We choose the tcas program We must acknowledge that what made our experiment look
taken from the Siemens benchmark, to which we apply so successful is the combination of three conditions, which
eight modifications (faults) provided in the same bench- do not necessarily prevail in all instances: first, the mutant
mark [2, 10]. generator was parameterized in such a way as to perform
• Test Data. We take the test data set T (of size 1578) mutations that are of the same nature and the same scale
provided by the benchmark for this program. as the benchmark faults that were introduced; second, all
• Specification. For the sake of this experiment, we use the
original fault-free version of tcas as the specification; the faults that were introduced are single-site faults, hence
this yields the following code for R: we were able to remove them by single mutations; third, we
bool R(s, sprime) // initial, final states assume the availability of boolean functions that capture the
{tcas(); // modifies s, preserves sprime specification R and its domain. The first condition pertains
return (sprime==s);} // candidate = spec?
to patch generation, and is a difficult condition to fulfill in
To run this experiment with non-deterministic spec- general, because it assumes that we know the nature/ scale of
ifications, we are planning cases where the equality the faults. The second condition pertains to patch validation,
(sprime==s) is replaced by the weaker condition and is relatively easy to fulfill: first because most faults are
(EQ(s,sprime)), for some equivalence relations EQ; single-site faults; and second, because we can run multiple
this is currently under way. mutations to cover the rare cases where they are not. For
• Specification domain. Since we take the correct version of
tcas as specification, and since this program is defined illustration, we run the same experiment described above on
for all states in T , we let domR(s) be true. the replace application of the Siemens benchmark, to which
bool domR(s) {return true;} we have inserted six modifications. After four iterations (four
Though the algorithm, as written in section III-C, seeks to fault removals) we reach a program that is more-correct than
build a single path from the faulty version of a program to the original, but not absolutely correct; when we deploy double
a correct version (by successive fault removals), what we mutation, we break through, generating two separate programs
execute for this experiment is a search for all the possible that are absolutely correct with respect to T \ R. So that we
paths; instead of the inner while loop of the algorithm (section were able to remove five faults (four single-site faults and one
III-C) we actually execute a for loop that covers all the mutants double-site fault) by doing nothing more than double mutation;
of the current base and catalogs those mutants that are strictly if we were using only absolute correctness as the criterion of
Fig. 5. Stepwise Repair of tcas Faults

success, we would have to apply sixtuple mutations to achieve P ′ through a repair operation. In this paper we derive the
the same result, an outrageously costly proposition. As for skeleton of an algorithm for program repair, which uses strict
requiring predicates R(s, s′ ) and domR(s), we admit that this relative correctness oracles to perform patch validation. Our
may limit the scope of our approach; but we also argue that approach can be characterized by the following premises: it
some form of specification is mandated by other methods to relies on formal definitions of correctness, relative correctness,
generate the required positive test data and the negative test and strict relative correctness; it derives test oracles from
data; all we are doing is making the requirement for R(s, s′ ) these definitions; it defines success/ progress as any strict
explicit. enhancement of relative correctness, rather the remediation
of a specific failure; it controls combinatorial divergence by
V. C ONCLUSION removing faults in sequence rather than simultaneously.
A. Summary and Prospects What would be more interesting, perhaps, is to explore how
Relative correctness plays for program repair the same role we can use relative correctness, not to test existing repair
that absolute correctness plays for program construction: In the candidates, but rather to generate repair candidates that are
same way that absolute correctness is the criterion by which more-correct by construction. In the same way that correctness
we judge the derivation of a program from a specification, ideas were used by researchers such as Dijkstra [9], Gries [12],
relative correctness ought to be the criterion by which we Hehner [13], Morgan [23] and others as a basis for correct-
judge the transformation of a program P into a program by-design programming, we can imagine ways to use relative
correctness ideas to generate more-correct-by-design program R EFERENCES
repairs. This is clearly a long-term research goal, but one that [1] Martinez M. and Monperrus M. Mining software repair models for
promises great returns, since it has the potential to guide patch reasoning on the search space of automated program fixing. Empirical
generation in addition to patch validation. Software Engineering, 2013.
[2] Benchmark. Siemens suite. Technical report, Georgia Institute of
Technology, January 2007.
[3] Kim D., Nam J., Song J., and Kim S. Automatic patch generation
learned from human-written patches. In ICSE 2013, pages 802–811,
B. Related Work 2013.
[4] Vidroha Debroy and W. Eric Wong. Combining mutation and fault
We argue that our approach to patch validation, which is localization for automated program debugging. Journal of Systems and
Software, 90:45–60, 2013.
based on the concept of relative correctness, addresses some [5] Marcio Eduardo Delamaro, Jose Carlos Maldonado, and Auri
shortcomings in existing program repair technology, in terms Marcelo Rizzo Vincenzi. Proteum /im 2.0: An integrated mutation
of precision, recall, and efficiency [1, 3, 4, 6, 14, 15, 18, 22, 24, testing environment. In W. Eric Wong, editor, Mutation Testing for
the New Century, volume 24, pages 91–101. Springer Verlag, 2001.
25]. [6] F. DeMarco, J. Xuan, D.L. Berra, and M. Monperrus. Automatic
1) Loss of Recall: GenProg [11, 15], for example, generates repair of buggy if conditions and missing preconditions with smt. In
Proceedings, CSTVA, pages 30–39, 2014.
candidate repairs by combining a set of elementary mutations [7] J. Desharnais, N. Diallo, W. Ghardallou, M. F. Frias, A. Jaoua, and
and submitting each mutant to a set of positive test data (which A. Mili. Relational mathematics for relative correctness. In RAMICS,
the original program passes, and we want to preserve) and a 2015, volume 9348 of LNCS, pages 191–208, Braga, Portugal, Septem-
ber 2015. Springer Verlag.
set of negative test data (which the original program fails, [8] Nafi Diallo, Wided Ghardallou, and Ali Mili. Correctness and relative
and we want candidates to pass). This approach presents two correctness. In Proceedings, 37th International Conference on Software
impediments for good recall: First, this condition is sufficient Engineering, NIER track, Firenze, Italy, May 20–22 2015.
[9] E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976.
for relative correctness but unnecessary. A candidate program [10] Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. Supporting a
may fail on the positive test data and still be more-correct controlled experimentation with testing techniques: An infrastructure and
than the original: because specifications are not necessarily its potential impact. Empirical Software Engineering: An International
Journal, 10(4):405–435, 2007.
deterministic, correct behavior is not necessarily unique. See [11] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A generic
Figure 1. Second, a candidate program P ′ may also fail on the method for automated software repair. IEEE Transactions on Software
negative test data and still be more-correct than the original Engineering, 31(1), 2012.
[12] David Gries. The Science of Programming. Springer Verlag, 1981.
program P ; the competence domain of P ′ may be a superset of [13] Eric C.R. Hehner. A Practical Theory of Programming. Prentice Hall,
that of P , yet still does not overlap the negative test data. The 1992.
loss of recall means that GenProg could very well generate [14] Claire LeGoues, Stephanie Forrest, and Westley Weimer. Current
challenges in automatic software repair. Software Quality Journal,
valid program repairs, but fail to recognize them as such. 21(3):421–443, 2013.
2) Loss of Precision: GenProg selects candidate repairs [15] Claire LeGoues, M. Dewey Vogt, S. Forrest, and W. Weimer. A
systematic study of automated program repair: Fixing 55 out of 105
by maximizing a fitness function, which is computed as the bugs for $8 each. In Proceedings, ICSE 2012, pages 3–13, 2012.
weighted sum of all the test data on which the program runs; [16] Yu-Seung Ma, Jeff Offutt, and Yong-Rae Kwon. Mujava : An automated
weights are assigned to test data according to their prepon- class mutation system. Journal of Software Testing, Verification and
Reliability, 15(2):97–133, June 2005.
derance in some usage pattern, so that the fitness function is [17] Zohar Manna. A Mathematical Theory of Computation. McGraw-Hill,
an approximation of the program’s reliability. But we see in 1974.
section II-B that relative correctness logically implies, but is [18] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. Angelix:
Scalable multiline program patch synthesis via symbolic analysis. In
not equivalent to, enhanced reliability. So that maximizing the Proceedings, ICSE 2016, Austin, TX, May 2016.
fitness function is a necessary condition, but not a sufficient [19] A. Mili, M. Frias, and A. Jaoua. On faults and faulty programs. In
condition, of relative correctness. P. Hoefner, P. Jipsen, W. Kahl, and M. E. Mueller, editors, Proceedings,
RAMICS 2014, volume 8428 of LNCS, pages 191–207, 2014.
3) Inefficiency: We recognize two sources of inefficiency [20] Ali Mili and Fairouz Tchier. Software Testing: Operations and Concepts.
in current practice of program repair. First, as we discuss in John Wiley and Sons, 2015.
[21] Harlan D. Mills, Victor R. Basili, John D. Gannon, and Dick R. Hamlet.
section IV-B, faults are prone to mask each other; so that if Structured Programming: A Mathematical Approach. Allyn and Bacon,
we define the success of a repair operation as the remediation Boston, Ma, 1986.
of a specific failure caused by a specific fault, and that fault [22] Martin Monperrus. A critical review of patch generation learned from
human written patches: Essay on the problem statement and evaluation
is masked by others, we may have to find a combination of automatic software repair. In Proceedings, ICSE 2014, Hyderabad,
of patches that fix all the faults involved in this situation India, 2014.
before the failing behavior is corrected. A more efficient [23] Carroll C. Morgan. Programming from Specifications, Second Edition.
International Series in Computer Sciences. Prentice Hall, London, UK,
approach may be to define success as an increase in relative 1998.
correctness, and accept any patch that fulfills this criterion, [24] Hoang Duong Thien Nguyen, DaWei Qi, Abhik Roychoudhury, and
until the targeted failure is remedied. Second, whenever they Satish Chandra. Semfix: Program repair via semantic analysis. In
Proceedings, ICSE, pages 772–781, 2013.
fail to distinguish between a single multi-site fault and several [25] Zhchao Qi, Fan Long, Sara Achour, and Martin Rinard. An analysis
single-site faults, program repair methods may be pursuing of patch plausibility and correctness for generate-and-validate patch
unnecessary and costly multiple patches where successive generation systems. In Proceedings, ISSTA 2015, Baltimore, MD, July
2015.
single patches would have been sufficient.

You might also like