Mining Software Repair Models
Mining Software Repair Models
Empirical Software Engineering, Springer, 2013 (accepted for publication on Sep. 11, 2013).
2
Change Action αi Prob. χi
and observations in that space can be valued. For instance, a Statement insert of method invocation 83,046 6.9%
standard Unix diff produces two integer values: the number of Statement insert of if statement 79,166 6.6%
Statement update of method invocation 76,023 6.4%
added lines and the number of deleted lines. ChangeDistiller Statement delete of method invocation 65,357 5.5%
enables us to define the following change models. Statement delete of if statement 59,336 5%
Statement insert of variable declaration statement 54,951 4.6%
CT (Change Type) is composed of 41 features, the 41 Statement insert of assignment 49,222 4.1%
change types of ChangeDistiller. For instance, one of this Additional functionality of method 49,192 4.1%
Statement delete of variable declaration statement 44,519 3.7%
feature is “Statement Insertion” (we may use the shortened Statement update of variable declaration statement 41,838 3.5%
name “Stmt_Insert”). CTET (Change Type Entity Type) is Statement delete of assignment 41,281 3.5%
Condition expression change of if statement 40,415 3.4%
made of all valid combinations of the Cartesian product Statement update of assignment 34,802 2.9%
between change types and entity types. CTET is a refinement Addition of attribute 29,328 2.5%
Removal of method 26,172 2.2%
of CT. Each repair action of CT is mapped to [1 . . . n] repair Statement insert of return statement 24,184 2%
actions of CTET. Hence the labels of the repair actions Statement parent change of method invocation 21,010 1.8%
Statement delete of return statement 20,880 1.7%
of CTET always contain the label of CT. There are 104 Insert of else statement 20,227 1.7%
entity types and 41 change types but many combinations Deletion of else statement 17,197 1.4%
Total 1,196,385
are impossible by construction, as a result CTET contains
Table I
173 features. For instance, since there is one entity type T HE ABUNDANCE OF AST- LEVEL CHANGES OF CHANGE MODEL CTET
representing assignments, one feature of CTET is “statement OVER 62,179 VERSIONING T RANSACTIONS . T HE PROBABILITY χi IS THE
insertion of an assignment”. RELATIVE FREQUENCY OVER ALL CHANGES ( E . G . 6.9% OF SOURCE CODE
CHANGES ARE INSERTIONS OF METHOD INVOCATION ).
In the rest of this paper, we express versioning transactions
within those two change models. There is no better change
model per se: they describe versioning transactions at different
granularity. We will see later that, depending on the perspec- composed of inserting a method invocation (6.9%), insert an
tive, both change models have pros and cons. “if” conditionals (6.6%), and insert a new variable (4.6%).
Since change model CTET is at a finer granularity, there are
D. Measures for Change Actions less observations: both αi and χi are lower. The probability
We define two measures for a change action i: αi is the distribution (χi ) over the change model is less sharp (smaller
absolute number of change action i in a dataset; χi is the values) since the feature space is bigger. High value of χi
probability of observing a change actionP i as given by its means that we have a change action that can frequently be
frequency over all changes (χi = αi / αi ). For instance, found in real data: those change actions have of a high
let us consider feature space CT and the change action “coverage” of data. CTET features describe modifications
“statement insertion” (StmtIns). If there is αStmtIns = 12 of software at a finer granularity. The differences between
source code changes related to statement insertion among those two change models illustrate the tension between a high
100, the probability of observing a statement insertion is coverage and the analysis granularity.
χStmtIns = 12%.
F. Project-independence of Change Models
E. Empirical Results An important question is whether the probability distribu-
We have run ChangeDistiller over the 62,179 Java transac- tion (composed of all χi ) of Table I is generalizable to Java
tions of our dataset, resulting in 1,196,385 AST-level changes software or not. That is, do developers evolve software in a
for both change models. For change model CT, which is similar manner over different projects? To answer this ques-
rather coarse-granularity, the three most common changes are tion, we have computed the metric values not for the whole
“statement insert” (28% of all changes), “statement delete” dataset, but per project. In other words, we have computed
(23% of all changes) and “statement update” (14% of all the frequency of change actions in 14 software repositories.
changes). Certain changes are rare, for instance, “addition We would like to see that the values do not vary between
of class derivability” (adding keyword final to the class projects, which would mean that the probability distributions
declaration) only appears 99 times (0.0008% of all changes). over change actions are project-independent. Since our dataset
The complete results are given in the companion technical covers many different domains, having high correlation values
report [21]. would be a strong point towards generalization.
Table I presents the top 20 change actions and the associated As correlation metric, we use Spearman’s ρ. We choose
measures for change model CTET. The comprehensive table Spearman’s ρ because it is non-parametric. In our case, what
for all 173 change actions is given in the companion tech- matters is to know whether the importance of change actions is
nical report [21]. In Table I, one sees that inserting method similar (for instance that “statement update” is more common
invocations as statement is the most common change, which than“condition expression change”). Contrary to parametric
makes sense for open-source object-oriented software that is correlation metric (e.g. Pearson), Spearman’s ρ only focuses
growing. on the ordering between change actions, which is what we are
Let us now compare the results over change models CT interested in.
and CTET. One can see that statement insertion is mostly We compute the Spearman correlation values between the
3
and 3. The lowest correlation value is 0.80 and it corresponds
40 to Spearman correlation values between projects Tomcat and
Carol. In this case, the maximum rank change is 23 (for change
35
action “Removing Method Overridability” — removing final
30 for methods). In total, between Tomcat and Carol, there are
25
six change actions for which the importance changes of at
# of project pairs
4
Full Agreement (3/3) Majority (2/3)
paper, we will define a mathematical criterion to tell whether Transaction is a Bug Fix 74 21
one approximation is better than another. Transaction is not a Bug Fix 22 23
I don’t know 0 1
A. Slicing Based on the Commit Message
Table II
When committing source code changes, developers may T HE R ESULTS OF T HE M ANUAL I NSPECTION OF 144 T RANSACTIONS BY
write a comment/message explaining the changes they have T HREE R ATERS .
made. For instance when a transaction is related to a bug
fix, they may write a comment referencing the bug report or
describing the fix.
To identify transaction bags related to bug fix, previous work 3) Sampling Versioning Transactions: We use stratified
focused on the content of the commit text: whether it contains sampling to randomly select 1-SC versioning transactions
a bug identifier, or whether it contains some keywords such from the software history of 16 open source projects (mostly
as “fix” (see [23] for a discussion on those approaches). To from [18]). Recall that a “1-SC” versioning transaction only
identify bug fix patterns, Pan et al. [24] select transactions introduces one AST change. The stratification consists of
containing at least one occurrence of “bug”, “fix” or “patch”. picking 10 items (if 10 are found) per project. In total, the
We call this transaction bag BFP. We will compute αi and χi sample set contains 144 transactions sampled over 6,953 1-
based on this definition. SC transactions present in our dataset.
Such a transaction bag makes a strong assumption on the
4) Evaluation Procedure: The 144 evaluation items were
development process and the developer’s behavior: it assumes
evaluated by three people called the raters: the paper authors
that developers generally put syntactic features in commit texts
and a colleague, member of the faculty at the University of
enabling to recognize repair transactions, which is not really
Bordeaux. During the evaluation, each item (see III-C2) is
true in practice [23], [25], [26].
presented to a rater, one by one. The rater has to answer the
B. Slicing Based on the Change Size in Terms of Number of question Is a bug fix change?. The possible answers are a) Yes,
AST Changes the change is a bug fix, b) No, the change is not a bug fix and
We may also define fixing transaction bags based on their c) I don’t know. Optionally, the rater can write a comment to
“AST diffs”, i.e.; based on the type and numbers of change explain his decision.
actions that a versioning transaction contains. This transaction 5) Experiment Results:
bag is called N-SC (for N Abstract Syntactic Changes), e.g. a) Level of Agreement: The three raters fully agreed that
5-SC represents the bag of transactions containing five AST- 74 of 144 (51.8%) transactions from the sample transactions
level source code change. are bug fixes. If we consider the majority (at least 2/3 agree),
In particular, we assume that small transactions are very 95 of 144 transactions (66%) were considered as bug fix trans-
likely to only contain a bug fix and unlikely to contain a new actions. The complete rating data is given in the companion
feature. Repair actions may be those that appear atomically in technical report [21].
transactions (i.e. the transaction only contains one AST-level Table II presents the number of agreements. The column
source code change). “1-SC” (composed of all transactions of Full Agreement shows the number of transactions for which
one single AST change) is the transaction bag that embodies all raters agreed. For example, the three rates agreed that there
this assumption. Let us verify this assumption. is a bug fix in 74/144 transactions. The Majority column shows
C. Do Small Versioning Transactions Fix Bugs? the number of transactions for which two out of three raters
agree. To sum up, small transactions predominantly consists
1) Experiment: We set up a study to determine whether of bug fixes.
small transactions correspond to bug fixes changes. We define
Among the transactions with full agreement on the absence
small as those transactions that introduce only one AST
of bug fix changes, the most common case found was the
change.
addition of a method. This change indeed consists of the
2) Overview: The study consists in manual inspection and
addition of one single AST change (the addition of a “method”
evaluation of source code changes of versioning transactions.
node). Interestingly, in some cases, adding a method was
First, we randomly take a sample set of transactions from
indeed a bug fix, when polymorphism is used: the new method
our dataset (see II-A). Then, we create an “evaluation item”
fixes the bug by replacing the super implementation.
for each pair of files of the sample set (the file before and
after the revision). An evaluation item contains data to help b) Statistics: Let us assume that pi measures the degree
the raters to decide whether a transaction is a bug fix or of agreement for a single item (in our case in { 31 , 23 , 33 }. The
not: the syntactic line-based differencing between the revision overall agreement P̄ [27] is the average over pi . We have
pair of the transaction (it helps to visualize the changes), the P̄ = 0.77. Using the scale introduced by [28], this value means
AST change between them (type and location – e.g. insertion there is a Substantial overall agreement between the rates,
of method invocation at line 42) and the commit message close to an Almost perfect agreement.
associated with the transaction. The coefficient κ (Kappa) [27], [29] measures the confi-
5
dence in the agreement level by removing the chance factor4 . Furthermore, we have made the following observations from
The κ degree of agreement in our study is 0.517, a value the experiment results:
distant from the critical value (it is 0). The null hypothesis is First, the order of repair actions (i.e. their likelihood of con-
rejected, the observed agreement was not due to chance. tributing to bug repair) varies significantly depending on the
6) Conclusion: The manual inspection of 144 versioning transaction bag used for computing the probability distribution.
transaction shows that there is a relation between the one AST For instance: a statement insertion is #1 when we consider
change transactions and bug fixing. Consequently, we can use all transactions (column ALL), but only #4 when considering
the 1-SC transaction bag to estimate the probability of change transactions with a single AST change (column 1-SC). In this
actions for software repair. case, the probability of observing a statement insertion varies
from 29% to 12%.
IV. F ROM C HANGE M ODELS TO R EPAIR M ODELS Second, even when the orders obtained from two different
This section presents how we can transform a “change transaction bags resemble such as for ALL and 20-SC, the
model” into a “repair model” usable for automated software probability distribution still varies: for instance χStmt_U pd is
repair. As discussed in Section II, a change model describes 29% for transaction bag ALL, but jumps to 33% for transaction
all types of source code change that occur during software bag 20-SC.
evolution. On the contrary, we define a “repair action” as a Third, the probability distributions for transaction bags ALL
change action that often occurs for repairing software, i.e. and BFP are close: repair action has similar probability values.
often used for fixing bugs. As consequence, transaction bag BFP maybe is a random
By construction, a repair model is equal to a subset of a subset of ALL transactions. All those observations also hold
change model in terms of features. But more than the number for repair model CTET, the complete table is given in the
of features, our intuition is that the probability distribution companion technical report [21].
over the feature space would vary between change models and Those results are a first answer to our question: different
repair models. For instance, one might expect that changing definitions of “repair transactions” yield different probability
the initialization of a variable has a higher probability in a distribution over a repair model.
repair model. Hence, the difference between a change model
and a repair model is matter of perspective. Since we are C. Discussion
interested in automated program repair, we now concentrate We have shown that one can base repair models on different
on the “repair” perspective hence use the terms “repair model” methods to extract repair transaction bags. There are certain
and “repair action” in the rest of the paper. analytical arguments against or for those different repair space
topologies. For instance, selecting transactions based on the
A. Methodology
commit text makes a very strong assumption on the quality of
We have applied the same methodology as in II. We have software repository data, but ensures that the selected trans-
computed the probability distributions of repair model CT and actions contain at least one actual repair. Alternatively, small
CTET based on different definitions of fix transactions, i.e. transactions indicate that they focus on to a single concern,
we have computed αi and χi based on the transactions bags they are likely to be a repair. However, small transactions
discussed in III: ALL transactions, N-SC and BFP. For N-SC, may only see the tip of the fix iceberg (large transactions
we choose four values of N: 1-SC, 5-SC, 10-SC and 20-SC. may be bug fixing as well), resulting in a distorted probability
Transactions larger than 20-SC have almost the same topology distribution over the repair space. At the experimental level,
of changes as ALL, as we will show later (see section IV-C2). the threats to validity are the same as for Section II.
The main question we ask is whether those different defi-
nitions of “repair transactions” yield different topologies for 1-SC 5-SC 10-SC 20-SC BFP
repair models. ALL 0.68 0.95 0.97 0.98 0.99
6
ALL BFP 1-SC 5-SC 10-SC 20-SC
Stmt_Insert-29% Stmt_Insert-32% Stmt_Upd-38% Stmt_Insert-28% Stmt_Insert-31% Stmt_Insert-33%
Stmt_Del-23% Stmt_Del-23% Add_Funct-14% Stmt_Upd-24% Stmt_Upd-19% Stmt_Del-16%
Stmt_Upd-15% Stmt_Upd-12% Cond_Change-13% Stmt_Del-11% Stmt_Del-14% Stmt_Upd-16%
Param_Change-6% Param_Change-7% Stmt_Insert-12% Add_Funct-10% Add_Funct-8% Param_Change-7%
Order_Change-5% Order_Change-6% Stmt_Del-6% Cond_Change-7% Param_Change-7% Add_Funct-7%
Add_Funct-4% Add_Funct-4% Rem_Funct-5% Param_Change-5% Cond_Change-6% Cond_Change-5%
Cond_Change-4% Cond_Change-3% Add_Obj_St-3% Add_Obj_St-3% Add_Obj_St-3% Add_Obj_St-3%
Add_Obj_St-2% Add_Obj_St-2% Order_Change-2% Rem_Funct-3% Rem_Funct-2% Order_Change-3%
Rem_Funct-2% Alt_Part_Insert-2% Rem_Obj_St-2% Order_Change-1% Order_Change-2% Rem_Funct-2%
Alt_Part_Insert-2% Rem_Funct-2% Inc_Access_Change-1% Rem_Obj_St-1% Alt_Part_Insert-1% Alt_Part_Insert-2%
Table III
T OP 10 C HANGE T YPES OF C HANGE M ODEL CT AND THEIR P ROBABILITY χi FOR D IFFERENT T RANSACTION BAGS . T HE DIFFERENT HEURISTICS USED
TO COMPUTE THE FIX TRANSACTIONS BAGS HAS A SIGNIFICANT IMPACT ON BOTH THE RANKING AND THE PROBABILITIES .
and 1-SC is 0.68. This value shows, as we have noted before, 0.14
Stmt update of variable declaration
Stmt Insert of method invocation
Stmt update of assignment
that there is not a strong correlation between the order of their Stmt update of return
Remove funct of method
0.12
repair actions of both transaction bags. In other words, heuris- Stmt delete of method invocation
0.08
On the contrary, the value between ALL and BFP is 0.99. 0.06
7
Informally, the shape of a bug fix is a kind of patch. that is needed to find a given repair shape R (demonstration
For instance, the repair shape of adding an “if” throwing an given in the companion technical report [21]):
exception for signaling an incorrect input consists of inserting
an if and inserting a throw. The concept of “repair shape” X
k
N = k such that p(1 − p)i−1 ≥ 0.5 (1)
is equivalent to what Wei et al. [3] call a “fix schema”, and
i=1
Weimer et al [2] a “mutation operator”.
In this paper, we define a “repair shape” as an unordered n!
with p = × Πr∈R PP (r)
tuple of repair actions (from a set of repair actions called R)5 . Πj (ej !)
In the if/throw example aforementioned, in repair space CTET, where ej is the number of occurrences of rj inside R
the repair shape of this bug fix consists of two repair actions:
statement insertion of “if” and statement insertion of “throw”. For instance, the repair of revision 1.2 of Eclipse’s
The shaping space consists of all possible combinations of CheckedTreeSelectionDialog7 consists of two inserted state-
repair actions. ments. Equation 1 tells us that in repair model CT, we would
The instantiation of a repair shape is what we call fix need in average 12 attempts to find the correct repair shape
synthesis. The complexity of the synthesis depends on the for this real bug.
repair actions of the shaping space. For instance, the repair Having only a repair shape is far from having a real fix.
actions of Weimer et al. [2] (insertion, deletion, replace) have However, the concept of repair shape associated with the
an “easy” and bounded synthesis space (random picking in the mathematical formula analyzing the time to navigate the repair
code base). space is key to compare ways to build a probability distribution
over repair models.
To sum up, we consider that the repair search space can
be viewed as the combination of the fault localization space C. Comparing Probability Distributions Over Repair Actions
(where the repair is likely to be successful), the shaping space From Versioning History
(which kind of repair may be applied) and the synthesis space We have seen in Section V-B that the time for finding
(assigning concrete statements and values to the chosen repair correct repair shapes depends on a probability distribution over
actions). The search space can then be loosely defined as the repair actions. The probability distribution P is crucial for
Cartesian product of those spaces and its size then reads: minimizing the search space traversal: a good distribution P
results in concentrating on likely repairs first, i.e. the repair
|FAULT L OCALIZATION| × |S HAPE| × |S YNTHESIS|
space is traversed in a guided way, by first exploring the parts
In this paper, we concentrate on the shaping part of the of the space that are likely to be more fruitful. This poses
space. If one can find efficient strategies to navigate through two important questions: first, how to set up a probability
this shaping space, this would contribute to efficiently navi- distribution over repair actions; second, how to compare the
gating through the repair search space as a whole, thanks to efficiency of different probability distributions to find good
the combination. repair shapes.
To compute a probability distribution over repair actions,
we propose to learn them from software repositories. For
B. Mathematical Analysis Over Repair Models
instance, if many bug fixes are made of inserted method
To analyze the shaping space, we now present a mathemati- calls, the probability of applying such a repair action should
cal analysis of our probabilistic repair models. So far, we have be high. Despite our single method (learning the probability
two repair models CT and CTET (see IV) and different ways distributions from software repositories), we have shown in IV
to parametrize them. that there is no single way to compute them (they depend on
According to our probabilistic repair model, a good naviga- different heuristics). To compare different distributions against
tion strategy consists on concentrating on likely repairs first: each other, we set up the following process.
the repair shape is more likely to be composed of frequent One first selects bug repair transactions in the versioning
repair actions. That is a repair shape of size n is predicted by history. Then, for each bug repair transaction, one extracts its
drawing n repair actions according to the probability distribu- repair shape (as a set of repair actions of a repair model). Then
tion over the repair model. Under the pessimistic assumption one computes the average time that a maximum likelihood
that repair actions are independent6 , our repair model makes approach would need to find this repair shape using equation 1.
it possible to know the exact median number of attempts N Let us assume two probability distributions P1 and P2
over a repair model and four fixes (F1 . . . F4 ) consisting
5 Since a bug fix may contain several instances of the same repair actions of two repair actions and observed in a repository. Let us
(e.g. several statement insertions), the repair shape may contain several times assume that the time (in number of attempts) to find the
the same repair action.
6 Equation (1) holds if and only if we consider them as independent. If exact shape of F1 . . . F4 according to P1 is (5, 26, 9, 12) and
they are not, it means that we under-estimate the deep structure of the repair according to P2 (25, 137, 31, 45). In this case, it’s clear that the
space, hence we over-approximate the time to navigate in the space to find the
correct shape. In other words, even if the repair actions are not independent 7 “Fix for 19346 integrating changes from Sebastian Davids” http://goo.gl/
(which is likely for some of them) our conclusions are sound. d4OSi
8
Input: C ⊲ A bag of transactions
Output: The median number of attempts to find good repair shapes
begin
Ω ← {} ⊲ Result set
T, E ← split(C) ⊲ Cross-validation: split C into Training and Evaluation data
M ← train_model(T ) ⊲ Train a repair model (e.g. compute a probability distribution over repair actions)
for s ∈ E ⊲ For all repairs observed in the repository
do
n ← compute_repairability(s, M ) ⊲ How long to find this repair according to the repair model
Ω←R∪n ⊲ Store the “repairability” value of s
return median(Ω) ⊲ Returning the median number of attempts to find the repair shapes
Figure 3. An Algorithm to Compare Fix Shaping Strategies. There may be different flavors of functions split, f and computeRepairability.
Table V
T HE MEDIAN NUMBER OF ATTEMPTS ( IN BOLD ) REQUIRED TO FIND THE CORRECT REPAIR SHAPE OF FIX TRANSACTIONS . T HE VALUES IN BRACKETS
INDICATE THE NUMBER OF FIX TRANSACTIONS TESTED PER PROJECT AND PER TRANSACTION SIZE FOR REPAIR MODEL CT. T HE REPAIR MODEL CT IS
MADE FROM THE DISTRIBUTION PROBABILITY OF CHANGES INCLUDED IN 5-SC TRANSACTION BAGS . F OR SMALL TRANSACTIONS , FINDING THE
CORRECT REPAIR SHAPE IN THE SEARCH SPACE IS DONE IN LESS THAN 100 ATTEMPTS .
probability distribution P1 enables us to find the correct repair compute the probability distributions). We repeat the process
shapes faster (the shaping time for P1 are lower). Beyond this 14 times, by testing each of the 14 projects separately. In
example, by applying the same process over real bug repairs other words, we try to predict real repair shapes found in one
found in a software repository, our process enables us to select repository from data learned on other software projects.
the best probability distributions for a given a repair model. Figure 3 sums up this algorithm to compare fix shaping
Since equation 1 is parametrized by a number of repair ac- strategies. From a bag of transactions C, function split creates
tions, we instantiate this process for all bug repair transactions a set of testing transactions and a set of evaluation transactions.
of a certain size (in terms of AST changes). This means that Then, one trains a repair model (with function trainM odel),
our process determines the best probability distribution for a for repair models CT and CTET it means computing a proba-
given bug fix shape size. bility distribution on a specific bag of transactions. Finally, for
each repair of the testing data, one computes its “repairability”
D. Cross-Validation according to the repair model (with Equation 1). The algorithm
We compute different probability distributions Px from returns the median repairability, i.e. the median number of
transaction bags found in repositories. We evaluate the time to attempts required to repair the test data.
find the shape of real fixes that are also found in repositories,
which may bias the results. To overcome this problem, we use E. Empirical Results
cross-validation: we always use different sets of transactions We run our fix shaping process on our dataset of 14
to estimate P and to calculate the average number of attempts repositories of Java software considering two repair models:
required to find a correct repair shape. Using cross-validation CT and CTET (see Section II-C). We remind that CT consists
reduces the risk of overfitting. of 41 repair actions and CTET of 173 repair actions. For both
Since we have a dataset of 14 independent software repos- repair models, we have tested the different heuristics of IV-A
itories, we use this dataset structure for cross-validation. to compute the median repair time: all transactions (ALL); one
We take one repository for extracting repair shapes and the AST change (1-SC); 5 AST changes (5-SC); 10 AST changes
remaining 13 projects to calibrate the repair model (i.e. to (10-SC); 20 AST changes (20-SC); transactions with commit
9
500
500
20000 12000020000 EQ EQ
EQ
1-SC 1-SC
1-SC
5-SC 5-SC
400
400
5-SC 10-SC10-SC
300
100000
300 20-SC 15000 BFP BFP
ALL ALL
10000
100 0
1.0 2.0 3.0 4.0
10000 60000
0
1.0 2.0 3.0 4.0
5000
40000
5000
0
20000 0 1 2 3 4 5 6 7 8 9
Repair size (in # AST changes) Repair size (in # AST changes)
Figure 4. The repairability of small transactions in repair model CT. Certain Figure 5. The repairability of small transactions in repair space CTET. There
probability distributions yield a median repair time that is much lower than is no way to find the repair shapes of transactions larger than 4 AST code
others. changes.
text containing “bug”, “fix”, “patch” (BFP); a baseline of a us confidence that one could apply our approach to any new
uniform distribution over the repair model (EQP for equally- project using the probability distributions mined in our dataset.
distributed probability). Furthermore, finding the correct repair shapes of larger
We extracted all bug fix transactions with less than 8 transactions (up to 8 AST changes) has an order of magnitude
AST changes from our dataset. For instance, the versioning of 104 and not more. Theoretically, for a given fix shape
repository of DNSJava contains 165 transactions of 1 repair of n AST changes, the size of the repair model is the
action, 139 transactions of size 2, 71 transactions of size 3, number of repair actions of the model at the power of n
etc. The biggest number of available repair tests are in jdt.core (e.g. |CT |n ). For CT and n = 4, this results in a space of
(1,605 fixes consist of one AST change), while Jhotdraw has 414 = 2,825,761 possible shapes (approx 106 ). In practice,
only 2 transactions of 8 AST changes. We then computed overall all projects, for small shapes (i.e. less or equal than 3
the median number of attempts to find the correct shape changes), a well-defined probability distribution can guide to
of those 23,048 fix transactions. Since this number highly the correct shape in a median time lower than 200 attempts.
depends on the probability distributions Px , we computed the This again show that the probability distribution over the repair
median repair time for all combinations of fix size transactions, model is so unbalanced that the likelihood of possible shapes
projects, and heuristics discussed above (8 × 14 × 6). is concentrated on less than 104 shapes (i.e. that the probability
Table V presents the results of this evaluation for repair density over |CT |n is really sparse).
space CT and transaction bag 5-SC. For each project, the Now, what is the best heuristic, with respect to shaping, to
bold values give the median repairability in terms of number train our probabilistic repair models? For each repair shape
of attempts required to find the correct repair shape with a size of Table V and heuristic, we computed the median
maximum likelihood approach. Then, the bracketed values repairability over all projects of the dataset (a median of
give the number of transactions per transaction size (size in median number of attempts). We also compute the median
number of AST changes) and per project. For instance, over repairability for a baseline of a uniform distribution (EQP)
996 fix transactions of size 1 in the ArgoUML repository, over the repair model (i.e. ∀i, P (ri ) = 1/|CT |)). Figure 4
it takes an average of 6 attempts to find the correct repair presents this data for repair model CT. It shows the median
shape. On the contrary, for the 51 transactions of size 8 in the number of attempts required to identify correct repair shapes as
Tomcat repository, it takes an average of 34,240 attempts to Y-axis. The X-axis is the number of repair actions in the repair
find the correct repair shape. Those results are encouraging: test (the size). Each line represents probability estimation
for small transactions, it takes a handful of attempts to find heuristics.
the correct repair shape. The probability distribution over the Figure 4 gives us important pieces of information. First, the
repair model seems to drive the search efficiently. The other heuristics yield different repair time. For instance, the repair
heuristics yield similar results – the complete results (6 tables time for heuristic 1-SC is generally higher than for 20-SC.
– one per heuristic) are given in [21]. Overall, there is a clear order between the repairability time:
About cross-validation, one can see that the performance for transactions with less than 5 repair actions heuristic 5-SC
over the 14 runs (one per project) is similar (all columns gives the best results, while for bigger transactions 20-SC is
of Table V contain numbers that are of similar order of the best. Interestingly, certain heuristics are inappropriate for
magnitude). Given our cross-validation procedure, this means maximum-likelihood shaping of real bug fixes: the resulting
that for all projects, we are able to predict the correct shapes distributions of probability results in a repair time that ex-
using only knowledge mined in the other projects. This gives plodes even for small shape (this is the case for a uniform
10
distribution EQP even for shape of size 3). Also, all median Finally, we think that our results empirically explore some
repair times tend toward infinity for shape of size larger than of the foundations of “repairing”: there is a difference between
9. Finally, although 1-SC is not good over many shape size, prescribing aspirin (it has a high likelihood to contribute to
we note that that for small shape of size 1 is better. This is healing, but only partially) and prescribing a specific medicine
explained by the empirical setup (where we also decompose (one can try many medicines before finding the perfect one).
transactions by shape size).
1) On The Best Heuristics for Computing Probability Dis- VI. ACTIONABLE G UIDELINES FOR AUTOMATED
tributions over Repair Actions: To sum up, for small repair S OFTWARE R EPAIR
shapes heuristic 1-SC is the best with respect to probabilistic Our results blend empirical findings with theoretical in-
repair shaping, but it is not efficient for shapes of size greater sights. How can they be used within a approach for automated
than two AST-level changes. Heuristics 5-SC and 20-SC are software repair? This section presents actionable guidelines
the best for changes of size greater than 2. An important point arising from our results. We apply those guidelines in a case
is that some probability distributions (in particular built from study that consists of reasoning on a simplified version of
heuristics EQP and 1-SC) are really suboptimal for quickly GenProg within our probabilistic framework.
navigating into the search space.
Do those findings hold for repair model CTET, which has A. Consider Using a Probability Distribution over Repair
a finer granularity? Actions
2) On The Difference between Repair Models CT and Automated software repair embed a set of repair actions,
CTET: We have also run the whole evaluation with the repair either explicitly or implicitly. On two different repair models,
model CTET (see II-C). The empirical results are given in the we have shown that the importance of each repair action
companion technical report [21](in the same form as Table V). greatly varies. Furthermore, our mathematical analysis has
Figure 5 is the sibling of figure 4 for repair model CTET. proved that considering a uniform distribution over repair
They look rather different. The main striking point is that with actions is extremely suboptimal.
repair model CTET, we are able to find the correct repair shape Hence, from the viewpoint of the time to fix a bug, we rec-
for fixes that are no larger than 4 AST changes. After that, the ommend to set up a probability distribution over the considered
arithmetic of very low probabilities results in virtually infinite repair actions. This probability distribution can be learned on
time to find the correct repair shape. On the contrary, in the past data as we do in this paper or simply tuned with an
repair model CT, even for fixes of 7 changes, one could find incremental evaluation process. For instance, Le Goues et al.
the correct shape in a finite number of attempts. Finally, in this [30] have done similar probabilistic tuning over their three
repair model the average time to find a correct repair shape is repair actions. Overall, using a probability distribution over
several times larger than in CT (in CT, the shape of fixes of repair actions could significantly fasten the repair process.
size 3 can be find in approx. 200 attempts, in CTET, it’s more
around 6,000). B. Be Aware of the Interplay between Shaping and Synthesis
For a given repair shape, the synthesis consists of finding We have shown that having more precise shapes has a real
concrete instances of repair actions. For instance, if the pre- impact on shaping time. In repair model CT, for fix shapes of
dicted repair action in CTET consists of inserting a method size 3, the logical shaping time is approximately 150 attempts.
call, it remains to predict the target object, the method and its In repair model CTET, for shapes of the same size, the average
parameters. We can assume that the more precise the repair logical time jumps around 4,000, which represents more than
action, the smaller the “synthesis space”. For instance, in a ten-fold increase. Our work quantitatively highlights the
CTET, the synthesis space is smaller compared to CT, because impact of consider more precise repair actions. By being aware
it only composed of enriched versions of basic repair actions of the interplay between shaping and synthesis, the research
of repair model CT (for instance inserting an “if” instead of community will be able to create a disciplined catalog of
inserting a statement). repair actions and to identify where the biggest synthesis
Our results illustrate the tension between the richness of the challenges lie.
repair model and the ease of fixing bugs automatically. When
we consider CT, we find likely repair shapes quickly (less C. Analyze the Repairability depending on The Fix Size
than 5,000 attempts), even for large repair, but to the price We have shown that certain repair shapes are impossible to
of a larger synthesis space. In other words, there is a balance find because of their size. In repair model CT, the shapes of
between finding correct repair actions and finding concrete more than 10 repair actions are not found in a finite time. In
repair actions. When the repair actions are more abstract, it repair model CTET, the repair shapes of more than 5 actions
results in a larger synthesis space, when repair actions are more are not found either. Given that a repair shape is an abstraction
concrete, it hampers the likelihood of being able to concentrate over a concrete bug fix, if one can not find the abstraction,
on likely repair shapes first. We conjecture that the profile there is no chance to find the concrete bug fix.
based on CT is better because of the following two points: Our analysis for identifying this limit is agnostic of the re-
it enables us to find bigger correct repair shapes (good) in a pair actions. Hence one can use our methodology and equation
smaller amount of time (good). to analyze the size of the “findable” fixes. Our probabilistic
11
pinsert
1 // insert 1 repair distribution as: pinsert (asti , placek ) = nplace ∗nast ,
2 if (a == 0) { // ast 1 pdelete pinsert
3 // insert 2 pdelete (astj ) = nast , pswap (asti , astj ) = (nast )2 .
4 System.out.println(b); // ast 2 With a uniform distribution pinsert = pdelete = pswap =
5 // insert 3
6 } 1/3, formula 1 yields that the logical time to fix this particular
7 // insert 4 bug (insertion of node #8 at place #3) is 219 attempts (not
8 while (b != 0) { // infinite loop // ast 3
9 // insert 5 that it is not anymore a shaping time, but the real number
10 if (a > b) { // ast 4 of required runs). However, we observed over real bug fix
11 // insert 6
12 a = a - b; // ast 5 that pinsert > pdelete (see Table III). What if we distort the
13 // insert 7 uniform distribution over the repair model to favor insertion?
14 } else {
15 // insert 8 The following table gives the results for arbitrary distributions
16 b = b - a; // ast 6 spanning different kinds of distribution:
17 // insert 9
18 }
19 // insert 10 pinsert pdelete pswap Logical time
20 } .33 .33 .33 219
21 // insert 11
22 System.out.println(a); // ast 7 .39 .28 .33 185
23 // insert 12 .45 .22 .33 160
24 return; // ast 8
25 // insert 13 .40 .40 .20 180
26 } .50 .30 .20 144
Listing 1. The infinite loop bug of Weimer et al’s bug [2]. Code insertion
can be made on 13 places, 8 AST subtrees can be deleted or copied.
.60 .20 .20 120
This table shows that as soon as we favor insertion over
deletion of code, the logical time to find the repair do actually
framework enables one to understand the theoretical limits of decrease.
certain repair processes. Interestingly, the same kind of reasoning applies to fault
Let us now apply those three guidelines on a small case localization. Let’s assume that a fault localizer filters out half
study. of the possible places where to modify code (i.e. nplace = 7).
Under the uniform distribution and the space concrete repair
D. Case Study: Reasoning on GenProg within our Probabilis- space, the logical time to find the fix decreases from 219 to
tic Framework 118 runs.
We now aim at showing than our model also enables to c) Repairability and Fix Size: We consider the same
reason on Weimer et al’s [2] example program. This program, model but on larger programs with fault localization, for
shown in Listing 1, implements Euclid’s greatest common instance 100 AST nodes and 20 potential places for changes.
divisor algorithm, but runs in an infinite loop if a = 0 and Let us assume that the concrete fix consists of inserting
b > 0. The fix consists of adding a “return” statement on line node #33 at place #13. Under a uniform distribution, the
6. corresponding repair time according to formula 1 is ≥ 20,000
a) Probability Distribution: In Weimer et al’s repair runs. Let us assume that the concrete fix consists of two repair
approach, the repair model consists of three repair actions: actions: inserting node #33 at place #13 and deleting node #12.
inserting statements, deleting statements, and swapping state- Under a uniform distribution, the repair time becomes 636,000
ments8 . By statements, they mean AST subtrees. With a runs, a 30-fold increase.
uniform probability distribution, the logical time to find the Obviously, for sake of static typing and runtime semantics,
correct shape is 4 (from Equation 1). If one favors insertion the nodes can not be inserted anywhere, resulting in lower
over deletion and swap, for instance by setting pinsert=0.6 , the number of runs. However, we think that more than the logical
median logical time to find the correct repair action becomes time, what matters is the order of magnitude of the difference
2 which is twice faster. Between 2 and 4, it seems negligible, between the two scenarios. Our results indicate that it is
but for larger repair models, the difference might be counted very hard to find concrete fixes that combine different repair
in days, as we show now. actions.
b) Shaping and Synthesis: In the GCD program, there are Let us now be rather speculative. Those simulation results
nplace = 13 places where nast = 8 AST statements can be contribute to the debate on whether past results on evolution-
inserted. In this case, the size synthesis space can be formally ary repair are either evolutionary or guided random search
approximate: the number of possible insertion is nplace ∗ nast ; [31]. According to our simulation results, it seems that the
the number of possible deletion is nast ; the number of possible evolutionary part (combining different repair actions) is indeed
swap is (nast )2 . extremely challenging. On the other hand, our simulation does
This enables us to apply our probabilistic reasoning at not involve fitness functions, it is only guided random search,
the level of concrete fix as follows. We define the concrete what we would call “Monte Carlo” repair. A good fitness
8 In more recent versions of GenProg, swapping has been replaced by function might counter-balance the combinatorial explosion of
“replacing”. repair actions.
12
VII. R ELATED W ORK code. It also remains at granularity that is coarser compared
to our analysis. Fluri et al. [7] gives some frequency numbers
d) Empirical Studies of Versioning Transactions: Pu- of their change types in order to validate the accuracy and
rushothaman and Perry [14] studied small commits (in terms the runtime performance of their distilling algorithm. Those
of number of lines of code) of proprietary software at Lucent numbers were not — and not meant to be — representative of
Technology. They showed the impact of small commits with the overall abundance of change types. Giger et al. [37] discuss
respect to introducing new bugs, and whether they are oriented the relations between 7 categories of change types and not the
toward corrective, perfective or adaptive maintenance. German detailed change actions as we do.
[11] asked different research questions on what he calls f) Automated Software Repair: We have already men-
“modification requests” (small improvements or bug fix), in tioned many pieces of work on automated software repair (incl.
particular with respect to authorship and change coupling [1], [2], [3], [4], [5], [38]). We have discussed in details the
(files that are often changed together). Alali and colleagues relationship of our work with GenProg. Let us now compare
[13] discussed the relations between different size metrics for with the other close papers.
commits (# of files, LOC and # of hunks), along the same line Wei et al. [3] presented AutoFix-E, an automated repair
as Hattori and Lanza [12] who also consider the relationship tool which works with contracts. In our perspective, AutoFix-
between commit keywords and engineering activities. Finally, E is based on two repair actions: adding sequences of state-
Hindle et al. [10], [32] focus on large commits, to determine changing statements (called “mutators”) and adding a precon-
whether they reflect specific engineering activities such as dition (of the form of an “if” conditional). Their fix schemas
license modifications. Compared to these studies on commits are combinations of those two elementary repair actions. In
that mostly focus, on metadata (e.g. authorship, commit text) contrast, we have 173 basic repair actions and we are able
or size metrics (number of changer files, number of hunks, to predict repair shapes that consist of combinations of 4
etc.), we discuss the content of commits and the kind of source repair actions. However, our approach is more theoretical
code change they contain. Fluri et al. [33] and Vaucher et al. than theirs. Our probabilistic view on repair may fasten their
[34] studied the versioning history to find patterns of change, repair approach: it is likely that not all “fix schemas” are
i.e. groups of similar versioning transactions. equivalent. For instance, according to our experience, adding
Pan et al. [24] manually identified 27 bug fix patterns on a precondition is a very common kind of fix in real bugs.
Java software. Those patterns are precise enough to be auto- Debroy et al. [39] invented an approach to repair bugs
matically extractable from software repositories. They provide using mutations inspired from the field of mutation testing.
and discuss the frequencies of the occurrence of those patterns The approach uses a fault localization technique to obtain
in 7 open source projects. This work is closely related to the candidate faulty locations. For a given location, it applies
ours: we both identify automatically extractable repair actions mutations, producing mutants of the program. Eventually, a
of software. The main difference is that our repair actions mutant is classified as “fixed” if it passes the test suite of
are discovered fully automatically based on AST differencing the program. Their repair actions are composed of mutations
(there is no prior manual analysis to find them). Furthermore, of arithmetic, relational, logical, and assignment operators.
since our repair actions are meant to be used in an automated Compared to our work, mutating a program is a special kind
program repair setup, they are smaller and more atomic. of fix synthesis where no explicit high-level repair shapes
Kim and et al. [35] use versioning history to mine project- are manipulated. Also, in the light of our results, we assume
specific bug fix patterns. Williams and Hollingsworth [36] also that a mutation-based repair process would be faster using
learn some repair knowledge from versioning history. They probabilities on top of the mutation operators.
mine how to statically recognize where checks on return values Kim et al. [40] introduced PAR, an algorithm that generates
should be inserted. Livshits and Zimmermann [15] mine co- program patches using a set of 10 manually written fix
changed method calls. The difference with those close pieces templates. As GenProg, the approach leverages evolutionary
of research is that we enlarge the scope of mined knowledge: computing techniques to generate program patches. We share
from project-specific knowledge [35] to domain-independant with PAR the idea of extracting repair knowledge from human-
repair actions, and from one single repair action [36], [15] to written patches. Beyond this high-level point in common,
41 and 173 repair actions. there are three important differences. First, they do a manual
e) Abstract Syntax Tree Differencing: The evaluation of extraction of fix patterns (by reading 62,656 patches) while
AST differencing tools often gives hints about common change we automatically mine them from the past commits. Second,
actions of software. For instance, Raghavan et al. [19] showed PAR patterns and our repair actions are expressed at a different
the six most common types of changes for the Apache web granularity. PAR patterns contain a specification of the context
server and the GCC compiler, the number one being “Altering that matches a piece of AST, a specification of analysis (e.g.
existing function bodies”. This example clearly shows the to collect compatible expressions in the current scope), and a
difference with our work: we provide change and repair actions specification of change. Our repair actions correspond to this
at a very fine granularity. Similarly, Neamtiu et al. [20] gives last part. While their patterns are operational, their change
interesting numerical findings about software evolution such specifications are ad hoc (due to the process of manually
as the evolution of added functions and global variables of C specifying templates). On the contrary, our specification of
13
repair actions are systematic and automatically extracted, but [10] A. Hindle, D. M. German, and R. Holt, “What do large commits tell
our approach is more theoretical and we do not fix concrete us?: a taxonomical study of large commits,” in Proceedings of the
International Working Conference on Mining Software Repositories,
bugs. This shows again that the foundations of their approach 2008.
contains more manual work than ours: a PAR pattern is a [11] D. M. German, “An empirical study of fine-grained software modifica-
manually identified repair schema where all the synthesis rules tions,” Empirical Softw. Engineering, vol. 11, no. 3, pp. 369–393, 2006.
[12] L. Hattori and M. Lanza, “On the nature of commits,” in Proceedings
are manually encoded. Finally, we think it is possible to marry of 4th International ERCIM Workshop on Software Evolution and
our approaches by decorating their templates with probability Evolvabillity (EVOL), pp. 63 –71, 2008.
distributions (whether mined or not) so as to speed up the [13] A. Alali, H. Kagdi, and J. Maletic, “What’s a typical commit? a
characterization of open source software repositories,” in Proceedings of
repair. the IEEE International Conference on Program Comprehension, 2008.
[14] R. Purushothaman and D. Perry, “Toward understanding the rhetoric of
VIII. C ONCLUSION small source code changes,” IEEE Transactions on Software Engineer-
In this paper, we have presented the idea that one can ing, vol. 31, pp. 511 – 526, june 2005.
[15] B. Livshits and T. Zimmermann, “Dynamine: finding common error
mine repair actions from software repositories. In other words, patterns by mining software revision histories,” in Proceedings of the
one can learn from past bug fixes the main repair actions European software engineering conference held jointly with Interna-
(e.g. adding a method call). Those repair actions are meant tional Symposium on Foundations of Software Engineering, 2005.
[16] R. Robbes, Of Change and Software. PhD thesis, University of Lugano,
to be generic enough to be independent of the kinds of bug 2008.
and the software domains. We have discussed and applied a [17] E. Giger, M. Pinzger, H. Gall, T. Xie, and T. Zimmermann, “Comparing
methodology to mine the repair actions of 62,179 versioning fine-grained source code changes and code churn for bug prediction,”
in Working Conference on Mining Software Repositories, 2011.
transactions extracted from 14 repositories of 14 open-source [18] M. Monperrus and M. Martinez, “Cvs-vintage: A dataset of 14 cvs
projects. We have largely discussed the rationales and conse- repositories of java software,” Tech. Rep. hal-00769121, INRIA, 2012.
quences of adding a probability distribution on top of a repair [19] S. Raghavan, R. Rohana, D. Leon, A. Podgurski, and V. Augustine,
“Dex: a semantic-graph differencing tool for studying changes in large
model. We have shown that certain distributions over repair code bases,” in Proceedings of the 20th IEEE International Conference
actions can result in an infinite time (in average) to find a on Software Maintenance, 2004.
repair shape while other fine-tuned distributions enable us to [20] I. Neamtiu, J. S. Foster, and M. Hicks, “Understanding source code
evolution using abstract syntax tree matching,” in Proceedings of the
find a repair shape in hundreds of repair attempts. International Workshop on Mining Software Repositories, 2005.
The main direction of future work consists of going be- [21] M. Martinez and M. Monperrus, “Appendix of "On Mining Software
yond empirical results and theoretical analysis. We are now Repair Models and their Relations to the Search Space of Automated
Program Fixing",” Tech. Rep. hal-00903804, INRIA, 2013.
exploring how to use this learned knowledge (of the form [22] D. of Mathematics of the University of York, “Statistical tables.” http:
of probabilistic repair models) to fix real bugs. In particular, //www.york.ac.uk/depts/maths/tables/, Last visited: April 9 2013.
we are planning to work on using probabilistic models to see [23] A. Murgia, G. Concas, M. Marchesi, and R. Tonelli, “A machine learning
approach for text categorization of fixing-issue commits on CVS,” in
whether one can faster repair the bugs of PAR’s and GenProg’s Proceedings of the International Symposium on Empirical Software
datasets. The latter involves having a Java implementation Engineering and Measurement, 2010.
of GenProg and would advance our knowledge on whether [24] K. Pan, S. Kim, and E. J. Whitehead, “Toward an understanding of bug
fix patterns,” Empirical Software Engineering, vol. 14, no. 3, pp. 286–
GenProg’s efficiency is really language-independent (Segfaults 315, 2008.
and buffer overruns do not exists in Java). [25] R. Wu, H. Zhang, S. Kim, and S.-C. Cheung, “Relink: recovering links
between bugs and changes,” in Proceedings of the 2011 Foundations of
R EFERENCES Software Engineering Conference, pp. 15–25, 2011.
[1] W. Weimer, “Patches as better bug reports,” in Proceedings of the [26] C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and
International Conference on Generative Programming and Component P. Devanbu, “Fair and balanced?: bias in bug-fix datasets,” in Proceed-
Engineering, 2006. ings of the 7th joint meeting of the European Software Engineering
[2] W. Weimer, T. Nguyen, C. L. Goues, and S. Forrest, “Automatically Conference and the ACM SIGSOFT Symposium on the Foundations of
finding patches using genetic programming,” in Proceedings of the Software Engineering, ESEC/FSE ’09, pp. 121–130, ACM, 2009.
International Conference on Software Engineering, 2009. [27] J. Cohen et al., “A coefficient of agreement for nominal scales,”
[3] Y. Wei, Y. Pei, C. A. Furia, L. S. Silva, S. Buchholz, B. Meyer, and Educational and psychological measurement, vol. 20, no. 1, pp. 37–46,
A. Zeller, “Automated fixing of programs with contracts,” in Proceedings 1960.
of the International Symposium on Software Testing and Analysis, AC, [28] J. R. Landis and G. G. Koch, “The measurement of observer agreement
2010. for categorical data.,” Biometrics, vol. 33, no. 1, pp. 159–174, 1977.
[4] V. Dallmeier, A. Zeller, and B. Meyer, “Generating fixes from object [29] F. L. Joseph, “Measuring nominal scale agreement among many raters,”
behavior anomalies,” in Proceedings of the International Conference on Psychological bulletin, vol. 76, no. 5, pp. 378–382, 1971.
Automated Software Engineering, 2009. [30] C. L. Goues, W. Weimer, and S. Forrest, “Representations and operators
[5] A. Arcuri, “Evolutionary repair of faulty software,” Applied Soft Com- for improving evolutionary software repair,” in Proceedings of GECCO,
puting, vol. 11, no. 4, pp. 3494–3514, 2011. pp. 959–966, 2012.
[6] D. Gopinath, M. Z. Malik, and S. Khurshid, “Specification-based pro- [31] A. Arcuri and L. Briand, “A practical guide for using statistical tests to
gram repair using sat,” in Proceedings of the International Conference assess randomized algorithms in software engineering,” in Proceedings
on Tools and Algorithms for the Construction and Analysis of Systems, of the 33rd International Conference on Software Engineering, pp. 1–10,
2011. ACM, 2011.
[7] B. Fluri, M. Wursch, M. Pinzger, and H. Gall, “Change distilling: [32] A. Hindle, D. German, M. Godfrey, and R. Holt, “Automatic classication
Tree differencing for fine-grained source code change extraction,” IEEE of large changes into maintenance categories,” in Proceedings of the
Transactions on Software Engineering, vol. 33, pp. 725 –743, nov. 2007. debroInternational Conference on Program Comprehension, 2009.
[8] M. Bóna, A Walk Through Combinatorics: An Introduction to Enumer- [33] B. Fluri, E. Giger, and H. C. Gall, “Discovering patterns of change
ation and Graph Theory. World Scientific, 2011. types,” in Proceedings of the International Conference on Automated
[9] M. Martinez and M. Monperrus, “Mining repair actions for guiding Software Engineering, 2008.
automated program fixing,” tech. rep., INRIA, 2012.
14
[34] S. Vaucher, H. Sahraoui, and J. Vaucher, “Discovering new change [38] A. Carzaniga, A. Gorla, N. Perino, and M. Pezzè, “Automatic
patterns in object-oriented systems,” in Proceedings of the Working workarounds for web applications,” in Proceedings of the 2010 Foun-
Conference on Reverse Engineering, 2008. dations of Software Engineering Conference, pp. 237–246, ACM, 2010.
[35] S. Kim, K. Pan, and E. J. Whitehead, “Memories of bug fixes,” in [39] V. Debroy and W. Wong, “Using mutation to automatically suggest fixes
Proceedings of the 14th ACM SIGSOFT International Symposium on for faulty programs,” in Proceedings of the International Conference on
Foundations of Software Engineering, 2006. Software Testing, Verification and Validation (ICST), pp. 65–74, IEEE,
[36] C. C. Williams and J. K. Hollingsworth, “Automatic mining of source 2010.
code repositories to improve bug finding techniques,” IEEE Transactions [40] D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation
on Software Engineering, vol. 31, no. 6, pp. 466–480, 2005. learned from human-written patches,” in Proceedings of the 2013
[37] E. Giger, M. Pinzger, and H. C. Gall, “Can we predict types of International Conference on Software Engineering, pp. 802–811, IEEE
code changes? an empirical analysis,” in Proceedings of the Working Press, 2013.
Conference on Mining Software Repositories, pp. 217–226, 2012.
15