RLocator
RLocator
Abstract—Software developers spend a significant portion of (in terms of time and resources), especially when there are a lot
time fixing bugs in their projects. To streamline this process, of files and bug reports. Moreover, the number of bugs reported
bug localization approaches have been proposed to identify the is often higher than the number of available developers [2].
source code files that are likely responsible for a particular bug.
Prior work proposed several similarity-based machine-learning Consequently, the fix-time and maintenance costs rise when the
techniques for bug localization. Despite significant advances in customer satisfaction rate decreases [3].
these techniques, they do not directly optimize the evaluation Bug Localization is a method that refers to identifying the
measures. We argue that directly optimizing evaluation measures source code files where a particular bug originated. Given a
can positively contribute to the performance of bug localization bug report, bug localization approaches utilize the textual in-
approaches. Therefore, in this paper, we utilize Reinforcement
Learning (RL) techniques to directly optimize the ranking formation in the bug report, and the project source code files
metrics. We propose RLOCATOR, a Reinforcement Learning- to shortlist the potentially buggy files. Prior work has pro-
based bug localization approach. We formulate RLocator using posed various Information Retrieval-based Bug Localization
a Markov Decision Process (MDP) to optimize the evaluation (IRBL) approaches to help developers speed up the debugging
measures directly. We present the technique and experimentally process (e.g., Deeplocator [4], CAST [5], KGBugLocator [6],
evaluate it based on a benchmark dataset of 8,316 bug reports
from six highly popular Apache projects. The results of our BL-GAN [7]).
evaluation reveal that RLocator achieves a Mean Reciprocal One common theme among these approaches is that they
Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, follow a similarity-based approach to localize bugs. Such tech-
and a Top 1 score of 0.46. We compare RLocator with three niques measure the similarity between bug reports and the
state-of-the-art bug localization tools, FLIM, BugLocator, and source code files. For estimating similarity, they use various
BL-GAN. Our evaluation reveals that RLocator outperforms both
approaches by a substantial margin, with improvements of 38.3% methods such as cosine distance [8], Deep Neural Networks
in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These (DNN) [9], and Convolutional Neural Networks (CNN) [5].
findings highlight that directly optimizing evaluation measures Then, they rank the source code files based on their simi-
considerably contributes to performance improvement of the bug larity score. In the training phase of these approaches, the
localization problem. model learns to optimize the similarity metrics. In contrast,
Index Terms—Reinforcement learning, bug localization, deep in the testing phase, the model is tested with ranking met-
learning. rics (e.g., Mean Reciprocal Rank (MRR) or Mean Average
Precision (MAP)).
While most of these approaches showed promising perfor-
mance, they optimize a metric that indirectly represents the
I. INTRODUCTION
performance metrics. Prior studies [10], [11], [12], [13] found
0098-5589 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2696 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
decision. This allows RL to use evaluation measures such as various companies [20] and domains [21] leverage its capacity
MRR and MAP in the training phase and directly optimize the for iterative learning and adjustment.
evaluation metrics. Moreover, because of using MRR/MAP as a The proficiency required for bug localization is often ac-
signal instead of a label, the problem of overfitting will be less quired through experience, with seasoned developers exhibiting
prevalent. Markov Decision Process (MDP) is a foundational a faster bug-finding aptitude than their less experienced coun-
element of RL. MDP is a mathematical framework that allows terparts [22]. Recognizing the significance of experience in bug
the formalization of discrete-time decision-making problems localization, we propose the integration of reinforcement learn-
[15]. Real-world problems often need to be formalized as MDP ing into this domain. By employing RL, a model can present
to apply RL. developers with sets of source code files as possible causes
In this paper, we present RLocator, an RL technique for for a bug and learn from their feedback to enhance its skill
localizing software bugs in source code files. We formulate in localizing bugs in the software. In contrast to conventional
RLocator into an MDP. In each step of the MDP, we use MRR machine learning approaches, which rely solely on labeled data
and MAP as signals to guide the model to optimal choice. and lack easy adaptability, reinforcement learning presents two
We evaluate RLocator on a benchmark dataset of six Apache distinct advantages: firstly, the ability to learn from developer
projects and find that, compared with existing state-of-the-art feedback, and secondly, the elimination of the requirement for
bug localization techniques, RLocator achieves substantial per- labeled data in real-world scenarios. Therefore, our research
formance improvement. While pinpointing the exact reasons aims to incorporate reinforcement learning into bug localiza-
for RL’s superior performance over other supervised techniques tion, leveraging its capacity to adapt and enhance performance
can be challenging, RL learns more generalizable approaches, through iterative feedback.
especially in dynamic and complex environments. In compar-
ison to supervised learning, it learns approaches that are more III. BACKGROUND
adaptable to a variety of situations [16], [17], which is a form
In this section, we describe terms related to the bug local-
of generalization. Additionally, RL demonstrates proficiency
ization problem, which we use throughout our study. Also, we
in scenarios where the optimal solution is not clearly defined,
present an overview of reinforcement learning.
showcasing its versatility across various tasks and domains
[14]. These factors can contribute to the superior performance
of RLocator. A. Bug Localization System
The main contributions of our work are as follows: A typical bug localization system utilizes several sources of
• We present RLocator, an RL-based software bug localiza- data, e.g., bug reports, stack traces, and logs, to identify the
tion approach. The key technical novelty of RLocator is responsible source code files. One particular challenge of the
using RL for bug localization, which includes formulating system is that the bug report contains natural language, whereas
the bug localization process into an MDP. source code files are written in a programming language.
• We provide an experimental evaluation of RLocator with Typically, bug localization systems identify whether a bug
8,316 bug reports from six Apache projects. When RLoca- report relates to a source code file. To do so, the system extracts
tor can localize, it achieves an MRR of 0.49 - 0.62, features from both the bug report and the source code files.
MAP of 0.47 - 0.59, and Top 1 of 0.38 - 0.46 across Previous studies used techniques such as N-gram [23], [24] and
all studied projects. Additionally, we compare RLoca- Word2Vec [25], [26] to extract features (embedding) from bug
tor’s performance with state-of-the-art bug localization reports and source code files. Other studies (e.g., Devlin et al.
methods. RLocator outperforms FLIM [18] by 38.3% in [27]) introduced the transformer-based model BERT which has
MAP, 36.73% in MRR, and 23.68% in Top K. Further- achieved higher performance than all the previous techniques.
more, RLocator exceeds BugLocator [3] by 56.86% in One of the reasons transformer-based models perform better in
MAP, 41.51% in MRR, and 26.32% in Top K. In terms extracting textual features is that the transformer uses multi-
of Top K, RLocator shows improved performance over head attention, which can utilize long context while generat-
BL-GAN [7], with gains ranging from 55.26% to 3.33%. ing embedding. Previous studies have proposed a multi-modal
The performance gains for MAP and MRR are 40.74% and BERT model [28] for programming languages, which can ex-
32.2%, respectively. tract features from both bug reports and source code files.
A bug report mainly contains information related to unex-
pected behavior and how to reproduce it. It mainly includes
II. MOTIVATION
a bug ID, title, textual description of the bug, and version of
Reinforcement Learning (RL) stands out for its ability to the codebase where the bug exists. The bug report may have
learn from feedback, a characteristic that empowers models an example of code, stack trace, or logs. A bug localization
to self-correct based on the outcomes of their actions. This system retrieves all the source code files from a source code
feature finds widespread application, exemplified by platforms repository at that particular version. For example, assume we
like Spotify, an audio streaming service using RL to learn user have 100 source code files in a repository in a specific version.
preferences [19]. The model evolves and adapts by presenting After retrieving 100 files from that version, the system will
music selections and refining recommendations through user in- estimate the relevance between the bug report and each of
teractions. The versatility of RL extends beyond entertainment; the 100 files. The relevance can be measured in several ways.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2697
For example, a naive system can check how many words of with entropy adds the entropy of the probability of the possible
the bug report exist in each source code file. A sophisticated action with the loss of the actor model. As a result, in the
system can compare embeddings using cosine distance [29]. gradient descent step, the model tries to maximize the entropy
After relevance estimation, the system ranks the files based on of the learned policy. Maximization of entropy ensures that the
their relevance score. The ranked list of files is the final output agent assigns almost an equal probability to an action with a
of a bug localization system that developers will use. similar return.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2698 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
words/tokens for the textual match and uses BM25 to rank the files in the first k files, it hinders RLocator training using de-
files matching the query. We use the ES index for identifying the veloper feedback and introduces noise. Therefore, we use an
topmost k source code files related to a bug report. Following XGBoost-based binary classifier [33] to identify cases where
the study by Liu et el. [32] (who used ES in the context of code ES may return no relevant files in the top k files. The rationale
search), we build an ES index using the source code files and for using XGBoost is twofold: (1) to optimize developer time
then queried the index using the bug report as the query. Then, by not presenting irrelevant files and (2) to filter out noise
we picked the first k files with the highest textual similarities during training.
with the bug report. We want to note that the goal of bug ES-based filtering is not used because its similarity values
localization is to get the relevant files to be ranked as close to are not normalized, and cosine similarity is inapplicable to text
the 1st rank as possible. Hence, metrics like MAP and MRR data. We provide the XGBoost model with the bug report and
can measure the performance of bug localization techniques. the top k files retrieved by ES to determine if any are relevant.
While one can argue why we not only rely on ES to rank the If the XGBoost model predicts no relevant files in the set, we
relevant files, we find that the MAP and MRR of using ES are exclude those bug reports and their associated files. Each bug
poor. Our RL-based technique learns from feedback and aims to report is associated with its unique set of source code files, so
rerank the output from ES to get higher MAP and MRR scores. filtering one does not impact others.
In Fig. 1, we illustrated the candidate refinement step where To build the model, we study the most important features
we query ElasticSearch using the bug report and use outputs to associated with the prediction task. We consult the related lit-
refine the candidate source code files. erature on the field of information retrieval [35], [36], [38]
Filtration of bug report and source code files: One limitation and bug report classification [34] for feature selection. The list
of ES is that it sometimes returns irrelevant files among the top of computed features is presented in Table I. For our dataset,
k most relevant source code files. When there are no relevant we calculate the selected features and trained the model using
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2699
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2700 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
corresponding concatenated embedding E1 , E2 , ..., and Ek , as action as return. The RL technique signals the agent about the
shown in Fig. 1. appropriate action in each step through the Reward Function,
Note that each state of the MDP comprises two lists: a can- which can be modeled using the retrieval metrics. Thus, the
didate list and a ranked list. The candidate list contains the RL agent can learn to optimize the retrieval metrics through
concatenated list of embedding. As shown in our example in the reward function. We consider two important factors in the
Fig. 1, the candidate list contains E1 , E2 , ..., and Ek . In the ranking evaluation: the position of the relevant files and the
candidate list, source code embeddings (code files and bug distance between relevant files in the ranked list of embedding.
reports embedding concatenated together) are ranked randomly. We incorporated both factors in designing the reward function
The other list is the ranked list of source code files based on their shown below.
relevance to the bug report R. Initially (at State1 ), the candidate M ∗ f ile relevance
(S, A) = ; if A is
list is full, and the ranked list is empty. In each state transition, log2 (t + 1) ∗ distance(s)
the model moves one embedding from the candidate list to an action that has not been selected bef ore
the ranked list based on their probability of being responsible
for a bug. In the final state, the ranked list will be full, and (1)
the candidate list will be empty. We describe the process of (S, A) = − log2 (t + 1); otherwise (2)
selecting and ranking a file in detail in the next step. distance(S) = Avg.(Distance between currently
Actions: We define Actions in our MDP as selecting a file from picked subsequent related f iles) (3)
the candidate list and moving it to the ranked list. Suppose at
the timestep t; the RL model picks the embedding E1 , then the In Equations 1 and 2, t is the timestamp, S is State and A
rank of that particular file will be t. In Fig. 1 at the timestamp 1, is Action. Mean reciprocal rank (MRR) measures the aver-
the model picks concatenated embedding of file F2 . Thus, the age reciprocal rank of all the relevant files. In Equation 1,
f ile relevance
rank of F2 will be 1. As in each timestamp, we are moving one log2 (t+1) represents the MRR. The use of a logarithmic
file from the candidate list to the ranked list; the total number function in the equation is motivated by previous studies [52],
of files will be equal to the number of states and the number [53], which found that it leads to a stable loss. When the relevant
of actions. For identifying the potentially best action at any files are ranked higher, the average precision tends to be higher.
timestamp t, we use a deep learning (DL) model (indicated as To encourage the reinforcement learning system to rank relevant
Ranking Model in Fig. 1), which is composed of a Convolu- files higher, we introduce a punishment mechanism if there is
tional Neural Network (CNN) followed by a Long Short-Term a greater distance between two relevant files. By imposing this
Memory (LSTM) [46]. Following [47], [48], [49], we use CNN punishment on the agent, we incentivize it to prioritize relevant
to establish the connection between source code files and bug files in higher ranks, which in turn contributes to the Mean
reports and extract relevant features. As mentioned earlier, de- Average Precision (MAP).
velopers acquire the ability to recognize cues and subsequently We illustrate the reward functions with an example below.
employ them to establish the association between source code Presuming that the process reaches State S6 and the currently
files and bug reports. The CNN facilitates the second stage of picked concatenated embeddings are E1 , E2 , E3 , E4 , E5 , E6
bug localization, which involves extracting important features. and their relevancy to the bug report is 0, 0, 1, 0, 1, 1. This
The input of the CNN is the concatenated embedding of both means that these embeddings (or files) ranked in the 3rd , 5th ,
bug reports and each source code file, and the output of CNN and 6th positions are relevant to the bug report. The position
is extracted features from the combined embedding of bug of the relevant files are 3, 5, 6, and the distance between them
reports and source code files. The features are later used to is 1, 0. Hence, distance(S6 ) = Avg.1, 0 = 0.5. If the agent
calculate relevance. picks a new relevant file, we reward the agent M times the
On the other hand, LSTM [50] intends to make the model reciprocal rank of the file divided by the distance between the
aware of a restriction, which we call state awareness. That is, already picked related files. In our example, the last picked
in each timestamp, the model is allowed to pick the potentially file, E6 ’s relevancy is 1. Thus, we have the following val-
best embedding that has not been picked yet, i.e., if a file is ues for Equation 1: distance(S6 ) = 0.5; log2 (6 + 1) = 2.8074;
selected at Statei , it cannot be selected again in a later state f ile relevance = 1. Note that M is a hyper-parameter. We find
(i.e., Statei+j ; j ≥ 1). The LSTM retains the state and aids the that three as the value of M results in the highest reward for
RL agent in choosing a subsequent action that does not conflict our RL model. We identify the best value for M by experi-
with prior actions. Thus, following previous studies [50], [51], menting with different values (1, 3, 6, and 9). Fig. 2 shows
we use an LSTM to make the model aware of previous actions. the resulting reward-episode graph using different values of M .
The LSTM takes a set of feature vectors as input and outputs Hence, given M = 3, the value of the reward function will be
the id of the source code file most suitable for the current state. (S, A) = 2.8074∗0.5
3∗1
= 2.14. The reward can vary between M
Transition: τ (S, A) is a function τ : S × A → S which maps to ∼ 0. A higher value of the reward function indicates a better
a state st into a new state st+1 in response to the selected action of the model. Finally, in the case of optimal ranking, the
action at . Choosing an action at means removing a file from distance(S) will be zero. We handle this case by assigning a
the candidate list and placing it in the ranked list. value of 1 for distance(S). Even though we are using MRR
Reward: A reward is a value provided to the RL agent as and MAP as optimization goals we do not require labeled
feedback on their action. We refer to a reward received from one data. Instead, it learns from developers’ feedback. It presents
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2701
TABLE II
DATASET STATISTICS
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2702 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
for training and testing, respectively. Unlike previous studies least one buggy source appears among the top K positions
[62] that used a 60:20:20 split, we repurpose validation data in the ranked list generated by the bug localization tool.
for testing to shorten the training duration. Following previous studies (e.g., [48], [49]), we consider
three values of K: 1, 5, and 10.
B. Evaluation Measures
VI. RLOCATOR PERFORMANCE
The dataset proposed by Ye et al. [57] provides ground truth
associated with each bug report. The ground truth contains the We evaluate RLocator on the hold-out dataset using the
path of the file in the project repository that has been modified metrics described in Section V-B. As there has been no
to fix a particular bug. To evaluate RLocator performance, we RL-based bug localization tool, we compare RLocator with
use the ground truth and analyze the experimental results based three state-of-the-art bug localization tools: BugLocator, FLIM,
on three criteria, which are widely adopted in bug localization and BL-GAN.
studies [3], [5], [6], [7], [8]. A short description of the approaches is presented below.
• Mean Reciprocal Rank (MRR): To identify the average • BugLocator [3]: an IR-based tool that utilizes a vector
rank of the relevant file in the retrieved files set, we adopted space model to identify the potentially responsible source
the Mean Reciprocal Rank. MRR is the average recipro- code files by estimating the similarity between source code
cal rank of the source code files for all the bug reports. file and bug report.
We present the equation for calculating MRR below, where • FLIM [18]: a deep-learning-based model that utilizes a
A is the set of bug reports. large language model like CodeBERT.
• BL-GAN [7]: uses generative adversarial strategy to train
1 1 an attention-based transformer model.
M RR =
|A| Least rank of the relevant f iles We use the original implementations to assess the perfor-
A
(4) mance of BugLocator [3] and FLIM [18]. Additionally, we fine-
tune a CodeBERT [28] model as a baseline to demonstrate
Suppose we have two bug reports, report1 and report2 .
the benefits of using reinforcement learning. For tools like
For each bug report, the bug localization model will rank
CAST [5], KGBugLocator [6], and BL-GAN [7], which lack
six files. For report1 the ground truth of the retrieved files
replication packages, we refer to their respective studies. These
are [0, 0, 1, 0, 1, 0] and for report2 the ground truth of the
studies show that KGBugLocator outperforms CAST, and BL-
retrieved files are [1, 0, 0, 0, 0, 1]. In this case, the least rank
GAN outperforms KGBugLocator. Consequently, we replicate
of relevant files is 3 and 1, respectively, for report1 and
BL-GAN based on its study descriptions.
report2 . Now, the M RR = 12 ( 13 + 11 ) = 0.67.
Regarding FBL-BERT [8], a recent technique, we do not
• Mean Average Precision (MAP): To consider the case
compare it with RLocator. This is because FBL-BERT per-
where a bug is associated with multiple source code files,
forms bug localization at the changeset level, and applying it
we adopted Mean Average Precision. It provides a measure
to our file-level dataset would disadvantage FBL-BERT, as it is
of the quality of the retrieval [3], [63]. MRR considers
designed for shorter documents. Therefore, comparing it with
only the best rank of relevant files; on the contrary, MAP
RLocator would be unfair.
considers the rank of all the relevant files in the retrieved
Furthermore, other studies, such as DeepLoc [62], bjXnet
files list. Thus, MAP is more descriptive and unbiased
[64], CAST [5], KGBugLocator [6], and Cheng et al. [65],
than MRR. Precision means how noisy the retrieval is.
also propose deep learning-based approaches but do not provide
If we calculate the precision on the first two retrieved
replication packages. Although these studies evaluate similar
files, we will get precision@2. For calculating average
projects, the lack of available code or pre-trained models pre-
precision, we have to figure precision@1, precision@2,...
vents further comparison. However, to ensure comprehensive
precision@k, and then we have to average the precision at
information, we include a table in our online appendix [39]
different points. After calculating the average precision for
displaying their performance alongside RLocator.
each bug report, we have to find the mean of the average
precision to calculate the MAP.
A. Retrieval Performance
1
M AP = AvgP recision(Reporti ) (5) We use k=31 relevant files in RLocator, allowing us to rerank
|A|
A files for 91% of the bug reports. Table III shows RLocator’s
We show the MAP calculation for the previous example performance on 91% and 100% of the data. RLocator is not
of two bug reports. The Average precision for report1 and designed for 100% data as it cannot rerank files if no relevant
report2 will be 0.37 and 0.67. So, the M AP = 12 (0.36 + files are in the top k files. For such cases, we estimate perfor-
0.67) = 0.52. mance assuming zero contribution, providing a lower bound for
• Top K: For fare comparison with prior studies [48], [49] RLocator’s effectiveness. This conservative approach ensures
and to present a straightforward understanding of perfor- we do not overestimate the technique’s effectiveness. Table III
mance we calculate Top K. Top K measures the over- showcases that RLocator achieves better performance
all ranking performance of the bug localization model. than BugLocator and FLIM in both MRR and MAP
It indicates the percentage of bug reports for which at across all studied projects when using the 91% data.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2703
TABLE III
RLOCATOR PERFORMANCE
AspectJ BugLocator 0.36 0.28 0.50 0.45 0.56 0.51 0.33 0.31 0.49 0.48
FLIM 0.51 0.36 0.65 0.60 0.72 0.67 0.41 0.35 0.47 0.45
CodeBERT 0.4 0.35 0.59 0.55 0.65 0.61 0.49 0.39 0.51 0.44
BL-GAN 0.41 0.38 0.6 0.55 0.71 0.65 0.33 0.31 0.42 0.39
RLocator 0.65 0.25 0.46 0.41 0.53 0.48 0.47 0.38 0.49 0.41
Birt BugLocator 0.61 0.15 0.27 0.21 0.34 0.29 0.30 0.30 0.39 0.38
FLIM 0.49 0.18 0.39 0.34 0.47 0.42 0.29 0.25 0.31 0.28
CodeBERT 0.33 0.22 0.39 0.35 0.46 0.43 0.41 0.33 0.42 0.35
BL-GAN 0.17 0.16 0.33 0.3 0.46 0.42 0.32 0.29 0.4 0.37
RLocator 0.45 0.37 0.69 0.63 0.78 0.73 0.54 0.42 0.59 0.50
Eclipse Platform UI BugLocator 0.45 0.33 0.54 0.49 0.63 0.58 0.29 0.30 0.38 0.35
FLIM 0.48 0.41 0.72 0.67 0.80 0.75 0.51 0.48 0.52 0.53
CodeBERT 0.39 0.32 0.6 0.55 0.68 0.62 0.47 0.36 0.52 0.44
BL-GAN 0.34 0.31 0.53 0.49 0.66 0.61 0.32 0.3 0.4 0.36
RLocator 0.44 0.33 0.67 0.61 0.78 0.75 0.51 0.44 0.53 0.45
JDT BugLocator 0.34 0.21 0.51 0.45 0.60 0.55 0.22 0.20 0.31 0.28
FLIM 0.40 0.35 0.65 0.60 0.82 0.77 0.42 0.41 0.51 0.49
CodeBERT 0.38 0.29 0.59 0.54 0.68 0.66 0.44 0.38 0.46 0.39
BL-GAN 0.3 0.27 0.53 0.48 0.64 0.59 0.35 0.32 0.44 0.41
RLocator 0.40 0.30 0.57 0.51 0.63 0.58 0.48 0.42 0.51 0.44
SWT BugLocator 0.37 0.25 0.50 0.45 0.56 0.51 0.42 0.40 0.46 0.43
FLIM 0.51 0.37 0.70 0.65 0.83 0.78 0.43 0.43 0.48 0.50
CodeBERT 0.34 0.27 0.5 0.45 0.54 0.51 0.42 0.37 0.45 0.39
BL-GAN 0.31 0.29 0.53 0.48 0.6 0.55 0.37 0.34 0.44 0.4
RLocator 0.46 0.39 0.61 0.55 0.73 0.68 0.59 0.47 0.62 0.51
Tomcat BugLocator 0.40 0.29 0.43 0.38 0.55 0.50 0.31 0.27 0.37 0.35
FLIM 0.51 0.42 0.70 0.65 0.76 0.71 0.52 0.47 0.59 0.60
CodeBERT 0.39 0.34 0.53 0.49 0.62 0.6 0.51 0.41 0.53 0.44
BL-GAN 0.38 0.35 0.61 0.55 0.65 0.61 0.43 0.4 0.55 0.5
On 91% data, RLocator outperforms FLIM by 5.56-38.3% in The results point out that RLocator outperforms BL-GAN
MAP and 3.77-36.73% in MRR. Regarding Top K, the perfor- across all the metrics in 91% settings. Specifically, in TopK,
mance improvement is up to 23.68%, 15.22%, and 11.32% in RLocator achieved better performance than BL-GAN, ranging
terms of Top 1, Top 5, and Top 10, respectively. Compared to from 3.33% to 55.26%. The performance gain is 40.74% and
BugLocator, RLocator achieves performance improvement of 32.2% for MAP and MRR, respectively.
12.5-56.86% and 9.8%-41.51%, in terms of MAP and MRR, Compared to the CodeBERT model trained as a classifier
respectively. Regarding Top K, the performance improvement (CodeBERT), RLocator achieves better performance across all
is up to 26.32%, 41.3%, and 35.85% in terms of Top 1, Top the metrics. CodeBERT model archives consistently lower per-
5, and Top 10, respectively. The results indicate that RLoca- formance across all the metrics. The performance drops up to
tor consistently outperforms BL-GAN in 91% settings across 17.65%, 15.63%, 17.95%, 17.14%, and 16.67% for Top1, Top5,
all metrics. Specifically, in the TopK measurements, RLoca- Top 10, MAP, and MRR, respectively.
tor’s performance exceeded that of BL-GAN, with improve- When we consider 100% of the data, RLocator has better
ments ranging from 55.26% to 3.33%. Additionally, RLocator MAP results than FLIM in three out of the six projects (AspectJ,
achieved performance gains of 40.74% in MAP and 32.2% in Birt, and JDT) by 6.82-34.21%, equal to FLIM in one project
MRR, respectively. (Tomcat) and worse than FLIM in 2 projects (Eclipse Platform
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2704 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2705
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2706 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
to learn from developers’ actions. Xie et al. [83] employed four days on an Nvidia V100 16GB GPU. The uniform out-
GAN to create failing test cases, addressing the data imbalance comes across these projects indicate that similar results could
issue within fault localization methods. White et al. [84] utilized be expected in the remaining projects. Additionally, due to the
reinforcement learning for fault localization in distributed net- absence of a replication package, we replicated BL-GAN based
works. Nonetheless, their approach involves the reinforcement on its description in the original study, which may lead to slight
learning agent understanding how to interact with the network performance deviations. Nevertheless, after experimenting with
which is different from our approach. In our approach, the agent various hyperparameters, we selected a set that achieves com-
learns to localize bugs from developers’ activity. Rezapour et al. parable performance to that reported in the original study.
[85] have provided a thorough exploration of reinforcement
learning’s application in fault localization within power sys- Construct Validity. Finally, our evaluation measures might
tems. However, their discussed approaches significantly differ be one threat to construct validity. The evaluation measures
from ours. In the context of power systems, the reinforcement may not completely reflect real-world situations. The threat is
learning agent can directly observe phenomenal components mitigated by the fact that the used evaluation measures are well-
(e.g., current, voltage) related to the environment. Moreover, known [3], [8], [18], [57] and best available to measure and
the agents are allowed to probe the environment by changing compare the performance of information retrieval-based bug
current or voltage. In contrast, bug localization entails more localization tools.
abstract phenomenal components in the environment (e.g., in-
teracting code blocks) and agents are not allowed to change any IX. CONCLUSION
code or execute the code. Other studies focused on associating
commits with bug reports [8], [86]. For example, FBL-BERT In this paper, we propose RLocator, a reinforcement learning-
[8] used CodeBERT embedding for estimating the similarity based (RL) technique to rank the source code files where the
between source code files and changesets of a commit. Based bug may reside, given the bug report. The key contribution
on the similarity, it ranks the suspicious commit. FLIM [18] also of our study is the formulation of the bug localization prob-
used CodeBERT embedding for estimating similarity. However, lem using the Markov Decision Process (MDP), which helps
FLIM works on function-level bug localization. us to optimize the evaluation measures directly. We evaluate
Our approach, RLocator, uses deep reinforcement learn- RLocator on 8,316 bug reports and find that RLocator per-
ing for bug localization, differing from previous similarity- forms better than the state-of-the-art techniques when using
based methods. By formulating the problem as a Markov MAP as an evaluation measure. Using 91% bug reports dataset,
Decision Process (MDP), we directly optimize evaluation mea- RLocator outperforms prior tools in all the project in terms
sures. Testing on a dataset of 8,316 projects from six popu- of both MAP and MRR. When using 100% data, RLocator
lar Apache projects, our results show significant performance outperforms all prior approaches in four of six projects using
improvement. MAP and two of the six projects using MRR. RLocator can be
used along with other bug localization approaches to improve
performance. Our results show that RL is a promising avenue
VIII. THREATS TO VALIDITY for future exploration when it comes to advancing state-of-the-
RLocator has a number of limitations as well. We identify art techniques for bug localization. Future research can explore
them and discuss how to overcome the limitations below. the application of advanced reinforcement learning algorithms
in bug localization. Additionally, researchers can investigate
Internal Validity. One limitation of our approach is we are not how training on larger datasets impacts the performance of tools
able to utilize 9% of our dataset due to the limitation of text- in low-similarity contexts.
based search. One may point out that we exclude the bug reports
where we do not perform well. But our the XGBoost model in DATA AVAILABILITY STATEMENT
our approach automatically identifies them and we say that we
To foster future research in the field, we make a replication
would rather not localize the source code files for these bug
package comprising our dataset and code are publicly avail-
reports than localize them incorrectly. Hence, developers need
able [39].
to rely on their manual analysis only for the 9%. Moreover, as
a measure of full transparency, we estimate the lower bound
of RLocator performance for the 100% data and show that the REFERENCES
difference is negligible. [1] T. D. LaToza and B. A. Myers, “Developers ask reachability questions,”
in Proc. 32nd ACM/IEEE Int. Conf. Softw. Eng. (ICSE), New York, NY,
USA: ACM, 2010, pp. 185–194.
External Validity. The primary concern for the external va- [2] J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug
lidity of the RLocator evaluation stems from its limitation to repository,” in Proc. OOPSLA Workshop Eclipse Technol. eXchange—
a small number of bugs in six varied, real-world open-source Eclipse, New York, NY, USA: ACM, 2005, pp. 35–39.
[3] J. Zhou, H. Zhang, and D. Lo, “Where should the bugs be fixed?
projects, potentially impacting its broad applicability. However, More accurate information retrieval-based bug localization based on bug
those projects are from different domains and used by prior reports,” in Proc. 34th Int. Conf. Softw. Eng. (ICSE), Piscataway, NJ,
studies [4], [5], [6], [9], [80], [87]. Furthermore, the A2C with- USA: IEEE Press, Jun. 2012, pp. 14–24.
[4] Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi, “Machine translation-based
out entropy model was only evaluated on three projects because bug localization technique for bridging lexical gap,” Inf. Softw. Technol.,
of the substantial resources required for training—taking about vol. 99, pp. 58–61, Jul. 2018.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
CHAKRABORTY et al.: RLOCATOR: REINFORCEMENT LEARNING FOR BUG LOCALIZATION 2707
[5] H. Liang, L. Sun, M. Wang, and Y. Yang, “Deep learning with cus- Lang. Technol., Minneapolis, Minnesota: Assoc. Comput. Linguistics,
tomized abstract syntax tree for bug localization,” IEEE Access, vol. 7, Jun. 2019, pp. 4171–4186.
pp. 116309–116320, 2019. [28] Z. Feng et al., “CodeBERT: A pre-trained model for programming and
[6] J. Zhang, R. Xie, W. Ye, Y. Zhang, and S. Zhang, “Exploiting code natural languages,” in Findings Assoc. Comput. Linguistics (EMNLP),
knowledge graph for bug localization via bi-directional attention,” in Nov. 2020, pp. 1536–1547.
Proc. 28th Int. Conf. Program Comprehension, New York, NY, USA: [29] J. Wang and Y. Dong, “Measurement of text similarity: A survey,”
ACM, Jul. 2020, pp. 219–229. Information, vol. 11, no. 9, p. 421, Aug. 2020.
[7] Z. Zhu, H. Tong, Y. Wang, and Y. Li, “BL-GAN: Semi-supervised bug [30] S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement
localization via generative adversarial network,” IEEE Trans. Knowl. learning without exploration,” in Proc. 36th Int. Conf. Mach. Learn.,
Data Eng., vol. 35, no. 11, pp. 11112–11125, Nov. 2023. vol. 97, K. Chaudhuri and R. Salakhutdinov, Eds., PMLR, Jun. 2019,
[8] A. Ciborowska and K. Damevski, “Fast changeset-based bug localization pp. 2052–2062.
with BERT,” in Proc. 44th Int. Conf. Softw. Eng., New York, NY, USA: [31] T. T. Nguyen and V. J. Reddi, “Deep reinforcement learning for
ACM, May 2022, pp. 946–957. cyber security,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8,
[9] A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, “Combining pp. 3779–3795, Aug. 2023.
deep learning with information retrieval to localize buggy files for bug [32] C. Liu, X. Xia, D. Lo, Z. Liu, A. E. Hassan, and S. Li, “CodeMatcher:
reports (N),” in Proc. 30th IEEE/ACM Int. Conf. Automated Softw. Eng. Searching code based on sequential semantics of important query
(ASE), Piscataway, NJ, USA: IEEE Press, Nov. 2015, pp. 476–481. words,” ACM Trans. Softw. Eng. Methodol., vol. 31, no. 1, pp. 1–37,
[10] Z. Wei, J. Xu, Y. Lan, J. Guo, and X. Cheng, “Reinforcement learning to Jan. 2022.
rank with Markov decision process,” in Proc. 40th Int. ACM SIGIR Conf. [33] T. Chen and C. Guestrin, “XGBoost,” in Proc. 22nd ACM SIGKDD
Res. Develop. Inf. Retrieval, New York, NY, USA: ACM, Aug. 2017, Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA: ACM,
pp. 945–948. Aug. 2016, pp. 9129–9149.
[11] O. Alejo, J. M. Fernandez-Luna, J. F. Huete, and R. Perez-Vazquez, [34] F. Fang, J. Wu, Y. Li, X. Ye, W. Aljedaani, and M. W. Mkaouer, “On the
“Direct optimization of evaluation measures in learning to rank using classification of bug reports to improve bug localization,” Soft Comput.,
particle swarm,” in Proc. Workshops Database Expert Syst. Appl., vol. 25, no. 11, pp. 7307–7323, Mar. 2021.
Piscataway, NJ, USA: IEEE Press, Aug. 2010, pp. 42–46. [35] Y. Lv and C. Zhai, “When documents are very long, BM25 fails!” in
[12] J. Xu, L. Xia, Y. Lan, J. Guo, and X. Cheng, “Directly optimize diversity Proc. 34th Int. ACM SIGIR Conf. Res. Develop. Inf. (SIGIR), New York,
evaluation measures,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 3, NY, USA: ACM, 2011, pp. 1103–1104.
pp. 1–26, Jan. 2017. [36] D. D. Lewis, “Naive (Bayes) at forty: The independence assumption
[13] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, “A support vector in information retrieval,” in Proc. Mach. Learn. (ECML-98), Berlin,
method for optimizing average precision,” in Proc. 30th Annu. Int. ACM Heidelberg: Springer Berlin Heidelberg, 1998, pp. 4–15.
SIGIR Conf. Res. Develop. Inf. Retrieval, New York, NY, USA: ACM, [37] A. Schroter, A. Schröter, N. Bettenburg, and R. Premraj, “Do stack
Jul. 2007, pp. 271–278. traces help developers fix bugs?” in Proc. 7th IEEE Work. Conf. Mining
[14] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Softw. Repositories (MSR), Piscataway, NJ, USA: IEEE Press, May
Cambridge, MA, USA: A Bradford Book, 2018. 2010, pp. 118–121.
[38] Y. Lv and C. Zhai, “Lower-bounding term frequency normalization,” in
[15] F. Garcia and E. Rachelson, “Markov decision processes,” in Markov
Proc. 20th ACM Int. Conf. Inf. Knowl. Manage. (CIKM), New York,
Decision Processes in Artificial Intellelligence. Hoboken, NJ, USA:
NY, USA: ACM, 2011, pp. 7–16.
Wiley, Mar. 2013, pp. 1–38.
[39] “Rlocator: Reinforcement learning for bug localization.” Accessed: May
[16] J. Fan, C. Xiao, Y. Gdi, “Rethinking what makes reinforcement learning
23, 2024. [Online]. Available: https://zenodo.org/record/7591879
different from supervised learning,” Jun. 2021, arXiv:2106.06232.
[40] M. Bagherzadeh, N. Kahani, and L. Briand, “Reinforcement learning
[17] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in
for test case prioritization,” IEEE Trans. Softw. Eng., vol. 48, no. 8,
robotics: A survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274,
pp. 2836–2856, Aug. 2022.
Aug. 2013. [41] Y. Wan et al., “Improving automatic source code summarization via deep
[18] H. Liang, D. Hang, and X. Li, “Modeling function-level interactions for reinforcement learning,” in Proc. 33rd ACM/IEEE Int. Conf. Automated
file-level bug localization,” Empirical Softw. Eng., vol. 27, no. 7, pp. Softw. Eng., New York, NY, USA: ACM, Sep. 2018, pp. 397–407.
186–212, Oct. 2022. [42] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural
[19] L. Maystre, D. Russo, and Y. Zhao, “Optimizing audio recommendations collaborative filtering,” in Proc. 26th Int. Conf. World Wide Web. Int.
for the long-term: A reinforcement learning perspective,” Feb. 2023, World Wide Web Conf. Steering Committee, Apr. 2017, pp. 173–182.
arXiv:2302.03561. [43] H. Zhang, Y. Yang, H. Luan, S. Yang, and T.-S. Chua, “Start from
[20] M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E. Chi, “Top-K scratch,” in Proc. 22nd ACM Int. Conf. Multimedia, New York, NY,
off-policy correction for a REINFORCE recommender system,” in Proc. USA: ACM, Nov. 2014, pp. 187–196.
12th ACM Int. Conf. Web Search Data Mining (WSDM ’19), New York, [44] R. Zhu, X. Tu, and J. X. Huang, “Deep learning on information retrieval
NY, USA: Association for Computing Machinery, 2019, pp. 456–464. and its applications,” in Deep Learning for Data Analytics, Washington,
[21] C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in USA: Elsevier, 2020, pp. 125–153.
healthcare: A survey,” ACM Comput. Surv., vol. 55, no. 1, pp. 1–36, [45] O. Khattab and M. Zaharia, “ColBERT,” in Proc. 43rd Int. ACM
Nov. 2021. SIGIR Conf. Res. Develop. Inf. Retrieval, New York, NY, USA: ACM,
[22] E. Winter et al., “How do developers really feel about bug fixing? Jul. 2020, pp. 39–48.
Directions for automatic program repair,” IEEE Trans. Softw. Eng., [46] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
vol. 49, no. 4, pp. 1823–1841, Apr. 2023. Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[23] S. Wang, T. Liu, and L. Tan, “Automatically learning semantic features [47] L. Pang, Y. Lan, J. Guo, J. Xu, J. Xu, and X. Cheng, “DeepRank,” in
for defect prediction,” in Proc. 38th Int. Conf. Softw. Eng., Piscataway, Proc. ACM Conf. Inf. Knowl. Manage., New York, NY, USA: ACM,
NJ, USA: ACM, May 2016, pp. 297–308. Nov. 2017, pp. 257–266.
[24] N. Miryeganeh, S. Hashtroudi, and H. Hemmati, “GloBug: Using [48] X. Huo, F. Thung, M. Li, D. Lo, and S.-T. Shi, “Deep transfer bug
global data in fault localization,” J. Syst. Softw., vol. 177, Jul. 2021, localization,” IEEE Trans. Softw. Eng., vol. 47, no. 7, pp. 1368–1380,
Art. no. 110961. Jul. 2021.
[25] Y. Kim, M. Kim, and E. Lee, “Feature combination to alleviate hubness [49] X. Huo, M. Li, and Z.-H. Zhou, “Learning unified features from natural
problem of source code representation for bug localization,” in Proc. and programming languages for locating buggy source code,” in Proc.
27th Asia-Pacific Softw. Eng. Conf. (APSEC), Piscataway, NJ, USA: Int. Joint Conf. Artif. Intell. (IJCAI), 2016, pp. 166–1612.
IEEE Press, Dec. 2020, pp. 511–512. [50] M. J. Hausknecht and P. Stone, “Deep recurrent q-learning for partially
[26] L. Chen, Z. Tang, and G. H. Yang, “Balancing reinforcement learning observable MDPs,” 2015, arXiv:1507.06527.
training experiences in interactive information retrieval,” in Proc. 43rd [51] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural
Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, New York, NY, USA: combinatorial optimization with reinforcement learning,” 2016.
ACM, Jul. 2020, pp. 1525–1528. [52] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
[27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- policy maximum entropy deep reinforcement learning with a stochastic
training of deep bidirectional transformers for language understanding,” actor,” in Proc. 35th Int. Conf. Mach. Learn., J. Dy and A. Krause, Eds.,
in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Human PMLR, vol. 8, Jul. 2018, pp. 1861–1870.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.
2708 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 10, OCTOBER 2024
[53] C. Wang, C. Xu, X. Yao, and D. Tao, “Evolutionary generative adversar- Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA: ACM,
ial networks,” IEEE Trans. Evol. Comput., vol. 23, no. 6, pp. 921–934, Aug. 2021.
Dec. 2019. [76] Y. Kim, S. Mun, S. Yoo, and M. Kim, “Precise learn-to-rank fault
[54] E. Rabinovich, M. Vetzler, S. Ackerman, and A. Anaby Tavor, “Reliable localization using dynamic and static features of target programs,” ACM
and interpretable drift detection in streams of short texts,” in Proc. 61st Trans. Softw. Eng. Methodol., vol. 28, no. 4, pp. 1–34, Oct. 2019.
Annu. Meeting Assoc. Comput. Linguistics (Volume 5: Industry Track), [77] M. M. Rahman, F. Khomh, S. Yeasmin, and C. K. Roy, “The forgotten
Assoc. Comput. Linguistics, 2023. role of search queries in IR-based bug localization: An empirical study,”
[55] M. R. Islam and M. F. Zibran, “What changes in where?: An empirical Empirical Softw. Eng., vol. 26, no. 6, Aug. 2021.
study of bug-fixing change patterns,” ACM SIGAPP Appl. Comput. Rev., [78] J. M. Florez, O. Chaparro, C. Treude, and A. Marcus, “Combining query
vol. 20, no. 4, pp. 18–34, Jan. 2021. reduction and expansion for text-retrieval-based bug localization,” in
[56] W. Aljedaani and Y. Javed, Bug Reports Evolution in Open Source Proc. IEEE Int. Conf. Softw. Anal., Evol. Reeng. (SANER), Piscataway,
Systems. Springer Int. Publishing, 2018, pp. 63–73. NJ, USA: IEEE Press, Mar. 2021, pp. 166–176.
[57] X. Ye, R. Bunescu, and C. Liu, “Learning to rank relevant files for bug [79] Y. Li, S. Wang, T. N. Nguyen, and S. V. Nguyen, “Improving bug detec-
reports using domain knowledge,” in Proc. 22nd ACM SIGSOFT Int. tion via context-based code representation learning and attention-based
Symp. Found. Softw. Eng. (FSE), New York, NY, USA: ACM, 2014. neural networks,” Proc. ACM Program. Lang., vol. 3, no. OOPSLA,
[58] J. Lee, D. Kim, T. F. Bissyandé, W. Jung, and Y. L. Traon, “Bench4bl: pp. 1–30, Oct. 2019.
Reproducibility study on the performance of IR-based bug localization,” [80] Z. Zhu, Y. Li, Y. Wang, Y. Wang, and H. Tong, “A deep multimodal
in Proc. 27th ACM SIGSOFT Int. Symp. Softw. Testing Anal., New York, model for bug localization,” Data Mining Knowl. Discovery, Apr. 2021.
NY, USA: ACM, Jul. 2018, pp. 61–72. [81] M. E. Peters et al., “Deep contextualized word representations,” CoRR,
[59] B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk, “Feature location in 2018, arXiv:1802.05365.
source code: A taxonomy and survey,” J. Softw., Evol. Process, vol. 25, [82] R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs:
no. 1, pp. 53–95, Nov. 2011. A framework for temporal abstraction in reinforcement learning,” Artif.
[60] L. Moreno et al., “Query-based configuration of text retrieval solutions Intell., vol. 112, nos. 1–2, pp. 181–211, Aug. 1999.
for software engineering tasks,” in Proc. 2015 10th Joint Meeting Found. [83] H. Xie, Y. Lei, M. Yan, Y. Yu, X. Xia, and X. Mao, “A universal data
Softw. Eng., New York, NY, USA: ACM, Aug. 2015. augmentation approach for fault localization,” in Proc. 44th Int. Conf.
[61] B. Sisman and A. C. Kak, “Assisting code search with automatic query Softw. Eng., New York, NY, USA: ACM, May 2022, pp. 48–60.
reformulation for bug localization,” in Proc. 10th Work. Conf. Mining [84] T. White and B. Pagurek, “Distributed fault location in networks using
Softw. Repositories (MSR), Piscataway, NJ, USA: IEEE Press, May learning mobile agents,” in Approaches to Intelligence Agents. Berlin
2013, pp. 309–318. Heidelberg: Springer Berlin Heidelberg, 1999, pp. 182–196.
[62] Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi, “Improving bug localization [85] H. Rezapour, S. Jamali, and A. Bahmanyar, “Review on artificial
with word embedding and enhanced convolutional neural networks,” Inf. intelligence-based fault location methods in power distribution net-
Softw. Technol., vol. 105, pp. 17–29, Jan. 2019. works,” Energies, vol. 16, no. 12, Jun. 2023, Art. no. 4636.
[63] M. N. Schwarz and A. Flammer, “Text structure and title—Effects on [86] C. Ni, W. Wang, K. Yang, X. Xia, K. Liu, and D. Lo, “The best of
comprehension and recall,” J. Verbal Learn. Verbal Behav., vol. 20, no. 1, both worlds: Integrating semantic features with expert features for defect
pp. 61–66, Feb. 1981. prediction and localization,” in Proc. 30th ACM Joint Eur. Softw. Eng.
[64] J. Han, C. Huang, S. Sun, Z. Liu, and J. Liu, “bjXnet: An improved Conf. Symp. Found. Softw. Eng., New York, NY, USA: ACM, Nov. 2022.
bug localization model based on code property graph and attention [87] B. Wang, L. Xu, M. Yan, C. Liu, and L. Liu, “Multi-dimension
mechanism,” Automated Softw. Eng., vol. 30, no. 1, Mar. 2023. convolutional neural network for bug localization,” IEEE Trans. Services
[65] S. Cheng, X. Yan, and A. A. Khan, “A similarity integration method Comput., vol. 15, no. 3, pp. 1649–1663, May/Jun. 2022.
based information retrieval and word embedding in bug localization,” in
Proc. 20th IEEE Int. Conf. Softw. Qual., Rel. Secur. (QRS), Piscataway,
NJ, USA: IEEE Press, Dec. 2020, pp. 180–187. Partha Chakraborty (Student Member, IEEE) is
[66] M. Soltani, F. Hermans, and T. Bäck, “The significance of bug report currently working toward the Ph.D. degree with
elements,” Empirical Softw. Eng., vol. 25, no. 6, pp. 5255–5294, David R. Cheriton School of Computer Science,
Sep. 2020. University of Waterloo, Canada. His research in-
[67] Z. Ahmed, N. Le Roux, M. Norouzi, and D. Schuurmans, “Understand- terests include bug localization, vulnerability detec-
tion, and the use of machine learning techniques
ing the impact of entropy on policy optimization,” in Proc. 36th Int.
in software engineering. For more information, see
Conf. Mach. Learn., K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97,
https://parthac.me/.
PMLR, Jun. 2019, pp. 151–160.
[68] S. Jang and H.-I. Kim, “Entropy-aware model initialization for effective
exploration in deep reinforcement learning,” Sensors, vol. 22, no. 15,
Aug. 2022, Art. no. 5845.
[69] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,”
in Proc. 33rd Int. Conf. Mach. Learn., M. F. Balcan and K. Q. Mahmoud Alfadel (Member, IEEE) is an Assis-
Weinberger, Eds., vol. 48, New York, NY, USA: PMLR, Jun. 2016, tant Professor with the Department of Computer
pp. 1928–1937. Science, University of Calgary. His research inter-
[70] M. Böhme, E. O. Soremekun, S. Chattopadhyay, E. Ugherughe, and ests include mining software repositories, software
A. Zeller, “Where is the bug and how is it fixed? An experiment with ecosystems, open-source security, and release engi-
neering.
practitioners,” in Proc. 11th Joint Meeting Found. Softw. Eng., New York,
NY, USA: ACM, Aug. 2017.
[71] T. D. Sasso, A. Mocci, and M. Lanza, “What makes a satisficing
bug report?” in Proc. IEEE Int. Conf. Softw. Qual., Rel. Secur. (QRS),
Piscataway, NJ, USA: IEEE Press, Aug. 2016, pp. 164–174.
[72] N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T.
Zimmermann, “What makes a good bug report?” in Proc. 16th ACM
SIGSOFT Int. Symp. Found. Softw. Eng., New York, NY, USA: ACM, Meiyappan Nagappan is an Associate Professor
Nov. 2008. with David R. Cheriton School of Computer Sci-
[73] T. Zimmermann, R. Premraj, N. Bettenburg, S. Just, A. Schroter, and ence, University of Waterloo. He has worked on
C. Weiss, “What makes a good bug report?” IEEE Trans. Softw. Eng., empirical software engineering to address software
vol. 36, no. 5, pp. 618–643, Sep. 2010. development concerns and currently researches the
[74] B. Vancsics, F. Horváth, A. Szatmári, and Á. Beszédes, “Fault localiza- impact of large language models on software
tion using function call frequencies,” J. Syst. Softw., vol. 193, Nov. 2022, development.
Art. no. 111429.
[75] Y. Lou et al., “Boosting coverage-based fault localization via graph-
based representation learning,” in Proc. 29th ACM Joint Meeting Eur.
Authorized licensed use limited to: Nanjing University. Downloaded on December 16,2024 at 14:18:38 UTC from IEEE Xplore. Restrictions apply.