0% found this document useful (0 votes)
21 views5 pages

Bug Reports Priortisation 5 Page

Uploaded by

sinhaaman777777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Bug Reports Priortisation 5 Page

Uploaded by

sinhaaman777777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2013 12th International Conference on Machine Learning and Applications

Bug Reports Prioritization: Which Features and


Classifier to Use?

Mamdouh Alenezi Shadi Banitaan


College of Computer & Information Sciences Department of Mathematics, Computer
Prince Sultan University Science and Software Engineering
Riyadh 11586, Saudi Arabia University of Detroit Mercy
enezi.fi[email protected] Detroit, MI 48221, USA
[email protected]

Abstract—Large open source bug tracking systems receives the highly important bugs first. The values of this field range
large number of bug reports daily. Managing these huge numbers from P1 to P5 where P1 represents the highest priority while
of incoming bug reports is a challenging task. Dealing with these P5 represents the lowest priority. Bug prioritization process
reports manually consumes time and resources which leads to usually performed manually which makes it error-prone and
delaying the resolution of important bugs which are crucial labor intensive. It relies heavily on the triager judgment and
and need to be identified and resolved earlier. Bug triaging is
experience. Many bug reports have been assigned incorrect
an important process in software maintenance. Some bugs are
important and need to be fixed right away, whereas others are priority levels and many of them are usually left blank since
minor and their fixes could be postponed until resources are bug prioritization needs a deep knowledge of bug reports.
available. Most automatic bug assignment approaches do not Wrong assignments of priority levels may lead to utilizing
take the priority of bug reports in their consideration. Assigning resources ineffectively (e.g., wasting time and effort by fixing
bug reports based on their priority may play an important unimportant bugs first). In this work, we present an approach to
role in enhancing the bug triaging process. In this paper, we solve the aforementioned problems by automatically prioritize
present an approach to predict the priority of a reported bug bug reports.
using different machine learning algorithms namely Naive Bayes,
Decision Trees, and Random Forest. We also investigate the Machine learning techniques such as Naive Bayes and
effect of using two feature sets on the classification accuracy. Support Vector Machines are used to build predictive models
We conduct experimental evaluation using open-source projects to categorize instances into different class labels based on
namely Eclipse and Firefox. The experimental evaluation shows historical data. These techniques have been previously used to
that the proposed approach is feasible in predicting the priority automate the bug triaging process [1], [2], [3]. Even though,
of bug reports. It also shows that feature-set-2 outperforms little work has been done to predict other characteristics such
feature-set-1. Moreover, both Random Forests and Decision Trees
as severity and priority of bug reports. In this paper, we
outperform Naive Bayes.
investigate whether we can accurately predict the priority of
a reported bug by using several features such as the textual
Keywords—bug triaging, text classification, predictive model, description of bug reports or other meta-data features such as
bug priority the severity level.

I. I NTRODUCTION The contributions of this paper include the following:

Most open source software projects contain a bug tracking • We investigate the effectiveness of applying several
system (BTS) to collect and manage bug reports. BTS allows machine learning techniques namely Naive Bayes, De-
users from different geographical areas to report their error cision Trees, and Random Forests on the classification
findings in a unified environment. BTS helps developers to performance.
track and communicate about bug reports and development • We evaluate the impacts of using different feature sets
issues which results in fixing bug reports in reasonable time. to build the predictive model. The first feature set is
These bug reports are usually used to guide several software based on the textual contents of bug reports while the
maintenance activities in order to produce more reliable soft- second feature set is based on meta data information
ware systems. One of the important software maintenance of bug reports.
activities is bug triaging. The triager examine the new filed
bug report to determine if it is valid and assigns a potential • We conduct experimental evaluation using two bug
developer to fix it. reports datasets namely Eclipse and Firefox obtained
from open source projects.
When a bug tracking system receives a new filed bug
report, the triager makes decisions about several characteris- The rest of the paper is organized as follows: Section
tics of bug reports such as priority and severity levels. The II presents some background information about bug reports.
bug priority level indicates the importance of that bug from Section III describes the proposed approach. The experimental
business perspective. It gives an indication of the order in evaluation and discussion are presented in Section IV. Section
which bug reports should be fixed. Developers usually use VI discusses some threats to validity. Section V discusses
the value of this feature to prioritize their work by fixing related work and Section VII concludes the paper.

978-0-7695-5144-9/13 $31.00
$26.00 © 2013 IEEE 112
DOI 10.1109/ICMLA.2013.114
II. BACKGROUND NEW
We provide some necessary background information about
bug reports in Section II-A. The life-cycle of a bug report is
presented briefly in Section II-B.
ASSIGNED
A. Bug Report
Bug reports in Bugzilla consist of predefined fields, text
description, attachments and dependencies. Predefined fields RESOLVED
represent attributes of a bug. Some attributes are unchangeable
such as creation date and the reporter who filed the bug.
Other attributes maybe changed over bug lifetime such as
product, component, priority and severity. Some attributes REOPEN VERIFIED
maybe frequently modified such as the assignee, the current
state and the final resolution. The text description of a bug
report refers to the natural language contents, including the
title of the bug report and a full description of the bug. Figure CLOSED

1 shows an example of a bug report.

Fig. 2. Bug Report Life-cycle.

A. Feature-Set-1
In this set, we only use the textual description of bug
reports. The textual description of a bug report is available in
two fields namely summary (title) and description. We consider
the summary only as the textual data since the description
of bug reports holds many terms that are unrelated to the
functionality of bug reports [5].
Fig. 1. An example of a bug report. The summary of bug reports is unstructured data that needs
a pre-processing step in order to convert it into structured data.
Therefore, we apply the traditional text processing approach
B. Bug Life-cycle
to transform the text data into a meaningful representation as
There are different states inwhich a bug report can experi- follows:
ence in its life-cycle. Figure 2 depicts the life-cycle of bugs in
Bugzilla-based projects. When a new bug report is filed, it is • Tokenization and filtering: the first step is to split the
assigned a NEW state. Once it has been triaged and assigned summary of each bug report into tokens (terms). Then,
to a developer, its state is then changed to ASSIGNED. After filter out unnecessary terms which include stop-words,
closing this bug, its state is set to RESOLVED, VERIFIED or punctuations, white-spaces and numbers.
CLOSED. The resolution to this bug is marked in several ways; • Vector space representation: each bug report is repre-
the resolution status in the report is used to record how the sented as a vector where each word in the bug report
report was resolved. If the resolution results in changing code represents a feature. We use the Term Frequency (TF)
base, this bug is marked as FIXED. When a bug is considered of the word to get the value of each word feature.
as a duplicate to other bugs, it is set to DUPLICATE. If a bug
will not be fixed, or it is not an actual bug, it will be set to B. Feature-Set-2
WONTFIX or INVALID respectively. If a bug was resolved
but has been reopened, it is marked as REOPENED [4]. We would like to investigate the usage of other features
than the textual content of bug reports on the effectiveness of
III. T HE A PPROACH classification accuracy. In this set, we use the following meta-
data features:
In this Section, we present an approach for predicting the
priority of each newly coming bug report using bug reports • Component: it represents the component in which the
history obtained from BTS. We formulate the problem as a bug belongs to.
classification task. Three class labels are used to categorize • Operating system: it represents the operating system
bug reports namely High, Medium, and Low. High represents the bug was observed on.
both P1 and P2, Medium represents P3, and Low represents
P4 and P5. This representation aims at helping developers on • Severity: it represents the degree of severity of the
fixing high priority bug reports first. We start by describing the bug.
feature sets under investigation. Then, we present the machine
learning algorithms used in this study. Finally, we present the We use these fields as features because they contain useful
evaluation metrics which are used to evaluate the approach. information that may help in discriminating between priority

113
levels. Other available fields in BTS such as Bug id and TABLE I. S UMMARY OF THE DATASETS
Changed (when the bug was last updated) do not contain useful Project # of Bugs From To
information. Moreover, some fields like Keywords can not be Eclipse 74183 Jan 01, 2010 Dec 31, 2012
Firefox 7284 Jan 01, 2010 Dec 31, 2012
used because it is optional (i.e., most bug reports have empty
values for these fields). TABLE II. P RIORITY D ISTRIBUTION

C. Machine Learning Techniques Project High Medium Low


Eclipse 5888 67742 553
In this section, we briefly present the machine learning Firefox 1142 6009 133

techniques used in this work.


1) Naive Bayes Classifier: Naive Bayes is a probabilistic 1) Imbalanced Data: A training dataset is considered im-
classifier which assumes that all features are independent. balanced if one or more of the class labels are represented by
It finds the class with maximum probability given a set of significantly less number of instances compared to other class
features values using the Bayes theorem. labels. This problem leads to skewed data distribution between
2) Decision Trees Classifier: Decision Tree is a classifier classes which is known to hinder the learning performance of
in the form of a tree structure. It is a predictive model that classification. Weiss and Provost [6] indicated in their study
decides the dependent value of a new sample based on diverse that balancing the dataset usually achieves better classification
attribute values of the existing data. Each internal node in the results. On the same hand, it is generally more important to
tree represents a single attribute while leaf nodes represent correctly classify the smaller class instances. Therefore, we
class labels. Decision trees classify each instances by starting re-balanced the distribution of the class labels by randomly
at the root of the tree and moving through it until a leaf node. selecting equally representative instances of each class label.
3) Random Forests Classifier: Random Forests is an en-
B. Results and Discussion
semble learning method that generates several decision trees
at training time. Each tree gives a class label. The Random Since all class labels are important, we present the classi-
Forests classifier selects the class label that has the mode of fication results for each class label to investigate whether we
the classes output by individual trees. can predict each one of them accurately using the two feature
sets. Table III shows number of features in each Feature set. It
D. Evaluation Metrics is clear from Table III that feature-set-2 has significantly less
We evaluate the approach using widely used metrics in dimensions.
classification and information retrieval namely Precision, Re- TABLE III. N UMBER OF F EATURES
call, and F-measure as follows: Project Feature-set-1 Feature-set-2
Eclipse 3259 287
Number of correctly classified Firefox 1097 45
Precision =
Number of classifications made
Figure 3 shows the classification results for Eclipse and
Number of correctly classified Firefox. For Eclipse, feature-set-2 outperforms feature-set-1
Recall = dramatically for all priority levels using the three classifiers in
Number of possible relevant classifications
terms of Precision, Recall, and F-measure.For instance, the F-
measure of the Low class label is 0.639 and 0.282 for feature-
Precision × Recall set-2 and feature-set-1 respectively using Decision Tree (i.e.,
F-measure = 2 ×
Precision + Recall feature-set-2 improves the F-measure by 35.7% over feature-
set-1). The average F-measure of feature-set-1 is 0.421, 0.364,
IV. E XPERIMENTAL E VALUATION and 0.434 for Naive Bayes, Decision Trees, and Random
In this Section, details of the datasets used in our experi- Forest respectively. For feature-set-2, the average F-measure is
mental evaluation are shown in Section IV-A. The results and 0.593, 0.603, and 0.611 for Naive Bayes, Decision Trees, and
discussion are presented in Section IV-B. Random Forest respectively. We can conclude that Random
Forest outperforms both Naive Bayes and Decision Trees in
A. Datasets both feature sets.
We choose bug reports from two different projects namely For Firefox, feature-set-2 outperforms feature-set-1 sig-
Eclipse and Firefox. We choose only bug reports that are nificantly in terms of Precision, Recall, and F-measure. For
marked as RESOLVED, CLOSED or VERIFIED because the instance, the F-measure of the High class label is 0.476 and
priorities of such reports have been confirmed. We extract bug 0.298 for feature-set-2 and feature-set-1 respectively using
reports dated between January 1st , 2010 and December 31st , Random Forest (i.e., feature-set-2 improves the F-measure by
2012. Table I shows a summary of the datasets. 17.8% over feature-set-1). The average F-measure of feature-
set-1 is 0.340, 0.355, and 0.356 for Naive Bayes, Decision
As mentioned before, there are three class labels namely Trees, and Random Forest respectively. For feature-set-2, the
High, Medium, and Low. Table II shows the distribution of average F-measure is 0.460, 0.491, and 0.476 for Naive
these priority levels in bug reports. It is clear from Table II that Bayes, Decision Trees, and Random Forest respectively. We
the data is imbalanced (e.g, 67742 bug reports are labeled as can conclude that both Random Forest and Decision Trees
Medium while 553 bug reports are labeled as Low in Eclipse). outperform Naive Bayes in both feature sets. To sum up,

114
Feature set 1 Feature set 2 Class Feature set 1 Feature set 2
Class Label
Precision Recall F-measure Precision Recall F-measure Label Precision Recall F-measure Precision Recall F-measure
High
Naïve Bayes

High

Naïve Bayes
0.413 0.391 0.402 0.653 0.501 0.567 0.353 0.406 0.378 0.465 0.444 0.454
Medium 0.394 0.467 0.427 0.507 0.685 0.583 Medium 0.303 0.271 0.286 0.366 0.256 0.301
Low 0.455 0.396 0.423 0.645 0.57 0.605 Low 0.362 0.346 0.354 0.525 0.707 0.603
avg 0.421 0.418 0.419 0.602 0.585 0.593 avg 0.339 0.341 0.340 0.452 0.469 0.460

High 0.389 0.199 0.263 0.659 0.535 0.591 High 0.374 0.368 0.371 0.544 0.466 0.502

Decision Tree
Decision Tree

Medium 0.337 0.627 0.438 0.519 0.629 0.569 Medium 0.291 0.188 0.228 0.439 0.218 0.291
Low Low 0.385 0.526 0.445 0.484 0.797 0.602
0.366 0.23 0.282 0.647 0.631 0.639
avg avg 0.35 0.361 0.355 0.489 0.494 0.491
0.364 0.352 0.358 0.608 0.599 0.603

High

Random Forest
High
Random Forest

0.408 0.519 0.457 0.648 0.624 0.636 0.343 0.263 0.298 0.574 0.406 0.476
Medium 0.408 0.382 0.395 0.564 0.566 0.565 Medium 0.282 0.233 0.255 0.369 0.286 0.322
Low 0.485 0.385 0.429 0.622 0.644 0.633 Low 0.422 0.594 0.493 0.485 0.737 0.585
avg 0.434 0.429 0.431 0.612 0.611 0.611 avg 0.349 0.363 0.356 0.476 0.476 0.476
(a) Eclipse (b) Firefox

Fig. 3. Classification results of Eclipse and Firefox.

feature-set-2 outperforms feature-set-1 dramatically for both extracted from the software testing process such as milestone,
datasets. workflow, and module. They defined four levels of priority and
evaluated their approach on five international health care prod-
Wilcoxon test is a non-parametric statistical hypothesis test ucts. Their experimental results showed that their approach is
[7]. We use Wilcoxon test to compare the F-measure values feasible and effective.
of each feature-sets to find out if feature-set-2 is significantly
better than feature-set-1. The p-value of our test is < 0.001 Different machine learning techniques have been applied
which means that the classification results of feature-set-2 is on open bug repository data to identify duplicate bug reports
significantly better than feature-set-1. Besides the low results [13], assign the most experienced developer to resolve a new
of feature-set-1 compared to feature-set-2, the vocabulary that bug automatically [14], [3], estimate the required effort to fix
are used to describe bugs may change over time [8] (e.g., bug reports [15] and predict the files that have most post-
reporters and developers change overtime). Therefore, we are release defects [16]. In this work, we apply three machine
recommending to use feature-set-2 since it does not depend learning techniques on different feature sets.
on the textual description of bugs, it gives better classification
results, and it contains much smaller number of features (See
VI. T HREATS TO VALIDITY
Table III). Regarding the classification techniques, we are
recommending to use Random Forest or Decision Trees since In this section, we discuss the threats that affect the validity
they both achieve comparable results and outperform Naive of our proposed approach. First, we only select two open
Bayes. source projects. Other projects may give different conclusions.
Therefore, we should apply the approach on more projects
V. R ELATED W ORK in order to generalize the results. Second, we only consider
projects that use Bugzilla as their bug tracking system. Other
Text mining has been successfully used to predict different bug tracking systems model bug reports in a different way such
meta-data about bug reports such as severity and security. as Gnats. Therefore, the proposed approach should be applied
Gegick et al. [9] proposed an approach to predict whether a to other bug tracking systems other than Bugzilla.
bug is security related or not using a classification technique.
They have applied their approach on dataset obtained from
a Cisco software project. Their model correctly classified VII. C ONCLUSION
78% of the security bug reports as validated by security
In this paper, we presented an approach to prioritize bug re-
engineers. Lamkanfi et al. [10] investigated the effectiveness
ports using classification. Three classification techniques were
of using several classification algorithms in predicting the
employed namely Naive Bayes, Decision Trees, and Random
severity level of each bug report. They applied their approach
Forests. In addition, two feature sets were investigated. The
on Eclipse, Mozilla, and GNOME projects. Their results are
first feature set is based on the textual description of bug
varied between 0.65 and 0.85. Wang at el., [11] presented a
reports. The second feature set is based on pre-defined meta-
text classification model using three classifiers to predict the
data of bug reports. The experimental results showed that both
component label of a new bug report. They used TF-IDF and
Random Forest and Decision Tree gave better classification
Chi-square to construct the vector space. Their experiments on
results than Naive Bayes. Moreover, the results showed that
Eclipse showed that the accuracy of SVM classifier reached up
feature-set-2 improved the results significantly compared to
to 81.21%.
feature-set-1 in terms of Precision, Recall, and F-measure.
Yu et al. [12] proposed an approach to predict the priority Future directions include applying the proposed approach on
of defects using Neural Network. They used several features other projects including close-source projects. We are also

115
planning to investigate the effect of using different combina-
tions of features on the classification accuracy.

R EFERENCES
[1] J. Anvik, L. Hiew, and G. Murphy, “Who should fix this bug?” in Pro-
ceedings of the 28th international conference on Software engineering.
ACM, 2006, pp. 361–370.
[2] J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage:
Recommenders for development-oriented decisions,” ACM Transactions
on Software Engineering and Methodology (TOSEM), vol. 20, no. 3,
p. 10, 2011.
[3] S. Banitaan and M. Alenezi, “Tram: An approach for assigning bug
reports using their metadata,” in Communications and Information
Technology (ICCIT), 2013 Third International Conference on. IEEE,
2013, pp. 215–219.
[4] D. Čubranić and G. C. Murphy, “Automatic bug triage using text catego-
rization,” in In SEKE 2004: Proceedings of the Sixteenth International
Conference on Software Engineering. Citeseer, 2004, pp. 92–97.
[5] A. J. Ko, B. A. Myers, and D. H. Chau, “A linguistic analysis
of how people describe software problems,” in Proceedings of the
Visual Languages and Human-Centric Computing, ser. VLHCC ’06.
Washington, DC, USA: IEEE Computer Society, 2006, pp. 127–134.
[6] G. M. Weiss and F. J. Provost, “Learning when training data are costly:
The effect of class distribution on tree induction,” J. Artif. Intell. Res.
(JAIR), vol. 19, pp. 315–354, 2003.
[7] J. Demšar, “Statistical comparisons of classifiers over multiple data
sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30,
2006.
[8] G. Chrupala, “Learning from evolving data streams: online triage of
bug reports,” in Proceedings of the 13th Conference of the European
Chapter of the Association for Computational Linguistics, ser. EACL
’12. Stroudsburg, PA, USA: Association for Computational Linguistics,
2012, pp. 613–622.
[9] M. Gegick, P. Rotella, and T. Xie, “Identifying security bug reports via
text mining: An industrial case study,” in Mining Software Repositories
(MSR), 2010 7th IEEE Working Conference on. IEEE, 2010, pp. 11–20.
[10] A. Lamkanfi, S. Demeyer, Q. D. Soetens, and T. Verdonck, “Comparing
mining algorithms for predicting the severity of a reported bug,” in
Software Maintenance and Reengineering (CSMR), 2011 15th European
Conference on. IEEE, 2011, pp. 249–258.
[11] D. Wang, H. Zhang, R. Liu, M. Lin, and W. Wu, “Predicting bugs
components via mining bug reports,” Journal of Software, vol. 7, no. 5,
pp. 1149–1154, 2012.
[12] L. Yu, W.-T. Tsai, W. Zhao, and F. Wu, “Predicting defect priority
based on neural networks,” in Proceedings of the 6th international
conference on Advanced data mining and applications - Volume Part II,
ser. ADMA’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 356–
367.
[13] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to
detecting duplicate bug reports using natural language and execution
information,” in Proceedings of the 30th international conference on
Software engineering. ACM, 2008, pp. 461–470.
[14] M. Alenezi, K. Magel, and S. Banitaan, “Efficient bug triaging using
text mining,” Journal of Software, vol. 8, no. 9, pp. 2185–2190, 2013.
[15] C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller, “Predicting effort
to fix software bugs,” Softwaretechnik-Trends, vol. 27, no. 2, 2007.
[16] T. Zimmermann, R. Premraj, and A. Zeller, “Predicting defects for
eclipse,” in Proceedings of the Third International Workshop on Predic-
tor Models in Software Engineering, ser. PROMISE ’07. Washington,
DC, USA: IEEE Computer Society, 2007.

116

You might also like