Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K
Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K
Abstract- An automatic bug triage process is an inevitable proposed [17]. This approach applies the text classification
step to fix the software bugs. To decrease the manual and time techniques in order to expect the valid developer for bug
cost, text classification techniques are applied to perform the reports without tossing.
automatic bug triage. The main goal of essential bug triaging
software is to allocate possibly experience developers to new In this paper, we proposed the data reduction
coming bug reports. The existing bug triage approach suffers techniques using the combination of the instance selection
from large scale and low quality bug data. The proposed algorithm (IS) [4] and feature selection algorithm (FS) [11,
system employs the combination of feature selection algorithm 12] . These approaches are used to shrink the data scale and
(FS) and instance selection algorithm (IS) for bug triage. also improve the accuracy of bug data set. The order of
These data reduction techniques are used to shrink the bug applying the reduction techniques may concern the
data and also to enhance the accuracy. The performance of consequence of bug triage approach. In this paper, to
proposed system is evaluated by using Mozilla bug data set. determine the order of bug data reduction techniques, i.e., FS
To show the effectiveness, scales of bug data is reduced to to IS or IS to FS we propose a Predictive model [17]. The text
avoid the manual and time cost, upgrades the accuracy of bug classification technique i.e., Naive Bayes is used to predict
triage with standard bug data in software maintenance. correct developer to solve and fix the bug reports [21] in order
to shrink the manual triager cost. The proposed system
Keywords- Bug Triage, Feature selection, Instance selection, Mining performance is verified using Mozilla bug data set [9] which
Software repository. obtains 78% accuracy after the training set reduction. The
outcome shows that the experiment on reduce training sets can
I. INTRODUCTION
obtain better accuracy than that on original training set.
A Software bug is an issue causing a program to
collapse or create unacceptable output. The problem is caused The remainder section of this paper is organized as
by inadequate or invalid logic. A bug can be an error, mistake, follows: Section 2 explains the proposed methodology.
flaw or fault, which may cause collapse or variation from Section 3 explains the experimental results and discussion.
usual results. Most bugs are due to human errors in source Section 4 lists the related work. In Section 5 we briefly
code or its design. A program is said to be buggy when it conclude this paper and present our future work.
includes a huge number of bugs, which concern program
functionality and cause erroneous results [5]. The details of II. RELATED WORKS
bugs are stored in large database which is named as bug
repository or bug tracking system [2]. An open source bug As our Knowledge, there is no combination of data
repository [2], which is employed by many large software reduction methods in turn to decrease the data scale and
companies for open source projects i.e., Mozilla [9]. To solve upgrade the exactness of bug triage approach in the
the real world engineering issues some data mining methods illustration.
[16] are exercised to describe with some useful information
accumulated in bug ordnance. Bug Triage is the process to Jeong, Kim, Zimmermann introduced a tossing graph
assign relevant developer to each bug reports in order to fix it model based on Markov property from the conception of
[19]. reassign the bug reports to other developers [6]. Shivaji and
colleagues [12] proposed the feature selection techniques to
Due to huge number of daily bugs and lack of skill predict the software bugs. Anvik, L. Hiew, and G. C. Murphy
person of all the bugs, manual triage is an expensive in time [1] extend the machine learning approaches. They describe the
cost and labor cost, low in precision. To defeat the limitations bug triage as semi-supervised approach which updated with
of existing work, an automatic bug triage approach is weighted recommendation list; based on the probabilistic view
the relevant developers are employed to the human triage [4,
Page | 106 www.ijsart.com
IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052
13]. Cubranic and Murphy [3] projected supervised learning repository contains all information about the software bugs.
method (NB Classifier) to assist in bug triage by using text Each bug has the bug statement and the details of the
categorization to predict the relevant developers. A developer who employed on that particular bug. The bug
classification model should be designed to investigate the details may be divided into two parts: summary and
relationship among the datas in bug data set and to check the description. The proposed system can use bug data reduction
quality [20, 17]. technique which reduces labor cost and time cost. Here, the
bug data reduction method is used to prepare the content for
Fu.Y, Zhu.X, and Li.B [4] investigated to obtain the bug triage. This proposed system mainly concerns on two
accurate prediction model with minimum cost by labelling goals. First, reduces the data scale and second, improves the
most informative instances. In contrast to these papers, our accuracy of bug data.
paper aims to employ the information gain algorithm to
develop the software value of bug data prediction. In this The instance selection and feature selection are pre-
paper, we focus on the issue of bug data reduction and low in processing techniques which are used for bug data reduction.
precision of bug data set. Further the combination of feature
selection and instance selection algorithm intend to shrink the
bug data set and develop the performance of bug triage with
high-quality bug data in software maintenance and
improvement.
By applying these techniques, the bug data scale can The classifier i.e., Naive Bayes is trained by training
get reduced and also upgrades the performance of the bug data set with their data reduction order. Then, the classifier is
triage approach. The predictive model is proposed in order to used to predict the correct order to test data set and reduce the
predict the correct order to shrink the bug data set. By labor cost. By this, the bug triage approach is upgraded by
employing this model FS to IS or IS to FS order can be their performance. Fig.4 illustrates the precision, recall, and F-
predicted without any complication. The text classification measure values of Mozilla bug data set are 0.667, 0.737 and
approach i.e., Naive Bayes is used to predict the correct 0.70. The accuracy is measured as 78% by using Naive Bayes
developer for the predicted bug. C4.5 AdaBoost is used to classifier for training data set.
calculate the precision, recall and to balance this F measure
values are calculated. The accuracy of Mozilla bug data set VI. CONCLUSION
can be calculated as 78% which reduces data scale and
improves the performance of bug triage approach. Bug triage is an important and significant step of
software protection in both labor cost and time cost. The
V. RESULTS AND DISCUSSION proposed method combines the feature selection algorithm
(FS) with instance selection algorithm (IS) in order to trim
The performance of bug data set can be measured by down the scale of bug data sets as well as develop the data
using both training and test bug data set. In this attributes of value. A Predictive model is utilized to establish the order of
each training and test bug data set can be calculated. The applying reduction order, i.e., FS to IS or IS to FS. The
attributes are named as bug dataset details as B1 to B10 and proposed system performance is verified using Mozilla bug
developer details as D1 to D8. The pre-processing techniques data set. To exhibit the value, a scale of data set is condensed
for data reduction i.e., feature selection and instance selection by using data reduction technique in order to diminish the time
is applied to in training bug data set. and labor cost, upgrades the precision of bug triage with high-
quality bug data in software progress and maintenance.
[5] https://www.techopedia.com/definition/24864/software-
bug
Fig.4 Comparison Graph for Precision, Recall and F- measure [6] G. Jeong, S. Kim, and T. Zimmermann, Improving bug
triage with tossing graphs, in Proc. Joint Meeting 12th
Page | 108 www.ijsart.com
IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052
Eur. Softw. Eng. Conf. 17th ACM SIGSOFT Symp. [19] J. Xuan, H. Jiang, Z. Ren, J. Yan, and Z. Luo,
Found. Softw. Eng., Aug. 2009, pp. 111120. Automatic bug triage using semi-supervised text
classification, in Proc. 22nd Int. Conf. Softw. Eng.
[7] S. Kim, H. Zhang, R. Wu, and L. Gong, Dealing with Knowl. Eng., Jul. 2010, pp. 209214.
noise in defect prediction, in Proc. 32nd ACM/IEEE
Int. Conf. Softw. Eng., May 2010, pp. 481490. [20] T. Zimmermann, N. Nagappan, P. J. Guo, and B.
Murphy, Characterizing and predicting which bugs get
[8] D. Matter, A. Kuhn, and O. Nierstrasz, Assigning bug reopened, in Proc. 34th Int. Conf. Softw. Eng., Jun.
reports using a vocabulary-based expertise model of 2012, pp. 10741083.
developers, in Proc. 6th Int. Working Conf. Mining
Softw. Repositories, May 2009, pp. 131140. [21] W. Zou, Y. Hu, J. Xuan, and H. Jiang, Towards
training set reduction for bug triage, in Proc. 35th
[9] Mozilla. (2015). [Online]. Available: http://mozilla.org/ Annu. IEEE Int. Comput. Soft. Appl. Conf., Jul. 2011,
pp. 576581.
[10] E.Murphy-Hill, T. Zimmermann, C. Bird, and N.
Nagappan, The design of bug fixes, in Proc. Int. Conf.
Softw. Eng., 2013, pp. 332 341.