0% found this document useful (0 votes)
72 views4 pages

Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K

The document discusses using data reduction techniques like feature selection and instance selection algorithms to improve the accuracy of automatic bug triage. It proposes applying these techniques in combination and in a specific order (feature selection followed by instance selection) to reduce the scale of bug data while enhancing bug triage performance. The techniques are evaluated on a Mozilla bug dataset, demonstrating 78% accuracy after training set reduction compared to using the original full training set.

Uploaded by

Kumarecit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views4 pages

Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K

The document discusses using data reduction techniques like feature selection and instance selection algorithms to improve the accuracy of automatic bug triage. It proposes applying these techniques in combination and in a specific order (feature selection followed by instance selection) to reduce the scale of bug data while enhancing bug triage performance. The techniques are evaluated on a Mozilla bug dataset, demonstrating 78% accuracy after training set reduction compared to using the original full training set.

Uploaded by

Kumarecit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052

Using Data Reduction Techniques for Effective Bug


Triage
Shanthipriya. D1, Deepa.K2
1
Department of Computer Science & Engineering
2
Department of Information Technology
1, 2
Sri Ramakrishna Engineering College,Coimbatore 641 022, TamilNadu, India

Abstract- An automatic bug triage process is an inevitable proposed [17]. This approach applies the text classification
step to fix the software bugs. To decrease the manual and time techniques in order to expect the valid developer for bug
cost, text classification techniques are applied to perform the reports without tossing.
automatic bug triage. The main goal of essential bug triaging
software is to allocate possibly experience developers to new In this paper, we proposed the data reduction
coming bug reports. The existing bug triage approach suffers techniques using the combination of the instance selection
from large scale and low quality bug data. The proposed algorithm (IS) [4] and feature selection algorithm (FS) [11,
system employs the combination of feature selection algorithm 12] . These approaches are used to shrink the data scale and
(FS) and instance selection algorithm (IS) for bug triage. also improve the accuracy of bug data set. The order of
These data reduction techniques are used to shrink the bug applying the reduction techniques may concern the
data and also to enhance the accuracy. The performance of consequence of bug triage approach. In this paper, to
proposed system is evaluated by using Mozilla bug data set. determine the order of bug data reduction techniques, i.e., FS
To show the effectiveness, scales of bug data is reduced to to IS or IS to FS we propose a Predictive model [17]. The text
avoid the manual and time cost, upgrades the accuracy of bug classification technique i.e., Naive Bayes is used to predict
triage with standard bug data in software maintenance. correct developer to solve and fix the bug reports [21] in order
to shrink the manual triager cost. The proposed system
Keywords- Bug Triage, Feature selection, Instance selection, Mining performance is verified using Mozilla bug data set [9] which
Software repository. obtains 78% accuracy after the training set reduction. The
outcome shows that the experiment on reduce training sets can
I. INTRODUCTION
obtain better accuracy than that on original training set.
A Software bug is an issue causing a program to
collapse or create unacceptable output. The problem is caused The remainder section of this paper is organized as
by inadequate or invalid logic. A bug can be an error, mistake, follows: Section 2 explains the proposed methodology.
flaw or fault, which may cause collapse or variation from Section 3 explains the experimental results and discussion.
usual results. Most bugs are due to human errors in source Section 4 lists the related work. In Section 5 we briefly
code or its design. A program is said to be buggy when it conclude this paper and present our future work.
includes a huge number of bugs, which concern program
functionality and cause erroneous results [5]. The details of II. RELATED WORKS
bugs are stored in large database which is named as bug
repository or bug tracking system [2]. An open source bug As our Knowledge, there is no combination of data
repository [2], which is employed by many large software reduction methods in turn to decrease the data scale and
companies for open source projects i.e., Mozilla [9]. To solve upgrade the exactness of bug triage approach in the
the real world engineering issues some data mining methods illustration.
[16] are exercised to describe with some useful information
accumulated in bug ordnance. Bug Triage is the process to Jeong, Kim, Zimmermann introduced a tossing graph
assign relevant developer to each bug reports in order to fix it model based on Markov property from the conception of
[19]. reassign the bug reports to other developers [6]. Shivaji and
colleagues [12] proposed the feature selection techniques to
Due to huge number of daily bugs and lack of skill predict the software bugs. Anvik, L. Hiew, and G. C. Murphy
person of all the bugs, manual triage is an expensive in time [1] extend the machine learning approaches. They describe the
cost and labor cost, low in precision. To defeat the limitations bug triage as semi-supervised approach which updated with
of existing work, an automatic bug triage approach is weighted recommendation list; based on the probabilistic view
the relevant developers are employed to the human triage [4,
Page | 106 www.ijsart.com
IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052

13]. Cubranic and Murphy [3] projected supervised learning repository contains all information about the software bugs.
method (NB Classifier) to assist in bug triage by using text Each bug has the bug statement and the details of the
categorization to predict the relevant developers. A developer who employed on that particular bug. The bug
classification model should be designed to investigate the details may be divided into two parts: summary and
relationship among the datas in bug data set and to check the description. The proposed system can use bug data reduction
quality [20, 17]. technique which reduces labor cost and time cost. Here, the
bug data reduction method is used to prepare the content for
Fu.Y, Zhu.X, and Li.B [4] investigated to obtain the bug triage. This proposed system mainly concerns on two
accurate prediction model with minimum cost by labelling goals. First, reduces the data scale and second, improves the
most informative instances. In contrast to these papers, our accuracy of bug data.
paper aims to employ the information gain algorithm to
develop the software value of bug data prediction. In this The instance selection and feature selection are pre-
paper, we focus on the issue of bug data reduction and low in processing techniques which are used for bug data reduction.
precision of bug data set. Further the combination of feature
selection and instance selection algorithm intend to shrink the
bug data set and develop the performance of bug triage with
high-quality bug data in software maintenance and
improvement.

III. NEED FOR BUG TRIAGE PROCESS

Bug triage is an important process in bug fixing process


in order to assign relevant developers to new coming bugs.
Fig.1 represents the bug triage process [19]. Some of the steps
involved in bug triage process are
1. Find bugs to triage
2. Pre-filter bug reports Fig.2 the Text Categorization approach for Bug triage
3. Search for duplicates of bugs
4. Check information provided in bug report For a specified bug data set, the instance selection is
5. Attempt to reproduce bug applied to find the significant subsets (i.e., bug reports in bug
6. Set bug status data set) and after/ before feature selection is applied to find
7. Prioritize bug the subset of appropriate features (i.e., words in bug data set).
In proposed system, the combination of these techniques is
used.

Algorithm: Data reduction based on FSIS


Input:
training set T with n words and m bug reports
reduction order FSIS
final number nF of words,
final number mI of bug reports,
1. apply FS n words of T
Fig.1 Bug Triage Process 2. calculate objective values for all the words
3. select the top nF words of T
8. Notify developers needed only in very specific 4. generate a training set TF
cases if bug seems to be a blocker / critical. 5. apply IS mI bug reports of TF
6. terminate IS when the number of bug reports is
IV. PROPOSED METHODOLOGY equal to or less than mI
7. Generate the final training set TFI.
The fig.2 illustrated the system architecture of the Output:
proposed system. Bug datas of Mozilla are taken from an open reduced data set TFI for bug triage
source bug repository i.e., Bugzilla. This open source bug
Page | 107 www.ijsart.com
IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052

By applying these techniques, the bug data scale can The classifier i.e., Naive Bayes is trained by training
get reduced and also upgrades the performance of the bug data set with their data reduction order. Then, the classifier is
triage approach. The predictive model is proposed in order to used to predict the correct order to test data set and reduce the
predict the correct order to shrink the bug data set. By labor cost. By this, the bug triage approach is upgraded by
employing this model FS to IS or IS to FS order can be their performance. Fig.4 illustrates the precision, recall, and F-
predicted without any complication. The text classification measure values of Mozilla bug data set are 0.667, 0.737 and
approach i.e., Naive Bayes is used to predict the correct 0.70. The accuracy is measured as 78% by using Naive Bayes
developer for the predicted bug. C4.5 AdaBoost is used to classifier for training data set.
calculate the precision, recall and to balance this F measure
values are calculated. The accuracy of Mozilla bug data set VI. CONCLUSION
can be calculated as 78% which reduces data scale and
improves the performance of bug triage approach. Bug triage is an important and significant step of
software protection in both labor cost and time cost. The
V. RESULTS AND DISCUSSION proposed method combines the feature selection algorithm
(FS) with instance selection algorithm (IS) in order to trim
The performance of bug data set can be measured by down the scale of bug data sets as well as develop the data
using both training and test bug data set. In this attributes of value. A Predictive model is utilized to establish the order of
each training and test bug data set can be calculated. The applying reduction order, i.e., FS to IS or IS to FS. The
attributes are named as bug dataset details as B1 to B10 and proposed system performance is verified using Mozilla bug
developer details as D1 to D8. The pre-processing techniques data set. To exhibit the value, a scale of data set is condensed
for data reduction i.e., feature selection and instance selection by using data reduction technique in order to diminish the time
is applied to in training bug data set. and labor cost, upgrades the precision of bug triage with high-
quality bug data in software progress and maintenance.

The future work of the proposed system is to get


better the outcome of data reduction in bug triage to
investigate how to organize a high quality bug data set and
deal with a domain-specific software assignment. For
predicting reduction orders, aim to give attempts to locate out
the possible relationship among the attributes of bug data sets
and the reduction orders.
Fig.3 Comparison result between original and reduced
REFERENCES
bug data set (Training bug data Set)
[1] J. Anvik, L. Hiew, and G. C. Murphy, Who should fix
The training data set contains 40 bug records which this bug?in Proc. 28th Int. Conf. Softw. Eng., May
give complete information about the bug data stored in large 2006, pp. 361370.
database i.e., Bugzilla.Fig.3 illustrates the comparison
between original bug data set and reduced bug data set. [2] Bugzilla, (2015). [Online]. Avaialble: http://bugzilla.org/

[3] D.Cubranic and G. C. Murphy, Automatic bug triage


using text categorization, in Proc. 16th Int. Conf. Softw.
Eng. Knowl. Eng., Jun. 2004, pp. 9297.

[4] Y. Fu, X. Zhu, and B. Li, A survey on instance


selection for active learning, Knowl. Inform. Syst., vol.
35, no. 2, pp. 249283, 2013.

[5] https://www.techopedia.com/definition/24864/software-
bug

Fig.4 Comparison Graph for Precision, Recall and F- measure [6] G. Jeong, S. Kim, and T. Zimmermann, Improving bug
triage with tossing graphs, in Proc. Joint Meeting 12th
Page | 108 www.ijsart.com
IJSART - Volume 2 Issue 2 FEBRUARY 2016 ISSN [ONLINE]: 2395-1052

Eur. Softw. Eng. Conf. 17th ACM SIGSOFT Symp. [19] J. Xuan, H. Jiang, Z. Ren, J. Yan, and Z. Luo,
Found. Softw. Eng., Aug. 2009, pp. 111120. Automatic bug triage using semi-supervised text
classification, in Proc. 22nd Int. Conf. Softw. Eng.
[7] S. Kim, H. Zhang, R. Wu, and L. Gong, Dealing with Knowl. Eng., Jul. 2010, pp. 209214.
noise in defect prediction, in Proc. 32nd ACM/IEEE
Int. Conf. Softw. Eng., May 2010, pp. 481490. [20] T. Zimmermann, N. Nagappan, P. J. Guo, and B.
Murphy, Characterizing and predicting which bugs get
[8] D. Matter, A. Kuhn, and O. Nierstrasz, Assigning bug reopened, in Proc. 34th Int. Conf. Softw. Eng., Jun.
reports using a vocabulary-based expertise model of 2012, pp. 10741083.
developers, in Proc. 6th Int. Working Conf. Mining
Softw. Repositories, May 2009, pp. 131140. [21] W. Zou, Y. Hu, J. Xuan, and H. Jiang, Towards
training set reduction for bug triage, in Proc. 35th
[9] Mozilla. (2015). [Online]. Available: http://mozilla.org/ Annu. IEEE Int. Comput. Soft. Appl. Conf., Jul. 2011,
pp. 576581.
[10] E.Murphy-Hill, T. Zimmermann, C. Bird, and N.
Nagappan, The design of bug fixes, in Proc. Int. Conf.
Softw. Eng., 2013, pp. 332 341.

[11] M. Rogati and Y. Yang, High-performing feature


selection for text classification, in Proc. 11th Int. Conf.
Inform. Knowl. Manag., Nov. 2002, pp. 659661.

[12] S. Shivaji, E. J. Whitehead, Jr., R. Akella, and S. Kim,


Reducing features to improve code change based bug
prediction, IEEE Trans. Soft. Eng., vol. 39, no. 4, pp.
552569, Apr. 2013.

[13] Sun, D. Lo, S. C. Khoo, and J. Jiang, Towards more


accurate retrieval of duplicate bug reports, in Proc. 26th
IEEE/ACM Int. Conf. Automated Softw. Eng., 2011, pp.
253262.

[14] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, An


approach to detecting duplicate bug reports using natural
language and execution information, in Proc. 30th Int.
Conf. Softw. Eng., May 2008, pp. 461470.

[15] D. R. Wilson and T. R. Mart_nez, Reduction


techniques for instance-based learning algorithms,
Mach. Learn., vol. 38, pp. 257286, 2000.

[16] T. Xie, S. Thummalapenta, D. Lo, and C. Liu, Data


mining for software engineering, Comput., vol. 42, no.
8, pp. 5562, Aug. 2009.

[17] J. Xuan, H. Jiang, Y. Hu, Z. Ren, Z. Luo, W.Zou and X.


Wu, Towards Effective Bug Triage with Software Data
Reduction Techniques in IEEE Trans. on Knowl. and
Data Eng., vol. 27, no. 1, Jan. 2015.

[18] J. Xuan, H. Jiang, Z. Ren, and W. Zou, Developer


prioritization in bug repositories, in Proc. 34th Int.
Conf. Softw. Eng., 2012, pp. 25 35.

Page | 109 www.ijsart.com

You might also like