Software Bug Detection Using Data Mining
Software Bug Detection Using Data Mining
net/publication/275327486
CITATIONS READS
7 4,152
1 author:
Saurabh Pal
Veer Bahadur Singh Purvanchal University
113 PUBLICATIONS 4,127 CITATIONS
SEE PROFILE
All content following this page was uploaded by Saurabh Pal on 22 April 2015.
General Terms
Data Mining, Classification algorithms, Software Bug, Weka
Tool.
1.2 Classification
Tiwari and Chaudhary [2] introduced about classification
which divide data samples into target classes. Classification
have a training set which provide a facility to have a common
level of same classes of data .Some different type bugs in
software project development: SW-bug, document bug,
duplicate bug and mistaken bug .These bugs have common
level bug classes of data object known as software defect in
training set.
21
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 15, April 2015
1. A flow chart like tree structure internal node Li and Reformate [13] discussed that the software
denotes a test. configuration management is a system includes documents,
software code, status accounting, design model defect
2. On an attribute branch represents an out comes of tracking and also include revision data.
the test.
Elcan [14] discussed that COCOMO model pruned accurate
3. Decision tree generate consists of two phases. cost estimation and there are many thing about cost estimation
because in project development involve more variable so
1.5 Bayes Rule COCOMO measure in term effort and metrics.
Tiwari and Chaudhary [2] introduced about Bayes rule have
event and supporting evidence and there are two cases arise: Chang and Chu [15] discussed that for discovering pattern of
large database and its variables also relation between them by
1. If event occurs means between evidence P (H)
association rule of data mining.
probability conform.
Kotsiantis and Kanellopoulos [16] discussed that high severity
2. Event occurs means with supporting evidence P
defect in software project development and also discussed the
(H/E).
pattern provide facility in prediction and associative rule
Let H be the event of SW-bug and E be the evidence of reducing number of pass in database.
software defect then we have
Pannurat, N. Kerdprasop and K. Kerdprasop [17] discussed
P (SW-bug/software defect) =P (software defect/SW-bug)*P that association rule provide facility the relationship among
(SW-bug)/P (software defect) large dataset as like software project term hug amount, cost
record and helpful in process of project development.
Naïve Bayes classification algorithm basically used for high
dimension input from the above example. We can predict and Fayyad, Piatesky Shapiro, Smuth and Uthurusamy [18]
output of some event and observing some evidence. Generally discussed that classification creates a relationship or map
it is better to have more than one evidence to support the between data item and predefined classes.
prediction of an event.
Pal [19] conducted study on the student dropout rate by
2. RELATED WORK selecting 1650 students from different branches of
Shepperd, Schofield and Kitchenham [4] discussed that need engineering college. In their study, it was found that student’s
of cost estimation for management and software development dropout rate in engineering exam, high school grade; senior
organizations and give the idea of prediction and discuss the secondary exam grade, family annual income and mother’s
methods for estimation. occupation were highly correlated with the student academic
performance.
Alsmadi and Magel [5] discussed that how data mining
provide facility in new software project its quality, cost and Shtern and Vassillios [20] discussed that in clustering analysis
complexity also build a channel between data mining and the similar object placed in the same cluster also sorting
software engineering. attribute into group so that the variation between clusters is
maximized relative to variation within clusters.
Boehm, Clark, Horowitz, Madachy, Shelby and Westland [6]
discussed that some software companies suffer from some Runeson and Nyholm [21] discussed that code duplication is a
accuracy problems depend on his data set after prediction problem which is language independent. It is appear again and
software company provide new idea to specify project cost again another problem report in software development and
schedule and determine staff time table. duplication arises using neural language with data mining.
Pal and Pal [7] conducted study on the student performance Vishal and Gurpreet [22] discussed that data mining analyzing
based by selecting 200 students from BCA course. By means information and research of hidden information from the text
of ID3, c4.5 and Bagging they find that SSG, HSG, Focc, in software project development.
Fqual and FAIn were highly correlated with the student Lovedeep and Arti [23] data mining provide a specific
academic performance. platform for software engineering in which many task run
K.Ribu [8] discussed that the need of open source code easily with best quality and reduce the cost and high profile
projects analyzed by prediction and get estimating object problems.
oriented software project by case model. Yadav and Pal [24] conducted a study using classification tree
Nagwani and Verma [9] discussed that the prediction of to predict student academic performance using students’
software defect (bug) and duration similar bug and bug gender, admission type, previous schools marks, medium of
average in all software summery, by data mining also discuss teaching, location of living, accommodation type, father’s
about software bug. qualification, mother’s qualification, father’s occupation,
mother’s occupation, family annual income and so on. In their
Hassan [10] discussed that the complex data source (audio, study, they achieved around 62.22%, 62.22% and 67.77%
video, text etc.) need more of buffer for processing it does not overall prediction accuracy using ID3, CART and C4.5
support general size and length of buffer. decision tree algorithms respectively.
Chaurasia and Pal [11, 12] conducted study on the prediction The present study proposed classification to get better rules
of heart attack risk levels from the heart disease database with and to decrease the error rate as much as possible, several
data mining technique like Naïve Bayes, J48 decision tree and approaches are used SW-bug detection from a software defect
Bagging approaches and CART, ID3 and Decision Table. The origin using different data mining techniques (ID3, J48 ,
outcome shows that bagging techniques performance is more Naïve Bayes) and Weka tool .The aim is to detect better
accurate than Bayesian classification and J48. accurate result of data by classifying all observations of
BUG.
22
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 15, April 2015
Dependable Variables
{0=SW-BUG,
1=DOC-BUG,
2=Change request, Category of BUG
Class
3=Support, classes
4=Mistake,
5=Duplicate}
Fig 4: Instances classified by J48 algorithm
A software error arises in problem report and all problem
reports grouped in two categories: recoverable and
unrecoverable. In recoverable group an error easily recovered
automatically by software. A software bug tracking system
GANTS, (a tracking system by GNU) is set up on MASC
intranet to collect and maintain all problem reports from every
department of MASC. The SW-bug is an input value for class
field; SW-bug is from code implementation. Now performing
for classification of SW-bug using several standard data
mining tasks, data preprocessing, clustering, classification,
association and tasks are needed to be done. The database is
designed in MS-Excel, MS word 2010 database and database
management system to store the collect data. The data is
formed according to the required format and structures and
data is converted to .csv (comma delimited) format to process
in Weka that describes a list of instances sharing a set of
attributes.
Fig 5: Instances classified by Naïve Bayes algorithm
23
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 15, April 2015
24
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 15, April 2015
IJCATM : www.ijcaonline.org 25