Software Defect
Software Defect
Software Defect
Dissertation submitted in
May 2013
to the department of
Computer Science and Engineering
of
National Institute of Technology Rourkela
in partial fulfillment of the requirements
for the degree of
Master of Technology
by
Dulal Chandra Sahana
(Roll 211CS3299)
under the supervision of
Prof. Korra Sathya Babu
Certificate
This is to certify that the work in the thesis entitled Software Defect Prediction
Based on Classification Rule Mining by Dulal Chandra Sahana, bearing roll number
211CS3299, is a record of an original research work carried out by him under my
supervision and guidance in partial fulfillment of the requirements for the award of
the degree of Master of Technology in Computer Science and Engineering. Neither
this thesis nor any part of it has been submitted for any degree or academic award
elsewhere.
I am gratefull to numerous local and global peers who have contributed towords
shaping this thesis. At the outset, I would like to express my sincere thanks to Prof.
K Sathya Babu for his advise during my thesis work. As my superviser , he has
constantly encouraged me to remain focused on achieving my goal. His observations
and comments help me to establish the overall direction of the research and to move
forword with investigation in depth. He has help me greatly and been a source of
knowledege.
I am very much indebted to Prof. Ashok Kumar Turuk, Head-CSE, for his
continuous encouragement and support. He is always ready to help with a smile. I
am also thankfull to all professors of the department for their support.
I must acknowledge the academic resources that I have got from NIT Rourkela.
I would like to thank administrative and technical staff members of the department
who have been kind enough to advise and help in their respective roles.
Last, but not the least, I would like to dedicate this thesis to my family, for
their love patience, and understanding.
Certificate ii
Acknowledgement iii
Abstract iv
List of Tables ix
1 Introduction 1
1.1 Introduction to Software Defect Prediction . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Structure of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 3
v
2.5.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.4 Decision Tree classification . . . . . . . . . . . . . . . . . . . . 10
2.6 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.1 Regression via classification . . . . . . . . . . . . . . . . . . . 10
2.6.2 Static Code Attribute . . . . . . . . . . . . . . . . . . . . . . 11
2.6.3 ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.4 Embedded software defect prdiction . . . . . . . . . . . . . . . 11
2.6.5 Association rule classification . . . . . . . . . . . . . . . . . . 11
2.6.6 Defect-proneness Prediction framework . . . . . . . . . . . . . 12
3 Proposed Scheme 13
3.1 Overview Of the Framework . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Scheme Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Scheme Evaluation Algoritm . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Defect prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Difference between Our Framework and Others . . . . . . . . . . . . 18
3.6 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 Performance Measurement . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Result Discussion 22
4.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 ROC Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Comparision with other’s results . . . . . . . . . . . . . . . . . . . . . 28
5 Conclusion 32
5.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Scope for Further Research . . . . . . . . . . . . . . . . . . . . . . . . 33
vi
Bibliography 34
vii
List of Figures
4.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 ROC Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
viii
List of Tables
4.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Comparative Performance(ROC Area) of Software defect prediction. . 28
ix
Chapter 1
Introduction
1
Chapter 1 Introduction
1.2 Motivation
Different data mining methods have been proposed for defect analysis in the past,
but few of them manage to deal successfully with all of the above issues. Regression
models estimates are difficult to interpret and also provide the exact number of faults
which is too risky, especially in the beginning of a project when too little information
is available. On the other hand classication models that predict possible faultiness
can be specific, but not so much usefull to give clue about the actual number of
faults. Many researcher used many techniques with different dataset that predict
faultiness. But there are so many classification rule algorithms that can be effective
to predict faultiness. All these issues motivates to our research in these field of
software falult/defect prediction.
1.3 Objective
Keeping the research indications in view, it has been realized that there exists enough
scope to improve the software defect prediction. In this research the objectives are
confined to the followings: —
i. To utilize novel data set filtering mechanism for effective noise remove.
2
Chapter 1 Introduction
3
Chapter 2
The purpose of this chapter is to establish a theoretical background for the project.
The focus of this study will be on software defects and effort spent correcting software
defects. However, it is necessary to explore research areas which influence or touches
software defects. Poor software quality may be manifested through severe software
defects, or software maintenance may be costly due to many defects requiring
extensive effort to correct. Last, we explore relevant research methods for this study.
The following digital sources was consulted:ACM Digital Library, IEEE Xplore, and
Science Direct.
4
Chapter 2 Background & Literature Survey
second and third columns of Table 2.1 list several example data mining algorithms
and the SE tasks to which engineers apply them [1].
5
Chapter 2 Background & Literature Survey
main challenge is the testing phase and practitioners seek predictors that indicate
where the defects might exist before they start testing. This allows them to efficiently
allocate their scarce resources. Defect predictors are used to make an ordering of
modules to be inspected by veriffication and validation teams:
• In the case where there are insufficient resources to inspect all code (which is
a very common situation in industrial developments), defect predictors can be
used to increase the chances that the inspected code will have defects.
• In the case where all the code is to be inspected, but that inspection process will
take weeks to months to complete, defect predictors can be used to increase the
chances that defective modules will be inspected earlier. This is useful since it
gives the development team earlier notification of what modules require rework,
hence giving them more time to complete that rework prior to delivery.
6
Chapter 2 Background & Literature Survey
7
Chapter 2 Background & Literature Survey
Example:
The posterior probability can be calculated by first, constructing a frequency table
for each attribute against the target. Then, transforming the frequency tables
to likelihood tables and finally use the Naive Bayesian equation to calculate the
posterior probability for each class. The class with the highest posterior probability
is the outcome of prediction.
8
Chapter 2 Background & Literature Survey
f(t)= 1+e1 −t
9
Chapter 2 Background & Literature Survey
There are many algorithms developed using decision tree for classification with
some differences. Some of them like BFTree, C4.8/J48, J48Graft,and SimpleCart
are very popular.
10
Chapter 2 Background & Literature Survey
2.6.3 ANN
In 2007, Iker Gondra [6]used a machine learning methods for defect prediction. He
used Artificial neural network as a machine learner.
11
Chapter 2 Background & Literature Survey
12
Chapter 3
Proposed Scheme
13
Chapter 3 Proposed Scheme
for evaluating the performances of the learners. It is very important that the test
data are not used in any way to build the learners. This is a necessary condition
to assess the generalization ability of a learner that is built according to a learning
scheme and to further determine whether or not to apply the learning scheme or
select one best scheme from the given schemes.
At the defect prediction stage, according to the performance report of the first
stage, a learning scheme is selected and used to build a prediction model and predict
software defect. From Fig. 3.1, we observe that all of the historical data are used to
build the predictor here. This is very different from the first stage; it is very useful
for improving the generalization ability of the predictor. After the predictor is built,
it can be used to predict the defect-proneness of new software components.
MGF proposed [5] a baseline experiment and reported the performance of the
Naive Bayes data miner with log-filtering as well as attribute selection, which
performed the scheme evaluation but with in appropriate data. This is because
14
Chapter 3 Proposed Scheme
they used both the training (which can be viewed as historical data) and test (which
can be viewed as new data) data to rank attributes, while the labels of the new data
are unavailable when choosing attributes in practice.
15
Chapter 3 Proposed Scheme
1. A data preprocessor
• The training data are preprocessed, such as removing outliers, handling missing
values, and discretizing or transforming numeric attributes.
2. An attribute selector
• Here we have considered all the attributes pvovided by the NASA MDP Data
Set.
3. Learning Algorithms
– Logistic classification
– DecisionTable
– OneR
– JRip
– PART
– J48
– J48Graft
16
Chapter 3 Proposed Scheme
17
Chapter 3 Proposed Scheme
18
Chapter 3 Proposed Scheme
19
Chapter 3 Proposed Scheme
• Accuracy = T P +T N
T P +F P +T N +F N
= T rueP ositive+FTalseP
rueP ositive+T rueN egative
ositive+T rueN egative+F alseN egative
• Specificity = TN
F P +T N
Formal definitions for pd and pf are given in the formula. Obviously, higher
pds and lower pfs are desired. The point (pd=1, pf=0) is the ideal position
where we recognize all defective modules and never make mistakes.
MGF introduced a performance measure called balance, which is used to choose
the optimal (pd, pf) pairs. The definition is shown bellow from which we can
see that it is equivalent to the normalized euclidean distance from the desired
point (0, 1) to (pf,pd) in a ROC curve.
√
(1−pd)2 +(0−pf )2
• Balance = 1 − √
2
20
Chapter 3 Proposed Scheme
The Area Under ROC Curve (AUC) is often calculated to compare different ROC
curves. Higher AUC values indicate the classifier is, on average, more to the upper
left region of the graph. AUC represents the most informative and commonly used,
thus it is used as another performance measure in this paper.
21
Chapter 4
Result Discussion
22
Chapter 4 Result Discussion
4.1 Accuracy
From the accuracy table 4.1 we can see different algorithm giving diffrent accuracy
on different data set. But the average performane nearly same.
For Storage management software(KC1-3) LOG, J48G giving better Accuracy value.
For database software written in c programming language (MW1) only PART giving
better accuracy value.
The performance graph is given in the figure 4.3.
23
Chapter 4 Result Discussion
4.2 Sensitivity
From the accuracy table 4.2 we see that NB algorithm gives better performance in
maximum data set.
In case of DecisionTable gives the sensitivity zero(sometimes), that means it
considering all the class as a true negetive. It can not be cosider for defect prediction.
LOG, OneR, PART, J48, J48G algorithms giving average performance.
24
Chapter 4 Result Discussion
4.3 Specificity
From the specificity table we can see some of the algoritm are giving 100 percent
specificity, that can not be cosider as there respective sensitivity zero. These
algorithms can give wrong predictin.
So According to the sensitivity and specificty DecisionTable algorithm should not
cosider for software defect prediction as they giving high 100% specificity bt 0%
sensitivity.
25
Chapter 4 Result Discussion
4.4 Balance
looking to the Accuracy, Sensitivity and Specificty performance table we cosider the
NB, LOG, JRip, OneR, PART, J48, J48G, as there performance are average.
From the graph figure 4.1 we see that, in maximum of cases the OneR algorithm
giving lowest balance value than others. So, no need to use for defect prediction.
• NaiveBayesSimple
• Logistic
• JRip
• PART
26
Chapter 4 Result Discussion
27
Chapter 4 Result Discussion
Methods CM1 JM1 KC1 KC3 MC1 MC2 MW1 PC1 PC2 PC3 PC4 PC5
NB 0.685 0.681 0.801 0.745 0.861 0.745 0.666 0.736 0.846 0.793 0.84 0.804
Log 0.668 0.709 0.808 0.604 0.893 0.686 0.592 0.821 0.7 0.802 0.911 0.958
JRip 0.572 0.562 0.633 0.527 0.58 0.5 0.561 0.561 0.499 0.589 0.735 0.755
PART 0.492 0.713 0.709 0.612 0.773 0.639 0.611 0.566 0.481 0.728 0.821 0.942
J48 0.537 0.67 0.698 0.572 0.819 0.259 0.5 0.646 0.39 0.727 0.784 0.775
j48G 0.543 0.666 0.698 0.587 0.819 0.274 0.5 0.651 0.39 0.738 0.778 0.775
• In 2007 MGF used considers only 10 data set, whereas in our research we used
12 data set with more modules in every data set. And in our result the balance
values are also greater than there results.
• In others works different machine learning algorithms are used. In our research
28
Chapter 4 Result Discussion
29
Chapter 4 Result Discussion
30
Chapter 4 Result Discussion
31
Chapter 5
Conclusion
32
Chapter 5 Conclusion
J48G)
33
Bibliography
[1] Tao Xie, Suresh Thummalapenta, David Lo, and Chao Liu. Data mining for software
engineering. Computer, 42(8):55–62, 2009.
[2] Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu. A general software
defect-proneness prediction framework. Software Engineering, IEEE Transactions on,
37(3):356–370, 2011.
[3] Ma Baojun, Karel Dejaeger, Jan Vanthienen, and Bart Baesens. Software defect prediction
based on association rule classification. Available at SSRN 1785381, 2011.
[4] S Bibi, G Tsoumakas, I Stamelos, and I Vlahavas. Software defect prediction using regression
via classification. In IEEE International Conference on, pages 330–336, 2006.
[5] Tim Menzies, Jeremy Greenwald, and Art Frank. Data mining static code attributes to learn
defect predictors. Software Engineering, IEEE Transactions on, 33(1):2–13, 2007.
[6] Iker Gondra. Applying machine learning to software fault-proneness prediction. Journal of
Systems and Software, 81(2):186–195, 2008.
[7] Ataç Deniz Oral and Ayşe Başar Bener. Defect prediction for embedded software. In Computer
and information sciences, 2007. iscis 2007. 22nd international symposium on, pages 1–6.
IEEE, 2007.
[8] Yuan Chen, Xiang-heng Shen, Peng Du, and Bing Ge. Research on software defect prediction
based on data mining. In Computer and Automation Engineering (ICCAE), 2010 The 2nd
International Conference on, volume 1, pages 563–567. IEEE, 2010.
[9] Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. Data quality: Some
comments on the nasa software defect data sets. 2013.
[10] Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. Benchmarking
classification models for software defect prediction: A proposed framework and novel findings.
Software Engineering, IEEE Transactions on, 34(4):485–496, 2008.
34
Bibliography
[11] Yue Jiang, Bojan Cukic, and Tim Menzies. Fault prediction using early lifecycle data. In
Software Reliability, 2007. ISSRE’07. The 18th IEEE International Symposium on, pages
237–246. IEEE, 2007.
[12] Yue Jiang, Bojan Cuki, Tim Menzies, and Nick Bartlow. Comparing design and code metrics
for software quality prediction. In Proceedings of the 4th international workshop on Predictor
models in software engineering, pages 11–18. ACM, 2008.
[13] Hongyu Zhang, Xiuzhen Zhang, and Ming Gu. Predicting defective software components from
code complexity measures. In Dependable Computing, 2007. PRDC 2007. 13th Pacific Rim
International Symposium on, pages 93–96. IEEE, 2007.
[14] Gustavo EAPA Batista, Ronaldo C Prati, and Maria Carolina Monard. A study of the
behavior of several methods for balancing machine learning training data. ACM SIGKDD
Explorations Newsletter, 6(1):20–29, 2004.
[15] Charles E Metz, Benjamin A Herman, and Jong-Her Shen. Maximum likelihood estimation
of receiver operating characteristic (roc) curves from continuously-distributed data. Statistics
in medicine, 17(9):1033–1053, 1998.
[16] Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. Software defect
association mining and defect correction effort prediction. Software Engineering, IEEE
Transactions on, 32(2):69–82, 2006.
[17] Norman E. Fenton and Martin Neil. A critique of software defect prediction models. Software
Engineering, IEEE Transactions on, 25(5):675–689, 1999.
[18] Naeem Seliya and Taghi M Khoshgoftaar. Software quality estimation with limited fault data:
a semi-supervised learning perspective. Software Quality Journal, 15(3):327–344, 2007.
[19] Frank Padberg, Thomas Ragg, and Ralf Schoknecht. Using machine learning for estimating the
defect content after an inspection. Software Engineering, IEEE Transactions on, 30(1):17–28,
2004.
[20] Venkata UB Challagulla, Farokh B Bastani, I-Ling Yen, and Raymond A Paul. Empirical
assessment of machine learning based software defect prediction techniques. In Object-Oriented
Real-Time Dependable Systems, 2005. WORDS 2005. 10th IEEE International Workshop on,
pages 263–270. IEEE, 2005.
[21] Norman Fenton, Paul Krause, and Martin Neil. A probabilistic model for software defect
prediction. IEEE Trans Software Eng, 2001.
35
Bibliography
[22] Raimund Moser, Witold Pedrycz, and Giancarlo Succi. A comparative analysis of the efficiency
of change metrics and static code attributes for defect prediction. In Software Engineering,
2008. ICSE’08. ACM/IEEE 30th International Conference on, pages 181–190. IEEE, 2008.
[23] Ganesh J Pai and Joanne Bechta Dugan. Empirical analysis of software fault content
and fault proneness using bayesian methods. Software Engineering, IEEE Transactions on,
33(10):675–686, 2007.
[24] Giovanni Denaro, Sandro Morasca, and Mauro Pezzè. Deriving models of software
fault-proneness. In Proceedings of the 14th international conference on Software engineering
and knowledge engineering, pages 361–368. ACM, 2002.
[25] Ling-Feng Zhang and Zhao-Wei Shang. Classifying feature description for software defect
prediction. In Wavelet Analysis and Pattern Recognition (ICWAPR), 2011 International
Conference on, pages 138–143. IEEE, 2011.
[26] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H
Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter,
11(1):10–18, 2009.
[27] DMW Powers. Evaluation: From precision, recall and f-measure to roc., informedness,
markedness & correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.
[28] Mark H Zweig and Gregory Campbell. Receiver-operating characteristic (roc) plots: a
fundamental evaluation tool in clinical medicine. Clinical chemistry, 39(4):561–577, 1993.
36