Redd Ivar I 2019
Redd Ivar I 2019
*3*
¥*&&&
%0**3*
then, several attempts have been made with both capability of Bayesian networks, one of the simplest
procedural and OO metrics, to rapidly advance the effective classifiers to predict the software quality. This
research in software metrics. Li & Henry (1996) in their study identified the most effective metrics for quality
work showed how various OO metrics can be used to prediction and performed an inspection of them on several
measure the amount of change in the code. The amount of open source systems [22]. In addition to SVM and
change made in the code is a direct reflection of the Bayesian networks, researchers have used ensemble
maintenance effort and it can be predicted with software learning and other alternative machine learning classifiers
metrics. There are other researchers [6, 19] in this area for identifying defects [23, 24]. However, it is noteworthy
that used other complexity measures to show the to mention that majority of the literature identified defect
prediction of maintainability. This forms the basis of our identification as quality prediction. Although defects
work to predict the maintainability of a system from its reflect the reliability of the system, maintainability is
metrics. Another complimentary study by Basili et.al., another important factor to be considered while assessing
(1996) conducted at University of Maryland investigated the quality of a software system. To the best of our
the capability of OO metrics as predictors of fault-prone knowledge there are very few studies that considered both
classes. The main finding from this study is that the reliability and maintainability aspects to predict the
Chidamber and Kemerer (CK) metrics predicted the software quality. For example, Quah & Thwin presented
defects efficiently in the early stages of the development the application of neural networks in estimating the
life cycle when compared to other traditional code quality of a software using OO metrics [26]. This paper
metrics. Another case study by Yu & Systa (2002) on the described software quality as a measure of its reliability
client side of a large industrial network service and maintainability. The reliability is measured in terms of
management system also showed the usefulness of OO defects and maintainability is measured in terms of the
metrics in predicting the fault proneness of the system. changes made to the code [26].
There are several more empirical evidences [15, 16, 17]
showing that software metrics are effective predictors of 3. Software Metrics
fault prone systems. Machine learning has received a great A software metric is a measure of various characteristics
deal of attention in the recent years with its applications of a software system. Metrics can be used to determine the
extending to various fields that has a considerable amount quality and performance of the software. In this paper, our
of data that can be used for analysis. Software engineering research objective is to analyse how different machine
is also one of the actively researched fields, where many learning techniques perform in predicting the software
software engineering (SE) activities can be formulated as quality [12,13,14,18]. There have been several papers
a learning problem and approached with these machine published on how software metrics can be used to estimate
learners [25]. Several activities of software development, the maintainability and the reliability of the system
ranging from requirements analysis to software testing [3,6,7,8]. An empirical evaluation of several major OO
have been actively researched with machine learners. metrics was performed by Gyimothy et al., (2005) to
Software maintenance is another active area of research in investigate the effectiveness in fault detection. This study
SE where researchers proposed machine learning has shown that the coupling between objects (CBO)
techniques for detecting code smells and predicting errors measure to be the best metric for predicting fault –
in the program. According to the literature, there exist a proneness, followed by lines of code (LOC) and lack of
direct relationship between the metrics and quality of the cohesion of methods (LCOM) [16]. Another empirical
system and several predictive models have been proposed investigation conducted by Yu et al., (2002) presented the
for the classification of systems into two categories: fault appropriate metrics with an industrial case study. This
prone and non-fault prone. study also described the significance of CBO, LOC and
LCOM in fault detection. Further, the study identified
Zimmermann et al., (2007) described one of the initial number of children (NOC) as a significant factor in
works on the usage of classifiers for predicting the defects determining the defects [15]. In addition to the OO
[20]. Eclipse bug dataset was used for predicting the metrics, Quah et.al, introduced additional metrics such as
faulty and non-faulty classes in [20]. Additionally, they inheritance coupling (IC), weighted methods per class
annotated the data with common complexity metrics and (WMC) and coupling between methods (CBM) [26]. Li &
the prediction results were concluded as being far from Henry identified metrics such as depth in inheritance tree
perfect [20]. Elish et al., (2008) evaluated the capability (DIT), weighted method complexity (WMC), number of
of support vector machines (SVM) in predicting the methods (NOM), response for class (RFC), message
defect-prone software modules and compared the passing coupling (MPC), data abstraction coupling (DAC)
performance of SVM against eight machine learning and few others to effectively estimate the maintenance
techniques in the context of NASA datasets [21]. Another
research by Okutan & Yldz (2014) investigated the
effort [3]. The details of the various metrics discussed Coupling between Methods (CBM) – CBM provides the
thus far are listed below: total number of methods to which the inherited methods
Weighted Methods Per Class (WMC) – WMC is an OO are coupled. It gives the functional dependencies between
metric to measure the complexity in a class. It is the the inherited methods and the new/redefined methods
measure of summation of complexities of all local [26].
methods of a class. The complexity of the method is Message Passing Coupling (MPC) – MPC is a measure of
proportional to the number of control flows it has and the number of messages an object of a class sends to other
greater the value of WMC, the harder it is to maintain the objects [3]. MPC is an indication of how dependent a
class [11, 26]. local method is on other methods of other classes [26].
Lines of Code (LOC) – LOC is used to measure the size of Data Abstraction Coupling (DAC) – This metric measures
the program with the number of non-commented, lines of the complexity caused by the abstract data types (ADTs)
source code. [3]. This coupling may cause a violation of encapsulation
Coupling Between Objects (CBO) – CBO is the measure if direct access to private attributes of the ADT is granted
of classes in which a given class is coupled to another if [3]. DAC is the number of ADT’s defined in the class.
methods of one class use methods or instance variables Number of Local Methods (NOM) – NOM is an interface
defined by other class [11]. Excessive coupling reduces metric and it is a measure of number of local methods
the chances of class reuse and makes the system more within a class. The complexity of a class’s interface
complicated. The theory behind this metric is that higher depends on the number of methods a class contains [3].
the number of couplings in a system, the more sensitive it SIZE1 – is the measure of number of executables in a
is to changes, and thus making the task of maintenance class (calculated by the number of semicolons).
difficult [11]. SIZE2 – is the measure of number of properties in a class.
Depth of Inheritance Tree (DIT) – DIT is the measure of It is calculated as the sum of the number of attributes and
depth of inheritance of any class. Considering classes to the number of local methods in the class.
be a directed acyclic graph (DAG), DIT is the longest
path of the node from the root [11]. The complexity of a 4. Machine Learning for Predicting Software
class is determined by the number of ancestors it has Quality
inherited from, as more number of ancestors means more
4.1 Decision Trees
inherited methods, and thus making the behaviour
Decision tree is a tree constructed in a top down recursive
prediction more complex [11].
order in which each node represents all possible decision
Number of Children (NOC) – NOC is the number of direct
with edges indicating the possible path from one node to
sub classes for each class. More number of children makes
another. Classification of an instance is effectively
it challenging to modify the parent class and increases the
following the path from the root of the tree to one of its
likelihood of improper abstraction of the parent class [11].
leaves. The attributes used for decision making are
The number of reuse instances of a class has a direct
selectively picked such that the information gain from the
impact on the magnitude of ripple effects and might
attribute is high. In our study, we use C4.5 which is a
require more testing [26].
classification algorithm producing a decision tree based
Lack of Cohesion in Methods (LCOM) – LCOM is the
on information theory and it is implemented as J48 in
difference between the number of pairs of methods
Weka [37]. One of the other methods used in our
without any shared instance variables and the number of
experiment is Random Forest (RF). RF is an ensemble
pairs of methods with a shared instance variable. In case,
learning technique where a forest of multiple decision
none of the methods share an instance variable and the
trees is constructed [27]. RF is known to be effective for
difference is negative, the LCOM is set to zero [11, 26]. A
predictions and it depends on the strength of the
low rate of cohesion indicates complexity as it illustrates
individual predictors [27].
encapsulation of unrelated methods, thereby increasing the
likeliness of errors [10].
Response for Class (RFC) – RFC is the set of methods of 4.2 Bayesian Classification
a class that can be executed in response to a message from Bayesian classification is a probabilistic classifier that
the object of a class [11]. The larger the number of labels a class based on its probability. A simple Bayesian
methods invoked in response to a message from the object classifier is the naïve Bayesian classifier based on the
of class, the larger the complexity. Bayes theorem [33,34]. According to this theorem the
Inheritance Coupling (IC) – Inheritance coupling shows classifier makes a hypothesis that the instance belongs to a
the number of parent class dependencies a given class has. class and the training set increases or decreases the
That is, a given class is coupled to its parent class if one probability that the hypothesis is correct. In our study, we
of its methods is functionally dependent on the parent used the Weka’s implementation of naïve Bayes classifier
class’s methods because of inheritance [26]. [32, 37]. Bayesian belief network forms a directed acyclic
graph indicating the conditional dependences between Optimization (SMO) algorithm available in Weka for
various attributes. The connection between two nodes experimentation [37].
implies the conditional independencies between the nodes.
Each node takes multiple values as the inputs depending 4.6. Artificial Neural Networks (ANN)
on its parent variables and determines the probability of Artificial Neural Networks is a learning algorithm inspired
the variable’s occurrence. by neural networks of the human brain. ANN comprises of
a set of connected nodes or artificial neurons that forms
4.3. Rule-based Classification the input, output and various hidden layers [28]. The
Rule based classifiers use a set of IF-THEN rules to various nodes and the connections have a weight that is
classify the instances. Rule based classifiers iteratively used for computation of the output. During the learning
generate a set of rules from the training set until all the phase the ANN may not produce the desired output, but it
data in the training set is covered by some rules and there learns from the output via back propagation and gradually
is no more data in the training set left to cover. The IF part adjusts this weight with the goal of minimizing its error
forms the rule condition and the THEN part the rate and making correct predictions about the class [28].
consequence. Many of the rule-based classifiers extracts Depending on the complexity of the problem at hand, one
the rules from decision trees. In this study we have used or more hidden layers can be added to increase the
the PART rule-based classifier available in Weka [37]. accuracy of ANN. This is also referred to as the
This algorithm produces an ordered set of rules called connectionist learning as it requires a lot of training time
decision list. During classification, a data instance is but is known to be very successful in classification.
assigned to the class of the first matching rule from the
decision list [29]. PART extracts its rules from C4.5 5. Experiments and Results
decision tree which we mentioned in section 4.1.
We manually evaluated to identify any missing, unknown
values in the dataset. The goal here is to evaluate the
classifier’s ability to identify if a given instance as
defective or not, we labelled all instances as defective or
not defective and used that as the classifying class. Not all
the data sets used were balanced in terms of the number of
defect instances. For example, Poi had 63% of defective
data while Jedit had only 2%. This was chosen
specifically to train the classifiers with variedly balanced
set of data. However, owing to such imbalance in the Fig. 2: The graphical user interface (GUI) of the Neural Network formed
number of defects, the entire data set was considered from software metrics for defect prediction
during the training phase and was evaluated with tenfold For Bayesian networks we used Simple Estimator (in
cross validation technique. We also removed the name of Weka) [37] for the construction of Bayesian network.
the project from the data set as that could be a factor for Fig1 shows the Bayesian network we obtained for the
the classifiers to determine the class label since some of defect dataset and can be seen that the metrics WMC,
the projects had very few defects. The fact that the DIT, NOC are being the primary estimators followed by
proportion of defective and non-defective classes are not LCOM, RFC, IC, CBO. For testing the rule-based
equal was considered during the evaluation of the classifiers we used the PART classifier from Weka [37].
classifiers. We used the Ibk which is an implementation of nearest
5.1.2. Experimentation neighbours in Weka for testing the nearest neighbour class
For experimenting with decisions trees, we chose J48 of classifiers [37]. If the ‘k’ value is too small, the
which is Weka’s implementation of C4.5 and Random decision is susceptible to noise and if ‘k’ value is chosen
Forest (RF) ensemble technique which is also available in to be too large then a larger area of instance space must be
Weka [37]. We did not use any additional filters for these covered and leads to wrong classifications. For the
algorithms as we have manually identified the defective parameter ‘k’, the general rule of thumb is that the ideal
instances based on the number of defects produced by value of ‘k’ is equal to the square root of the number of
each instance. Therefore, J48 classifier has nominal instances. Having a total of about 3189 instances, we
classes for consideration and not required any additional started experimenting with 58,57,59 and 60 as values of
filters. The pruned tree constructed by J48 showed that ‘k’. SMO algorithm is used for training SVM [37]. The
the lack of Cohesion in Methods (LCOM) was the root training data was normalized to speed up the time taken to
implying it was one of the major deciding attributes build the model. Since we were not sure if the data set was
followed by RFC, DIT and CBM. In case of RF linearly separable we specified the exponent value to two
technique, we specified 150 iterations for the model in the kernel option. This was to ensure that weka uses
construction. The construction of trees in RF also showed support vectors for classification. In addition to these, we
that LCOM, WMC to be the primary contributors in used logistic regression model as the calibrator. The SMO
decision making. For Bayesian classification, we tested model-built shows that about 1523 support vectors have
with both naïve Bayes and Bayesian belief’s network. been used to train the SVM. ANN is implemented as
Naïve Bayes is one of the simplest models and it makes a multilayer perceptron in Weka [31, 37] as shown in the
naïve assumption that all attributes of the training set is Figure 2.
independent of each other, which might not be true in
many cases, yet performs well. 5.2. Prediction of Maintainability
Like defect prediction, for predicting the maintenance
effort required, we experimented with all machine
learning techniques described in section 4. The metrics
used here are DIT, NOC, MPC, RFC, LCOM, DAC,
WMC, MOM, SIZE1 and SIZE2 (c.f. section 3). The
details of the data collection process and the experimental
set up for maintenance prediction is described below.
The change could be an addition or a deletion. In case of complexities directly correlate with the overall complexity
change of content, it is counted as a deletion followed by of the system. For experimentation with nearest
an addition [3]. Both the systems have been developed neighbours we used the ibk algorithm as mentioned in
from an object-oriented language – Classic-Ada. Table 2 section 5.1.2 and iteratively experimented with different
provides more details of the datasets. values of ‘k’ in order to identify the ideal ‘k’ value. Since
TABLE 2 DATASETS USED FOR MAINTENANCE ANALYSIS we used the binary SMO and the dataset contains three
labels, the SVM was trained with the typical one versus all
Project No. of Total No.of Change Percent (w.r.t strategy [35]. ANN is implemented as multilayer
Instances Changes the total executables)
UIMS 39 1826 43% perceptron in Weka [37] as shown in Figure 3.
QUES 71 4560 23%
Accuracy – Accuracy relates the number of correct TABLE 4: ACCURACY AND ERROR RATE
classifications. Accuracy is calculated as
Accuracy = TP+TN Technique Accuracy Mean Root mean
non-defective instances were fairly very accurate with PART 68% 0.239 0.426
most classifiers predicting them with 90% accuracy. KNN 66% 0.258 0.400
However, most of the classifiers were below par in the SVM 66% 0.317 0.411
classification of defective instances. This justifies our ANN 68% 0.281 0.409
contributors in determining the quality. We have primarily prediction,” Software Engineering, IEEE Transactions on, vol.
considered AUC as the evaluating parameter as it 31, no. 10, pp. 897 – 910, oct. 2005.
[17] Y. Zhou and H. Leung, “Empirical analysis of object-oriented
considers the variability in class distribution. Our design metrics for predicting high and low severity faults,”
evaluation results show that Random Forest is a reliable Software Engineering, IEEE Transactions on, vol. 32, no. 10, pp.
classifier for both defect and maintainability prediction 771 –789, oct. 2006.
with a good AUC value. However, there were other [18] R. Shatnawi and W. Li, “The effectiveness of software metrics in
identifying error-prone classes in post-release software evolution
classifiers with overall good performance such as the process,” Journal of Systems and Software, vol. 81, no. 11, pp.
PART, J48 and KNN. Overall, decision tree-based 1868–1882, 2008.
prediction techniques (i.e., Random Forest) seemed to [19] R.K. Bandi, V.K. Vaishnavi, D.E. Turk, "Predicting Maintenance
perform well in quality prediction. Bayesian network had Performance Using Object-Oriented Design Complexity
Metrics", IEEE Trans. Software Eng., vol. 29, no. 1, pp. 77-87,
very good results in defect prediction, however, had very Jan. 2003.
low accuracy and TP rate for high maintenance class [20] T. Zimmermann, R. Premraj, and A. Zeller, “Predicting defects
prediction. Our work sheds light on predicting software for eclipse,” in Predictor Models in Software Engineering, 2007.
quality using machine learning techniques. PROMISE’07: ICSE Workshops 2007. International Workshop
on, may 2007, p. 9
[21] K. O. Elish and M. O. Elish, “Predicting defect-prone software
References modules using support vector machines,” J. Syst. Softw., vol. 81,
no. 5, pp. 649–660, 2008.
[1] The economic impacts of inadequate infrastructure for software [22] A. Okutan and O. Yldz, “Software defect prediction using
testing. http://www.nist.gov/director/planning/upload/report02- bayesian networks,” Empirical Software Engineering, pp. 1–28,
3.pdf. 2012.
[2] J. D. Musa, "A theory of software reliability and its application," [23] I. H. Laradji, M. Alshayeb, and L. Ghouti. Software defect
IEEE Trans. Software Eng., vol. SE-1, pp. 312-327, 1971. prediction using ensemble learning on selected features.
[3] W. Li and S. Henry (1993). “Object-oriented metrics that predict Information and Software Technology, 58:388– 402, 2015.
maintainability”. Journal of Systems and Software. 23(2):111– [24] I. Gondra, “Applying machine learning to software fault
122. proneness prediction,” Journal of Systems and Software, vol. 81,
[4] F.P. Brooks, The Mythical Man-Month: Essays on Software no. 2, pp. 186–195, 2008.
Engineering, Addison-Wesley, Reading, Mass., 1998. [25] Zhang, D. and J.J.P. Tsai, Machine Learning and Software
[5] R.B. Grady, "Successfully Applying Software Metrics," Engineering. Software Quality Journal, 2003. 11(2): p. 87-119.
Computer, vol. 27, no. 9, pp. 18-25, September. 1994. [26] M.M.T. Thwin, T.-S. Quah, "Application of Neural Networks for
[6] D. M. Coleman, D. Ash, B. Lowther, and P. W. Oman, “Using Software Quality Prediction Using Object-Oriented Metrics",
metrics to evaluate software system maintainability,” IEEE Proc. IEEE Int'l Conf. Software Maintenance (ICSM), 2003.
Computer, vol. 27, no. 8, pp. 44–49, August 1994. [27] L. Breiman. Random forests. Mach. Learning, 45(1):5–32, 2001.
[7] T.M. Khoshgoftaar and J.C. Munson, “Predicting Software [28] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural
Development Errors Using Complexity Metrics,” IEEE J Selected networks: A tutorial,” IEEE Comput., pp. 31–44, Mar. 1996.
Areas in Comm., vol. 8, no. 2, pp. 253-261, 1990. [29] B.R. Gaines and P. Compton. Induction of ripple-down rules
[8] L. Rosenberg, T. Hammer, and J. Shaw, "Software Metrics and applied to modeling large databases.
Reliability," in 9th International Symposium on Software [30] G. Boetticher, T. Menzies, and T. J. Ostrand, (2007) Promise
Reliability Engineering, 1998. repository of empirical software engineering data. [Online].
[9] G. Stark, R. C. Durst, C. W. Vowell, "Using Metrics in Available: http:// promisedata.org/repository.
Management Decision Making", IEEE Computer, vol. 27, no. 9, [31] G. Holmes, A. Donkin, and I. Witten, “Weka: A machine learning
pp. 42-48, September 1994. workbench,” in Proc. 2nd Aust. New Zealand Conf. Intell. Inf.
[10] V. Basili, L. Briand, and W.L. Melo, “A Validation of Object Syst., 1994, pp. 1269–1277
Oriented Design Metrics as Quality Indicators,” IEEE Trans. [32] P. Domingos and M. Pazzani. On the optimality of the simple
Software Eng., 1996. Bayesian classier under zero-one loss. Machine Learning,
[11] S. R. Chidamber and C. F. Kemerer, “A Metrics Suite for Object 29:103{130, 1997.
Oriented Design”, IEEE Transactions on Software Engineering, [33] N.Friedman, D.Geiger, and M.Goldszmidt. Bayesian network
20(6), pp. 476-493, 1994. classifiers. Machine Learning, 29:131{163, 1997.
[12] M. Lorenz and J. Kidd. Object-Oriented Software Metrics. [34] A.McCallum and K.Nigam. A comparison of event models for
Prentice-Hall Object-Oriented Series, Englewood Cliffs, NY, naive bayes text classification. In AAAI-98 Workshop on
1994. Learning for Text Categorization, 1998.
[13] F. Brito e Abreu, Melo, W., “Evaluating the Impact of Object- [35] T.Pornpon, L.Preechaveerakul, and W. Wettayaprasit. "A novel
Oriented Design on Software Quality”, Proceedings of Third Voting Algorithm of multi-class SVM for web page
International Software Metrics Symposium, pp. 90-99, 1996. classification." Computer Science and Information Technology,
[14] S. Jamali, Object Oriented Metrics A Survey Approach, Tehran 2009. ICCSIT 2009. 2nd IEEE International Conference on.
Iran: Sharif University of Technology, January 2006. IEEE, 2009.
[15] P. Yu, T. Systa¨, and H. Mu¨ller, “Predicting Fault-Proneness [36] A. P. Bradley, “The use of the area under the ROC curve in the
Using OO Metrics: An Industrial Case Study,” Proc. Sixth evaluation of machine learning algorithms,” Pattern Recog., vol.
European Conf. Software Maintenance and Reeng. (CSMR 30, no. 7, pp. 1145– 1159, 1997.
2002), pp. 99-107, Mar. 2002. [37] M. Hal et al., “The WEKA data mining software: An update,”
[16] T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of SIGKDD Explor., vol. 11, no. 1, pp. 10-18, 2009.
object-oriented metrics on open source software for fault T. M. Cover and P. E. Hart, “Nearest neighbour pattern
classification,” IEEE Trans. Inf. Theory, vol. IT-13, no. 1, pp.
21-27, Jan. 1967.