0% found this document useful (0 votes)
64 views

A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering

In the expansive domain of software engineering, the persistent challenge of fault prediction has garnered scholarly interest in machine learning methodologies, aiming to refine decision-making and enhance software quality. This study pioneers advanced fault prediction models, intertwining product and process metrics through machine learning classifiers and ensemble design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering

In the expansive domain of software engineering, the persistent challenge of fault prediction has garnered scholarly interest in machine learning methodologies, aiming to refine decision-making and enhance software quality. This study pioneers advanced fault prediction models, intertwining product and process metrics through machine learning classifiers and ensemble design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

A Comprehensive Analysis of Ensemble-based Fault


Prediction Models Using Product, Process, and
Object-Oriented Metrics in Software Engineering
Atul Pandey1 , Srujana Maddula2 , Gaddam Prathik Kumar3 , Sarthak Kumar Shailendra4 , Karan Mudaliar5
1
Btech Graduate, (Electrical & Electronics Engineering), Vellore Institute of Technology, Vellore, Tamil Nadu, India
2
Data Scientist, Target, Benguluru, Karnataka, India
3
Btech Graduate, (Computer Science & Engineering), Sharda University, Uttar Pradesh, India
4
Btech Graduate, Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
5
Btech Graduate, (Computer Science & Engineering), Vellore Institute of Technology, Vellore, Tamil Nadu, India

Abstract:- In the expansive domain of software challenge from two perspectives. Firstly, novel
engineering, the persistent challenge of fault prediction has methodologies or combinations of existing methods have
garnered scholarly interest in machine learning been introduced by researchers to enhance fault prediction
methodologies, aiming to refine decision-making and performance. Secondly, the exploration of new parameters to
enhance software quality. This study pioneers advanced identify the most influential metrics for fault prediction has
fault prediction models, intertwining product and process been undertaken. Despite numerous approaches proposed in
metrics through machine learning classifiers and ensemble the literature, the classification of software modules as faulty
design. The methodological framework involves metric or non-faulty remains a largely unresolved issue [2]. To
identification, experimentation with machine learning address this challenge, scholars have increasingly turned to
classifiers, and evaluation, considering cost dynamics. sophisticated techniques, including machine learning, deep
Empirically, 42 diverse projects from PROMISE, BUG, learning, and unsupervised methods, indicating a shift
and JIRA repositories are examined, revealing advanced towards novel and more compelling directions in fault
models with ensemble methods manifesting an accuracy of prediction [3]. Machine learning algorithms have witnessed a
(91.7%), showcasing heightened predictive capabilities and surge in popularity over the last decade and continue to be
nuanced cost sensitivity. Non-parametric tests affirm one of the preferred methods for defect prediction [4]. As
statistical significance, portraying innovation beyond noted by Lessmann et al. [5], "There is a need to develop
conventional paradigms. Conclusively, these advanced more reliable research procedures before having confidence
models navigate inter-project fault prediction with finesse, in the conclusion of comparative studies of software
signifying a convergence of novelty and performance. prediction models."
Simultaneously, anticipating fault proneness in software
components is a pivotal focus in software testing. Software In this study, the aim is to evaluate the performance of
coupling and complexity metrics are critical for evaluating various classifier models without bias towards any specific
software quality. Object-oriented metrics, including classifier. Additionally, the reported efficacy of ensemble
inheritance, polymorphism, and encapsulation, influence techniques by previous researchers [6] for enhancing fault
software quality and offer avenues for estimating fault prediction accuracy is recognized. Furthermore, the
proneness. This study contributes a comprehensive investigation into the diversity of classifiers within ensemble
taxonomy to the discourse, offering a holistic perspective models has been identified as crucial for improving the
on the multifaceted landscape of object-oriented metrics in effectiveness of ensemble designs [7]. This motivation
fault prediction within the broader context of advancing propels the exploration into the design of ensembles to
software quality. enhance the predictive capability of classifiers. In the context
of the second viewpoint, a substantial body of research has
Keywords:- Software Fault Prediction; Object-Oriented been dedicated to investigating the utilization of software
Testing; Object-Oriented Coupling; Machine Learning, metrics derived from code to discern the fault proneness of
Ensemble Design, Product, and Process Metrics. software components. While fault estimation models
predominantly rely on product metrics in the existing
I. INTRODUCTION literature [8], those constructed through a synergy of product
and process metrics remain relatively scarce [9]. Although
Software fault prediction has been a focal point in the some scholars have underscored the importance of
software engineering domain for over three decades, garnering integrating both product and process metrics in their studies,
escalating attention from researchers [1]. The term "fault" the broader incorporation of such models has been limited.
denotes an erroneous step, process, or data definition in a Madeyski and Jureczko [10], in their research, ascertained
computer program, commonly referred to as a "BUG." that process metrics contribute valuable information to fault
Scholars have approached the software fault prediction (SFP) proneness determination. The utilization of process metrics in

IJISRT23DEC1931 www.ijisrt.com 2019


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
fault ascertainment demonstrates the potential for superior crafts advanced fault prediction models employing a
outcomes when compared to reliance solely on product systematic incorporation of process metrics, one at a time.
metrics. The imperative for further investigations to
substantiate and refine these advanced models was underscored Motivated by the imperative to advance fault prediction
by their findings. models, this study establishes a comprehensive research
framework characterized by meticulous pre-processing and
Radjenovic et al. [11], in a Systematic Literature Review feature extraction activities on datasets to identify pertinent
(SLR), emphasized the necessity of identifying methodologies metrics. Subsequently, diverse machine learning classifiers
to measure and evaluate process-related information for fault such as Naive Bayes (NB), Decision Tree (DT), Multilayer
proneness. Similarly, Wan et al. [12], in their study on Perceptron (MLP), Random Tree (RT), and Support Vector
perceptions, expectations, and challenges in defect prediction, Machine (SVM) are employed for training and testing
concluded that software practitioners exhibit a preference for experiments to evaluate the advanced models. The
rational, interpretable, and actionable metrics in defect assessment involves a set of performance metrics
prediction. Additionally, the literature indicates not only the encompassing accuracy, root mean square error (RMSE), F-
comparative superiority of process metrics over product score, and the area under the curve AUC(ROC).
metrics but also the proposition of alternative features based on
developer-related factors, code smells, etc. [13]. This Object-Oriented (OO) Metrics have been the subject of
discernment necessitates further studies to meticulously numerous proposals by researchers, resulting in metric suites
examine the intricate association between metrics and fault designed for diverse perspectives within the context of
proneness, thereby furnishing meaningful insights for informed Object-Oriented software. These suites find application in
decision-making. Consequently, the present study embarks on various contexts, serving as quality indicators, complexity
the development of advanced software fault prediction models measures, fault proneness predictors, and reliability
that leverage a combination of metrics. Following the measures. Table 1 below provides a comprehensive overview
identification of a judicious set of product metrics, the research of the most frequently employed OO metrics documented in
the literature.

Table 1. Object-oriented metrics


S.no. Chidamber & Kemerer metrics (CK) [25] Li and Henry Metrics [26] MOOD Metrics [27]
1. Weighted Methods Per Class (WMC) N/A Attribute Inheritance Factor (AIF)
2. Depth of Inheritance tree (DIT) Number of Methods NOM Method Hiding Factor (MHF)
3. Number of Children (NOC) Message Passing Coupling (MPC) Method Inheritance Factor (MIF)
4. Coupling Between Objects (CBO) Data Abstracting Coupling (DAC) Attribute Hiding Factor (AHF)

Multiple classifiers are harmoniously combined to al. [11] delineate that within the literature on fault prediction
enhance overall performance, with a specific focus on studies, process metrics constitute 24%, source code
improving fault-detection capabilities. Additionally, an contributes 27%, and object-oriented metrics constitute 49%
examination of the cost sensitivity of the proposed ensemble- of the total. Prospective studies are urged to incorporate
based classifier is undertaken. The outcomes of this analysis methodologies for measuring and evaluating process-related
serve to validate the predictive efficacy of the proposed information for fault proneness in conjunction with product
classifiers for the development of advanced fault prediction metrics.
models.
In an empirical study conducted by Madeyski and
The noteworthy contributions of this work can be Jureczko [9], utilizing both industrial and open-source
delineated as follows: software datasets, the significance of process metrics in
 Establishment of a learning scheme comprising both base enhancing results was notably observed. Emphasizing the
and ensemble learning classifiers. need for replication using machine learning approaches, they
 Construction and scrutiny of the predictive capability of underscore the uncertainty of features performing optimally
advanced fault prediction models. in one method being equally effective in alternative
 Evaluation of the cost sensitivity of the proposed ensemble- approaches. Therefore, experimentation is warranted to
based classifier through a comprehensive cost evaluation explore the utility of both product and process-related
framework using Object oriented metrics and process metrics.
metric.
Khoshgoftaar et al. [15] extend the research landscape
II. RELATED WORK by constructing software quality models employing majority
voting with multiple training datasets. This work presents an
Noteworthy contributions to the field of fault prediction opportunity for further extension by incorporating data from
have been documented through comprehensive surveys diverse software project repositories. An analysis of the
conducted by Catal and Diri [14], Li Zhiqiang et al. [1], predictive capability of ensembles, in comparison to base
Matloob et al. [7], and Radjenovic et al. [11]. These surveys classifiers, can offer insights into the efficacy of advanced
encompass various aspects, including prediction models, fault prediction models.
modeling techniques, and the metrics employed. Radjenovic et

IJISRT23DEC1931 www.ijisrt.com 2020


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Chen et al. [16] investigated to determine whether distinct influencing the performance of learning algorithms.
cross-project defect prediction methods yield consistent However, a notable gap in the literature pertains to the
identifications of defective modules. The outcomes of this scarcity of investigations on larger datasets, essential for the
study suggest the potential for extension through the development of generalized models. Moreover, the prevalent
application of learning approaches founded on ensemble issue of class imbalance demands attention to augment the
design, thereby further enhancing the performance of cross- efficacy of fault prediction [7]. The exploration of parameter
project defect prediction. A related exploration by Zhang et al. combinations remains a relatively underexplored aspect in
[17] delved into the utilization of various algorithms existing literature studies. Consequently, there is an
integrating machine learning (ML) predictors for cross-project opportunity to replicate this work by incorporating more
defect prediction. However, to comprehensively examine the datasets, with a dedicated focus on product and process
predictive capabilities of advanced algorithms, additional software metrics, and experimenting with diverse scenarios
experimentation is deemed necessary. or combinations of models, encompassing both simple and
advanced models, to attain heightened reliability and
An analysis of the aforementioned studies underscores the robustness.
pivotal role of pre-processing techniques in significantly

Table 2: Existing Review


Authors Metrics employed Outcomes and proposed benchmark solutions
Chen et al. [16] Process and Product The researchers in this study explored the alignment of distinct cross-project defect
prediction methods in identifying common defective modules. The outcomes
suggest the potential for extension through the implementation of learning
approaches founded on ensemble design, to enhance the overall performance of
cross-project defect prediction methodologies.
Khoshgoftaar et Product and Process The authors constructed software quality models employing a majority voting
al. [15] approach with multiple training datasets. This work could be extended by
incorporating data from diverse software project repositories. Such an extension
would facilitate an in-depth analysis of the predictive capabilities of ensembles in
comparison to base classifiers, particularly in the context of advanced models.
Erturk and Sezer CK Product metrics In their study, the authors concluded that the Adaptive Neuro-Fuzzy Inference
[18] System (ANFIS) outperforms the Neural Network (NN) and Support Vector
Machine (SVM) approaches in predicting faults. Future research endeavors may
consider incorporating process metrics into the analysis or developing advanced
defect prediction models to further enhance the predictive capabilities.
Li et al. [19] Code metrics The authors provided a summary of defect prediction studies with a focus on
emerging topics, including machine learning-based algorithms, data manipulation
techniques, and effort-aware prediction strategies. They emphasized the
importance of addressing the class imbalance problem and the need for developing
models in the field of defect prediction.

In software engineering, a well-established principle indicators of fault proneness. The study underscores the
emphasizes that high-quality software should exhibit low significance of not only metric selection but also the size of
coupling and high cohesiveness. Noteworthy contributions to datasets and feature extraction techniques in fault prediction
the study of cohesion metrics for fault prediction include the endeavors.
work of Marcus, Poshyvanyk, and Ferenc [28], who introduced
the Conceptual Cohesion of Classes (C3) as a novel measure Recent trends in software fault prediction underscore
based on the textual coherence of methods. Utilizing an the increasing popularity of machine learning algorithms.
information retrieval approach supported by Latent Semantic Catal and Diri [30] empirically examined the impact of
Indexing, the study performed experiments on three open- metric sets, dataset size, and feature selection techniques on
source subject programs. The findings advocate the integration fault prediction models, employing random forest (RF) and
of structural metrics and cohesion metrics for enhanced AIRS algorithms. The study concluded that RF algorithms
prediction accuracy. performed better for large datasets, while Naive Bayes
algorithms demonstrated efficacy for smaller datasets.
Similarly, Zhou, Xu, and Leung [29] conducted empirical Additionally, Alan [31] employed an RF machine-learning
evaluations on the effectiveness of complexity metrics in algorithm for outlier detection, selecting six metrics from the
predicting software faults, employing CK metrics and McCabe CK suite. The study highlighted the promising nature of
metrics. Using data from three versions of Eclipse IDE, the threshold-based outlier detection, advocating its application
authors compared the performance of LR, Naive Bayes, before the development of fault prediction models.
AdTree, K Star, and Neural networks. Results indicated that
several metrics exhibit a moderate ability to differentiate fault-
prone and fault-non-prone classes, with lines of code and
weighted method McCabe complexity identified as robust

IJISRT23DEC1931 www.ijisrt.com 2021


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. RESEARCH METHODOLOGY scenario-1 features a simple model based on product metrics;
scenario-2 through scenario-5 explores advanced models
In Phase I, the identification of a metrics suite is incorporating additional process metrics (NR, NDC, NML,
undertaken from metric datasets available in the PROMISE, NDPV). These models are assessed using various base
BUG, and JIRA dataset repositories. Various pre-processing machine learning classifiers. Performance evaluation utilizes
methods, including feature ranking methods, feature subset accuracy as key performance indices. To enhance the base
selection methods, and normalization, are employed to derive a machine learning classifier performance, classifier ensembles
reduced subset of features from the original dataset. This are designed through Bagging, AdaBoostM1 (a prominent
reduction is guided by a specific evaluation criterion, aiming to boosting technique), and Voting algorithms. In Phase III, the
diminish feature space dimensionality, eliminate redundant and focus shifts to examining the cost sensitivity of the proposed
irrelevant information, and enhance data quality to improve the ensemble classifiers. This involves the development of a
algorithm's performance. The experimental design incorporates comprehensive cost analysis framework, facilitating a
N-fold cross-validation for training, testing, and replicating the comparison between the best ensemble's cost and the best
experiment across diverse datasets. Phase II involves the base classifier's cost through the determination of normalized
evaluation of a simplified dataset under distinct scenarios: fault removal cost.

Fig 1. A framework of the Proposed ensemble model with cost analysis

In alignment with the insights derived from a The application of statistical tests is motivated by the
comprehensive review of the existing literature and aspiration to empirically substantiate the performance of
identifying potential research gaps, the following research predictors, thereby addressing RQ3. To address RQ4 and
questions have been formulated: ascertain the cost-sensitivity of the proposed predictors, a
comprehensive cost-based evaluation framework has been
RQ1: How do the advanced defect prediction models, adopted.
posited within the study, demonstrate performance
variations across diverse machine learning classifiers? For the experimental investigations, five distinct
RQ2: To what extent does the ensemble design contribute to scenarios were devised by the outlined research questions.
enhancing classification performance compared to the In Scenario 1, an assemblage of all product metrics was
individual machine-learning classifiers? curated post-data processing and normalization, forming
RQ3: Is there a discernible and statistically significant what is denoted as the "Simple model." The detailed
difference in performance among the base classifiers and selection of metrics is provided in Table 3. Subsequently,
ensemble classifiers? Scenario 2 introduced the "Advanced model-1,"
RQ4: Within the context of a given software system, do the incorporating product metrics alongside a singular process
proposed ensembles exhibit a sensitivity to cost metric (Product + NR). Similarly, Scenarios 3, 4, and 5
considerations? engendered the "Advanced model-2" (Product + NDC),
"Advanced model-3" (Product + NML), and "Advanced
The formulation of RQ1 and RQ2 is grounded in the model-4" (Product + NDPV), respectively. These designed
intent to assess the efficacy of advanced models embodying models underwent testing across diverse project datasets
distinct scenarios, characterized by a fusion of software from repositories such as PROMISE, Bug, and Jira,
product and process metrics. These models undergo training employing various classifiers, including DT, MLP, SVM,
utilizing both base learning and ensemble-based classifiers, RT, NB, and classifier ensembles.
with their performances subjected to evaluation through
metrics such as accuracy, RMSE, ROC(AUC), and F-score.

IJISRT23DEC1931 www.ijisrt.com 2022


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The performance assessment of the models—namely, ones employed both bagging and boosting methods.
"Simple model," "Advanced model-1," "Advanced model- Boosting and bootstrap aggregating incorporated Decision
2," "Advanced model-3," and "Advanced model-4"— Stump and REPTree as weak learners. AdaBoosting,
comprised the utilization of accuracy. The metrics adopted involving repeated iterations with weight adjustments, and
in the base classifiers were obtained post-feature selection bootstrap aggregating, employing sampling with
and ranking. N-fold cross-validation, with N set to 10 in this replacement, were integral components of this phase.
instance, facilitated the evaluation of base classifier
performance, employing both training and testing phases. In light of RQ3, exploring potential statistically
This methodology was consistently applied across various significant differences between base classifier and ensemble
dataset versions for distinct base classifiers. classifier performance, the authors employed Friedman's
tests and Wilcoxon signed-rank tests. RQ4 delves into the
To address RQ2, which seeks to assess and compare cost sensitivity of the proposed ensembles, incorporating a
the performance of diverse ensemble methods, relevant normalized fault removal cost approach. To further
algorithm libraries were installed using the pip Python scrutinize the cost sensitivity of the premier ensemble
installer. Algorithms such as Bagging, AdaBoostM1, and classifier, VOT-E2, concerning fault misclassification, a
Voting were deployed. Heterogeneous classifier ensembles comparative analysis was conducted against the best-
adopted the majority voting method, while homogeneous performing base classifier, MLP.

IV. RESULT AND DISCUSSION

Table 3. Summary of research questions


Research question Discussion
Research Question 1 (RQ1): What is the performance of Each model underwent testing across diverse project datasets
the advanced defect prediction models proposed in the sourced from PROMISE, BUG, and JIRA repositories.
study when subjected to various machine learning Employing distinct classifiers such as DT, MLP, SVM, RT, and
classifiers? NB, the performance of the models was assessed.
Research Question 2 (RQ2): To what extent does In a comprehensive evaluation, ensemble methods demonstrated
ensemble design enhance classification performance an overall median F-score ranging between 76.50% and 87.34%,
compared to individual machine-learning classifiers? and ROC (AUC) values between 77.09% and 84.05%. In
contrast, base classifiers achieved an average F-score ranging
between 73% (Simple model) and 83% (Advanced model-2) for
the PROMISE dataset, and ROC (AUC) values between 60%
(Advanced model-4) and 79% (Advanced model-2). This
observation underscores the efficacy of ensemble design in
leveraging the strengths of multiple predictors, contributing to
the advancement of fault prediction methodologies.
RQ3: Whether there exists any statistically significant For pairwise comparisons, the Wilcoxon signed-rank test was
performance difference among the base classifiers and employed. The outcomes from both Friedman's tests and
ensemble classifiers? Wilcoxon signed-rank tests provide statistical evidence
supporting the existence of significant performance differences,
particularly highlighting the unique standing of the ensemble
method.

Fig 2. Box plots for Ensemble Results for average accuracy.

For the Promise dataset, the average accuracy for MLP 2 illustrates that the average accuracy for MLP is higher in
in the simple model is 91.7%, advanced model-1 is 80%, advanced model-2 than in advanced model-3, advanced
advanced model-2 is 87%, advanced model-3 is 85%, and model-1, and the simple model. Similarly, the average
advanced model-4 is 79%. Notably, the bar graph in Figure accuracy for DT in the simple model is 74%, advanced

IJISRT23DEC1931 www.ijisrt.com 2023


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
model-1 is 81%, advanced model-2 is 87%, advanced Logistic regression, a statistical classification
model-3 is 83%, and advanced model-4 is 77%. The technique rooted in maximum likelihood estimation, is
corresponding bar graph (Figure 2a) indicates that the applicable in two modes: univariate regression and
average accuracy for DT is higher in advanced model-2 multivariate regression. Univariate LR is employed for the
compared to advanced model-3, advanced model-1, and the isolated analysis of a single metric on fault proneness. In
simple model. contrast, Multivariate LR proves useful when multiple
metrics need assessment for their impact on fault proneness.
The integration of machine learning techniques into LR is employed when there is one or more than one
software fault prediction has emerged as a recent and independent variable. The objective of LR is to construct the
dynamic area of research. Researchers employ diverse best-fitting model that elucidates the relationship between
machine learning methodologies across multiple dimensions dependent and independent variables. The result of LR is
within this domain, encompassing feature selection, expressed through a fitted logistic regression equation.
classification, outlier detection, and model building. This
study provides a comparative analysis of recent literature, Learning can be categorized into supervised or
shedding light on the various applications of machine unsupervised forms. In supervised learning, a dependent
learning in software fault prediction. variable can be predicted from a given set of independent
variables. A map function is generated using these variables
Researchers exhibit significant diversity in their to produce the desired outcome. Numerous research studies
approaches, encompassing variations like data, metrics have leveraged various machine learning algorithms to
under consideration, employed machine learning algorithms, predict the impact of Object-Oriented (OO) metrics on
and the tools utilized for experimentation. The following software fault proneness. For instance, in a study [37], a
table (Table 4) presents a comprehensive overview of recent Decision tree was employed, and validation was conducted
studies, offering a qualitative analysis of the machine using the receiver operating characteristic (ROC) curve. The
learning algorithms utilized for fault prediction. This primary advantage of Decision trees lies in their ability to
comparative analysis aims to provide insights into the implicitly identify the most influential features from the
nuanced differences and trends prevalent in the application dataset, and their performance is not influenced by the type
of machine learning techniques in software fault prediction of relationship between attributes. Machine learning
studies. algorithms like random forests are suitable for handling
multiclass data, while Bayes networks rely on rules of
Table 4. Comparative analysis probability for prediction. In certain studies [35-37], a set of
Existing Algorithm Metrics Programming learning algorithms, including Bayes networks, random
Research employed evaluated Language forests, and NNge (nearest neighbor with generalization),
[32] Logistic C-K C++ were applied for a comparative analysis of prediction
Regression models. The Artificial Immune Recognition System (AIRS),
[28] Logistic Conceptual C++ inspired by the vertebrate immune system, is a machine-
Regression Cohesion of learning algorithm capable of working with both nominal
and PCA Classes and continuous data. The study that applied AIRS found its
[33] Naive C-K Java performance to be superior to J48. Principal Component
Bayes Analysis (PCA) serves as a feature selection technique,
network, emphasizing variation and producing strong patterns in the
Random dataset. Some studies have utilized PCA for fault prediction
Forest and feature selection in the context of OO metrics.
[30] Random McCabe C++ Additionally, Neuro-fuzzy and Latent Semantic Indexing are
Forest, J48 [36], other competitive algorithms explored for prediction
Halstead purposes.
[37]
[34] Decision C-K Java V. CONCLUSION
tree and
Neural This study introduces advanced models for software
network fault prediction, leveraging information related to both
product and process metrics. The investigation involved
The presented table indicates the widespread forty-two open-source code projects extracted from Promise,
popularity of logistic regression, random forest, and neural Jira, and Bug repositories. Results indicate that the MLP-
networks within the domain of fault prediction studies. based base classifier exhibits superior performance, as
Machine learning algorithms offer versatile applications for reflected in high average accuracy (91.7%). Ensemble
conducting diverse statistical and predictive analyses of methods, incorporating bagging, boosting, and voting,
Object-Oriented (OO) metrics. The WEKA platform serves further enhance classification performance, with VOT-E2
as a comprehensive tool for executing and analyzing these (DT + MLP + SVM) producing the best results. Statistical
algorithms. A succinct overview of the regression model and tests confirm significant performance differences between
other pertinent techniques is provided below. base classifiers and ensemble classifiers, validating the
predictive capability of the proposed models. The study

IJISRT23DEC1931 www.ijisrt.com 2024


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
emphasizes the potential utility of the combination models [8]. R. Jabangwe, J. Börstler, D. Šmite, and C. Wohlin,
for developing advanced defect prediction models, “Empirical evidence on the link between object-
providing valuable insights for software engineers in new oriented measures and external quality attributes: A
projects. While the experiments utilized datasets from systematic literature review,” Empirical Software
Promise, Jira, and Bug repositories, extending the Engineering, Vol. 20, No. 3, 2015, pp. 640–693.
investigation to more open-source and cross-project datasets [9]. I. Kiris, S. Kapan, A. Kılbas, N. Yılmaz, I. Altuntaş et
would enhance the generalization of results. This research al., “The protective effect of erythropoietin on renal
article also presents a comprehensive taxonomy of object- injury induced by abdominal aortic-ischemia-
oriented (OO) metrics usage for fault proneness prediction, reperfusion in rats,” Journal of Surgical Research, Vol.
emphasizing their significance in determining software 149, No. 2, 2008, pp. 206–213.
quality. Various machine learning algorithms have been [10]. L. Madeyski and M. Jureczko, “Which process metrics
applied in fault prediction, with opportunities for further can significantly improve defect prediction models?
exploration in sub-domains such as Support Vector An empirical study,” Software Quality Journal, Vol.
Machine, Dimensionality Reduction, Gradient Boosting, and 23, No. 3, 2015, pp. 393–422.
Deep Learning. While existing research primarily focuses on [11]. D. Radjenović, M. Heričko, R. Torkar, and A.
fault prediction, the application of OO metrics can extend to Živkovič, “Software fault prediction metrics: A
other testing phase activities, including test case selection, systematic literature review,” Information and software
generation, prioritization, and clone detection. Future technology, Vol. 55, No. 8, 2013, pp. 1397–1418.
research directions may involve developing tools for [12]. Y. Wu, Y. Yang, Y. Zhao, H. Lu, Y. Zhou et al., “The
extracting OO metrics from software, contributing to the influence of developer quality on software fault-
efficiency of code analysis. The provided set of extensively proneness prediction,” in Eighth International
used datasets and software tools, discussed in the paper, Conference on Software Security and Reliability
facilitates the evaluation of techniques/methodologies. The (SERE). IEEE, 2014, pp. 11–19.
integration of predictive measures based on OO metrics into [13]. C. Bird, N. Nagappan, B. Murphy, H. Gall, and P.
testing processes can optimize fault localization, refactoring, Devanbu, “Don’t touch my code! Examining the
debugging, and test case minimization, potentially effects of ownership on software quality,” in
minimizing software maintenance costs. Ongoing research Proceedings of the 19th ACM SIGSOFT symposium
in object-oriented software testing aims to explore the and the 13th European conference on Foundations of
impact of OO metrics on software maintenance for a more software engineering, 2011, pp. 4–14.
accurate examination of the problem. [14]. C. Catal and B. Diri, “Investigating the effect of
dataset size, metrics sets, and feature selection
REFERENCES techniques on software fault prediction problem,”
Information Sciences, Vol. 179, No. 8, 2009, pp. 1040–
[1]. Z. Li, X.Y. Jing, and X. Zhu, “Progress on approaches 1058.
to software defect prediction,” Iet Software, Vol. 12, [15]. T.M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute
No. 3, 2018, pp. 161–175. selection and imbalanced data: Problems in software
[2]. Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A defect prediction,” in 22nd IEEE International
general software defect-proneness prediction conference on tools with artificial intelligence, Vol. 1.
framework,” IEEE Transactions on Software IEEE, 2010, pp. 137–144.
Engineering, Vol. 37, No. 3, 2010, pp. 356–370. [16]. X. Chen, Y. Mu, Y. Qu, C. Ni, M. Liu et al., “Do
[3]. X. Yang, D. Lo, X. Xia, and J. Sun, “TLEL: A two- different cross-project defect prediction methods
layer ensemble learning approach for just-in-time identify the same defective modules?” Journal of
defect prediction,” Information and Software Software: Evolution and Process, Vol. 32, No. 5, 2020,
Technology, Vol. 87, 2017, pp. 206–220. p. e2234.
[4]. L. Pascarella, F. Palomba, and A. Bacchelli, “Fine- [17]. Y. Zhang, D. Lo, X. Xia, and J. Sun, “Combined
grained just-in-time defect prediction,” Journal of classifier for cross-project defect prediction: An
Systems and Software, Vol. 150, 2019, pp. 22–36. extended empirical study,” Frontiers of Computer
[5]. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, Science, Vol. 12, No. 2, 2018, p. 280.
“Benchmarking classification models for software [18]. E. Erturk and E.A. Sezer, “A comparison of some soft
defect prediction: A proposed framework and novel computing methods for software fault prediction,”
findings,” IEEE Transactions on Software Engineering, Expert systems with applications, Vol. 42, No. 4, 2015,
Vol. 34, No. 4, 2008, pp. 485–496. pp. 1872–1879.
[6]. S.S. Rathore and S. Kumar, “An empirical study of [19]. Z. Li, X.Y. Jing, and X. Zhu, “Heterogeneous fault
ensemble techniques for software fault prediction,” prediction with cost-sensitive domain adaptation,”
Applied Intelligence, Vol. 51, No. 6, 2021, pp. 3615– Software Testing, Verification, and Reliability, Vol.
3644. 28, No. 2, 2018, p. e1658.
[7]. F. Matloob, T.M. Ghazal, N. Taleb, S. Aftab, M. [20]. T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect
Ahmad et al., “Software defect prediction using prediction based on classifiers ensemble,” Journal of
ensemble learning: A systematic literature review,” Information and Computational Science, Vol. 8, No.
IEEE Access, 2021. 16, 2011, pp. 4241–4254.

IJISRT23DEC1931 www.ijisrt.com 2025


Volume 8, Issue 12, December – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[21]. K. Bańczyk, O. Kempa, T. Lasota, and B. Trawiński, [35]. Thwin MM, Quah TS. Application of neural networks
“Empirical comparison of bagging ensembles created for software quality prediction using object-oriented
using weak learners for a regression problem,” in metrics. Journal of systems and software. 2005 May
Asian Conference on Intelligent Information and 1;76(2):147-56.
Database Systems. Springer, 2011, pp. 312–322. [36]. Zhou Y, Leung H. Empirical analysis of object-
[22]. G. Catolino and F. Ferrucci, “An extensive evaluation oriented design metrics for predicting high and low
of ensemble techniques for software change severity faults. IEEE Transactions on Software
prediction,” Journal of Software: Evolution and Engineering. 2006 Oct;32(10):771-89.
Process, Vol. 31, No. 9, 2019, p. e2156. [37]. Singh Y, Kaur A, Malhotra R. Empirical validation of
[23]. L. Reyzin and R.E. Schapire, “How boosting the object-oriented metrics for predicting fault proneness
margin can also boost classifier complexity,” in models. Software quality journal. 2010 Mar 1;18(1):3.
Proceedings of the 23rd International Conference on
Machine Learning, 2006, pp. 753–760.
[24]. J. Petrić, D. Bowes, T. Hall, B. Christianson, and N.
Baddoo, “Building an ensemble for software defect
prediction based on diversity selection,” in Proceedings
of the 10th ACM/IEEE International Symposium on
Empirical Software Engineering and Measurement,
2016, pp. 1–10.
[25]. Chidamber SR, Kemerer CF. Towards a metrics suite
for object-oriented design. ACM; 1991 Nov 1.
[26]. Li W, Henry S. Object-oriented metrics that predict
maintainability. Journal of systems and software. 1993
Nov 1;23(2):111-22.
[27]. Abreu FB, Carapuça R. Object-oriented software
engineering: Measuring and controlling the
development process. In Proceedings of the 4th
International Conference on software quality 1994 Oct
3 (Vol. 186, pp. 1-8).
[28]. Marcus A, Poshyvanyk D, Ferenc R. Using the
conceptual cohesion of classes for fault prediction in
object-oriented systems. IEEE Transactions on
Software Engineering. 2008 Mar;34(2):287-300.
[29]. Zhou Y, Xu B, Leung H. On the ability of complexity
metrics to predict fault-prone classes in object-oriented
systems. Journal of Systems and Software. 2010 Apr
1;83(4):660-74.
[30]. Catal C, Diri B. Investigating the effect of dataset size,
metrics sets, and feature selection techniques on
software fault prediction problem. Information
Sciences. 2009 Mar 29;179(8):1040-58.
[31]. Alan O, Catal PD. An outlier detection algorithm based
on object-oriented metrics thresholds. InComputer and
Information Sciences, 2009. ISCIS 2009. 24th
International Symposium on 2009 Sep 14 (pp. 567-
570). IEEE.
[32]. Basili VR, Briand LC, Melo WL. A validation of
object-oriented design metrics as quality indicators.
IEEE Transactions on Software Engineering. 1996
Oct;22(10):751-61.
[33]. Catal C, Diri B. Software fault prediction with object-
oriented metrics based artificial immune recognition
system. In International Conference on Product
Focused Software Process Improvement 2007 Jul 2
(pp. 300-314). Springer, Berlin, Heidelberg.
[34]. Gyimothy T, Ferenc R, Siket I. Empirical validation of
object-oriented metrics on open source software for
fault prediction. IEEE Transactions on Software
Engineering. 2005 Oct;31(10):897-910.

IJISRT23DEC1931 www.ijisrt.com 2026

You might also like