A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
ISSN No:-2456-2165
Abstract:- In the expansive domain of software challenge from two perspectives. Firstly, novel
engineering, the persistent challenge of fault prediction has methodologies or combinations of existing methods have
garnered scholarly interest in machine learning been introduced by researchers to enhance fault prediction
methodologies, aiming to refine decision-making and performance. Secondly, the exploration of new parameters to
enhance software quality. This study pioneers advanced identify the most influential metrics for fault prediction has
fault prediction models, intertwining product and process been undertaken. Despite numerous approaches proposed in
metrics through machine learning classifiers and ensemble the literature, the classification of software modules as faulty
design. The methodological framework involves metric or non-faulty remains a largely unresolved issue [2]. To
identification, experimentation with machine learning address this challenge, scholars have increasingly turned to
classifiers, and evaluation, considering cost dynamics. sophisticated techniques, including machine learning, deep
Empirically, 42 diverse projects from PROMISE, BUG, learning, and unsupervised methods, indicating a shift
and JIRA repositories are examined, revealing advanced towards novel and more compelling directions in fault
models with ensemble methods manifesting an accuracy of prediction [3]. Machine learning algorithms have witnessed a
(91.7%), showcasing heightened predictive capabilities and surge in popularity over the last decade and continue to be
nuanced cost sensitivity. Non-parametric tests affirm one of the preferred methods for defect prediction [4]. As
statistical significance, portraying innovation beyond noted by Lessmann et al. [5], "There is a need to develop
conventional paradigms. Conclusively, these advanced more reliable research procedures before having confidence
models navigate inter-project fault prediction with finesse, in the conclusion of comparative studies of software
signifying a convergence of novelty and performance. prediction models."
Simultaneously, anticipating fault proneness in software
components is a pivotal focus in software testing. Software In this study, the aim is to evaluate the performance of
coupling and complexity metrics are critical for evaluating various classifier models without bias towards any specific
software quality. Object-oriented metrics, including classifier. Additionally, the reported efficacy of ensemble
inheritance, polymorphism, and encapsulation, influence techniques by previous researchers [6] for enhancing fault
software quality and offer avenues for estimating fault prediction accuracy is recognized. Furthermore, the
proneness. This study contributes a comprehensive investigation into the diversity of classifiers within ensemble
taxonomy to the discourse, offering a holistic perspective models has been identified as crucial for improving the
on the multifaceted landscape of object-oriented metrics in effectiveness of ensemble designs [7]. This motivation
fault prediction within the broader context of advancing propels the exploration into the design of ensembles to
software quality. enhance the predictive capability of classifiers. In the context
of the second viewpoint, a substantial body of research has
Keywords:- Software Fault Prediction; Object-Oriented been dedicated to investigating the utilization of software
Testing; Object-Oriented Coupling; Machine Learning, metrics derived from code to discern the fault proneness of
Ensemble Design, Product, and Process Metrics. software components. While fault estimation models
predominantly rely on product metrics in the existing
I. INTRODUCTION literature [8], those constructed through a synergy of product
and process metrics remain relatively scarce [9]. Although
Software fault prediction has been a focal point in the some scholars have underscored the importance of
software engineering domain for over three decades, garnering integrating both product and process metrics in their studies,
escalating attention from researchers [1]. The term "fault" the broader incorporation of such models has been limited.
denotes an erroneous step, process, or data definition in a Madeyski and Jureczko [10], in their research, ascertained
computer program, commonly referred to as a "BUG." that process metrics contribute valuable information to fault
Scholars have approached the software fault prediction (SFP) proneness determination. The utilization of process metrics in
Multiple classifiers are harmoniously combined to al. [11] delineate that within the literature on fault prediction
enhance overall performance, with a specific focus on studies, process metrics constitute 24%, source code
improving fault-detection capabilities. Additionally, an contributes 27%, and object-oriented metrics constitute 49%
examination of the cost sensitivity of the proposed ensemble- of the total. Prospective studies are urged to incorporate
based classifier is undertaken. The outcomes of this analysis methodologies for measuring and evaluating process-related
serve to validate the predictive efficacy of the proposed information for fault proneness in conjunction with product
classifiers for the development of advanced fault prediction metrics.
models.
In an empirical study conducted by Madeyski and
The noteworthy contributions of this work can be Jureczko [9], utilizing both industrial and open-source
delineated as follows: software datasets, the significance of process metrics in
Establishment of a learning scheme comprising both base enhancing results was notably observed. Emphasizing the
and ensemble learning classifiers. need for replication using machine learning approaches, they
Construction and scrutiny of the predictive capability of underscore the uncertainty of features performing optimally
advanced fault prediction models. in one method being equally effective in alternative
Evaluation of the cost sensitivity of the proposed ensemble- approaches. Therefore, experimentation is warranted to
based classifier through a comprehensive cost evaluation explore the utility of both product and process-related
framework using Object oriented metrics and process metrics.
metric.
Khoshgoftaar et al. [15] extend the research landscape
II. RELATED WORK by constructing software quality models employing majority
voting with multiple training datasets. This work presents an
Noteworthy contributions to the field of fault prediction opportunity for further extension by incorporating data from
have been documented through comprehensive surveys diverse software project repositories. An analysis of the
conducted by Catal and Diri [14], Li Zhiqiang et al. [1], predictive capability of ensembles, in comparison to base
Matloob et al. [7], and Radjenovic et al. [11]. These surveys classifiers, can offer insights into the efficacy of advanced
encompass various aspects, including prediction models, fault prediction models.
modeling techniques, and the metrics employed. Radjenovic et
In software engineering, a well-established principle indicators of fault proneness. The study underscores the
emphasizes that high-quality software should exhibit low significance of not only metric selection but also the size of
coupling and high cohesiveness. Noteworthy contributions to datasets and feature extraction techniques in fault prediction
the study of cohesion metrics for fault prediction include the endeavors.
work of Marcus, Poshyvanyk, and Ferenc [28], who introduced
the Conceptual Cohesion of Classes (C3) as a novel measure Recent trends in software fault prediction underscore
based on the textual coherence of methods. Utilizing an the increasing popularity of machine learning algorithms.
information retrieval approach supported by Latent Semantic Catal and Diri [30] empirically examined the impact of
Indexing, the study performed experiments on three open- metric sets, dataset size, and feature selection techniques on
source subject programs. The findings advocate the integration fault prediction models, employing random forest (RF) and
of structural metrics and cohesion metrics for enhanced AIRS algorithms. The study concluded that RF algorithms
prediction accuracy. performed better for large datasets, while Naive Bayes
algorithms demonstrated efficacy for smaller datasets.
Similarly, Zhou, Xu, and Leung [29] conducted empirical Additionally, Alan [31] employed an RF machine-learning
evaluations on the effectiveness of complexity metrics in algorithm for outlier detection, selecting six metrics from the
predicting software faults, employing CK metrics and McCabe CK suite. The study highlighted the promising nature of
metrics. Using data from three versions of Eclipse IDE, the threshold-based outlier detection, advocating its application
authors compared the performance of LR, Naive Bayes, before the development of fault prediction models.
AdTree, K Star, and Neural networks. Results indicated that
several metrics exhibit a moderate ability to differentiate fault-
prone and fault-non-prone classes, with lines of code and
weighted method McCabe complexity identified as robust
In alignment with the insights derived from a The application of statistical tests is motivated by the
comprehensive review of the existing literature and aspiration to empirically substantiate the performance of
identifying potential research gaps, the following research predictors, thereby addressing RQ3. To address RQ4 and
questions have been formulated: ascertain the cost-sensitivity of the proposed predictors, a
comprehensive cost-based evaluation framework has been
RQ1: How do the advanced defect prediction models, adopted.
posited within the study, demonstrate performance
variations across diverse machine learning classifiers? For the experimental investigations, five distinct
RQ2: To what extent does the ensemble design contribute to scenarios were devised by the outlined research questions.
enhancing classification performance compared to the In Scenario 1, an assemblage of all product metrics was
individual machine-learning classifiers? curated post-data processing and normalization, forming
RQ3: Is there a discernible and statistically significant what is denoted as the "Simple model." The detailed
difference in performance among the base classifiers and selection of metrics is provided in Table 3. Subsequently,
ensemble classifiers? Scenario 2 introduced the "Advanced model-1,"
RQ4: Within the context of a given software system, do the incorporating product metrics alongside a singular process
proposed ensembles exhibit a sensitivity to cost metric (Product + NR). Similarly, Scenarios 3, 4, and 5
considerations? engendered the "Advanced model-2" (Product + NDC),
"Advanced model-3" (Product + NML), and "Advanced
The formulation of RQ1 and RQ2 is grounded in the model-4" (Product + NDPV), respectively. These designed
intent to assess the efficacy of advanced models embodying models underwent testing across diverse project datasets
distinct scenarios, characterized by a fusion of software from repositories such as PROMISE, Bug, and Jira,
product and process metrics. These models undergo training employing various classifiers, including DT, MLP, SVM,
utilizing both base learning and ensemble-based classifiers, RT, NB, and classifier ensembles.
with their performances subjected to evaluation through
metrics such as accuracy, RMSE, ROC(AUC), and F-score.
For the Promise dataset, the average accuracy for MLP 2 illustrates that the average accuracy for MLP is higher in
in the simple model is 91.7%, advanced model-1 is 80%, advanced model-2 than in advanced model-3, advanced
advanced model-2 is 87%, advanced model-3 is 85%, and model-1, and the simple model. Similarly, the average
advanced model-4 is 79%. Notably, the bar graph in Figure accuracy for DT in the simple model is 74%, advanced