Hyperparameter Optimization for Software Bug Prediction Using Ensemble Learning

This study investigates the effectiveness of Hyperparameter Optimization and Ensemble Learning (EL) in Software Bug Prediction (SBP), revealing that EL models significantly outperform single hypothesis learning models. Utilizing a dataset from NASA, the research demonstrates that tuning hyperparameters enhances the accuracy of the proposed model. The findings highlight the importance of these methodologies in improving predictive capabilities in software development processes.

Uploaded by

Tahiru Abdul-Moomin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views10 pages

Hyperparameter Optimization for Software Bug Prediction Using Ensemble Learning

Uploaded by

Tahiru Abdul-Moomin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Received 31 January 2024, accepted 17 March 2024, date of publication 21 March 2024, date of current version 16 April 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3380024

Hyperparameter Optimization for Software Bug

Prediction Using Ensemble Learning
DIMAH AL-FRAIHAT 1 , YOUSEF SHARRAB2 , ABDEL-RAHMAN AL-GHUWAIRI3 ,
HAMZEH ALSHISHANI3 , AND ABDULMOHSEN ALGARNI 4
1 Department of Software Engineering, Faculty of Information Technology, Isra University, Amman 11622, Jordan
2 Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Isra University, Amman 11622, Jordan
3 Department of Software Engineering, Faculty of Prince Al-Hussien Bin Abdullah for IT, Hashemite University, Zarqa 13133, Jordan
4 Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia

Corresponding author: Dimah Al-Fraihat ([email protected])

This work was supported by the Deanship of Scientific Research, King Khalid University, under Grant R.G.P.2/93/45.

ABSTRACT Software Bug Prediction (SBP) is an integral process to the software’s success that involves
predicting software bugs before their occurrence. Detecting software bugs early in the development process
enhances software quality, performance, and reduces software costs. The integration of Machine Learning
(ML) algorithms has significantly improved software bug prediction accuracy and concurrently reduced
costs and resource utilization. Numerous studies have explored the impact of Hyperparameter Optimization
on single classifiers, enhancing these models’ overall performance in SBP analysis. Ensemble Learning
(EL) approaches have also demonstrated increased model accuracy and performance on SBP datasets.
This study proposes a novel learning model for predicting software bugs through the utilization of EL
and tuning hyperparameters. The results are compared with single hypothesis learning models using the
WEKA software. The dataset, collected by the National Aeronautics and Space Administration (NASA)
U.S.A., comprises 10,885 instances with 20 attributes, including a classifier for defects in one of their
coding projects. The findings indicate that EL models outperform single hypothesis learning models, and
the proposed model’s accuracy increases after optimization. Furthermore, the accuracy of the proposed
model demonstrates improvement following the optimization process. These results underscore the efficacy
of ensemble learning, coupled with hyperparameter optimization, as a viable approach for enhancing the
predictive capabilities of software bug prediction models.

INDEX TERMS Software bug prediction, machine learning, hyperparameter optimization, ensemble
learning.

I. INTRODUCTION consumes resources and contributes to delays in release dates.

Software bugs increase the cost of the Software Development All of these considerations have led to the development of
Life Cycle (SDLC), as the expense of software development Software Bug Prediction (SBP) systems, which assist soft-
rises proportionately with the time of discovery [1]. This ware development organizations in automating the identifica-
impact is particularly significant if bugs are identified after tion of bugs before they occur, thereby preventing the wastage
the software release, beginning to affect the end-user experi- of resources.
ence. Therefore, it is always best to find bugs as soon as they Predicting the emergence of bugs and understanding the
occur. Bugs also influence software quality and reliability, Software Development Life Cycle (SDLC) process flow are
prompting many software companies to research methods critical factors that contribute to performance criteria [2]. The
for identifying bugs as soon as they occur. This research application of ML algorithms for defect prediction is versa-
tile, as it can be employed across various stages of the SDLC,
The associate editor coordinating the review of this manuscript and including problem detection, planning, design, building, test-
approving it for publication was Ahmed M. Elmisery . ing, deployment, and maintenance. This application extends
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
VOLUME 12, 2024 51869
D. Al-Fraihat et al.: Hyperparameter Optimization for SBP Using EL

to multiple SDLC models, such as the Waterfall Model, Iter- Levesque et al. [9]. Hyperparameter optimizers are utilized
ative Model, Agile Model, V-Shaped Model, Spiral Model, to fine-tune the control parameters of data mining algorithms.
and Big Bang Model. ML algorithms prove effective in It is widely recognized that such tuning enhances classifica-
determining the presence of bugs in a given piece of code. tion tasks, including software bug prediction or text classifi-
Enhanced programming methods can significantly elevate the cation [10].
overall quality of the program. ML represents one of the To enhance the effectiveness of software bug prediction,
most rapidly expanding fields of computer science, with far- this research addresses two key questions. Firstly, it explores
reaching applications. It involves the automatic identification whether Ensemble Learning (EL) models outperform single
of meaningful patterns in records. The primary goal of ML learning models in predicting software bugs. Secondly, the
tools is to provide algorithms with the opportunity to learn study investigates the impact of adding Hyperparameter Opti-
and adapt [3], [4]. mization approaches on EL model performance, examining
With the rise of ML Algorithms in the past couple of years, whether this enhancement leads to improved accuracy. These
many companies utilized ML to find and predict software questions serve as the foundation of our exploration, guiding
bugs before occurring. ML Algorithms help organizations us to understand the nuances of predictive modeling in soft-
make future predictions in many fields using mathematical ware bug detection.
equations with statistics and ML algorithms. There are two The remainder of the paper is organized as follows:
types of ML algorithms, supervised learning and unsuper- Section II presents related work; Section III covers the back-
vised learning. In supervised learning, the algorithm’s input ground of the study, which includes Hyperparameter opti-
and output are based on historical data related to the study, mization and ensemble learning; Section IV outlines the
which is used to train the algorithms and find patterns; these methodology, encompassing the dataset, tools used, research
patterns are then used to predict future outputs. In unsu- questions, and evaluation measures; Section V presents the
pervised learning, the learning algorithms process data to results and discussion; Section VI provides the conclusion
find patterns in unlabeled fields. Early bug discovery and and outlines future work.
prediction are critical exercises in the software development
life cycle [5]. They contribute to a higher satisfaction rate II. RELATED WORK
among users and enhance the overall performance and quality This section navigates through recent contributions in the
of the software project. dynamic field of software bug prediction by exploring various
Defect prediction models can guide testing efforts towards methodologies and approaches used by previous researchers.
code that is susceptible to bugs. Before software delivery to These studies enrich our understanding of effective bug pre-
the customer, latent bugs in the code may surface, and once diction models and offer diverse insights and methodologies
identified, these flaws can be rectified prior to delivery at to address the challenges posed by software defects.
a fraction of the cost compared to post-delivery repairs [6]. In the study of Hammouri et al. [5], the researchers
Code defects incur significant expenses for companies each conducted a thorough examination of three supervised ML
year, with billions of dollars spent on identification and models—Naive Bayes, Decision Tree, and Artificial Neural
correction. Models that accurately pinpoint the locations of Networks—specifically tailored for predicting software bugs.
bugs in code have the potential to save businesses substantial Their research involved a meticulous comparison with two
amounts of money. Given the high costs involved, even minor previously proposed models, revealing that the three models
improvements in detecting and repairing flaws can have a exhibited a significantly higher accuracy rate than the pro-
significant impact on total costs. posed two models.
Many studies have been conducted on Bug Detection using Di Nucci et al. [11] undertook an innovative approach,
ML Algorithms. Some of these studies have utilized EL, quantifying the dispersal of modifications made by devel-
which enables researchers and practitioners to employ more opers working on a component, and utilized this data to
than one machine learning algorithm. By employing different construct a bug prediction model. This study, based on prior
ML algorithms with diverse methodologies, EL allows for research that examined the human factor in bug generation,
the aggregation of votes from multiple ML algorithms used demonstrates that non-focused developers are more prone to
together, enhancing the ability to make more accurate predic- introducing defects compared to their focused counterparts.
tions. Further, previous research studies have examined dif- Notably, the model exhibited superiority when compared to
ferent types of algorithms to predict software bugs. Although four competitive techniques.
some of these proposed algorithms showed higher accuracy Khan et al. [12] introduced a model for predicting soft-
rates, most of the cited bug prediction studies for software ware bugs, integrating ML classifiers with Artificial Immune
projects had their settings values set to default values [7]. Networks (AIN) to optimize hyperparameters for enhanced
Hyperparameter optimization can enhance accuracy and bug prediction accuracy. Employing seven ML algorithms
improve the performance of ML models. The optimiza- on a bug prediction dataset, the study revealed that hyper-
tion of hyperparameters in Bayesian, SVM, and KNN mod- parameter tuning of ML classifiers using AIN surpassed the
els has demonstrated an increase in the accuracy of bug performance of classifiers with default hyperparameters in
prediction models, as observed by Osman et al. [8] and the context of software bug prediction.