0% found this document useful (0 votes)

12 views17 pages

Muhammad

This study focuses on enhancing software defect prediction (SDP) using machine learning and deep learning techniques to improve accuracy and address class imbalance issues. Various models, including Random Forest and Convolutional Neural Networks, were evaluated using the JM1 NASA dataset, with Random Forest achieving the highest accuracy of 82.3%. The findings indicate significant improvements in defect detection methodologies, highlighting the effectiveness of these models in identifying software defects early in the development process.

Uploaded by

seyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views17 pages

Muhammad

Uploaded by

seyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Online ISSN: 1597-2216

Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Software Defect Prediction Using Machine Learning and Deep Learning

Techniques

Muhammad Jumare1, Haruna and Darius, Tienhua Chinyio2

1,2
Department of Computer Science,
Nigerian Defence Academy, Kaduna-Kaduna Sate
[email protected], [email protected]

Abstract

Software defect prediction (SDP) has become a critical component in modern software development, aiming to
identify potential bugs early in the development process. Despite advancements in the field, existing SDP models
often struggle with accuracy and class imbalance issues, limiting their practical application in real-world software
development environments. The increasing complexity of software systems necessitates more robust and accurate
defect prediction techniques to enhance software quality and reduce development costs. This study aims to enhance
software defect prediction through machine learning and deep learning techniques, focusing on improving accuracy
and addressing class imbalance. The research seeks to develop and compare various machine learning and deep
learning models to identify the most effective approach for early defect detection in software development processes.
The study employs a comprehensive methodology utilizing Random Forest, Support Vector Machines (SVM), Naive
Bayes, Artificial Neural Networks (ANN), and Convolutional Neural Networks (CNN) algorithms. The JM1 NASA
software defect dataset was used, consisting of 10,885 instances with 22 code metric attributes. To address class
imbalance, a hybrid sampling technique (SMOTE-Tomek) was implemented. Models were evaluated using multiple
performance metrics including accuracy, precision, recall, F1-score, and AUC-ROC. This approach allows for a
thorough comparison of traditional machine learning and deep learning techniques in the context of software defect
prediction. The Random Forest model achieved the best overall performance with an accuracy of 82.3%, recall of
96.8%, F1-score of 0.898, and precision of 83.7%, significantly outperforming previous benchmarks. The CNN
model also showed promise, achieving 81.95% accuracy and 95.90% recall. These results demonstrated substantial
improvements in handling class imbalance and overall predictive performance compared to existing studies. The
high recall rates suggest that these models, particularly Random Forest, are effective at identifying a large
proportion of defects, which is crucial in software development where missing defects can be costly. However, the
study also highlighted the ongoing challenge of balancing precision and recall in software defect prediction. The
findings contribute to the refinement of software defect prediction methodologies, offering improved models for
early defect detection in software development processes and opening avenues for further research in applying
machine learning to software engineering challenges.
.

Keywords: Software Defect, Machine learning, Software Engineering, Deep learning, NASA JM1 dataset

527
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

1. Introduction

Software defects, also known as bugs or software faults, are errors or flaws in computer programs that
lead to incorrect behavior or unintended results (Elentukh 2023; Shafiq et al. 2023). These defects can
have severe consequences, ranging from system crashes and data corruption to security vulnerabilities and
financial losses. As software systems continue to grow in complexity, the task of detecting and eliminating
defects has become an increasingly critical challenge in software development and maintenance
(Mahmoud et al. 2024; Mehmood et al. 2023).
The increasing intricacy of modern software systems has rendered manual defect detection not only
extremely challenging but also highly time-consuming. This has necessitated the development and
implementation of automated techniques for efficient and effective defect detection (Vogel-Heuser et al.
2015). In recent years, Machine Learning (ML) and Deep Learning (DL) techniques have emerged as
powerful tools in addressing this challenge (Adam, Abatcha, and Aboaba 2022; Esteves et al. 2020; Li et
al. 2017).
Machine Learning enables computers to learn from data and make predictions without explicit
programming (LeCun, Bengio, and Hinton 2015). In the context of software defect detection, ML
techniques analyze historical data such as source code, execution traces, and bug reports to identify
patterns and predict potential defects. Deep Learning, a subset of machine learning, utilizes artificial
neural networks with multiple layers to learn hierarchical data representations. When applied to software
defect detection, DL models can be trained on large datasets to automatically learn complex patterns
associated with defects (Zhao et al. 2024).
Numerous studies have explored the application of ML and DL techniques for software defect detection,
encompassing supervised learning, unsupervised learning, and hybrid approaches (Hasanpour et al. 2020;
Pandey, Mishra, and Tripathi 2021; Sanusi et al. 2019; Thomas and Kaliraj 2024). These techniques
leverage historical data and patterns to build predictive models capable of identifying potentially defective
code or components. Despite showing promising results, the adoption of these techniques in industry
remains limited due to several challenges. These include issues related to data quality, model
interpretability, and integration into existing software development processes (Ali et al. 2021; Nevendra
and Singh 2021; Pandey et al. 2021; Shen and Chen 2020).
Software Defect Prediction (SDP) has become increasingly important in modern software development.
However, existing models often struggle with accuracy and class imbalance issues. This research aims to
enhance software defect prediction through advanced machine learning techniques, focusing on
improving accuracy and addressing class imbalance. The study employs a comprehensive methodology,
utilizing Random Forest, Support Vector Machines (SVM), Naive Bayes, Artificial Neural Networks
(ANN), and Convolutional Neural Networks (CNN) algorithms.
By taking into account these challenges and exploring the potential of both traditional machine learning
and deep learning approaches, this study seeks to contribute to the refinement of software defect prediction
methodologies. The goal is to offer improved models for early defect detection in software development
processes, thereby enhancing software quality assurance practices and opening avenues for further
research in applying machine learning to software engineering challenges.

528
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

2. Related Works

This section presents a review of recent works in software defect prediction, The technique of predicting
defective modules in a software system is known as SDP. Over the past twenty years, numerous methods
have been presented in the literature that use historical defect datasets to try and determine the relationship
between a collection of attributes or features of a particular software module and the presence of defects in
that module. Software metrics, such as McCabe’s Cyclomatic Complexity (McCabe 1976), Halstead
Metrics Halstead (1977), and the Chidamber and Kemerer object-oriented metrics suite Chidamber and
Kemerer (1994), are the most often utilised features to predict problems.

Sanusi et al. (2019), developed a software defect prediction system using machine learning algorithms to
identify software bugs promptly. Their study employed Random Forest, Decision Tree, and Artificial
Neural Network (ANN) algorithms, with Random Forest outperforming others in accuracy (83.40%),
precision (53.18%), and F-score (52.04%). While the study demonstrated robust methodology through
feature selection and cross-validation, it lacked detailed information about the datasets used and did not
address the impact of class imbalance on algorithm performance.
Olorunshola et al. (2020), evaluated twelve machine learning classification algorithms using the PROMISE
dataset. Their comprehensive study found that the Random Forest algorithm outperformed most others,
while the Bayes Net classifier excelled in terms of the false positive rate. However, the study’s focus on a
single dataset raised concerns about result generalizability.

Hasanpour et al. (2020) addressed high dimensionality and class imbalance issues using Deep Belief
Networks (DBN) and Stack Sparse Auto-Encoders (SSAE). Their findings indicated that deep learning
models, particularly SSAE, provided more accurate predictions for most NASA datasets than traditional
methods. However, the study identified weaknesses in handling severely imbalanced classes and datasets
with insufficient examples.

Nevendra and Singh (2021) developed an enhanced Convolutional Neural Network (CNN) model for
software defect prediction. Their two-stage approach, involving feature selection and data transformation,
outperformed state-of-the-art techniques such as KNN, SVM, and AdaBoost across 19 open-source
datasets. However, the study lacked detailed explanations for architectural choices and did not address
cross-project defect prediction scenarios.

Ali et al. (2021), focused on early software defect prediction in NASA datasets, employing various machine
learning classification algorithms. Their novel tuned XGBoost model achieved the highest accuracy of
95.98% on the MW1 dataset. While the study effectively handled class imbalance, it lacked detailed
information on parameter tuning and model interpretability.

Benin et al.(2022) tried to investigate six re-sampling techniques: synthetic minority over sampling
technique(SMOTE), borderline SMOTE ,safe-level SMOTE, Oversampling using adaptive synthetic
(ADASYN), random over sampling, and random under sampling. The six approaches were conducted on
40 releases of 20 open-source projects with an imbalance ratio between the ranges of 3.8 to 17.46%, and
two types of metrics were used. The study utilize used several machine learning algorithms and concluded
that: first, resampling methods significantly improved the performance of the prediction model in terms of

529
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

all model metrics except for Area Under the Roc Curve; second the performance of resampling method
depends on the imbalance ratio of the dataset (ratio of defects and clean instances); third, random under-
sampling and border-line SMOTE provided more stable results across several performance measures and
prediction models.

Mehmood et al. (2023) integrated feature selection with various machine learning classifiers to improve
software defect prediction accuracy. Using five NASA datasets, the study achieved significant
improvements, including an average 8% accuracy improvement for the Bayesian Net algorithm and over
93% accuracy for Logistic Regression with feature selection. While the study contributed to bridging the
gap between software engineering and data mining, it lacked explanations for the chosen algorithms and
comparisons with state-of-the-art techniques.

Chinenye, Anyachebelu, and Abdullahi (2023), proposed improving defect prediction accuracy through
decision tree algorithms, addressing issues such as feature selection and hyperparameter tuning. Their
structured framework included genetic algorithms for feature selection and comprehensive evaluation
metrics. However, the study lacked analysis of important features and their impact on predictions.

Elshamy, AbouElenen, and Elmougy (2023), addressed challenges in software defect detection including
class imbalance and hyperparameter optimization. Their approach, using SMOTE-SVM for dataset
balancing and NDSGA-II with Hyperband for hyperparameter optimization, achieved high accuracy with
the Random Forest classifier. However, the study lacked an analysis of computational complexity and time
requirements.

Alkaberi and Assiri (2024), focused on predicting the number of software faults using CNN and multilayer
perceptron (MLP) models. Their approach, which included oversampling techniques and log
transformation, demonstrated improved performance over traditional regression models. However, the
study’s reliance on Java datasets with specific metrics limited its external validity.

Thomas and Kaliraj (2024), proposed an enhanced Random Forest-based approach for the NASA JM1
dataset. Their methodology, which included SMOTE for class imbalance and a two-fold optimization
process, achieved an accuracy of 82.96% and an F1 score of 89.53%, outperforming standard models.
However, the focus on a single dataset limited the generalizability of their approach.

Jude and Uddin (2024), introduced a hybrid machine learning algorithm combining multiple models
through stacking. Their approach, which integrated Explainable AI (XAI) techniques for interpretability,
showed improved performance across various metrics using NASA-MD datasets. The study acknowledged
limitations in generalizability and recommended further research on computational requirements.

Ali et al. (2024), developed the Voting Ensemble-Based Software Defect Prediction model (VESDP),
integrating four heterogeneous supervised machine learning classifiers. The VESDP model demonstrated
remarkable accuracy across seven historical defect datasets, outperforming twenty state-of-the-art
techniques. However, the study’s focus on NASA MDP datasets and emphasis on accuracy metrics
potentially limited its broader applicability.

530
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

The review of related works in software defect prediction reveals several significant research gaps. These
include a reliance on limited datasets, particularly NASA datasets, which constrains the generalizability of
findings across diverse software projects. The persistent challenge of class imbalance in defect prediction
datasets remains inadequately addressed. There’s a notable lack of focus on model interpretability, which
is crucial for practical adoption in software engineering contexts. While some studies have begun to explore
deep learning techniques, there is insufficient investigation into optimizing various deep learning
architectures specifically for defect prediction. The field also lacks studies on real-time prediction models
that can adapt to evolving software projects, as well as research on effectively integrating prediction models
into actual development workflows. Furthermore, there is a scarcity of comprehensive comparative studies
benchmarking a wide range of traditional and deep learning techniques. Lastly, most studies treat all defects
equally, neglecting the potential value in predicting specific types of defects such as security vulnerabilities
or performance issues. Find solutions to these gaps could significantly advance the field, leading to more
accurate, interpretable, and practically applicable defect prediction models in real-world software
development scenarios.

3. Methodology

The research design of this study is depicted in Figure 1. The input dataset JM1 Dataset is first pre-
processed. The preprocessing step includes data cleaning, data normalization, feature engineering, data
scaling and data splitting. After the preprocessing of the input dataset JM1, a hybrid sampling technique
was applied to the training dataset and evaluation was done on test data based on the algorithms including
Decision Trees (DT), Random Forests (RF), Naive Bayes (NB), Artificial Neural Networks (ANN), and
Convolutional Neural Networks (CNN). Finally, the results of each model were reported.

Figure 1 illustrates the research design, which encompasses the following key steps:

Figure 1: Research Design

531
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

a. Dataset Description

The study uses a secondary data source from the NASA repository (Sayyad Shirabad and Menzies
2005; Thomas and Kaliraj 2024). The dataset contains 10,885 instances and 22 attribute columns,
including various code metrics. The class distribution for the ‘defects’ field is 19.35% false (2,106
instances) and 80.65% true (8,779 instances). Table 1 provides a detailed description of the dataset
features:

Table 1:Dataset Columns Description

b. Dependent and Independent Variables

The dependent variable false in this study is binary (i.e., false ∈ {0,1}) where 0 means that the class
does not have a defect and 1 means the class has a defect.

532
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Figure 2: Dataset Defects Column Distribution

c. Data Preprocessing

The data preprocessing stage involved several critical steps to prepare the software metrics for
defect prediction modelling. Feature engineering was performed on the ‘defects’ column,
transforming it into a binary classification. Values greater than 0 were labelled as ‘Defect’, while
0 values were labelled as ‘No Defect’. This transformation simplified the target variable for
subsequent classification tasks. Data scaling was applied using MinMaxScaler to standardize
selected features. This step normalized software metrics such as lines of code (loc), McCabe's
cyclomatic complexity (v(g)), and Halstead effort (e), ensuring that all features contribute equally
to the model training process. The standardization improved model performance, accelerated
convergence, and enhanced generalization capabilities. A correlation analysis was conducted, and
visualized through a heatmap (refer to Figure 3). This heatmap revealed important relationships
between various code metrics and defect occurrence. Key metrics like lines of code (loc),
cyclomatic complexity (v(g), and essential complexity (ev(g)) showed stronger correlations with
defects, indicating their potential as significant predictors. The heatmap also highlighted possible
multicollinearity between certain metrics, an important consideration for model development. The
dataset was split into training and testing sets using the train_test_split function with a 70:30 ratio.
This stratified split maintained the distribution of the target variable across both sets, ensuring a
representative sample for model training and evaluation. This step was crucial for assessing the
model’s predictive capabilities on unseen data and preventing overfitting. These preprocessing
steps collectively prepared the data for effective defect prediction modelling by transforming
features, normalizing scales, analysing correlations, and creating appropriate training and testing
sets.

d. Sampling

In software defect prediction, where non-defective instances often outnumber defective ones,
hybrid techniques of SMOTE-Tomek were essential for addressing class imbalance in this research.
SMOTE generates synthetic samples of defective instances, while Tomek Links refines decision

533
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

boundaries by removing borderline non-defective instances. By balancing the dataset and

improving class separability, the technique enhances the six models and creates the ability to
accurately predict defects, leading to more reliable and effective software quality assessments.

Figure 3 Correlational Heatmap

e. Model training and Evaluation phases

Post-SMOTE-Tomek, the model is trained using supervised learning and deep learning algorithms
on the rebalanced dataset, employing methods such as the ‘fit’ function on the resampled and
standardized training data. The subsequent evaluation phase is instrumental in assessing the
model’s efficacy and generalizability. This phase encompasses making predictions on both the
resampled training data and previously unseen test data, followed by the computation of various
performance metrics. These metrics include but are not limited to, accuracy, precision, recall, F1
score, confusion matrix, and the Area Under the Receiver Operating Characteristic curve (ROC

534
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

AUC). This comprehensive set of metrics provides a multifaceted view of the model’s predictive
capabilities, particularly its ability to discriminate between defective and non-defective software
modules. Through meticulous interpretation of these results, researchers can identify the model’s
strengths and limitations, ensure its robustness against overfitting and underfitting, and validate its
readiness for deployment in real-world software defect prediction scenarios. This rigorous
evaluation process is paramount in establishing the model’s reliability and effectiveness in the
context of software quality assurance and defect management.

f. Evaluation Metrics

The study used five performance metrics to evaluate the developed model’s performance which are
accuracy, precision, recall F1 score and AUC-ROC . These metrics are widely used in the related
literature to evaluate Software Defect Prediction models.

True positives (TP), False negatives (FN), False positives (FP), and True negatives (TN), False
positives rate (FPR), True positives rate (TPR).

The following metrics are calculated as follows:

TP+TN
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = TP+TN+FP+FN (1)

TP
𝐑𝐞𝐜𝐚𝐥𝐥 = (2)
TP+FN

TP
𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 = (3)
TP+FP

Precision × Recall
𝐅𝟏 𝐒𝐜𝐨𝐫𝐞 = 2 × Precision+Recall
(4)

(𝐹𝑃𝑅𝑖+1 −𝐹𝑃𝑅𝑖 ) × (𝑇𝑃𝑅𝑖+1 + 𝑇𝑃𝑅𝑖

𝑨𝑼𝑪 = ∑𝑛−1
𝑖−1 2
(5)

where,

True Positive (TP): This occurs when the model correctly predicts a software defect, and the defect
is confirmed (i.e. when a defective instance is accurately identified as defective).

True Negative (TN): This occurs when the model correctly predicts that a software component is
not defective, and it is confirmed as non-defective (i.e. when a non-defective instance is accurately
identified as non-defective).

False Positive (FP): This occurs when the model incorrectly predicts a defect in a software
component that is non-defective (i.e. when a non-defective instance is mistakenly identified as
defective).

False Negative (FN): This occurs when the model fails to detect a defect in a software component
that is defective (i.e. when a defective instance is mistakenly identified as non-defective).

535
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

4. Results and discussion

This section presents the results of experiments conducted on various machine learning and deep learning
algorithms for software defect prediction. The algorithms evaluated include Decision Tree (DT), Random
Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), Artificial Neural Network (ANN), and
Convolutional Neural Network (CNN).

a. Decision Tree Algorithm

The Decision Tree algorithm achieved an accuracy of 79.56% on the test data, with a precision of 84.35%,
a recall of 91.68%, and an F1 score of 0.8786. The model showed some signs of overfitting, as indicated
by a slight performance drop from training (85.17%). A notable class imbalance was observed, with better
performance in the majority “No Defect” class. The ROC AUC score of 0.7022 suggested moderate
discriminative ability.

Accuracy: 0.7956495098039216
Precision: 0.843466107617051
Recall: 0.9168249145461451
F1 Score: 0.8786169244767971
ROC AUC Score:0.7022419937607701
b. Random Forest Algorithm

Random Forest demonstrated improved performance over the Decision Tree, achieving 82.32% accuracy
on test data with good generalization. It exhibited exceptional recall (96.89%) and good precision (83.75%),
resulting in a strong F1 score of 0.8984. The ROC AUC score of 0.7529 indicated improved discriminative
ability compared to the Decision Tree.

Accuracy: 0.8232230392156863
Precision: 0.8374917925147735
Recall: 0.9688568173186479
F1 Score: 0.8983976052121853
ROC AUC Score:0.7529280622695124
c. Support Vector Machine (SVM) Algorithm

The SVM algorithm showed good overall performance with 81.19% accuracy on test data and consistent
generalization. It achieved exceptionally high recall (99.62%) and solid precision (81.28%), resulting in an
F1 score of 0.8952. However, the model struggled significantly with the minority class, showing extremely
low recall (4%) for the “Defect” class.

Accuracy: 0.8118872549019608
Precision: 0.8128292531763247
Recall: 0.996202050892518
F1 Score: 0.895221843003413
ROC AUC Score:0.6761830069765496
d. Naive Bayes Algorithm

536
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Naive Bayes demonstrated solid performance with 80.94% accuracy on test data and good generalization.
It achieved high recall (94.49%) and good precision (83.91%), resulting in an F1 score of 0.8889. The ROC
AUC score of 0.7020 indicated moderate discriminative ability.

Accuracy: 0.8094362745098039
Precision: 0.839123102866779
Recall: 0.9449297379415116
F1 Score: 0.888888888888889
ROC AUC Score:0.7020424660065498
e. Artificial Neural Network (ANN) Algorithm

The ANN showed strong performance with 81.95% accuracy and good generalization. It achieved high
recall (96.39%) and good precision (83.71%), resulting in an F1 score of 0.8960. The ROC AUC score of
0.7295 indicated moderate discriminative ability, better than most models except Random Forest.

Accuracy: 0.819546568627451
Precision: 0.8370712401055409
Recall: 0.9639194834789214
F1 Score: 0.8960282436010591
ROC AUC Score:0.7294969432829569
f. Convolutional Neural Network (CNN) Algorithm

The CNN demonstrated strong performance with 81.95% accuracy, high recall (95.90%), and good
precision (84.00%), resulting in an F1 score of 0.8955. The ROC AUC score of 0.7323 indicated moderate
discriminative ability, slightly better than most models but lower than Random Forest.

Accuracy: 0.819546568627451
Precision: 0.8399866932801064
Recall: 0.9589821496391948
F1 Score: 0.8955488561801737
ROC AUC Score:0.7322963507788204
g. Comparative Analysis of Models

537
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Figure 4: comparison of the performance metrics across all evaluated algorithms

Figure 4 presents a comparison of the performance metrics across all evaluated algorithms. Random Forest
achieved the highest accuracy (82.3%), closely followed by ANN and CNN (both 82.0%). Decision Tree
showed the highest precision (84.3%), while SVM demonstrated the highest recall (99.6%). The F1 score,
balancing precision and recall, was highest for Random Forest (89.8%), with ANN and CNN close behind
(both 89.6%).

h. Comparison with Existing Studies

Table 2 presents a comparison of the Random Forest algorithm’s performance in this study with two
previous studies.

Table 2: Comparison with existing studies

Study Algorithm Accuracy Recall F1-Score Precision

(Olorunshola et al. 2020) Random Forest 0.818 0.818 0.787 0.787
(Thomas and Kaliraj 2024) Random Forest 0.7276 0.885 0.7801 0.6974
This Study Random Forest 0. 823 0.968 0.898 0.837

The current study’s implementation of Random Forest demonstrated the strongest overall performance,
achieving the highest accuracy (82.3%), recall (96.8%), F1-score (0.898), and precision (83.7%) compared
to the studies by Olorunshola et al. (2020), and Thomas, and Kaliraj (2024).

538
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

The results of this study demonstrate significant advancements in software defect prediction using machine
learning techniques, particularly with the Random Forest algorithm. The performance improvements can
be attributed to several factors, including advanced feature engineering, improved handling of class
imbalance, and optimized hyperparameter tuning. The high recall (96.8%) of our model is particularly
noteworthy in the context of software defect prediction. As (D’Ambros et al. 2012) points out, in software
engineering, the cost of missing a defect (false negative) is generally higher than the cost of a false alarm
(false positive). The model’s ability to identify a high proportion of defects aligns well with this priority.
However, it is important to acknowledge the persistent challenge of class imbalance in software defect
prediction, as highlighted by (Mahmoud et al. 2024). While the model shows improved handling of this
issue compared to previous studies, there is still room for enhancement, particularly in precision for the
minority class. The performance of the Random Forest model aligns with findings from (Esteves et al.
2020), who identified Random Forest as one of the top-performing algorithms for software defect
prediction. The study emphasized the importance of proper tuning and validation techniques, which were
incorporated into our methodology. While this study demonstrates significant progress in software defect
prediction, it also highlights areas for future work. These include further addressing the class imbalance,
exploring advanced deep learning architectures, and investigating the interpretability of model decisions,
an aspect emphasized by (Mahmoud et al. 2024; Mehmood et al. 2023) as crucial for practical adoption in
software engineering.

5. Conclusion

This paper has made significant strides in advancing the field of software defect prediction through the
application of machine learning and deep learning techniques. The research findings offer several important
contributions and implications for both academic research and practical software development. The
Random Forest algorithm emerged as the top performer, achieving an accuracy of 82.3%, recall of 96.8%,
F1-score of 0.898, and precision of 83.7%. This performance significantly surpasses previous benchmarks,
demonstrating the potential of ensemble methods in software defect prediction. The high recall rate is
particularly noteworthy, as it indicates the model’s strong capability in identifying a large proportion of
defects, which is crucial in software development contexts where missing defects can lead to significant
costs and risks. While traditional machine learning methods showed strong performance, the study also
highlighted the promise of deep learning approaches, particularly Convolutional Neural Networks (CNN).
The CNN model achieved comparable results to traditional methods, suggesting that with further
optimization, deep learning could offer powerful tools for software defect prediction. The use of hybrid
sampling techniques (SMOTE-Tomek) proved effective in addressing the persistent challenge of class
imbalance in software defect datasets. This approach could be valuable for future research dealing with
imbalanced datasets in various domains of software engineering. Also, by evaluating multiple algorithms
on the same dataset, this study provides a comprehensive comparison that can serve as a benchmark for
future research in software defect prediction. The improved accuracy and high recall rates of the models
developed in this study have significant implications for practical software development. These models
could be integrated into development workflows to identify potential defects early, thereby reducing
development costs and improving software quality.

However, the study also revealed areas that require further investigation:

539
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

 While the models performed well on the NASA JM1 dataset, future work should focus on
validating these approaches across diverse software projects and development environments to
ensure generalizability.
 As model complexity increases, so does the need for interpretable results. Future research should
explore ways to make the predictions of these models more explainable to developers and project
managers.
 Developing models that can adapt and provide predictions in real-time as software projects evolve
remains an important area for future work.
 Advanced feature engineering techniques specifically tailored for software metrics could
potentially improve model performance further.
 Future studies could explore the prediction of specific types of defects (e.g., security vulnerabilities,
performance issues) which could be more valuable in certain development contexts.

540
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

References

Adam, Hassan, Muhammad Abatcha, and Abdulfattah Aboaba. 2022. ‘Design of a Hybrid Machine Learning
Base-Classifiers for Software Defect Prediction’. International Journal of Innovative Research and
Development. doi: 10.24940/ijird/2022/v11/i10/OCT22020.

Ali, Misbah, Tehseen Mazhar, Yasir Arif, Shaha Al-Otaibi, Yazeed Yasin Ghadi, Tariq Shahzad, Muhammad
Amir Khan, and Habib Hamam. 2024. ‘Software Defect Prediction Using an Intelligent Ensemble-Based
Model’. IEEE Access.

Ali, Sikandar, Muhammad Adeel, Sumaira Johar, Muhammad Zeeshan, Samad Baseer, and Azeem Irshad. 2021.
Classification and Prediction of Software Incidents Using Machine Learning Techniques’. Security and
Communication Networks 2021(1):9609823.

Alkaberi, Wahaj, and Fatmah Assiri. 2024. ‘Predicting the Number of Software Faults Using Deep Learning’.
Engineering, Technology & Applied Science Research 14(2):13222–31. doi: 10.48084/etasr.6798.

Alzahrani, Musaad. 2022. ‘Using Machine Learning Techniques to Predict Bugs in Classes: An Empirical Study’.
International Journal of Advanced Computer Science and Applications 13(5).

Bennin, K.E., Tahir, A., MacDonell, S.G., B¨orstler, J. (2022) An empirical study on the effectiveness of data
resampling approaches for cross-project software defect prediction. IET Software 16(2),185–199

Chidamber, Shyam R., and Chris F. Kemerer. 1994. ‘A Metrics Suite for Object-Oriented Design’. IEEE
Transactions on Software Engineering 20(6):476–93.

Chinenye, Obidike, Kene Tochukwu Anyachebelu, and Muhammad Umar Abdullahi. 2023. ‘Software Defect
Prediction System Based on Decision Tree Algorithm’. Asian Journal of Research in Computer Science
16(4):32–48.

D’Ambros, M., M. Lanza, and R. Robbes. 2012. ‘Evaluating Defect Prediction Approaches: A Benchmark and
an Extensive Comparison’. Empirical Software Engineering 17(4–5):531–77.

Elentukh, Alex. 2023. ‘People Make Mistakes–A Survey of Common Causes of Software Defects’. Pp. 117–33
in International Conference on Computer Science and Education in Computer Science.

Elshamy, Nawal, Amal AbouElenen, and Samir Elmougy. 2023. ‘Automatic Detection of Software Defects
Based on Machine Learning’. International Journal of Advanced Computer Science and Applications 14(3).

Esteves, Geanderson, Eduardo Figueiredo, Adriano Veloso, Markos Viggiato, and Nivio Ziviani. 2020.
‘Understanding Machine Learning Software Defect Predictions’. Automated Software Engineering
27(3):369–92.

Goyal, Somya. 2022. ‘Handling Class-Imbalance with KNN (Neighbourhood) under-Sampling for Software
Defect Prediction’. Artificial Intelligence Review 55(3):2023–64.

Halstead, Maurice H. 1977. Elements of Software Science. Elsevier Science Inc.

541
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Hasanpour, Ahmad, Pourya Farzi, Ali Tehrani, and Reza Akbari. 2020. ‘Software Defect Prediction Based on
Deep Learning Models: Performance Study’. ArXiv Preprint ArXiv:2004.02589.

Jude, Agboeze, and Jia Uddin. 2024. ‘Explainable Software Defects Classification Using SMOTE and Machine
Learning’. Annals of Emerging Technologies in Computing (AETiC) 8(1).

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. ‘Deep Learning’. Nature 521(7553):436–44.

Li, Jian, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. ‘Software Defect Prediction via Convolutional
Neural Network’. Pp. 318–28 in 2017 IEEE international conference on software quality, reliability and
security (QRS).

Mahmoud, Alia Nabil, Ahmed Abdelaziz, Vitor Santos, and Mario M. Freire. 2024. ‘A Proposed Model for
Detecting Defects in Software Projects’. Indonesian Journal of Electrical Engineering and Computer
Science 33(1):290–302.

McCabe, Thomas J. 1976. ‘A Complexity Measure’. IEEE Transactions on Software Engineering SE-2(4):308–
20.

Mehmood, Iqra, Sidra Shahid, Hameed Hussain, Inayat Khan, Shafiq Ahmad, Shahid Rahman, Najeeb Ullah,
and Shamsul Huda. 2023. ‘A Novel Approach to Improve Software Defect Prediction Accuracy Using
Machine Learning’. IEEE Access.

Nevendra, Meetesh, and Pradeep Singh. 2021. ‘Software Defect Prediction Using Deep Learning’. Acta
Polytechnica Hungarica 18(10):173–89.

Olorunshola, Oluwaseyi Ezekiel, Martins E. Irhebhude, Abraham E. Evwiekpaefe, and Francisca Nonyelum
Ogwueleka. 2020. ‘Evaluation of Machine Learning Classification Techniques in Predicting Software
Defects’. Trans. Mach. Learn. Artif. Intel 8:1–15.

Pandey, Sushant Kumar, Ravi Bhushan Mishra, and Anil Kumar Tripathi. 2021. ‘Machine Learning Based
Methods for Software Fault Prediction: A Survey’. Expert Systems with Applications 172:114595.

Sanusi, B. A., S. O. Olabiyisi, A. O. Olowoye, and B. L. Olatunji. 2019. ‘Software Defect Prediction System
Using Machine Learning Based Algorithms’. Journal of Advances in Computational Intelligence Theory
1(3):1–9.

Sayyad Shirabad, J., and T. J. Menzies. 2005. ‘The PROMISE Repository of Software Engineering Databases.’

Shafiq, Muhammad, Fatemah H. Alghamedy, Nasir Jamal, Tahir Kamal, Yousef Ibrahim Daradkeh, and
Mohammad Shabaz. 2023. ‘Retracted: Scientific Programming Using Optimized Machine Learning
Techniques for Software Fault Prediction to Improve Software Quality’. IET Software 17(4):694–704.

Shen, Zhidong, and Si Chen. 2020. ‘A Survey of Automatic Software Vulnerability Detection, Program Repair,
and Defect Prediction Techniques’. Security and Communication Networks 2020(1):8858010.

Thomas, Nikhil Saji, and S. Kaliraj. 2024. ‘An Improved and Optimized Random Forest Based Approach to
Predict the Software Faults’. SN Computer Science 5(5):530.

542
Online ISSN: 1597-2216
Kasu Journal of Computer Science Vol. 1 No. 3 [September, 2024], pp. 527-543 Print ISSN: 1597-2178
https://doi.org//10.47514/kjcs/2024.1.3.0010 www.kjcs.edu.ng

Vogel-Heuser, Birgit, Alexander Fay, Ina Schaefer, and Matthias Tichy. 2015. ‘Evolution of Software in
Automated Production Systems: Challenges and Research Directions’. Journal of Systems and Software
110:54–84.

Zhao, Guoliang, Stefanos Georgiou, Safwat Hassan, Ying Zou, Derek Truong, and Toby Corbin. 2024.
‘Enhancing Performance Bug Prediction Using Performance Code Metrics’. Pp. 50–62 in Proceedings of
the 21st International Conference on Mining Software Repositories.

543

Business Process Support ONGOING
No ratings yet
Business Process Support ONGOING
224 pages
Hazel User Guide
No ratings yet
Hazel User Guide
186 pages
Print Out Project MACHINE LEARNING
No ratings yet
Print Out Project MACHINE LEARNING
12 pages
Software Defect Prediction Using An Intelligent Ensemble-Based Model
No ratings yet
Software Defect Prediction Using An Intelligent Ensemble-Based Model
20 pages
Predicciones de Defectos de Software
No ratings yet
Predicciones de Defectos de Software
6 pages
Deep Learning Based Software Defect Prediction
No ratings yet
Deep Learning Based Software Defect Prediction
11 pages
Research Proposal
No ratings yet
Research Proposal
4 pages
Overview of Software Defect Prediction Using Machine Learning Algorithms
No ratings yet
Overview of Software Defect Prediction Using Machine Learning Algorithms
12 pages
Software Defect Prediction Using An Intelligent Ensemble-Based Model - Abstract
No ratings yet
Software Defect Prediction Using An Intelligent Ensemble-Based Model - Abstract
5 pages
Software Defect Prediction Using Machine Learning
No ratings yet
Software Defect Prediction Using Machine Learning
5 pages
SDP Edited1.edited
No ratings yet
SDP Edited1.edited
8 pages
Review Article Abstract
No ratings yet
Review Article Abstract
2 pages
Romi Jse Template 2014
No ratings yet
Romi Jse Template 2014
5 pages
Sivam 219303066 Research Paper Testing 1
No ratings yet
Sivam 219303066 Research Paper Testing 1
13 pages
Exploring Metaheuristic Optimized Machine Learning
No ratings yet
Exploring Metaheuristic Optimized Machine Learning
45 pages
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
No ratings yet
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
17 pages
Ai PPT 2)
No ratings yet
Ai PPT 2)
10 pages
Deep Learning Software Defect Prediction Methods F
No ratings yet
Deep Learning Software Defect Prediction Methods F
11 pages
A Survey of Different Machine Learning M
No ratings yet
A Survey of Different Machine Learning M
13 pages
IEEE - INDIACom 2018 Paper
No ratings yet
IEEE - INDIACom 2018 Paper
6 pages
SEKE23 Paper 118
No ratings yet
SEKE23 Paper 118
6 pages
August 2024: Top 10 Cited Articles in Software Engineering & Applications
No ratings yet
August 2024: Top 10 Cited Articles in Software Engineering & Applications
31 pages
May 2025: Top 10 Cited Articles in Software Engineering & Applications
No ratings yet
May 2025: Top 10 Cited Articles in Software Engineering & Applications
31 pages
Software Defect Prediction Using Random Forest
No ratings yet
Software Defect Prediction Using Random Forest
5 pages
Assessing Personalized Software Defect Predictors
No ratings yet
Assessing Personalized Software Defect Predictors
4 pages
A Hybrid Machine Learning Approach For Enhanced Software Defect Prediction Through Optimized Feature Selection
No ratings yet
A Hybrid Machine Learning Approach For Enhanced Software Defect Prediction Through Optimized Feature Selection
26 pages
Software Defect Prediction - Final - Doc - Phase 1
No ratings yet
Software Defect Prediction - Final - Doc - Phase 1
36 pages
Software Defect Prediction
No ratings yet
Software Defect Prediction
14 pages
Deep Learning For Software Defect Prediction - A Survey
No ratings yet
Deep Learning For Software Defect Prediction - A Survey
6 pages
Comprehensive Study On Machine Learning
No ratings yet
Comprehensive Study On Machine Learning
10 pages
Neural Network Parameter Optimization Based On Genetic Algorithm For Software Defect Prediction
No ratings yet
Neural Network Parameter Optimization Based On Genetic Algorithm For Software Defect Prediction
2 pages
Research Paper Updation-14.4.25
No ratings yet
Research Paper Updation-14.4.25
26 pages
P4 - Progress On Approaches To Software Defect Prediction
No ratings yet
P4 - Progress On Approaches To Software Defect Prediction
15 pages
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
No ratings yet
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
6 pages
Software Defect Prediction Using ML
No ratings yet
Software Defect Prediction Using ML
6 pages
Software Defect Prediction Using A Bidirectional LSTM Network Combined With Oversampling Techniques
No ratings yet
Software Defect Prediction Using A Bidirectional LSTM Network Combined With Oversampling Techniques
24 pages
Software Fault Prediction Using Cross-Project Anal
No ratings yet
Software Fault Prediction Using Cross-Project Anal
17 pages
Software Defect
No ratings yet
Software Defect
46 pages
Software Defect Prediction: A Survey With Machine Learning Approach
No ratings yet
Software Defect Prediction: A Survey With Machine Learning Approach
6 pages
14 Apr
No ratings yet
14 Apr
9 pages
Software Defect Prediction Via Convolutional Neural Network
No ratings yet
Software Defect Prediction Via Convolutional Neural Network
11 pages
Integrating Adaptive Sampling With Ensembles Model
No ratings yet
Integrating Adaptive Sampling With Ensembles Model
12 pages
Software Defect Prediction Using Ensemble Learning
No ratings yet
Software Defect Prediction Using Ensemble Learning
6 pages
Application of Deep Learning For Software Defect Prediction: Team Members
No ratings yet
Application of Deep Learning For Software Defect Prediction: Team Members
2 pages
A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
No ratings yet
A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
8 pages
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
No ratings yet
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
19 pages
Fault Prediction
No ratings yet
Fault Prediction
9 pages
Xu 2019
No ratings yet
Xu 2019
19 pages
Software Testing Defect Prediction Model - A Practical Approach
No ratings yet
Software Testing Defect Prediction Model - A Practical Approach
5 pages
Predicting Root Cause Analysis (RCA) Bucket For
No ratings yet
Predicting Root Cause Analysis (RCA) Bucket For
4 pages
Intelligent Software Defect Prediction 1st Edition Xiao-Yuan Jing Instant Download
100% (1)
Intelligent Software Defect Prediction 1st Edition Xiao-Yuan Jing Instant Download
54 pages
Systematic - Literature - Review - Without Citation
No ratings yet
Systematic - Literature - Review - Without Citation
18 pages
A Novel Approach To Enhancing Software Quality Assurance Through Early Detection and Prevention of Software Faults
No ratings yet
A Novel Approach To Enhancing Software Quality Assurance Through Early Detection and Prevention of Software Faults
13 pages
Designing A Robust Software Bug Prediction Model Using Enhanced Learning Principles With Artificial Intelligence Assistance
No ratings yet
Designing A Robust Software Bug Prediction Model Using Enhanced Learning Principles With Artificial Intelligence Assistance
6 pages
Software Bug Prediction Using Machine Learning Approach
No ratings yet
Software Bug Prediction Using Machine Learning Approach
6 pages
Fuzzy C Means Method For Cross - Project Software Defect Prediction
No ratings yet
Fuzzy C Means Method For Cross - Project Software Defect Prediction
10 pages
Defect Prediction
No ratings yet
Defect Prediction
2 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
100% (2)
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
7 pages
P1 - A Systematic Literature Review and Meta Analysis On Cross Project Defect Prediction
No ratings yet
P1 - A Systematic Literature Review and Meta Analysis On Cross Project Defect Prediction
37 pages
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
No ratings yet
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
15 pages
Defect Prediction-Survey
No ratings yet
Defect Prediction-Survey
14 pages
FINAL UoI SP Advert - Reducing Delays in Disease Diagnosis-May 3
No ratings yet
FINAL UoI SP Advert - Reducing Delays in Disease Diagnosis-May 3
1 page
Mem-Hr-Rec-02-1 - Vacancy Announcement - Agric Management Trainee
No ratings yet
Mem-Hr-Rec-02-1 - Vacancy Announcement - Agric Management Trainee
1 page
Lafia 2025
No ratings yet
Lafia 2025
30 pages
Invoice 316786
No ratings yet
Invoice 316786
1 page
Overview of The PSR - 014309
No ratings yet
Overview of The PSR - 014309
56 pages
IEEE NIGERIA 2024 Brochure FINAL - Compressed
No ratings yet
IEEE NIGERIA 2024 Brochure FINAL - Compressed
25 pages
Guidelines For Establishing New Programes in Polytechnics and Similar Tertiary Institutions in Nigeria
No ratings yet
Guidelines For Establishing New Programes in Polytechnics and Similar Tertiary Institutions in Nigeria
13 pages
BPML
No ratings yet
BPML
3 pages
Serv-U 15.1.5 Release Notes: Last Updated: 2/7/17
No ratings yet
Serv-U 15.1.5 Release Notes: Last Updated: 2/7/17
5 pages
Emu Log
No ratings yet
Emu Log
33 pages
Exam 77 726 Word 2016 Expert Creating Documents For Effective Communication Skills Measured
No ratings yet
Exam 77 726 Word 2016 Expert Creating Documents For Effective Communication Skills Measured
2 pages
Kubernets
No ratings yet
Kubernets
14 pages
Digital Presentation
No ratings yet
Digital Presentation
5 pages
Modern Robotics With OpenCV, Widodo Budiharto
No ratings yet
Modern Robotics With OpenCV, Widodo Budiharto
232 pages
ITviec IT Salary Report 2023 2024 Full en 85777e
No ratings yet
ITviec IT Salary Report 2023 2024 Full en 85777e
112 pages
Logix Error Codes For PLC and SLC Messages, R10.x and Later: Quick Reference
No ratings yet
Logix Error Codes For PLC and SLC Messages, R10.x and Later: Quick Reference
2 pages
PSCAD V5 - HPC Brochure 2023
No ratings yet
PSCAD V5 - HPC Brochure 2023
2 pages
BT0062
No ratings yet
BT0062
1 page
Debugging Guide: Enable Inspector
No ratings yet
Debugging Guide: Enable Inspector
4 pages
AI Module1
No ratings yet
AI Module1
44 pages
Google Cloud Skills Boost - Google Cloud Architect Diagnostic Questions
No ratings yet
Google Cloud Skills Boost - Google Cloud Architect Diagnostic Questions
11 pages
Introduction To Linux: Core & Advanced Drafted By: Abhishek Roshan
No ratings yet
Introduction To Linux: Core & Advanced Drafted By: Abhishek Roshan
46 pages
Gartner DC MQ 2020
No ratings yet
Gartner DC MQ 2020
19 pages
Operating System
No ratings yet
Operating System
11 pages
Paypal Guide Final
No ratings yet
Paypal Guide Final
71 pages
Renan A. Tamin
No ratings yet
Renan A. Tamin
5 pages
Soal 05
No ratings yet
Soal 05
3 pages
ISACA CISA v2022-10-07 q117
No ratings yet
ISACA CISA v2022-10-07 q117
29 pages
Reliability Prediction of Electronic Equipment: Problems and Experience
No ratings yet
Reliability Prediction of Electronic Equipment: Problems and Experience
7 pages
SAP WEB IDE For SAP HANA Installation Troubleshooting Guide: Public 2020-12-28
No ratings yet
SAP WEB IDE For SAP HANA Installation Troubleshooting Guide: Public 2020-12-28
52 pages
CD Lab Manual
No ratings yet
CD Lab Manual
83 pages
Ezcad Parameter and Debugging
No ratings yet
Ezcad Parameter and Debugging
16 pages
Governance of Generative Artificial Intelligence For Companies
No ratings yet
Governance of Generative Artificial Intelligence For Companies
9 pages
Cit 2111 Fundamentals of Internet 2
No ratings yet
Cit 2111 Fundamentals of Internet 2
3 pages
17.23w - Introduction To Big Data Analysis (Iee3593-01) - p2
No ratings yet
17.23w - Introduction To Big Data Analysis (Iee3593-01) - p2
4 pages

Muhammad

Uploaded by

Muhammad

Uploaded by

Online ISSN: 1597-2216

Software Defect Prediction Using Machine Learning and Deep Learning

Muhammad Jumare1, Haruna and Darius, Tienhua Chinyio2

Figure 1: Research Design

Table 1:Dataset Columns Description

b. Dependent and Independent Variables

Figure 2: Dataset Defects Column Distribution

boundaries by removing borderline non-defective instances. By balancing the dataset and

Figure 3 Correlational Heatmap

e. Model training and Evaluation phases

The following metrics are calculated as follows:

(𝐹𝑃𝑅𝑖+1 −𝐹𝑃𝑅𝑖 ) × (𝑇𝑃𝑅𝑖+1 + 𝑇𝑃𝑅𝑖

4. Results and discussion

a. Decision Tree Algorithm

Figure 4: comparison of the performance metrics across all evaluated algorithms

h. Comparison with Existing Studies

Table 2: Comparison with existing studies

Study Algorithm Accuracy Recall F1-Score Precision

Halstead, Maurice H. 1977. Elements of Software Science. Elsevier Science Inc.

You might also like