Comprehensive Study On Machine Learning
Comprehensive Study On Machine Learning
Abstract—Software bugs are defects or faults in computer make these tools hard to be used in practice. So, there must be
programs or systems that cause incorrect or unexpected found another methodology or approach for static code
operations. These negatively affect software quality, reliability, analysis such as Machine Learning (ML) algorithms [1], [9],
and maintenance cost; therefore many researchers have already [12]. Software bugs usually appear during software
built and developed several models for software bug prediction. development process. Software bugs are often difficult to
Till now, a few works have been done which used machine detect or identify, and developers spend a large amount of
learning techniques for software bug prediction. The aim of this time locating and fixing them. As well, some bugs cannot be
paper is to present comprehensive study on machine learning detected at an early phase of development. To relieve the issue
techniques that were successfully used to predict software bug.
of bug fixing, the researchers did many extensively studies for
Paper also presents a software bug prediction model based on
bug prediction. Many machine learning (ML) driven
supervised machine learning algorithms are Decision Tree (DT),
Naïve Bayes (NB), Random Forest (RF) and Logistic Regression prediction models have been built and tested on various basis.
(LR) on four datasets. We compared the results of our proposed The process of software bug report is an important part of
models with those of the other studies. The results of this study software maintenance, but the process of bug reports
demonstrated that our proposed models performed better than assignment can be very expensive in large software
other models that used the same data sets. The evaluation process development projects, where a lot of studies suggest
and the results of the study show that machine learning automating bug assignment approaches using machine
algorithms can be used effectively for prediction of bugs. learning in open-source software. Software Bug Prediction
(SBP) plays a vital and important role in the process of
Keywords—Static code analysis; software bug prediction; improving software product quality. SBP is a process of
software metrics; machine learning techniques generating machine learning models (classifiers) to predict
software (code) defects based on historical data. The most
I. INTRODUCTION recent methodologies used to predict software bugs are
Due to the increasing size, complexity of software supervised(classification)machine learning models, and with
products and inadequate software testing no system or recent advances in machine learning techniques, new models
software can claim to be bugs free. There are many activities have emerged that have enhanced performance and
related to software testing such as implementing processes, capabilities in predicting software bug [2]. Classification is a
procedures, and standards that must be carried out in a specific major task of data analysis using machine learning algorithms
sequence to ensure that quality objectives are achieved or that allow the machine to learn associations between instances
testing a product for issues such as software bugs. There are and decision labels, from which an algorithm builds a model
different classifications of bugs in software testing like Major to predict the labels of new instances for a specific sample
defect: a defect, which will cause an observable product data. In machine learning, classification can be categorized
failure or deviation from functional requirements. Minor into three types: binary (yes or no), multi-class, and multi-
defect: a defect that will not cause a failure in execution of the label classification [5], [25]. To build a dataset containing
product. Fatal defect: a defect that will cause useful buggy code element characterization information, we
application/system crash or close abruptly. Bugs can also be chose Promise Repository dataset that stores software metrics
classified into functional defects, performance defects, along with bug information for many projects, these datasets
usability defects, compatibility defects, security defects, etc. were collected from real software projects by NASA [26]. The
The use of analytical methods to check and review source objective of this study is to investigate the previous studies
codes is standard development practice. This process can be that used most effective machine learning techniques for
accomplished manually or automatically using static code software bug prediction. In this paper, four supervised
analysis tools, dynamic code analysis tools, etc. Recently a lot machine learning models are identified and utilized on four
of tools evolved for static code analysis, to provide a truly different datasets to evaluate the Machine learning algorithms
practical, value added solution to many of the problems that capabilities in software bug prediction. The paper compares
software development organizations face. But there are the proposed models based on various performance measures
numerous false positives and false negatives results, which like accuracy, precision, recall, F1-score and ROC curves. The
726 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
structure of this study is organized as follow. Section 2 quantities of data are needed to develop machine learning
presents a discussion on software bug prediction by analyzing models-based prediction [11], [31], [33]. Machine learning
static code analysis. An overview of the machine learning algorithms build models from training examples, which are
techniques is presented in Section 3. After that, the literature then used to make predictions when faced with new examples.
review is presented in Section 4. Section 5 presents our Supervised learning is a type of machine-learning algorithm
research methodology. Section 6 presents software metrics that builds a prediction model by training the labeled data to
and data sets. An overview of the selected machine learning execute the prediction task. The goal of supervised machine
classifiers and their evaluation is presented in Sections 7 and learning algorithms is to develop an inferring function through
8. Section 9 presents the experimental results and discussion concluding relationships between independent
followed by conclusions and future work in the Section 10. variables(inputs) and dependent variables(outputs) of the
training datasets [5], [27]. Classification is a method uses a
II. SOFTWARE BUG PREDICTION BY ANALYZING STATIC data mining or machine learning approach classify the data,
CODE classification techniques deal with a software component,
Static code analysis is a method of analyzing software named classifier, this classifier invoked with inputs (features).
code without its execution to find potential problems like Features are extracted from the training data examples as text,
defects or bugs issues that might arise at runtime to check the numbers, or nominal values. Bug prediction is one application
quality of source code and addressing weaknesses in the of machine learning that aims to identify critical pieces in
program code through evaluating and correct source code source code potential contain defects. This process can be
based on some factors like structure, content, and used in software projects to earning insights into how and
documentation. There are many commercial and open source where bugs happen to enhance software quality.
tools developed for static code analysis [3], [24]. These tools
remove the unnecessary fuzz from source code and perform IV. LITERATURE REVIEW
some automated checks to improve and ensure a certain level Software bug prediction is one of the most popular
of quality. This can be performed very early in the research areas in software engineering. The major aim of the
development process, during this procedure the code must software bug prediction is to detect bugs in software modules
pass many formal tests to be considered bug free. There exist by considering software metrics as input (parameters). The
several ways of analyzing static code by exploiting the natural research described in this paper presents a comprehensive
language found within a program’s text based on compliance study on machine learning techniques for software bug
with different coding standards. These types of analysis may prediction. The following subsection covers the recent
be manual, which is usually very time consuming like code literature related to bug prediction. Considerable research has
inspections, or automated using one or more tools. Software been performed on software bug prediction using machine
Bug Prediction (SBP) considers a vital activity during learning techniques. For example, Wang et al. in [1] proposed
software development and maintenance. SBP is a a combination approach of contexts and neural network to
methodology related to figure out bugs in the software module detecting bugs. The results show that the tool can have a
by considering software metrics as a parameter [4]. Numerous relative improvement up to 160% on F-score. Also, the tool
studies have confirmed that machine learning techniques are can detect 48 true bugs in the list of top 100 reported bugs.
suitable techniques for predicting software bug to identify Jonsson et al. in [2] evaluated automated bug assignment
defective software code [5], [6], [9]. Bug reports are basic techniques that are based on machine learning classification.
software development tools which describe software bugs, The results of study show that the prediction of accuracies is
especially in open-source software [7], [30]. To warranty the between 50% and 90% when large training sets are used.
quality of software, many projects use bug reports to gather Chappell et al. in [3] presented report on using machine
and record the bugs reported [8]. The bugs classified into two learning techniques for finding bugs in C programs.
classes: intrinsic bugs refer to bugs that were introduced by Hammouri et al. in [5] presented machine learning model for
one or more specific changes to the source code and extrinsic software bug prediction. The experiment was conducted on
bugs refer to bugs that were introduced by changes not the basis of three supervised machine learning algorithms
recorded in the version control system [5], [18]. Several Naïve Bayes, Decision Tree, and Artificial Neural Networks
techniques have been developed over the years to to predict future software bugs based on historical data. The
automatically detect bugs in source code. Often, these results show that the use of machine learning algorithms is
techniques depend on formal methods program analysis. Many effective and leads to a high rate of accuracy. The comparison
studies in literature use code features as input for machine results showed that the Decision Tree (DT) classifier has the
learning algorithms to perform bug prediction. The most best results over the others. Kumar Pandey et al. in [6]
machine learning algorithms that can be used to detect conducted compare various Bayesian network classifier and
software bugs is classification techniques [10]. how they are useful for bugs prediction and random forest.
The experimental results revealed that the Bayesian network is
III. MACHINE LEARNING TECHNIQUES better than random forest. Meenakshi et al. in [7] proposed
Machine learning is an area of research where computer various ML models for software bug prediction. The
programs can learn and get better at performing specific tasks experiment results demonstrated that the machine learning
by training on historical data [2]. Machine learning algorithms techniques are efficient and suitable approaches to predict the
can be applied to analyze data from different perspectives to future software bugs and the comparison of results showed
allow developers to obtain useful information [10], [38]. High that the DT classifier has the best results over the others. Un-
727 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
728 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
1) RQ1: Which ML models have been used for software • Logistic Regression (LR): LR is a statistical
bug prediction?: To answer this research question, this study classification technique which is based on maximum
identified the machine learning models commonly being used likelihood estimation. It is meant for predicting the
for software bug prediction in previous studies as shown in likelihood of an entity belonging one class or another
Fig. 2, and these models are: class [16], [28], [37], [39].
• Decision Tree is a popular learning method used in data 2) RQ2: How these models have been trained and what
mining and machine learning for the purpose of languages have been used? To answer this research question,
regression and classification. It refers to a hierarchal the essential issue of software bug prediction with machine
model or a tree with decision nodes that have more than learning techniques is how train and test the model [17]. A
one branch and leaf nodes that represent the decision. large and representative data set is the basis for training and
Each node in a decision tree represents a feature in an testing machine learning models. So, in the literature review
instance to be classified, and each branch represents the and in our experimental study, different and large datasets,
value thresholds the contained nodes can assume. and different programming languages such C, C++ and Java
Instances are categorized beginning at the root node and has been used to training machine learning models.
sorted based on their attribute values [5], [29].
3) RQ3: Which performance measures are used for
• Naïve Bayes (NB) is a supervised learning algorithm software bug prediction? To answer this research question,
and defines as simple probabilistic classifier and several measures are used for gauging the performance of
efficient based on Bayes theorem with independence different machine learning models. These performance
assumption between the features, this means that the measures are used for comparing and evaluating models
Naive Bayes classifier is based on estimating the developed using various machine learning techniques. A
probabilities of the unobserved node, based on the depiction of the number of studies using each performance
observed probabilities [5], [22].
measures is used in Fig. 3. The most used performance metric
• Artificial Neural Networks (ANNs): ANNs are machine is accuracy, which is closely followed by recall, precision, and
learning models or nonlinear classifiers used to model F1-score, and some less commonly metrics are H-measure,
complex relationships between inputs and outputs for Area Under the Curve (AUC) and Receiver Operating
classification purposes. An ANN model contains Characteristics (ROC) curve.
multiple units (layers) for information processing which
are known as neurons. The layers are typically named
the input layer, hidden layer, and output layer [5]. When
implementing a neural network, a set of consistent
training values must be available to set up the expected
operation of the network and a set of validation values
to validate the training process [14].
• Random Forest is one of the most utilized models, due
its effortlessness and the way, which it can be utilized
for both characterization and relapse assignments. It is
an adaptable and simple to utilize machine learning
calculation, even without hyper-parameter tuning [23].
• Support Vector Machine (SVM): SVM is one of the
regulated machine learning models. It is a Fig. 2. Number of Studies across ML Techniques based on Classifications.
comparatively novel learning approach used for binary
classification. The primary role is to discover a hyper-
plane, which divide the dimensional data completely
into two categories [15], [32].
• Deep Learning (DL): DL is one of an artificial
intelligence function that mimics the workings of the
human brain. It allows and helps to solve complex
problems with using a data set that is very diverse,
unstructured, and interconnected [40].
• K-Nearest Neighbor define as a simple supervised
classification algorithm in which an object is classified
by looking at the K nearest objects and by choice most
frequently occurring class [28].
729 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
4) RQ4: What the conclusions can we draw about the metrics, Halstead metrics, Lines of Codes (LoC), and McCabe
efficiencies of machine learning techniques for software bug complexity. Various defects detection methods like Black box
prediction from results presented in the selected studies?: To probing, automatic formal methods, etc. And different
answer this research question, this study evaluates the best machine learning models like linear regression, the M5’ model
tree learner and the J48 decision tree learner have been
machine learning techniques for devolving an effective model
implemented in these projects [10]. Table III, Table IV shows
for software bug prediction through evaluating the presented the information about dataset, and software metrics (features).
software bug prediction models in previous studies. Different
machine learning techniques have different characteristic like TABLE III. DESCRIPTIONS OF DATASETS (PROJECTS) USED IN THIS STUDY
speed, accuracy, interpretability, and simplicity. This study
focused on the studies that applied machine learning # %
Projects Language Description
Modules Defects
algorithms and performance measures that most used. Looking
at the results achieved in the literature review and the results Real-time predictive
ground system: Uses
achieved in our study, machine learning techniques are well JM1 10885 19% C
simulations to generate
applicable to static code analysis for software bug prediction. predictions.
VI. SOFTWARE METRICS (FEATURES) AND DATASETS Flight software for earth
PC1 1107 6.8% C
orbiting satellite.
Software metrics are a quantitative and standard measure Storage management for
of some property of software that assigns numbers or symbols KC1 2107 15.4% C++ receiving and processing
to attributes of the measured entity. Software metrics can be ground data.
used to collect information regarding structural properties of a Software for science data
software design which can be further statistically analyzed, KC2 523 20% C++
processing.
interpreted and linked to its quality. In software comprise
complexity, cohesion, and coupling related metrics can be TABLE IV. DESCRIPTIONS OF SOFTWARE METRICS (FEATURES) USED IN
measured during the software development phases such as THIS STUDY
design or coding and it also used to calculate the quality of
software [19], [34], [36]. Software metrics can be classified to Metrics Type Description
static code metrics and process metrics. Static code metrics It counts the line of code in
Loc McCabe
can be directly extracted from source code, like Lines of Code software module.
(LOC), Cyclomatic Complexity Number (CCN). Object Measure McCabe Cyclomatic
v(g) McCabe
Complexity.
oriented metrics is a subcategory of static code metrics, like
Depth of Inheritance Tree (DIT), coupling between Objects ev (g) McCabe McCabe Essential Complexity.
(CBO), Number of Children (NOC), and Response for Class iv (g) McCabe McCabe Design Complexity.
(RFC). Process metrics can be extracted from Source Code
Total number of operators and
Management system based on historic changes on source code N Derived Halstead
operands.
overtime. Metrics can also be classified based on development
phase of software life cycle, into source code level metrics, V Derived Halstead Volume.
detailed design level metrics or test level metrics. Object- L Derived Halstead Program length.
oriented metrics are often used to assess the testability, D Derived Halstead Measure difficulty.
maintainability or reusability of source code [20], [35].
Commonly dataset that used for software bug prediction I Derived Halstead Measure Intelligence.
domain is promise repository dataset. To perform this E Derived Halstead Measure Effort.
experiment, the data is obtained from the publicly available B Derived Halstead Effort estimate.
and published data in defect prediction datasets that stored
software metrics along with defect information of several T Derived Halstead Time Estimator.
projects, these datasets were collected from real software Number of lines in software
Locoed Line Count
projects by NASA. These public domain datasets are used in module.
this experiment because this is a benchmarking procedure of Locomment Line Count Number of comments.
defect prediction research [17, 21]. To perform machine
Loblank Line Count Number of blank lines.
learning on the available source code, it is necessary to
establish a set of features that can be extracted that contain the Locodeandcom
Line Count Number of codes and comments.
information needed. Many studies [4, 6, 7, and 14] use ment
software metrics as independent variables to measuring the uniq_op Basic Halstead Unique operators.
quality of software modules and build software bug prediction uniq_opnd Basic Halstead Unique operands.
models. It is intuitive to think that the bug proneness of a
module is correlated with its complexity; therefore, bug total_op Basic Halstead Total operators.
prediction studies usually employ product metrics to improve total_opnd Basic Halstead Total operands.
prediction accuracy. The projects used in this study were BranchCount Branch Total Number of branch count.
developed using different programming languages and include
heterogeneous code metrics like Object-Oriented (OO)
730 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
731 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
Performance measures
proposed model
Accuracy Precision Recall F1-score
DT 0.99 0.99 1.00 0.99
NB 0.80 0.81 0.97 0.89
RF 0.99 0.99 1.00 0.99
LR 0.81 0.82 0.99 0.89 Fig. 5. Average of Accuracy Measure of Models across the JM1 and PC1
Dataset.
Performance measures
proposed model
Accuracy Precision Recall F1-score
DT 0.99 0.99 1.00 1.00
NB 0.91 0.94 0.96 0.95
RF 0.99 0.99 1.00 1.00
LR 0.93 0.94 0.99 0.96
Performance measures
proposed model
Accuracy Precision Recall F1-score
DT 0.99 0.99 1.00 0.99
NB 0.85 0.88 0.96 0.92
RF 0.99 0.99 1.00 0.99
LR 0.85 0.87 0.96 0.92
Fig. 6. Average of Accuracy Measure of Models across the KC1 and KC2
Dataset.
732 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
Fig. 7. Comparison of ROC Curves for Models across the JM1 Dataset. Fig. 9. Comparison of ROC Curves for Models across the KC1 Dataset.
Fig. 10. Comparison of ROC Curves for Models across the KC2 Dataset.
Fig. 8. Comparison of ROC Curves for Models across the PC1 Dataset.
733 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
TABLE X. COMPARING THE RESULTS OF OUR STUDY WITH THE RESULTS X. CONCLUSION
OF STUDIES WHICH USES THE SAME DATASET AND ALGORITHMS ACROSS THE
JM1 AND PC1 DATASET Software bug prediction is very important field in static
code analysis to improve software quality and reliability. It is
JM1 dataset
an approach, in which a prediction model is constructed for
Studies the purpose of predicting future software defects based on
Performance ML
measure models First Second Third Our historical data using some software metrics. Many approaches
Study Study Study study have been presented using various datasets, various metrics,
DT - - 0.81 0.99 and various performance measures. The aims of this study are
successfully achieved. The aims are evaluate and present
Accuracy NB - - 0.81 0.80 comprehensive study on machine learning techniques have
RF - - 0.82 0.99 been used for software bug prediction in recent years and
DT - - 0.90 0.99
apply the best techniques for software bug prediction in this
study. To compare and evaluate the performance of the
NB 0.75 - 0.89 0.89 proposed models, we used different performance measures.
F1-score
RF 0.76 - 0.90 0.99 The results concluded that ML techniques are gaining interest
LR 0.74 - - 0.89
in software bug prediction, to improve the efficiency of bug
detection. Four NASA public datasets were chosen for this
pc1 dataset experiment and analyze the performance of models. The
DT - - 0.93 0.99 experimental results revealed that the DT and RF classifiers
Accuracy NB - - 0.88 0.91
are better than others classifiers. Static code analysis requires
further research to identify and detect of software bugs and
RF - - 0.93 0.99 several machine learning techniques can be used to improve
DT - - 0.97 1.00 results. As a future work, we plan to introduce other machine
NB 0.89 - 0.94
learning techniques with data balancing techniques to improve
0.95
F1-score the accuracy for predicting software bugs.
RF 0.91 - 0.97 1.00
LR 0.91 - - 0.96
ACKNOWLEDGMENT
The authors gratefully acknowledge the financial
TABLE XI. COMPARING THE RESULTS OF OUR STUDY WITH THE RESULTS assistance from the Institute of Information Science, Faculty
OF STUDIES WHICH USES THE SAME DATASET AND ALGORITHMS ACROSS THE of Mechanical Engineering and Informatics, University of
KC1 AND KC2 DATASET Miskolc.
kc1 dataset REFERENCES
Studies [1] Y. Li, S. Wang, T. N. Nguyen, and S. V. Nguyen, “Improving bug
Performance ML detection via context-based code representation learning and attention-
measure models First Second Third Our based neural networks”, in Proceedings of the ACM on Programming
Study Study Study study Languages, vol. 3, OOPSLA, paper no. 162, pages 1‒30, 2019.
DT - - 0.84 0.99 [2] L. Jonsson, M. Borg, D. Broman, K. Sandahl, S. Eldh, and P. Runeson,
“Automated bug assignment: Ensemble-based machine learning in large
Accuracy NB - 0.82 0.82 0.85 scale industrial contexts”, Empirical Software Engineering, vol. 21, pp.
RF - - 0.85 0.99 1533‒1578, 2016.
[3] T. Chappelly, C. Cifuentes, P. Krishnan and S. Gevay, “Machine
Precision NB - 0.80 - 0.88 learning for finding bugs: An initial report” in IEEE Workshop on
Recall NB - 0.83 - 0.96 Machine Learning Techniques for Software Quality Evaluation,
Klagenfurt, Austria, 21 -21 February 2017, pp. 21‒26.
DT - - 0.92 0.99 [4] S. K. Pandey, R. B. Mishra, and A. K. Tripathi, “BPDET: An effective
NB 0.82 0.81 0.90 0.92 software bug prediction model using deep representation and ensemble
F1-score learning techniques”, Expert Systems with Applications, vol. 144, paper
RF 0.82 - 0.92 0.99 no. 113085, 2020.
LR 0.81 - - 0.92 [5] A. Hammouri, M. Hammad, M. Alnabhan, and F. Alsarayrah, “Software
bug prediction using machine learning approach”, International Journal
kc2 dataset of Advanced Computer Science and Applications, vol. 9, no. 2, pp. 78‒
83, 2018.
DT - - 0.82 0.98
[6] S. K. Pandey, R. B. Mishra, and A. K. Triphathi, “Software bug
Accuracy NB - - 0.84 0.83 prediction prototype using Bayesian network classifier: A
comprehensive model”, Procedia Computer Science, vol. 132, pp. 1412‒
RF - - 0.82 0.98 1421, 2018.
DT - - 0.89 0.99 [7] S. S. Meenakshi, “Software bug prediction using machine learning
approach”, International Research Journal of Engineering and
NB 0.80 - 0.90 0.90
F1-score Technology, vol. 6, no. 4, pp. 4968‒4971, 2019.
RF 0.76 - 0.89 0.99 [8] I. U. N. Uqaili, S. N. Ahsan, “Machine learning based prediction of
complex bugs in source code”, The International Arab Journal of
LR 0.79 - - 0.91
Information Technology, vol. 17, no. 1, pp. 26‒37, 2020.
734 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021
[9] Károly, Nehéz, and Khleel Nasraldeen Alnor Adam. "Tools, processes [25] ÖZTÜRK, Elife, Kökten Ulaş Birant, and Derya Birant. "An Ordinal
and factors influencing of code review." Multidiszciplináris Classification Approach for Software Bug Prediction." Dokuz Eylül
Tudományok 10.3 (2020): 277-284. Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 21.62
[10] Aleem, Saiqa, Luiz Fernando Capretz, and Faheem Ahmed. (2019): 533-544.
"Comparative performance analysis of machine learning techniques for [26] Ferenc, Rudolf, et al. "An automatically created novel bug dataset and
software bug detection." ITCS, CST, JSE, SIP, ARIA, DMS (2015): 71- its validation in bug prediction." Journal of Systems and Software 169
79. (2020): 110691.
[11] M. J. Islam, P. Pan, G. Nguyen, and H. Rajan,“A comprehensive study [27] Pecorelli, Fabiano, and Dario Di Nucci. "Adaptive selection of
on deep learning bug characteristics”, in Proceedings of the 2019 27th classifiers for bug prediction: A large-scale empirical analysis of its
ACM Joint Meeting on European Software Engineering Conference and performances and a benchmark study." Science of Computer
Symposium on the Foundations of Software Engineering, Tallinn, Programming 205 (2021): 102611.
Estonia, 26–30 August 2019, pages 1‒11, 2019. [28] Sharma, Shubham, and Sandeep Kumar. "Analysis of Ensemble Models
[12] S. Gitika, S. Sharma, and S. Gujral. “A novel way of assessing software for Aging Related Bug Prediction in Software Systems." ICSOFT. 2018.
bug severity using dictionary of critical terms”, Procedia Computer [29] Kumar, Raj. "Multiclass Software Bug Severity Classification using
Science, vol. 70, pp. 632‒639, 2015. Decision Tree, Naive Bayes and Bagging." Turkish Journal of Computer
[13] P. Maltare and V. Sharma, “Implementation advance technique for and Mathematics Education (TURCOMAT) 12.2 (2021): 1859-1865.
prediction bug using machine learning”, International Journal of [30] Ferenc, Rudolf, et al. "Deep learning in static, metric-based bug
Computer Science and Information Technologies, vol. 8, no. 1, pp. 16‒ prediction." Array 6 (2020): 100021.
19, 2017.
[31] Ye, Xin, et al. "Bug Report Classification using LSTM architecture for
[14] S. D. Immaculate, M. F. Begam, and M. Floramary. “Software bug more accurate software defect locating." 2018 17th IEEE International
prediction using supervised machine learning algorithms”, in Conference on Machine Learning and Applications (ICMLA). IEEE,
International Conference on Data Science and Communication, 2018.
Bangalore, India, 1-2 March 2019, pages 1‒7, 2019.
[32] Bani-Salameh, Hani, and Mohammed Sallam. "A Deep-Learning-Based
[15] G. Rodríguez-Pérez, A. Serebrenik, A. Zaidman, D. M. Germán and J. Bug Priority Prediction Using RNN-LSTM Neural Networks." e-
M. Gonzalez-Barahona, “How bugs are born: a model to identify how Informatica Software Engineering Journal 15.1 (2021).
bugs are introduced in software components”, Empirical Software
Engineering, vol. 25, pp. 1294‒1340, 2020. [33] Pascarella, Luca, Fabio Palomba, and Alberto Bacchelli. "Re-evaluating
method-level bug prediction." 2018 IEEE 25th International Conference
[16] M. Sharma, P. Bedi, K.K. Chaturvedi, and V.B. Singh, “Predicting the on Software Analysis, Evolution and Reengineering (SANER). IEEE,
priority of a reported bug using machine learning techniques and cross 2018.
project validation”, in 12th International Conference on Intelligent
[34] Puranik, Shruthi, Pranav Deshpande, and K. Chandrasekaran. "A novel
Systems Design and Applications, Kochi, India, 27-29 November 2012,
pp. 539‒545, 2012. machine learning approach for bug prediction." Procedia Computer
Science 93 (2016): 924-930.
[17] Shirabad, J. Sayyad, and Tim J. Menzies. "The PROMISE repository of
software engineering databases." School of Information Technology and [35] Saharudin, S. N., Wei, K. T. & Na, K. S. (2020). Machine Learning
Engineering, University of Ottawa, Canada 24 (2005). Techniques for Software Bug Prediction: A Systematic Review. Journal
of Computer Science, 16(11), 1558-1569.
[18] M. Efendioglu, A. Sen, and Y. Koroglu. “Bug prediction of system C
models using machine learning”, IEEE Transactions on Computer-Aided [36] Gupta, Varuna, N. Ganeshan, and Tarun K. Singhal. "Developing
Design of Integrated Circuits and Systems, vol. 38, no. 3, pp. 419‒429, software bug prediction models using various software metrics as the
2019. bug indicators." International Journal of Advanced Computer Science
and Applications (IJACSA) 6.2 (2015).
[19] Rajkumar, V. and V. Venkatesh. “Hybrid Approach for Fault Prediction
in Object-Oriented Systems.” (2017). [37] Baarah, Aladdin, et al. "Machine learning approaches for predicting the
severity level of software bug reports in closed source
[20] Meiliana, Syaeful Karim, et al. "Software Metrics for Fault Prediction projects." International Journal of Advanced Computer Science and
Using Machine Learning Approaches." IEEE (2017). Applications 10.10.14569 (2019).
[21] Iqbal, Ahmed, et al. "Performance analysis of machine learning [38] Qin, Fangyun, Xiaohui Wan, and Beibei Yin. "An empirical study of
techniques on software defect prediction using NASA datasets." Int. J. factors affecting cross-project aging-related bug prediction with
Adv. Comput. Sci. Appl 10.5 (2019): 300-308. TLAP." Software Quality Journal 28.1 (2020): 107-134.
[22] Baarah, Aladdin, et al. "Machine learning approaches for predicting the [39] Qin, Fangyun, et al. "Studying aging-related bug prediction using cross-
severity level of software bug reports in closed source projects." Mach project models." IEEE Transactions on Reliability 68.3 (2018): 1134-
Learn (2019). 1153.
[23] Kukkar, Ashima, et al. "A novel deep-learning-based bug severity [40] Som Gupta and Sanjai Kumar Gupta, “A Systematic Study of Duplicate
classification technique using convolutional neural networks and Bug Report Detection” International Journal of Advanced Computer
random forest with boosting." Sensors 19.13 (2019): 2964. Science and Applications(IJACSA), 12(1), 2021.
[24] Moustafa, Sammar, et al. "Software bug prediction using weighted
majority voting techniques." Alexandria engineering journal 57.4
(2018): 2763-2774.
735 | P a g e
www.ijacsa.thesai.org