A Systematic Survey of Just-In-time Software Defect Prediction
A Systematic Survey of Just-In-time Software Defect Prediction
This is an supplement that accompanies the survey article with the same title. For convenience,
we use “this survey” or “the survey” to refer to the survey article, and “this supplement” or “the
supplement” to this online supplement.
In the supplement, Section 1 details the methodology of this systematic survey, Section 2 il-
lustrates the number of Just-in-time Software Defect Prediction (JIT-SDP) studies over time,
Section 3 lists the software projects used for evaluating JIT-SDP models in the surveyed JIT-SDP
studies, Section 4 gives additional and detailed information about software metrics (features or
independent variables), Section 5 lists machine learning models that the JIT-SDP studies are based
on, Section 6 examines the availability of replication packages in the JIT-SDP studies, and finally,
Section 7 lists the surveyed JIT-SDP studies and provides a one-sentence summary describing the
primary topic of each study.
1 REVIEW METHODOLOGY
Kitchenham et al. [39, 40] advocate a systematic literature review method in software engineering
aiming at providing scientific value to the research community. According to Kitchenham
et al. [38, 40], a systematic literature review process consists of the stages of planning the review
(including identifying the need for the review, specifying the research questions, and developing
a review protocol), conducting the review, and reporting the review.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:2 Y. Zhao et al.
method called the backward (or reverse) snowball where we examine the references of an identified
article. Empirical evidence suggests that the snowball method should be effective to locate “high-
quality sources in obscure locations” [20].
1.2 Planning
Kitchenham et al. [37, 38] published a guideline in 2007 and refined it in 2013. Applying the method
by Kitchenham et al., we begin this research with an exploratory phase, an informal search and ex-
amination of literature about defect prediction. This belongs to the planning stage of Kitchenham
et al.’s systematic method.
Software defect prediction (SDP) has been a research subject since the 1970s. Not only have
researches in this area evolved and taken different directions but also there are relevant systematic
surveys in the literature developed over time. Following the exploration phase, we proceed to the
second phase of the planning stage, i.e., we carry out a meta-survey whose process we describe in
Section 1.2.1 of this supplement. Kitchenham et al. term this type of survey as a tertiary survey, a
systematic survey of systematic surveys [38] and argue that it is potentially less resource intensive
to conduct a tertiary survey than conduct a new systematic review of primary studies to answer
wider research questions [38]. In this meta-survey phase, we investigate existing surveys on SDP.
As the result of this phase, we articulate the need to conduct this literature review on JIT-SDP in
Section 1 of this survey and define the research questions in Section 1.3.1 of this supplement.
Following the planning stage, we turn our focus to a systematic literature review on JIT-SDP
and describe the process in Section 1.3 of this supplement. With this focused survey, we answer
the research questions in Sections 3, 4, and 5 of the survey.
1.2.1 Meta-survey. The goal of the meta-survey is to define the scope of SDP, to learn its re-
lationship with the related areas, to understand the topics surveyed in prior literature surveys or
reviews on SDP.
Researchers and practitioners have used a range of terms to refer the scenarios that software
exhibit undesired behavior or outputs. These terms include “defect,” “fault,” “bug,” “error,” “failure,”
and “exception.” These occur in either a piece of “software” or in a “program.” Based on these, we
construct Query 1. The digital libraries in Table 1 vary by their user interfaces and the query is to
convey the semantics of our queries using the digital libraries.
Survey Article Publication Venues. Kitchenham et al. points out that the quality of tertiary sur-
veys depends on the quantity and quality of systematic reviews [38]. To control the quality of
the meta-survey, we choose only survey papers from the most significant software engineer-
ing journals and conferences by consulting Google Scholar,1 the Computing Research and
1 See https://scholar.google.com/citations?view_op=top_venues&vq=eng_softwaresystems.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:3
2 See http://portal.core.edu.au/jnl-ranks/?search=software&by=all&source=CORE2020&sort=arank&page=1.
3 See https://academic.microsoft.com/.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:4 Y. Zhao et al.
Publication Publication
Type
ACM/IEEE International Conference on Software Engineering (ICSE)
ACM SIGSOFT International Symposium on Foundations of Software Engi-
neering (FSE)
Conferences ACM/IEEE International Conference on Automated Software Engineering
(ASE)
ACM/IEEE International Conference on Mining Software Repositories (MSR)
ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM)
IEEE International Conference on Software Maintenance and Evolution
(ICSME)
IEEE International Conference on Software Quality, Reliability and Security
(QRS)
ACM Transactions on Software Engineering and Methodology (TSEM)
IEEE Transactions on Software Engineering (TSE)
IEEE Transactions on Reliability (TR)
(Elsevier) Journal of Systems and Software (JSS)
Journals (Elsevier) Information and Software Technology (IST)
(Elsevier) Applied Soft Computing
(Springer) Empirical Software Engineering (ESE)
(Wiley) Software Testing, Verification & Reliability (STVR)
(Wiley) Software: Practice and Experience
This table is for filtering the digital library search results to identify SDP surveys. It is not for identifying JIT-SDP studies.
Survey Coverage
No. Authors
Duration # of Articles Survey Topic
Surveyed
SV1 Li et al. [42] 2000–2018 49Unsupervised SDP
SV2 Li et al. [44] 2014–2017 70Comprehensive
SV3 Hosseini et al. [26] 2002–2017 46Cross-project SDP
SV4 Kamei and Shihab [32] 1992–2015 65Comprehensive
SV5 Malhotra [47] 1995–2013 64Within-project and
cross-project SDP
SV6 Radjenović et al. [56] 1991–2011 106 Software metrics for SDP
SV7 Hall et al. [22] 2000–2010 208 Within-project and
cross-project SDP
SV8 Catal et al. [9]1 1990–2009 68 Datasets, metrics, and models
SV9 Fenton and Neil [16]2 1971–1999 55 Defect, failure, quality,
complexity, metrics, and models
1 Catal et al. [9] investigate 90 software defect/fault prediction papers in their survey, but only cite 68. We use
this as the number of papers studied in their survey.
2 Fenton and Neil [16] do not list explicitly the paper surveyed, and we only count the papers relevant to software
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:5
prediction, and software defect prediction. It begs the questions how we define SDP, what
the scope of SDP is, and how we differentiate it from the related areas.
RQ.2 What is the scope of the JIT-SDP research? JIT-SDP is an area in SDP. To comprehend the
studies in JIT-SDP and present our understanding of it in a way complementary to other than
repeating prior surveys in SDP, we need to identify the scope of JIT-SDP and to differentiate
JIT-SDP from SDP that has been investigated in prior surveys, and present our understanding
in the context of SDP, a larger area than JIT-SDP.
RQ.3 What are input data and the features or the independent variables in JIT-SDP? A necessary
type of data to JIT-SDP is software changesets. Are there any other types of data that can
help improve JIT-SDP? What are the features that we can extract from the input data? How
do these features impact JIT-SDP performance? These are not only helpful to build JIT-SDP
models but also potentially aid our understanding on the relationship between factors in
the software development life cycle process and defect occurrences, which help produce
explainable and actionable models and insights.
RQ.4 On what target do we make predictions and what are the dependent variables in JIT-SDP?
JIT-SDP is to predict defects on software changes. Are software changes the only target on
which we predict defects? What are we really predicting? In other words, is defect proneness
the only dependent variable? Understanding these is important to understand the limitation
and the potential of existing model techniques.
RQ.5 What are the modeling techniques in JIT-SDP? Statistical analysis and machine learning are
important model building techniques for JIT-SDP. What are the machine learning techniques
used in JIT-SDP and how do they compare, such as in terms of predicative performance? This
helps address several issues. First, what machine learning techniques should we explore to
continue to improve JIT-SDP? Second, which machine learning technique should we choose
as a baseline to compare with if we are to build a new model? Third, if a user wishes to use
JIT-SDP to help QA, then which machine learning model should the user choose? Last, is
machine learning the only way to build JIT-SDP models? If not, then how do the alternative
approaches perform when compared with machine learning?
RQ.6 What are the evaluation strategies and criteria used in the existing JIT-SDP models? First,
to understand the strength and the limitation of a JIT-SDP model, we need to know how
we evaluate it. Second, to be able to assemble and compare existing models, we need to
understand the evaluation criteria and strategies.
RQ.7 How is performance of JIT-SDP models with respect to the evaluation criteria? Which JIT-
SDP model performs the best? This helps researchers to develop new models and compare it
with the existing ones and helps users to select existing ones to build applications of JIT-SDP.
RQ.8 How are the JIT-SDP researches address the reproducibility (or replication) problem? Repro-
ducibility is an important problem that has garnered increased scrutiny from the research
community and the public in empirical research. JIT-SDP is an empirical research. Repro-
ducibility is an important concern. How is the practice of the prior JIT-SDP studies with
regard to facilitating replication to examine whether a JIT-SDP research is reproducible?
We focus our survey on JIT-SDP. The answer to RQ.1 is thus out of scope of this survey. The an-
swer to RQ.2 is in Section 3.1 of the survey where we define Release SDP and JIT-SDP. Sections 3.2
and 3.3 of the survey answer RQ.3. In Sections 3.6.1 and 3.6.2 of the survey, we divide JIT-SDP mod-
els into two categories, defect prediction and effort-aware prediction based on dependent variables,
which is an answer to RQ.4. Section 3.6 of the survey documents modeling techniques, thus an-
swers RQ.5. For RQ.6, we report JIT-SDP evaluation strategies in Section 3.7 of the survey. Through
a synthesis of the prior JIT-SDP studies, we provide an answer to RQ.7. In Section 6 of this online
supplement, we collect and discuss replication package and data, which is an answer to RQ.8.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:6 Y. Zhao et al.
1.3.2 Digital Library Keyword Search Query for JIT-SDP. Kamei et al. coined the term “Just-in-
time” Quality Assurance in their 2012 article [33]. JIT-SDP is the change-level SDP, i.e., is to predict
existence of defects in software changes. Mockus and Weiss appear to be the first to examine
change-level defect prediction [49] in year 2000. Following Query 2, we search the digital libraries
in Table 1.
1.3.3 Literature Selection via 2-Pass Review. We combine all of the search results from the digital
libraries and remove duplicates and divide the set of articles among the authors of this survey to
evaluate whether to include or discard an article. The division ensures that we assign each article
to two of the three authors and each article goes through two reviews by the two assigned authors
(thus, the 2-pass review). Each author follows the following process. First, we remove any article
whose title clearly indicates that it is not relevant. Second, for the remaining articles, we evaluate
whether or not to include them by reading the abstract. Finally, we convene a meeting and resolve
the difference via a discussion.
1.3.4 Exclusion and Inclusion Criteria. We include only articles written in English that study
predictive modeling for JIT-SDP whose prediction is on the level or sub-level of software changes.
For instance, we exclude Amasaki et al. [2], because they make predictions on the level of software
component albeit claiming that they study JIT-SDP. We also exclude non-peer reviewed articles,
posters, and abstract-only articles.
1.3.5 Results of Literature Search for JIT-SDP. We search JIT-SDP studies published from 2000
and completed our literature search in November 2021. Table 4 summarizes the literature search
process and the results. The digital library keyword search yield 881 entries. After we remove
duplicates, complete a two-pass review, we identify 55 JIT-SDP articles. We then begin the snow-
balling process on these 55 JIT-SDP articles. As shown in Table 4, these 55 JIT-SDP papers lists
in total 2,563 entries in their reference sections. After removing duplicates and a two-pass review,
we find 12 additional JIT-SDP articles. Table 15 in Section 7 lists these 67 articles and provides a
one-sentence summary describing the primary topic of each study.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:7
2 PUBLICATIONS TREND
Figure 1 plots the number of selected JIT-SDP papers versus publication year.4 It shows that there
has been an elevated interest in JIT-SDP in recent years.
3 EVALUATION DATA
Table 5 is a summary of software projects used for evaluating JIT-SDP models. Most studies use
open source projects. As shown in Table 5, 11 studies include proprietary/commercial projects
among the 67 papers surveyed (listed in Table 15).
4 The publication year is from the online publication date if available. The online publication date may be different from
the bibliographic or the final publication date.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:8 Y. Zhao et al.
Count
Projects Study
Total(Proprietary)
11(5) Bugzilla, Eclipse JDT, Eclipse Platform, Mozilla, Kamei et al. [33]
Columba, PostgreSQL; 5 commercial projects
Bugzilla, Columba, Eclipse JDT, Eclipse Platform, Yang et al. [73]
6(0)
Mozilla, PostgreSQL Yang et al. [77]
Fu and Menzies [17]
Huang et al. [27]
Liu et al. [46]
Yang et al. [72]
Young et al. [78]
Albahli [1]
Qiao and Wang [54]
Yang et al. [76]
Yang et al. [74]
Zhang et al. [81]
Huang et al. [28]
Chen et al. [11]
Li et al. [43]
Yang et al. [75]
Zhu et al. [85]
5(0) Bugzilla, Columba, Eclipse Platform, Mozilla, Bennin et al. [6]
PostgreSQL
4(0) Eclipse Platform, Eclipse JDT, Mozilla, PostgreSQL Jahanshahi et al. [29]
4(0) Bugzilla, Eclipse Platform, Eclipse JDT, Mozilla Tessema et al. [64]
11(0) Bugzilla, Columba, Eclipse JDT, Eclipse Platform, Kamei et al. [31]
Mozilla, PostgreSQL; Gimp, Maven-2, Perl, Ruby on
Rails, Rhino
Fukushima et al. [18]
2(0) QT, OpenStack McIntosh and Kamei [48]
Hoang et al. [24]
Rodriguezperez et al. [58]
Hoang et al. [25]
Gesi et al. [19]
Pornprasit et al. [53]
Zeng et al. [80]
Android Firewall, Alfresco, Android Sync, Android
15(0) Catolino et al. [10]
Wallpaper, AnySoft Keyboard, Apg, Applozic Android
Zhao et al. [83]
SDK, Chat Secure Android, Delta Chat, Android Uni-
Zhao et al. [82]
versal Image Loader, Kiwix, Observable Scroll View,
Zhao et al. [84]
Own Cloud Android, Page Turner, Notify Reddit
(Continued)
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:9
Table 5. Continued
Count
Projects Study
Total(Proprietary)
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:10 Y. Zhao et al.
Table 5. Continued
Count
Projects Study
Total(Proprietary)
et al. argue that the context lines of a software change, i.e., the lines of code surround the changed
lines in a software change has an impact on defect proneness of the software change [41]. They
propose and evaluate a suite of metrics called the context metrics, i.e., the metrics computed from
the context lines [41]. Additionally, they also adapt the indentation metric from Hindle et al. [23]
and propose two change complexity metrics [41]. Table 6 lists the metrics by Kamei et al., Liu et al.,
and Kondo et al. [33, 41, 46] organized according to different categories.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:11
Pascarella et al. investigate defect prediction on a finer granularity than software changes, i.e., to
predict whether a specific file in the software change is defect prone [52]. They adapt the process
metrics in Rahman and Devanbu [57] and evaluate a suite of file-level process metrics for change-
sets [52]. Table 7 summarizes file-level software change metrics; each of the metrics is computed
on a file in the changeset.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:12 Y. Zhao et al.
dictionary, and finally to form a vector recording occurrences of the dictionary words in a commit
message. The TF vector of a commit message is likely a sparse vector with most elements as 0.
Barnett et al. [5] hypothesize that the level of detail in commit messages is useful for JIT-SDP and
confirm it via their investigation of more than 300 repositories. For this, they propose two commit
message metrics, commit volume and commit content. The former is the number of words in a
commit message after the stop words are removed. The later is a score computed via a SPAM filter,
and this is in effect a feature representation of the commit message and a surrogate representing
the content of the commit message. Table 8 lists these metrics.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:13
Metrics Description
CR-TTM Time span between submission of the change request to its resolution (a new
change)
CR-NDA Number of developers involved in the change request
CR- Priorities assigned to the change request
PRIORITY
CR- Severity assigned to the change request
SEVERITY
CR-NC Number of comments about the change request
CR-DD Depth of discussion computed as the number of words used during the dis-
cussion of the change request
Tessema and Abebe [64] propose six metrics for change requests in ITS. They augment these
metrics with Kamei et al.’s [33] software change metrics and show that the JIT-SDP models with
augmented metrics outperform those with the change metrics alone. Table 10 summarizes the six
change request metrics. These metrics are from the meta-data of the change requests.
5 JIT-SDP MODELS
The prior JIT-SDP studies have examined a broad range of machine learning algorithms. As such,
one may argue that JIT-SDP is a microcosm of recent development in machine learning. It is im-
portant to note that there have been several investigations of non-machine-learning algorithms
for JIT-SDP. We refer to this type of JIT-SDP models as searching-based models. Searching-based
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:14 Y. Zhao et al.
models like those in Yang et al. [77] and Liu et al. [46] are unsupervised. Several studies extend
these unsupervised searching-based models by adding a supervised component to improve their
predictive performance. Table 12 lists the modeling techniques in the prior JIT-SDP studies. It
shows that Logistic Regression, Tree-based models (including Random Forest, C4.5 Decision Tree
and ADTree) and ensemble models (including Random Forest, XGBoost, and others) are more
popular modeling techniques and the use of neural network-based models (including deep neural
networks) is on the rise.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:15
Algorithm Studies
K-Nearest Bennin et al. [6], Kang et al. [34], Tian et al. [65], Aversano et al. [4]
Neighbor
Linear Kamei et al. [33], Tian et al. [65], Yan et al. [70]
Regression
Non-linear Rodriguezperez et al. [58], McIntosh and Kamei [48]
Regression
Logistic Duan et al. [12], Lin et al. [45], Zeng et al. [80], Yang et al. [75], Kang et al. [34], Li
Regression et al. [43], Trautsch et al. [67], Yan et al. [69], Yan et al. [70], Catolino et al. [10],
Fan et al. [15], Huang et al. [28], Kondo et al. [41], Yang et al. [74], Chen
et al. [11], Huang et al. [27], Tourani and Adams [66], Rosen et al. [59], Jiang
et al. [30], Tarvo et al. [63], Kamei et al. [33], Aversano et al. [4], Mockus and
Weiss [49], Tessema et al. [64]
Naive Bayes Duan et al. [12], Eken et al. [14], Bennin et al. [6], Kang et al. [34], Tian et al. [65],
Catolino et al. [10], Fan et al. [15], Zhu et al. [86], Barnett et al. [5], Jiang
et al. [30], Shivaji et al. [60]
Decision Table Catolino et al. [10],
C4.5 Decision Zhu et al. [86], Tarvo et al. [63], Aversano et al. [4]
Tree
Alternating Tan et al. [62], Jiang et al. [30]
Decision Tree
(ADTree)
Random Forest Fukushima et al. [18], Kamei et al. [31], Yang et al. [71], Nayrolles et al. [51], Zhu
et al. [86], Borg et al. [7], Catolino et al. [10], Fan et al. [15], Jahanshahi et al. [29],
Kondo et al. [41], Pascarella et al. [52], Yang et al. [76], Bennin et al. [6], Kang
et al. [34], Khanan et al. [35], Li et al. [43], Trautsch et al. [67], Tian et al. [65],
Duan et al. [12], Lin et al. [45], Pornprasit et al. [53], Quach et al. [55], Tessema
et al. [64]
Support Vector Kang et al. [34], Li et al. [43], Catolino et al. [10], Zhu et al. [86], Shivaji et al. [60],
Machine Kim et al. [36], Aversano et al. [4]
Neural Network Yang et al. [73], Hoang et al. [24], Qiao and Wang [54], Bennin et al. [6], Hoang
& Deep Neural et al. [25], Kang et al. [34], Tian et al. [65], Zhu et al. [85], Tessema et al. [64]
Network Ardimento et al. [3], Gesi et al. [19], Xu et al. [68], Zeng et al. [80], Zhao
et al. [83], Zhao et al. [82],
Deep Forest Zhao et al. [84]
Ensemble Bennin et al. [6], Eken et al. [13], Tessema et al. [64]
(XGBoost)
Ensemble Aversano et al. [4], Yang et al. [72], Young et al. [78], Albahli [1], Cabral et al. [8],
(others) Catolino et al. [10], Zhang et al. [81], Li et al. [43], Tabassum et al. [61], Tian
et al. [65], Tessema et al. [64]
Spam Filter (Text Mori et al. [50]
Classifier)
Searching-based Liu et al. [46], Yang et al. [77]
Algorithm
Supervised Yan et al. [70], Huang et al. [28], Huang et al. [27], Fu and Menzies [17]
Learning +
Searching-based
Algorithm
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:16 Y. Zhao et al.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:17
Replication
Package Original Study Dependent Study
Name
Kamei-2012 Kamei et al. [33] Fukushima et al. [18], Yang et al. [73], Kamei et al. [31],
Yang et al. [77], Huang et al. [27], Liu et al. [46], Guo
et al. [21], Young et al. [78], Chen et al. [11], Jahan-
shahi et al. [29], Huang et al. [28], Albahli [1], Ben-
nin et al. [6], Li et al. [43], Yang et al. [75], Tessema
et al. [64]
Yang-2016 Yang et al. [77] Fu and Menzies [17]
McIntosh- McIntosh et al. [48] Hoang et al. [24], Hoang et al. [25], Rodriguezperez
2017 et al. [58]
Catolino- Catolino et al. [10] Xu et al. [68], Zhao et al. [83], Zhao et al. [82], Zhao
2019 et al. [84]
Hoang-2019, Hoang et al. [25] and Gesi et al. [19], Pornprasit et al. [53], Zeng et al. [80]
Hoang-2020 Hoang et al. [24]
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:18 Y. Zhao et al.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:19
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:20 Y. Zhao et al.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:21
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:22 Y. Zhao et al.
REFERENCES
[1] Saleh Albahli. 2019. A deep ensemble learning method for effort-aware just-in-time defect prediction. Future Internet
11, 12 (2019), 246.
[2] Sousuke Amasaki, Hirohisa Aman, and Tomoyuki Yokogawa. 2021. A preliminary evaluation of CPDP approaches on
just-in-time software defect prediction. In Proceedings of the 47th Euromicro Conference on Software Engineering and
Advanced Applications (SEAA’21). IEEE, 279–286.
[3] Pasquale Ardimento, Lerina Aversano, Mario Luca Bernardi, Marta Cimitile, and Martina Iammarino. 2021. Just-in-
time software defect prediction using deep temporal convolutional networks. Neural Comput. Appl. (2021), 1–21.
[4] Lerina Aversano, Luigi Cerulo, and Concettina Del Grosso. 2007. Learning from bug-introducing changes to prevent
fault prone code. In Proceedings of the 9th International Workshop on Principles of Software Evolution: In Conjunction
with the 6th ESEC/FSE Joint Meeting. 19–26.
[5] Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, and Shane McIntosh. 2016. The relationship between com-
mit message detail and defect proneness in java projects on github. In Proceedings of the IEEE/ACM 13th Working
Conference on Mining Software Repositories (MSR’16). IEEE, 496–499.
[6] Kwabena E. Bennin, Nauman bin Ali, Jürgen Börstler, and Xiao Yu. 2020. Revisiting the impact of concept drift on
just-in-time quality assurance. In Proceedings of the IEEE 20th International Conference on Software Quality, Reliability
and Security (QRS’20). IEEE, 53–59.
[7] Markus Borg, Oscar Svensson, Kristian Berg, and Daniel Hansson. 2019. SZZ unleashed: An open implementation
of the SZZ algorithm-featuring example usage in a study of just-in-time bug prediction for the Jenkins project. In
Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality
Evaluation. 7–12.
[8] George G. Cabral, Leandro L. Minku, Emad Shihab, and Suhaib Mujahid. 2019. Class imbalance evolution and verifi-
cation latency in just-in-time software defect prediction. In Proceedings of the IEEE/ACM 41st International Conference
on Software Engineering (ICSE’19). IEEE, 666–676.
[9] Cagatay Catal. 2011. Software fault prediction: A literature review and current trends. Expert Syst. Appl. 38, 4 (2011),
4626–4636.
[10] Gemma Catolino, Dario Di Nucci, and Filomena Ferrucci. 2019. Cross-project just-in-time bug prediction for mobile
apps: An empirical assessment. In Proceedings of the IEEE/ACM 6th International Conference on Mobile Software Engi-
neering and Systems (MOBILESoft’19). IEEE, 99–110.
[11] Xiang Chen, Yingquan Zhao, Qiuping Wang, and Zhidan Yuan. 2018. MULTI: Multi-objective effort-aware just-in-time
software defect prediction. Info. Softw. Technol. 93 (2018), 1–13.
[12] Ruifeng Duan, Haitao Xu, Yuanrui Fan, and Meng Yan. 2022. The impact of duplicate changes on just-in-time defect
prediction. IEEE Trans. Reliabil. 71, 3 (2022), 1294–1308. DOI:10.1109/TR.2021.3061618
[13] Beyza Eken, RiFat Atar, Sahra Sertalp, and Ayşe Tosun. 2019. Predicting defects with latent and semantic features
from commit logs in an industrial setting. In Proceedings of the 34th IEEE/ACM International Conference on Automated
Software Engineering Workshop (ASEW’19). IEEE, 98–105.
[14] Beyza Eken, Selda Tufan, Alper Tunaboylu, Tevfik Guler, Rifat Atar, and Ayse Tosun. 2021. Deployment of a change-
level software defect prediction solution into an industrial setting. J. Softw.: Evol. Process 33, 11 (2021), e2381.
[15] Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2019. The impact of
changes mislabeled by SZZ on just-in-time defect prediction. IEEE Trans. Softw. Eng. (2019).
[16] Norman E. Fenton and Martin Neil. 1999. A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25,
5 (1999), 675–689.
[17] Wei Fu and Tim Menzies. 2017. Revisiting unsupervised learning for defect prediction. In Proceedings of the 11th Joint
Meeting on Foundations of Software Engineering. 72–83.
[18] Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, and Naoyasu Ubayashi. 2014. An empir-
ical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference
on Mining Software Repositories. 172–181.
[19] Jiri Gesi, Jiawei Li, and Iftekhar Ahmed. 2021. An empirical examination of the impact of bias on just-in-time de-
fect prediction. In Proceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM’21). 1–12.
[20] Trisha Greenhalgh and Richard Peacock. 2005. Effectiveness and efficiency of search methods in systematic reviews
of complex evidence: Audit of primary sources. BMJ 331, 7524 (2005), 1064–1065.
[21] Yuchen Guo, Martin Shepperd, and Ning Li. 2018. Bridging effort-aware prediction and strong classification: A just-
in-time software defect prediction study. In Proceedings of the 40th International Conference on Software Engineering:
Companion Proceeedings. 325–326.
[22] Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. A systematic literature review on
fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38, 6 (2011), 1276–1304.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:23
[23] Abram Hindle, Michael W. Godfrey, and Richard C. Holt. 2008. Reading beside the lines: Indentation as a proxy for
complexity metric. In Proceedings of the 16th IEEE International Conference on Program Comprehension. IEEE, 133–142.
[24] Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: An end-to-end
deep learning framework for just-in-time defect prediction. In Proceedings of the IEEE/ACM 16th International Confer-
ence on Mining Software Repositories (MSR’19). IEEE, 34–45.
[25] Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes.
In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 518–529.
[26] Seyedrebvar Hosseini, Burak Turhan, and Dimuthu Gunarathna. 2017. A systematic literature review and meta-
analysis on cross project defect prediction. IEEE Trans. Softw. Eng. 45, 2 (2017), 111–147.
[27] Qiao Huang, Xin Xia, and David Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-
in-time defect prediction. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution
(ICSME’17). IEEE, 159–170.
[28] Qiao Huang, Xin Xia, and David Lo. 2019. Revisiting supervised and unsupervised models for effort-aware just-in-time
defect prediction. Empir. Softw. Eng. 24, 5 (2019), 2823–2862.
[29] Hadi Jahanshahi, Dhanya Jothimani, Ayşe Başar, and Mucahit Cevik. 2019. Does chronology matter in JIT defect
prediction? A partial replication study. In Proceedings of the 15th International Conference on Predictive Models and
Data Analytics in Software Engineering. 90–99.
[30] Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 28th IEEE/ACM
International Conference on Automated Software Engineering (ASE’13). IEEE, 279–289.
[31] Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E.
Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21, 5 (2016),
2072–2106.
[32] Yasutaka Kamei and Emad Shihab. 2016. Defect prediction: Accomplishments and future challenges. In Proceedings
of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 5. IEEE,
33–45.
[33] Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi.
2012. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (2012), 757–773.
[34] Jonggu Kang, Duksan Ryu, and Jongmoon Baik. 2021. Predicting just-in-time software defects to reduce post-release
quality costs in the maritime industry. Softw.: Pract. Exper. 51, 4 (2021), 748–771. https://onlinelibrary.wiley.com/doi/
abs/10.1002/spe.2927.
[35] Chaiyakarn Khanan, Worawit Luewichana, Krissakorn Pruktharathikoon, Jirayus Jiarpakdee, Chakkrit Tantithamtha-
vorn, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, and Thanwadee Sunetnanta. 2020. JITBot: An explainable
just-in-time defect prediction bot. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software
Engineering (ASE’20). IEEE, 1336–1339.
[36] Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Trans.
Softw. Eng. 34, 2 (2008), 181–196.
[37] Barbara Kitchenham and Pearl Brereton. 2013. A systematic review of systematic review process research in software
engineering. Info. Softw. Technol. 55, 12 (2013), 2049–2075.
[38] Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software
Engineering. Technical Report EBSE-2007-01. School of Computer Science and Mathematics, Keele University.
[39] Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Sys-
tematic literature reviews in software engineering—A systematic literature review. Info. Softw. Technol. 51, 1 (2009),
7–15. https://doi.org/10.1016/j.infsof.2008.09.009
[40] Barbara Ann Kitchenham, David Budgen, and Pearl Brereton. 2015. Evidence-based Software Engineering and System-
atic Reviews. Vol. 4. CRC Press.
[41] Masanari Kondo, Daniel M. German, Osamu Mizuno, and Eun-Hye Choi. 2020. The impact of context metrics on
just-in-time defect prediction. Empir. Softw. Eng. 25, 1 (2020), 890–939.
[42] Ning Li, Martin Shepperd, and Yuchen Guo. 2020. A systematic review of unsupervised learning techniques for soft-
ware defect prediction. Info. Softw. Technol. (2020), 106287.
[43] Weiwei Li, Wenzhou Zhang, Xiuyi Jia, and Zhiqiu Huang. 2020. Effort-aware semi-supervised just-in-time defect
prediction. Info. Softw. Technol. 126 (2020), 106364.
[44] Zhiqiang Li, Xiao-Yuan Jing, and Xiaoke Zhu. 2018. Progress on approaches to software defect prediction. IET Softw.
12, 3 (2018), 161–175.
[45] Dayi Lin, Chakkrit Tantithamthavorn, and Ahmed E. Hassan. 2022. The impact of data merging on the
interpretation of cross-project just-in-time defect models. IEEE Trans. Softw. Eng. 48, 8 (2022), 2969–2986.
DOI:10.1109/TSE.2021.3073920
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:24 Y. Zhao et al.
[46] Jinping Liu, Yuming Zhou, Yibiao Yang, Hongmin Lu, and Baowen Xu. 2017. Code churn: A neglected metric in effort-
aware just-in-time defect prediction. In Proceedings of the ACM/IEEE International Symposium on Empirical Software
Engineering and Measurement (ESEM’17). IEEE, 11–19.
[47] Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Appl. Soft
Comput. 27 (2015), 504–518.
[48] Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? A longitudinal case study of
just-in-time defect prediction. IEEE Trans. Softw. Eng. 44, 5 (2017), 412–428.
[49] Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Tech. J. 5, 2 (2000), 169–180.
[50] Keita Mori and Osamu Mizuno. 2015. An implementation of just-in-time fault-prone prediction technique using text
classifier. In Proceedings of the IEEE 39th Annual Computer Software and Applications Conference, Vol. 3. IEEE, 609–612.
[51] Mathieu Nayrolles and Abdelwahab Hamou-Lhadj. 2018. CLEVER: Combining code metrics with clone detection
for just-in-time fault prevention and resolution in large industrial projects. In Proceedings of the 15th International
Conference on Mining Software Repositories. 153–164.
[52] Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained just-in-time defect prediction. J. Syst. Softw.
150 (2019), 22–36.
[53] Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A simpler, better, faster, finer-grained just-in-
time defect prediction. Retrieved from https://arXiv:2103.07068.
[54] Lei Qiao and Yan Wang. 2019. Effort-aware and just-in-time defect prediction with neural network. PloS One 14, 2
(2019), e0211359.
[55] Sophia Quach, Maxime Lamothe, Bram Adams, Yasutaka Kamei, and Weiyi Shang. 2021. Evaluating the impact of
falsely detected performance bug-inducing changes in JIT models. Empir. Softw. Eng. 26, 5 (2021), 1–32.
[56] Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A
systematic literature review. Info. Softw. Technol. 55, 8 (2013), 1397–1418.
[57] Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Proceedings of the 35th
International Conference on Software Engineering (ICSE’13). IEEE, 432–441.
[58] Gema Rodriguez-Perez, Meiyappan Nagappan, and Gregorio Robles. 2022. Watch out for extrinsic bugs! A case study
of their impact in just-in-time bug prediction models on the OpenStack project. IEEE Trans. Softw. Eng. 48, 4 (2022),
1400–1416. DOI:10.1109/TSE.2020.3021380
[59] Christoffer Rosen, Ben Grawi, and Emad Shihab. 2015. Commit Guru: Analytics and risk prediction of software com-
mits. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 966–969.
[60] Shivkumar Shivaji, E. James Whitehead, Ram Akella, and Sunghun Kim. 2012. Reducing features to improve code
change-based bug prediction. IEEE Trans. Softw. Eng. 39, 4 (2012), 552–569.
[61] Sadia Tabassum, Leandro L. Minku, Danyi Feng, George G. Cabral, and Liyan Song. 2020. An investigation of cross-
project learning in online just-in-time software defect prediction. In Proceedings of the IEEE/ACM 42nd International
Conference on Software Engineering (ICSE’20). IEEE, 554–565.
[62] Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceed-
ings of the IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 99–108.
[63] Alexander Tarvo, Nachiappan Nagappan, Thomas Zimmermann, Thirumalesh Bhat, and Jacek Czerwonka. 2013. Pre-
dicting risk of pre-release code changes with checkinmentor. In Proceedings of the IEEE 24th International Symposium
on Software Reliability Engineering (ISSRE’13). IEEE, 128–137.
[64] Hailemelekot Demtse Tessema and Surafel Lemma Abebe. 2021. Enhancing just-in-time defect prediction using
change request-based metrics. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and
Reengineering (SANER’21). IEEE, 511–515.
[65] Yuli Tian, Ning Li, Jeff Tian, and Wei Zheng. 2020. How well just-in-time defect prediction techniques enhance soft-
ware reliability? In Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security
(QRS’20). IEEE, 212–221.
[66] Parastou Tourani and Bram Adams. 2016. The impact of human discussions on just-in-time quality assurance: An
empirical study on openstack and eclipse. In Proceedings of the IEEE 23rd International Conference on Software Analysis,
Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 189–200.
[67] Alexander Trautsch, Steffen Herbold, and Jens Grabowski. 2020. Static source code metrics and static analysis warn-
ings for fine-grained just-in-time defect prediction. In Proceedings of the IEEE International Conference on Software
Maintenance and Evolution (ICSME’20). IEEE, 127–138.
[68] Zhou Xu, Kunsong Zhao, Tao Zhang, Chunlei Fu, Meng Yan, Zhiwen Xie, Xiaohong Zhang, and Gemma Catolino.
2022. Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans.
Reliabil. 71, 1 (2022), 204–220. DOI:10.1109/TR.2021.3066170
[69] Meng Yan, Xin Xia, Yuanrui Fan, Ahmed E. Hassan, David Lo, and Shanping Li. 2020. Just-in-time defect identification
and localization: A two-phase framework. IEEE Trans. Softw. Eng. (2020).
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:25
[70] Meng Yan, Xin Xia, Yuanrui Fan, David Lo, Ahmed E. Hassan, and Xindong Zhang. 2020. Effort-aware just-in-time
defect identification in practice: A case study at Alibaba. In Proceedings of the 28th ACM Joint Meeting on European
Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1308–1319.
[71] Limin Yang, Xiangxue Li, and Yu Yu. 2017. Vuldigger: A just-in-time and cost-aware tool for digging vulnerability-
contributing changes. In Proceedings of the IEEE Global Communications Conference (GLOBECOM’17). IEEE, 1–7.
[72] Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time
defect prediction. Info. Softw. Technol. 87 (2017), 206–220.
[73] Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In
Proceedings of the IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17–26.
[74] Xingguang Yang, Huiqun Yu, Guisheng Fan, Kai Shi, and Liqiong Chen. 2019. Local versus global models for just-in-
time software defect prediction. Sci. Program. 2019 (2019).
[75] Xingguang Yang, Huiqun Yu, Guisheng Fan, and Kang Yang. 2020. A differential evolution-based approach for effort-
aware just-in-time software defect prediction. In Proceedings of the 1st ACM SIGSOFT International Workshop on Rep-
resentation Learning for Software Engineering and Program Languages. 13–16.
[76] Xingguang Yang, Huiqun Yu, Guisheng Fan, Kang Yang, and Kai Shi. 2019. An empirical study on progressive sampling
for just-in-time software defect prediction. In Proceedings of the International Workshop on Quantitative Approaches to
Software Quality in conjunction with the Asia-Pacific Software Engineering Conference (QuASoQ@APSEC’19). 12–18.
[77] Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016.
Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In
Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 157–168.
[78] Steven Young, Tamer Abdou, and Ayse Bener. 2018. A replication study: Just-in-time defect prediction with ensemble
learning. In Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software
Engineering. 42–47.
[79] Abubakar Zakari, Sai Peck Lee, Rui Abreu, Babiker Hussien Ahmed, and Rasheed Abubakar Rasheed. 2020. Multiple
fault localization of software programs: A systematic literature review. Info. Softw. Technol. 124, 106312 (2020), 1–20.
https://www.sciencedirect.com/science/article/pii/S0950584920300641.
[80] Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction:
How far are we? In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.
427–438.
[81] Wenzhou Zhang, Weiwei Li, and Xiuyi Jia. 2019. Effort-aware tri-training for semi-supervised just-in-time defect
prediction. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 293–304.
[82] Kunsong Zhao, Zhou Xu, Meng Yan, Yutian Tang, Ming Fan, and Gemma Catolino. 2021. Just-in-time defect prediction
for Android apps via imbalanced deep learning model. In Proceedings of the 36th Annual ACM Symposium on Applied
Computing. 1447–1454.
[83] Kunsong Zhao, Zhou Xu, Meng Yan, Lei Xue, Wei Li, and Gemma Catolino. 2021. A compositional model for effort-
aware just-in-time defect prediction on android apps. IET Softw. (2021).
[84] K. Zhao, Z. Xu, T. Zhang, Y. Tang, and M. Yan. 2021. Simplified deep forest model-based just-in-time defect prediction
for Android Mobile apps. IEEE Trans. Reliabil. (2021), 1–12. https://doi.org/10.1109/TR.2021.3060937
[85] Kun Zhu, Nana Zhang, Shi Ying, and Dandan Zhu. 2020. Within-project and cross-project just-in-time defect predic-
tion based on denoising autoencoder and convolutional neural network. IET Softw. 14, 3 (2020), 185–195.
[86] Xiaoyan Zhu, Binbin Niu, E. James Whitehead Jr., and Zhongbin Sun. 2018. An empirical study of software change
classification with imbalance data-handling methods. Softw.: Pract. Exper. 48, 11 (2018), 1968–1999.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.