0% found this document useful (0 votes)
116 views25 pages

A Systematic Survey of Just-In-time Software Defect Prediction

Paper in bug preduction

Uploaded by

Riyad Alelwany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views25 pages

A Systematic Survey of Just-In-time Software Defect Prediction

Paper in bug preduction

Uploaded by

Riyad Alelwany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Supplementary Material for:

A Systematic Survey of Just-in-time Software Defect


Prediction
YUNHUA ZHAO, CUNY Graduate Center, USA
KOSTADIN DAMEVSKI, Virginia Commonwealth University, USA
HUI CHEN, CUNY Brooklyn College, USA

This is an supplement that accompanies the survey article with the same title. For convenience,
we use “this survey” or “the survey” to refer to the survey article, and “this supplement” or “the
supplement” to this online supplement.
In the supplement, Section 1 details the methodology of this systematic survey, Section 2 il-
lustrates the number of Just-in-time Software Defect Prediction (JIT-SDP) studies over time,
Section 3 lists the software projects used for evaluating JIT-SDP models in the surveyed JIT-SDP
studies, Section 4 gives additional and detailed information about software metrics (features or
independent variables), Section 5 lists machine learning models that the JIT-SDP studies are based
on, Section 6 examines the availability of replication packages in the JIT-SDP studies, and finally,
Section 7 lists the surveyed JIT-SDP studies and provides a one-sentence summary describing the
primary topic of each study.

1 REVIEW METHODOLOGY
Kitchenham et al. [39, 40] advocate a systematic literature review method in software engineering
aiming at providing scientific value to the research community. According to Kitchenham
et al. [38, 40], a systematic literature review process consists of the stages of planning the review
(including identifying the need for the review, specifying the research questions, and developing
a review protocol), conducting the review, and reporting the review.

1.1 Literature Search


Through the systematic literature review process, we use two methods to identify relevant studies,
digital library keyword search and literature snowballing.
1.1.1 Digital Library Keyword Search. To locate existing surveys and papers, we use the digital
libraries listed in Table 1. These digital libraries archive and index leading journals and conference
proceedings in Software Engineering and the related. For instance, they index and archive the
conference proceedings and journals in Table 2. Not surprisingly, existing software engineering
research surveys also reference to these digital libraries. For instance, Li et al. [42] cite digital
libraries 1–4 as the digital libraries to carry out their survey while Zakari el al. [79] cite 1–5.
1.1.2 Literature Snowballing. The digital library keyword search may not identify all of the
relevant studies. To alleviate this problem, we use the snowball method to discover new studies
starting with the selected articles from the previous step. We consider a variant of the snowball

© 2023 Association for Computing Machinery.


0360-0300/2023/02-ART201 $15.00
https://doi.org/10.1145/3567550

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:2 Y. Zhao et al.

Table 1. Digital Libraries

Digital Libraries URL to Query User Interface


1. ACM https://dl.acm.org/
2. IEEE Xplore https://ieeexplore.ieee.org/
3. ScienceDirect https://www.sciencedirect.com/search/
4. SpringerLink https://link.springer.com/
5. Wiley https://onlinelibrary.wiley.com/

method called the backward (or reverse) snowball where we examine the references of an identified
article. Empirical evidence suggests that the snowball method should be effective to locate “high-
quality sources in obscure locations” [20].

1.2 Planning
Kitchenham et al. [37, 38] published a guideline in 2007 and refined it in 2013. Applying the method
by Kitchenham et al., we begin this research with an exploratory phase, an informal search and ex-
amination of literature about defect prediction. This belongs to the planning stage of Kitchenham
et al.’s systematic method.
Software defect prediction (SDP) has been a research subject since the 1970s. Not only have
researches in this area evolved and taken different directions but also there are relevant systematic
surveys in the literature developed over time. Following the exploration phase, we proceed to the
second phase of the planning stage, i.e., we carry out a meta-survey whose process we describe in
Section 1.2.1 of this supplement. Kitchenham et al. term this type of survey as a tertiary survey, a
systematic survey of systematic surveys [38] and argue that it is potentially less resource intensive
to conduct a tertiary survey than conduct a new systematic review of primary studies to answer
wider research questions [38]. In this meta-survey phase, we investigate existing surveys on SDP.
As the result of this phase, we articulate the need to conduct this literature review on JIT-SDP in
Section 1 of this survey and define the research questions in Section 1.3.1 of this supplement.
Following the planning stage, we turn our focus to a systematic literature review on JIT-SDP
and describe the process in Section 1.3 of this supplement. With this focused survey, we answer
the research questions in Sections 3, 4, and 5 of the survey.

1.2.1 Meta-survey. The goal of the meta-survey is to define the scope of SDP, to learn its re-
lationship with the related areas, to understand the topics surveyed in prior literature surveys or
reviews on SDP.
Researchers and practitioners have used a range of terms to refer the scenarios that software
exhibit undesired behavior or outputs. These terms include “defect,” “fault,” “bug,” “error,” “failure,”
and “exception.” These occur in either a piece of “software” or in a “program.” Based on these, we
construct Query 1. The digital libraries in Table 1 vary by their user interfaces and the query is to
convey the semantics of our queries using the digital libraries.

Survey Article Publication Venues. Kitchenham et al. points out that the quality of tertiary sur-
veys depends on the quantity and quality of systematic reviews [38]. To control the quality of
the meta-survey, we choose only survey papers from the most significant software engineer-
ing journals and conferences by consulting Google Scholar,1 the Computing Research and

1 See https://scholar.google.com/citations?view_op=top_venues&vq=eng_softwaresystems.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:3

Query 1. Semantics of digital library keyword search query for meta-survey


(
(
( fault OR defect OR bug OR exception OR failure OR error )
AND
( prediction OR model ))
)
OR
(
( fault OR defect OR bug OR exception OR failure OR error )
AND
risk
AND
( assessment OR prediction OR model )
)
)
AND
(
review OR survey OR mapping OR progress OR accomplishment OR critique
)

Education Association of Australasia (CORE),2 Microsoft Academic where we select the


“Topic” “Computer science” and its subtopic “Software engineering,”3 and journals and conferences
cited by prior surveys. These journals and conferences are in Table 2.
1.2.2 Results of Literature Search for Meta-survey. We list in Table 3 the prior surveys on SDP
that we identify. These SDP surveys have focused on a variety of aspects of the SDP problem,
including the specific definition of the problem (e.g., predicting a probability or binary value),
selected features, data granularity, training and test datasets, model design and evaluation metrics.
While some of the surveys mention JIT-SDP, they focus only on the difference in data type (i.e.,
JIT-SDP uses software changes), but fail to provide coverage of the more nuanced aspects of the
problem. For instance, JIT-SDP introduces a label identification latency stemming from the fact
that it takes time for developers to identify defects, which, in turn, changes certain past software
changesets from clean to defect-inducing.
As the result of the meta-survey, we are able to (1) justify the need for a focused survey on
JIT-SDP, (2) provide background information for JIT-SDP, such as clear definitions of defect and
SDP, and (3) determine the distinct aspects of JIT-SDP to focus our survey on.

1.3 Focused Survey on JIT-SDP


Upon the completion of the meta-survey, we commerce the focused survey on JIT-SDP.
1.3.1 Research Questions. Our end goal is to provide a comprehensive understanding of the
state of the art of JIT-SDP. For this, we define and answer the following research questions.
RQ.1 What is the scope of the SDP research? Our literature search results in several related terms
or areas about predicative modeling for quality assurance in software engineering. These
terms include software reliability prediction, software failure prediction, software fault

2 See http://portal.core.edu.au/jnl-ranks/?search=software&by=all&source=CORE2020&sort=arank&page=1.
3 See https://academic.microsoft.com/.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:4 Y. Zhao et al.

Table 2. Selected Software Engineering Conferences and Journals

Publication Publication
Type
ACM/IEEE International Conference on Software Engineering (ICSE)
ACM SIGSOFT International Symposium on Foundations of Software Engi-
neering (FSE)
Conferences ACM/IEEE International Conference on Automated Software Engineering
(ASE)
ACM/IEEE International Conference on Mining Software Repositories (MSR)
ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM)
IEEE International Conference on Software Maintenance and Evolution
(ICSME)
IEEE International Conference on Software Quality, Reliability and Security
(QRS)
ACM Transactions on Software Engineering and Methodology (TSEM)
IEEE Transactions on Software Engineering (TSE)
IEEE Transactions on Reliability (TR)
(Elsevier) Journal of Systems and Software (JSS)
Journals (Elsevier) Information and Software Technology (IST)
(Elsevier) Applied Soft Computing
(Springer) Empirical Software Engineering (ESE)
(Wiley) Software Testing, Verification & Reliability (STVR)
(Wiley) Software: Practice and Experience
This table is for filtering the digital library search results to identify SDP surveys. It is not for identifying JIT-SDP studies.

Table 3. Summary of Software-defect Prediction Surveys

Survey Coverage
No. Authors
Duration # of Articles Survey Topic
Surveyed
SV1 Li et al. [42] 2000–2018 49Unsupervised SDP
SV2 Li et al. [44] 2014–2017 70Comprehensive
SV3 Hosseini et al. [26] 2002–2017 46Cross-project SDP
SV4 Kamei and Shihab [32] 1992–2015 65Comprehensive
SV5 Malhotra [47] 1995–2013 64Within-project and
cross-project SDP
SV6 Radjenović et al. [56] 1991–2011 106 Software metrics for SDP
SV7 Hall et al. [22] 2000–2010 208 Within-project and
cross-project SDP
SV8 Catal et al. [9]1 1990–2009 68 Datasets, metrics, and models
SV9 Fenton and Neil [16]2 1971–1999 55 Defect, failure, quality,
complexity, metrics, and models
1 Catal et al. [9] investigate 90 software defect/fault prediction papers in their survey, but only cite 68. We use
this as the number of papers studied in their survey.
2 Fenton and Neil [16] do not list explicitly the paper surveyed, and we only count the papers relevant to software

metrics, defects, faults, quality, and failures.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:5

prediction, and software defect prediction. It begs the questions how we define SDP, what
the scope of SDP is, and how we differentiate it from the related areas.
RQ.2 What is the scope of the JIT-SDP research? JIT-SDP is an area in SDP. To comprehend the
studies in JIT-SDP and present our understanding of it in a way complementary to other than
repeating prior surveys in SDP, we need to identify the scope of JIT-SDP and to differentiate
JIT-SDP from SDP that has been investigated in prior surveys, and present our understanding
in the context of SDP, a larger area than JIT-SDP.
RQ.3 What are input data and the features or the independent variables in JIT-SDP? A necessary
type of data to JIT-SDP is software changesets. Are there any other types of data that can
help improve JIT-SDP? What are the features that we can extract from the input data? How
do these features impact JIT-SDP performance? These are not only helpful to build JIT-SDP
models but also potentially aid our understanding on the relationship between factors in
the software development life cycle process and defect occurrences, which help produce
explainable and actionable models and insights.
RQ.4 On what target do we make predictions and what are the dependent variables in JIT-SDP?
JIT-SDP is to predict defects on software changes. Are software changes the only target on
which we predict defects? What are we really predicting? In other words, is defect proneness
the only dependent variable? Understanding these is important to understand the limitation
and the potential of existing model techniques.
RQ.5 What are the modeling techniques in JIT-SDP? Statistical analysis and machine learning are
important model building techniques for JIT-SDP. What are the machine learning techniques
used in JIT-SDP and how do they compare, such as in terms of predicative performance? This
helps address several issues. First, what machine learning techniques should we explore to
continue to improve JIT-SDP? Second, which machine learning technique should we choose
as a baseline to compare with if we are to build a new model? Third, if a user wishes to use
JIT-SDP to help QA, then which machine learning model should the user choose? Last, is
machine learning the only way to build JIT-SDP models? If not, then how do the alternative
approaches perform when compared with machine learning?
RQ.6 What are the evaluation strategies and criteria used in the existing JIT-SDP models? First,
to understand the strength and the limitation of a JIT-SDP model, we need to know how
we evaluate it. Second, to be able to assemble and compare existing models, we need to
understand the evaluation criteria and strategies.
RQ.7 How is performance of JIT-SDP models with respect to the evaluation criteria? Which JIT-
SDP model performs the best? This helps researchers to develop new models and compare it
with the existing ones and helps users to select existing ones to build applications of JIT-SDP.
RQ.8 How are the JIT-SDP researches address the reproducibility (or replication) problem? Repro-
ducibility is an important problem that has garnered increased scrutiny from the research
community and the public in empirical research. JIT-SDP is an empirical research. Repro-
ducibility is an important concern. How is the practice of the prior JIT-SDP studies with
regard to facilitating replication to examine whether a JIT-SDP research is reproducible?
We focus our survey on JIT-SDP. The answer to RQ.1 is thus out of scope of this survey. The an-
swer to RQ.2 is in Section 3.1 of the survey where we define Release SDP and JIT-SDP. Sections 3.2
and 3.3 of the survey answer RQ.3. In Sections 3.6.1 and 3.6.2 of the survey, we divide JIT-SDP mod-
els into two categories, defect prediction and effort-aware prediction based on dependent variables,
which is an answer to RQ.4. Section 3.6 of the survey documents modeling techniques, thus an-
swers RQ.5. For RQ.6, we report JIT-SDP evaluation strategies in Section 3.7 of the survey. Through
a synthesis of the prior JIT-SDP studies, we provide an answer to RQ.7. In Section 6 of this online
supplement, we collect and discuss replication package and data, which is an answer to RQ.8.
ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:6 Y. Zhao et al.

Query 2. Semantics of digital library keyword search query for JIT-SDP


(
(
( fault OR defect OR bug OR exception OR failure OR error )
AND
( prediction OR model ))
)
OR
(
( fault OR defect OR bug OR exception OR failure OR error )
AND
risk
AND
( assessment OR prediction OR model )
)
)
AND
( just -in - time OR change )
AND
( year >= 2000)

1.3.2 Digital Library Keyword Search Query for JIT-SDP. Kamei et al. coined the term “Just-in-
time” Quality Assurance in their 2012 article [33]. JIT-SDP is the change-level SDP, i.e., is to predict
existence of defects in software changes. Mockus and Weiss appear to be the first to examine
change-level defect prediction [49] in year 2000. Following Query 2, we search the digital libraries
in Table 1.

1.3.3 Literature Selection via 2-Pass Review. We combine all of the search results from the digital
libraries and remove duplicates and divide the set of articles among the authors of this survey to
evaluate whether to include or discard an article. The division ensures that we assign each article
to two of the three authors and each article goes through two reviews by the two assigned authors
(thus, the 2-pass review). Each author follows the following process. First, we remove any article
whose title clearly indicates that it is not relevant. Second, for the remaining articles, we evaluate
whether or not to include them by reading the abstract. Finally, we convene a meeting and resolve
the difference via a discussion.

1.3.4 Exclusion and Inclusion Criteria. We include only articles written in English that study
predictive modeling for JIT-SDP whose prediction is on the level or sub-level of software changes.
For instance, we exclude Amasaki et al. [2], because they make predictions on the level of software
component albeit claiming that they study JIT-SDP. We also exclude non-peer reviewed articles,
posters, and abstract-only articles.

1.3.5 Results of Literature Search for JIT-SDP. We search JIT-SDP studies published from 2000
and completed our literature search in November 2021. Table 4 summarizes the literature search
process and the results. The digital library keyword search yield 881 entries. After we remove
duplicates, complete a two-pass review, we identify 55 JIT-SDP articles. We then begin the snow-
balling process on these 55 JIT-SDP articles. As shown in Table 4, these 55 JIT-SDP papers lists
in total 2,563 entries in their reference sections. After removing duplicates and a two-pass review,
we find 12 additional JIT-SDP articles. Table 15 in Section 7 lists these 67 articles and provides a
one-sentence summary describing the primary topic of each study.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:7

Table 4. JIT-SDP Literature Search Results

Digital Library & Additional # of Articles


Sources Constraint JIT-SDP
Library Search or Snow-
after 2-Pass
balling
Review
ACM — 196
IEEE Xplore — 55
ScienceDirect Research articles 269 55
SpringerLink — 334
Wiley Computer Science 27
Snowballing On 55 JIT-SDP papers 2,563 12

Fig. 1. The number of selected JIT-SDP papers over publication year.

2 PUBLICATIONS TREND
Figure 1 plots the number of selected JIT-SDP papers versus publication year.4 It shows that there
has been an elevated interest in JIT-SDP in recent years.

3 EVALUATION DATA
Table 5 is a summary of software projects used for evaluating JIT-SDP models. Most studies use
open source projects. As shown in Table 5, 11 studies include proprietary/commercial projects
among the 67 papers surveyed (listed in Table 15).

4 SOFTWARE METRICS AND FEATURES FOR JIT-SDP


In Section 3.3 of this survey, we provide a table of categories of software metrics that prior studies
find useful for JIT-SDP. We provide a more detailed discussion for these categories of metrics.

4.1 Software Change Metrics


Kamei et al. summarize past studies on the relationship between the characteristics of software
changes and defects, and list 14 software change metrics that have been useful for JIT-SDP [33]. Liu
et al. build an unsupervised effort-aware JIT-SDP model called CCUM and argue that the code churn
metric, i.e., the size of a code change is particularly useful for unsupervised models [46]. Kondo

4 The publication year is from the online publication date if available. The online publication date may be different from
the bibliographic or the final publication date.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:8 Y. Zhao et al.

Table 5. Software Projects Used for Evaluation in JIT-SDP Studies

Count
Projects Study
Total(Proprietary)

11(5) Bugzilla, Eclipse JDT, Eclipse Platform, Mozilla, Kamei et al. [33]
Columba, PostgreSQL; 5 commercial projects
Bugzilla, Columba, Eclipse JDT, Eclipse Platform, Yang et al. [73]
6(0)
Mozilla, PostgreSQL Yang et al. [77]
Fu and Menzies [17]
Huang et al. [27]
Liu et al. [46]
Yang et al. [72]
Young et al. [78]
Albahli [1]
Qiao and Wang [54]
Yang et al. [76]
Yang et al. [74]
Zhang et al. [81]
Huang et al. [28]
Chen et al. [11]
Li et al. [43]
Yang et al. [75]
Zhu et al. [85]
5(0) Bugzilla, Columba, Eclipse Platform, Mozilla, Bennin et al. [6]
PostgreSQL
4(0) Eclipse Platform, Eclipse JDT, Mozilla, PostgreSQL Jahanshahi et al. [29]
4(0) Bugzilla, Eclipse Platform, Eclipse JDT, Mozilla Tessema et al. [64]
11(0) Bugzilla, Columba, Eclipse JDT, Eclipse Platform, Kamei et al. [31]
Mozilla, PostgreSQL; Gimp, Maven-2, Perl, Ruby on
Rails, Rhino
Fukushima et al. [18]
2(0) QT, OpenStack McIntosh and Kamei [48]
Hoang et al. [24]
Rodriguezperez et al. [58]
Hoang et al. [25]
Gesi et al. [19]
Pornprasit et al. [53]
Zeng et al. [80]
Android Firewall, Alfresco, Android Sync, Android
15(0) Catolino et al. [10]
Wallpaper, AnySoft Keyboard, Apg, Applozic Android
Zhao et al. [83]
SDK, Chat Secure Android, Delta Chat, Android Uni-
Zhao et al. [82]
versal Image Loader, Kiwix, Observable Scroll View,
Zhao et al. [84]
Own Cloud Android, Page Turner, Notify Reddit
(Continued)

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:9

Table 5. Continued

Count
Projects Study
Total(Proprietary)

19(0) Android Firewall, Alfresco, Android Sync, Android Xu et al. [68]


Wallpaper, AnySoft Keyboard, Apg, Applozic
Android SDK, Chat Secure Android, Delta Chat,
Android Universal Image Loader, Kiwix, Observable
Scroll View, Own Cloud Android, Page Turner,
Notify Reddit, Facebook Android SDK, Lottie,
Atmosphere, Telegram
10(0) Apache ActiveMQ, Camel, Derby, Geronimo, Hadoop Fan et al. [15]
Common, HBase, Mahout, OpenJPA, Pig, Tuscany
8(0) Apache ActiveMQ, Camel, Derby, Geronimo, Duan et al. [12]
Hadoop Common, HBase, OpenJPA, Pig
18(0) Apache ActiveMQ, Ant, Camel, Derby, Geronimo, Tian et al. [65]
Hadoop, HBase, IVY, JCR, JMeter, LOG4J2, LUCENE,
Mahout, OpenJPA, Pig, POI, VELOCITY, Xerces-C++
13(3) Apache Fabric8, Camel, Tomcat; JGroups; Brackets; Tabassum et al. [61]
OpenStack Neutron, Nova; Spring-integration;
Broadleaf Commerce; NPM; and 3 proprietary
projects
Cabral et al. [8]
2(0) Apache Cassandra, Hadoop Quach et al. [55]
39(0) Apache Ant-Ivy, Archiva, Calcite, Cayenne, Trautsch et al. [67]
Commons BCEL, Commons BeanUtils, Commons
Codec, Commons Collections, Commons Compress,
Commons Configuration, Commons DBCP,
Commons Digester, Commons IO, Commons Jcs,
Commons JEXL, Commons Lang, Commons Math,
Commons Net, Commons SCXML, Commons
Validator, Commons VFS, DeltaSpike, Eagle, Giraph,
Gora, JSPWiki, Knox, Kylin, Lens, Mahout,
ManifoldCF, Nutch, OpenNLP, Parquet-MR,
Santuario-java, SystemML, Tika, Wss4j
10(0) Apache Lucene, Tomcat, jEdit, Ant, Synapse, Flink, Zhu et al. [86]
Hadoop; Voldemort; iTextpdf; Facebook Buck
20(0) Apache Accumulo, Camel, Cinder, Kylin, Log4j, Lin et al. [45]
Tomcat; Eclipse Jetty; OpenStack Nova; Angular-js,
Brackets, Bugzilla, Django, Fastjson, Gephi,
Hibernate-ORM, Hibernate-Search, ImgLib2,
osquery, PostgresSQL, Wordpress
10(0) Apache Accumulo, Hadoop, OpenJPA; Angular-js; Pascarella et al. [52]
Bugzilla; Eclipse Jetty; Gerrit; Gimp; JDeodorant;
JRuby
(Continued)

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:10 Y. Zhao et al.

Table 5. Continued

Count
Projects Study
Total(Proprietary)

6(0) ZooKeeper, Xerces-Java, JFreeChart, Jackson Data Ardimento et al. [3]


Format, Jackson Core, Commons Imaging
14(0) Deeplearning4j, JMeter, H2O, LibGDX, Jetty, Yan et al. [69]
Robolectric, Storm, Jitsi, Jenkins, Graylog2-server,
Flink, Druid, Closure-compiler, Activemq
1(0) Jenkins Borg et al. [7]
5(0) Apache Hadoop, Camel; Gerrit; OsmAnd; Bitcoin; Kondo et al. [41]
Gimp
1(0) Mozilla Firefox Yang et al. [71]
15(0) OpenStack Cinder, Devstack, Glance, Heat, Keystone, Tourani and Adams [66]
Neutron, Nova, OpenStack-Manuals, Swift, Tempest;
Eclipse CDT, EGit, JGit, LinuxTools, Scout.rt
3(0) Apache OpenJPA, James; Eclipse Birt Mori et al. [50]
6(0) Linux Kernel, PostgreSQL, Xorg Xserver, Eclipse JDT, Jiang et al. [30]
Lucene, Jackrabbit
7(1) Linux Kernel, PostgreSQL, Xorg Xserver, Eclipse JDT, Tan et al. [62]
Lucene, Jackrabbit; 1 Cisco project (proprietary)
12(0) Apache HTTP 1.3 Sever, Bugzilla, Columba, Gaim, Kim et al. [36]
GForge, jEdit, Mozilla, Eclipse JDT, Plone,
PostgreSQL, Scarab, Subversion
11(1) Apache HTTP 1.3 Server, Columba, Gaim, GForge, Shivaji et al. [60]
jEdit, Mozilla, Eclipse JDT, Plone, PostgreSQL,
Subversion; and a commercial project (proprietary, in
Java)
2(0) JHotDraw and DNS-Java Aversano et al. [4]
324(0) 324 unspecified repositories Barnett et al. [5]
21(0) 21 unspecified OSS projects Khanan et al. [35]
2(2) 2 maritime projects (proprietary) Kang et al. [34]
1(1) 1 unspecified project (proprietary) Eken et al. [14]
14(14) 14 unspecified Alibaba projects (proprietary, mainly Yan et al. [70]
in Java)
0(1) 1 telecommunication project (proprietary) Eken et al. [13]
12(12) 12 Ubisoft projects (proprietary) Nayrolles et al. [51]
1(1) Windows Phone (proprietary) Tarvo et al. [63]
1(1) 5ESS®switching system software (proprietary) Mockus and Weiss [49]

et al. argue that the context lines of a software change, i.e., the lines of code surround the changed
lines in a software change has an impact on defect proneness of the software change [41]. They
propose and evaluate a suite of metrics called the context metrics, i.e., the metrics computed from
the context lines [41]. Additionally, they also adapt the indentation metric from Hindle et al. [23]
and propose two change complexity metrics [41]. Table 6 lists the metrics by Kamei et al., Liu et al.,
and Kondo et al. [33, 41, 46] organized according to different categories.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:11

Table 6. Software Change Metrics [31, 33, 46]

Category Metric Description


NS Number of modified subsystems
ND Number of modified directories
Diffusion
NF Number of modified files
Entropy Distribution of modified code across each file
LA Lines of code added
Size LD Lines of code deleted
LT Lines of code in a file before the change
Purpose FIX Whether or not the change is a defect fix
NDEV The number of developers that changed the modified files
History AGE The average time interval between the last and current change
NUC The number of unique changes to the modified files
EXP Developer experience
Experience REXP Recent developer experience
SEXP Developer experience on a sub-system
Size Churn Size of the change, i.e., LA + LD
RChurn Relative Churn, i.e., (LA + LD)/LT
RLA Relative LA, i.e., LA / LT
RLD Relative LD, i.e., LD / LT
RLT Relative LT, i.e., LT / NF
Change AS Number of white spaces on all the “+” (added) lines in a commit
Complexity AB Sum of the difference of left-braces and right-braces on all the “+”
(Indentation) lines in each function in a commit
NCW Number of words in context
Change NCKW Number of programming language keywords
Context NCCW number of words in the context and the changed lines
NCCKW Number of programming language keywords in the context and
the changed lines

Pascarella et al. investigate defect prediction on a finer granularity than software changes, i.e., to
predict whether a specific file in the software change is defect prone [52]. They adapt the process
metrics in Rahman and Devanbu [57] and evaluate a suite of file-level process metrics for change-
sets [52]. Table 7 summarizes file-level software change metrics; each of the metrics is computed
on a file in the changeset.

4.2 Commit Message Features


Since commit messages are in natural language text, features to encode them are usually borrowed
from the natural language processing literature. An example is term frequency (TF), which counts
the occurrences of a specific word in the commit message, e.g., used by Tan et al. [62]. A typical
workflow to compute the TF feature is to assemble the corpus of the commit messages, i.e., the
collection of all commit messages, to use a stemmer to obtain the root of each word, to remove
stop words or rare words, to obtain a word dictionary, to assign an index to each word in the

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:12 Y. Zhao et al.

Table 7. Software File Change Metrics [41, 52]

Category Metric Description


COMM Number of changes to the file up to the considered commit
ADEV Number developers who modified the file up to the considered
commit
DDEV Cumulative number of distinct developers contributed to the file up
to the considered commit
ADD Number of lines added to the file in the considered commit
Change DEL Number of lines removed from the file in the considered commit
Process OWN Whether the commit is done by the owner of the file
MINOR Number of contributors who contributed less than 5% of the file up
to the considered commits
SCTR Number of packages modified by the committer in the commit
NADEV Number of developers who changed the files in the commits where
the file has been modified
NDDEV Cumulative number of distinct developers who changed the files in
commits where the file has been modified
NCOMM Number of commits made to files in commits where the file has
been modified
NSCTR Number of different packages touched by the developer in commits
where the file has been modified
OEXP Percentage of lines authored in the project
AEXP Mean of the experiences of all the developers who touched the file

Table 8. Commit Message Metrics [5, 62]

Feature Description Study


CM-TF Term frequency of commit message Tan et al. [62]
CM-VOLUME Number of words in commit message excluding stop words Barnett et al. [5]
CM-SPAM Commit message content represented by SPAM score Barnett et al. [5]
computed via a SPAM filter

dictionary, and finally to form a vector recording occurrences of the dictionary words in a commit
message. The TF vector of a commit message is likely a sparse vector with most elements as 0.
Barnett et al. [5] hypothesize that the level of detail in commit messages is useful for JIT-SDP and
confirm it via their investigation of more than 300 repositories. For this, they propose two commit
message metrics, commit volume and commit content. The former is the number of words in a
commit message after the stop words are removed. The later is a score computed via a SPAM filter,
and this is in effect a feature representation of the commit message and a surrogate representing
the content of the commit message. Table 8 lists these metrics.

4.3 ITS Data Features


ITS data, such as issue reports, issue discussions, change requests, and code reviews can be useful to
predict defects in future changes as the result of these data. Tourani and Adams [66] propose a suite
of issue discussion and code review discussion metrics that attempt to capture the characteristics
of these data sources other than actual textual content; Table 9 lists these metrics.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:13

Table 9. Issue Report, Issue Discussion, and Code Review


Metrics [66]

Category Metric Description


COMMEXP Commenter experience
RPTEXP Reporter experience
Thread
RVWEXP Reviewer experience
Focus
PATCHNUM Number of patch revisions
NINLCMMT Number of inline comments
Thread NUMCMMT Number of comments
Length LENCMMT Length of comments
RVWTIME Review time
Thread Time FIXTIME Fix time
DISCLAG Average discussion lag
Sentiment CMMTSENT Comment sentiment

Table 10. Change Request Metrics [64]

Metrics Description
CR-TTM Time span between submission of the change request to its resolution (a new
change)
CR-NDA Number of developers involved in the change request
CR- Priorities assigned to the change request
PRIORITY
CR- Severity assigned to the change request
SEVERITY
CR-NC Number of comments about the change request
CR-DD Depth of discussion computed as the number of words used during the dis-
cussion of the change request

Tessema and Abebe [64] propose six metrics for change requests in ITS. They augment these
metrics with Kamei et al.’s [33] software change metrics and show that the JIT-SDP models with
augmented metrics outperform those with the change metrics alone. Table 10 summarizes the six
change request metrics. These metrics are from the meta-data of the change requests.

4.4 Static Program Analysis Metrics


Trautsch et al. [67] collect static program analysis warning messages from two popular tools, PMD
and OpenStaticAnalyzer. From these warning messages, aimed at JIT-SDP, they derive the warn-
ing density metrics listed in Table 11.

5 JIT-SDP MODELS
The prior JIT-SDP studies have examined a broad range of machine learning algorithms. As such,
one may argue that JIT-SDP is a microcosm of recent development in machine learning. It is im-
portant to note that there have been several investigations of non-machine-learning algorithms
for JIT-SDP. We refer to this type of JIT-SDP models as searching-based models. Searching-based

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:14 Y. Zhao et al.

Table 11. Static Program Analysis Metrics [67]

Category Metric Description


SysWD Warning density of the project
Program
FSysWD Cumulative difference between warning density of the file and the
Analysis
project
AuDWD Cumulative sum of the changes in warning density by the author

models like those in Yang et al. [77] and Liu et al. [46] are unsupervised. Several studies extend
these unsupervised searching-based models by adding a supervised component to improve their
predictive performance. Table 12 lists the modeling techniques in the prior JIT-SDP studies. It
shows that Logistic Regression, Tree-based models (including Random Forest, C4.5 Decision Tree
and ADTree) and ensemble models (including Random Forest, XGBoost, and others) are more
popular modeling techniques and the use of neural network-based models (including deep neural
networks) is on the rise.

6 REPLICATION PACKAGES AND DATA


Reproducibility is an important issue in empirical studies [17]. Table 13 lists the studies that in-
dicate the availability of the replication packages among the 67 studies in Table 15. Among the
studies, two replication packages appear no longer accessible and two are identical, which results
in 26 accessible replication packages.
Some of the replication packages have an impact not only on the reproducibility of the studies
that provide the packages but also on generating new models or new insights or the both. For
instance, Kamei et al. [33] make available a replication package including both the source code and
the data sets. More than 10 studies take advantage of either the source code, the data set, or the
both. Yang et al. [77] include the the source code in their replication package and the code allows
Fu and Menzies [17] to quickly replicate the Yang et al.’s results and helps Fu and Menzies reach
new discovery. Table 14 provide several examples of those impactful studies and their replication
packages. The table also lists the studies that use the code of, or the data of, or both the code and
the data in the replication packages.
The research community has curated additional tools that facilitate the JIT-SDP research. Two
prime examples are Commit Guru and several SZZ implementations. Commit Guru [59] is a public
available tool to compute software change metrics for software projects whose source code is in
a Git repository. The studies that use the data extracted from Commit Guru include Tabassum
et al. [61], Cabral et al. [8], Khanan et al. [35], and Kondo et al. [41]. Public available SZZ tools
have made labeling of large collections of software changes feasible.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:15

Table 12. JIT-SDP Modeling Techniques

Algorithm Studies
K-Nearest Bennin et al. [6], Kang et al. [34], Tian et al. [65], Aversano et al. [4]
Neighbor
Linear Kamei et al. [33], Tian et al. [65], Yan et al. [70]
Regression
Non-linear Rodriguezperez et al. [58], McIntosh and Kamei [48]
Regression
Logistic Duan et al. [12], Lin et al. [45], Zeng et al. [80], Yang et al. [75], Kang et al. [34], Li
Regression et al. [43], Trautsch et al. [67], Yan et al. [69], Yan et al. [70], Catolino et al. [10],
Fan et al. [15], Huang et al. [28], Kondo et al. [41], Yang et al. [74], Chen
et al. [11], Huang et al. [27], Tourani and Adams [66], Rosen et al. [59], Jiang
et al. [30], Tarvo et al. [63], Kamei et al. [33], Aversano et al. [4], Mockus and
Weiss [49], Tessema et al. [64]
Naive Bayes Duan et al. [12], Eken et al. [14], Bennin et al. [6], Kang et al. [34], Tian et al. [65],
Catolino et al. [10], Fan et al. [15], Zhu et al. [86], Barnett et al. [5], Jiang
et al. [30], Shivaji et al. [60]
Decision Table Catolino et al. [10],
C4.5 Decision Zhu et al. [86], Tarvo et al. [63], Aversano et al. [4]
Tree
Alternating Tan et al. [62], Jiang et al. [30]
Decision Tree
(ADTree)
Random Forest Fukushima et al. [18], Kamei et al. [31], Yang et al. [71], Nayrolles et al. [51], Zhu
et al. [86], Borg et al. [7], Catolino et al. [10], Fan et al. [15], Jahanshahi et al. [29],
Kondo et al. [41], Pascarella et al. [52], Yang et al. [76], Bennin et al. [6], Kang
et al. [34], Khanan et al. [35], Li et al. [43], Trautsch et al. [67], Tian et al. [65],
Duan et al. [12], Lin et al. [45], Pornprasit et al. [53], Quach et al. [55], Tessema
et al. [64]
Support Vector Kang et al. [34], Li et al. [43], Catolino et al. [10], Zhu et al. [86], Shivaji et al. [60],
Machine Kim et al. [36], Aversano et al. [4]
Neural Network Yang et al. [73], Hoang et al. [24], Qiao and Wang [54], Bennin et al. [6], Hoang
& Deep Neural et al. [25], Kang et al. [34], Tian et al. [65], Zhu et al. [85], Tessema et al. [64]
Network Ardimento et al. [3], Gesi et al. [19], Xu et al. [68], Zeng et al. [80], Zhao
et al. [83], Zhao et al. [82],
Deep Forest Zhao et al. [84]
Ensemble Bennin et al. [6], Eken et al. [13], Tessema et al. [64]
(XGBoost)
Ensemble Aversano et al. [4], Yang et al. [72], Young et al. [78], Albahli [1], Cabral et al. [8],
(others) Catolino et al. [10], Zhang et al. [81], Li et al. [43], Tabassum et al. [61], Tian
et al. [65], Tessema et al. [64]
Spam Filter (Text Mori et al. [50]
Classifier)
Searching-based Liu et al. [46], Yang et al. [77]
Algorithm
Supervised Yan et al. [70], Huang et al. [28], Huang et al. [27], Fu and Menzies [17]
Learning +
Searching-based
Algorithm

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:16 Y. Zhao et al.

Table 13. Replication Packages of JIT-SDP Studies

No. Study Year Replication Package Remark


1 Lin et al. [45] 2021 https://github.com/SAILResearch/suppmaterial-
19-dayi-risk_data_merging_jit
2 Pornprasit 2021 http://doi.org/10.5281/zenodo.4433498
et al. [53]
3 Quach et al. [55] 2021 https://github.com/senseconcordia/Perf-JIT-
Models
4 Xu et al. [68] 2021 https://figshare.com/search?q=10.6084/m9.
figshare.13635347
5 Zeng et al. [80] 2021 https://github.com/ZZR0/ISSTA21-JIT-DP
6 Zhao et al. [83] 2021 https://github.com/sepine/IET-2021
7 Duan et al. [12] 2021 https://github.com/deref007/Duplicate-change-
TR
8 Rodriguezperez 2020 https://gemarodri.github.io/2019-Study-of-
et al. [58] Extrinsic-Bugs/
9 Yan et al. [69] 2020 https://github.com/MengYan1989/JIT-DIL
10 Hoang et al. [25] 2020 https://github.com/CC2Vec/CC2Vec
11 Tian et al. [65] 2020 https://github.com/lining-nwpu/JiTReliability
12 Trautsch 2020 https://doi.org/10.5281/zenodo.3974204
et al. [67]
13 Li et al. [43] 2020 https://github.com/NJUST-IDAM/EATT
14 Borg et al. [7] 2019 https://github.com/wogscpar/SZZUnleashed
15 Qiao and 2019 https://github.com/donaldjoe/Effort-Aware-
Wang [54] and-Just-in-Time-Defect-Prediction-with-
Neural-Network
16 Yang et al. [74] 2019 https://github.com/yangxingguang/LocalJIT
17 Fan et al. [15] 2019 https://github.com/YuanruiZJU/SZZ-TSE
18 Hoang et al. [24] 2019 https://github.com/AnonymousAccountConf/
19 Pascarella 2019 not found
et al. [52]
20 Huang et al. [28] 2018 https://doi.org/10.5281/zenodo.1432582
21 Cabral et al. [8] 2019 https://doi.org/10.5281/zenodo.2555695
22 Zhang et al. [81] 2019 https://github.com/NJUST-IDAM/EATT identical
to Li
et al. [43]
23 Guo et al. [21] 2018 https://github.com/yuchen1990/EAposter
24 Chen et al. [11] 2017 https://github.com/Hecoz/Multi-Project-
Learning
25 Fu and 2017 https://github.com/WeiFoo/Revisit
Menzies [17] Unsupervised
26 McIntosh and 2017 https://github.com/software-
Kamei [48] rebels/JITMovingTarget
27 Huang et al. [27] 2017 https://doi.org/10.5281/zenodo.836352
28 Yang et al. [77] 2016 http://ise.nju.edu.cn/yangyibiao/jit.html inaccessible
29 Kamei et al. [33] 2012 http://research.cs.queensu.ca/ kamei/jittse/
jit.zip

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:17

Table 14. Use of Replication Packages

Replication
Package Original Study Dependent Study
Name
Kamei-2012 Kamei et al. [33] Fukushima et al. [18], Yang et al. [73], Kamei et al. [31],
Yang et al. [77], Huang et al. [27], Liu et al. [46], Guo
et al. [21], Young et al. [78], Chen et al. [11], Jahan-
shahi et al. [29], Huang et al. [28], Albahli [1], Ben-
nin et al. [6], Li et al. [43], Yang et al. [75], Tessema
et al. [64]
Yang-2016 Yang et al. [77] Fu and Menzies [17]
McIntosh- McIntosh et al. [48] Hoang et al. [24], Hoang et al. [25], Rodriguezperez
2017 et al. [58]
Catolino- Catolino et al. [10] Xu et al. [68], Zhao et al. [83], Zhao et al. [82], Zhao
2019 et al. [84]
Hoang-2019, Hoang et al. [25] and Gesi et al. [19], Pornprasit et al. [53], Zeng et al. [80]
Hoang-2020 Hoang et al. [24]

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:18 Y. Zhao et al.

7 SELECTED JIT-SDP STUDIES

Table 15. JIT-SDP Studies and Primary Topics

ID Study Yeara Primary Topics


P1 Ardimento et al. [3] 2021 applying temporal convolutional neural networks with hierarchical
attention layers to a set of 40+ production and process software
metrics data to predict defect proneness of SCM commits
P2 Duan et al. [12] 2021 modeling impact of duplicate changes, i.e., identical changes
applied to multiple SCM branches on prediction performance
P3 Eken et al. [14] 2021 deploying a JIT-SDP model to an industrial project (presumably
closed source and proprietary), comparing online and offline
prediction settings, presenting lessons learned
P4 Gesi et al. [19] 2021 addressing data imbalance beyond class label imbalance, i.e., data
bias along dimensions, such as File Count, Edit Count, and
Multiline Comments on JIT-SDP predictive performance and
proposing a few-short learning JIT-SDP model (SifterJIT)
combing Siamese networks and DeepJIT [24]
P5 Lin et al. [45] 2021 investigating the impact of data merging on interpretation of
cross-project JIT-SDP models (e.g., most important independent
variables), and advocating mixed-effect models for sound
interpretation.
P6 Pornprasit et al. [53] 2021 replicating CC2Vec [25] to contrast feature representation learning
when including and excluding test data set, which leads to a
Random Forest-based JIT-SDP model (JITLine) to rank lines added
in software changes based on defect-inducing risk via a Local
Interpretable Model-Agnostic Explanations model (LIME)
P7 Quach et al. [55] 2021 observing SZZ’s weaker ability to identify performance defects
than do non-performance ones and studying impacts of
non-performance defects on predictive performance of JIT-SDP
models
P8 Tessema and 2021 augmenting publicly available change metrics dataset with six
Abebe [64] change request-based metrics collected from issue tracking systems
and examining their impact on JIT-SDP predictive performance.
P9 Xu et al. [68] 2021 designing a deep neural network triplet loss function (called CDFE)
for cross-project JIT-SDP and learning high-level feature
representations from software metrics data for improving defect
prediction performance for mobile apps
P10 Zeng et al. [80] 2021 replicating CC2Vec [25] and DeepJIT [24] to discover the role of
high-level feature about lines added to a software change, which
leads to a performant logistic regress-based JIT-SDP model
(LAPredict)
P11 Zhao et al. [83] 2021 proposing a deep neural network JIT-SDP model (KPIDL) that
employs kernel-based PCA to learn high-level features from
software metrics data and addressing class imbalance problem with
a custom cross-entropy loss function and evaluating the model on
Android apps
P12 Zhao et al. [82] 2021 proposing to address class imbalance problem using class weights,
realizing it with a cross-entropy loss function in a deep neural
network JIT-SDP model (IDL), and evaluating the model using
software metrics data from Android apps
P13 Zhao et al. [84] 2021 applying a custom deep forest model for high-level feature
representation learning from software metrics data and evaluating
the model on Android apps
P14 Bennin et al. [6] 2020 investigating impact of concept drift in software change data on the
performance of JIT-SDP
(Continued)

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:19

Table 15. Continued

ID Study Yeara Primary Topics


P15 Hoang et al. [25] 2020 considering the hierarchical structure of diffs of software changes
and designing a feature representation learning framework
(CC2Vec) using a convolutional network with hierarchical attention
layers, and evaluating the framework using DeepJIT [24]
P16 Kang et al. [34] 2020 studying within and cross-project JIT-SDP for post-release changes
of maritime software and integrating a cost-benefit analysis in the
JIT-SDP models
P17 Khanan et al. [35] 2020 designing explainable JIT-SDP bot that uses a model-agnostic
technique (LIME) to “explain” a defect proneness change prediction
with the “contribution” of software metrics
P18 Li et al. [43] 2020 investigating semi-supervised effort-aware JIT-SDP using a
tri-training method (also see Zhang et al. [81])
P19 Rodriguez-Perez 2020 studying the impact of extrinsic bugs in JIT-SDP and concluding
et al. [58] that extrinsic bugs negatively impact predictive performance of
JIT-SDP models
P20 Tabassum et al. [61] 2020 designing online cross-project JIT-SDP models and concluding that
combing incoming cross-project and within-project data can
improve G-mean and reduce performance drops due to concept drift
P21 Tian et al. [65] 2020 evaluating long-term JIT-SDP for reliability improvement and
short-term JIT-SDP for early defect prediction while considering
the relationship of software usage and defect
P22 Trautsch et al. [67] 2020 designing static analysis warning message metrics, comparing two
software change labeling strategies (ad hoc SZZ and ITS SZZ), and
investigating predictive performance of sub-change-level (i.e., file
in a changeset) JIT-SDP
P23 Yan et al. [69] 2020 proposing a two-phase model, which in the first phase predicts
defectiveness of a software change and in the second phase rank
defect inducing risks of lines added in predicted defect prone
software changes via a probabilistic model
P24 Yan et al. [70] 2020 investigating the effectiveness of supervised (CBS+, OneWay, and
EALR) and unsupervised (LT and Code Churn) effort-aware JIT-SDP
models in an industry setting (on Alibaba projects)
P25 Yang et al. [75] 2020 proposing an effort-aware JIT-SDP model (DEJIT) that uses a
differential evolution algorithm to optimize
density-percentile-average (DPA) objective function, purposely
designed for effort-aware prediction
P26 Zhu et al. [85] 2020 proposing a deep neural network model (DAECNN-JDP) based on
denoising autoencoder and convolutional neural network and
investigating the predictive performance of the model using
software metrics data
P27 Albahli [1] 2019 devising an ensemble JIT-SDP model whose base classifiers are
Random forest, XGBoost, and Multi-layer perceptron
P28 Borg et al. [7] 2019 presenting an open-source implementation of SZZ (SZZ
Unleashed) and illustrating the use of it by applying a Random
Forest classifier to predict defective commits in the Jenkins project
P29 Cabral et al. [8] 2019 investigating the problems of class imbalance evolution and
verification latency in software change data and proposing an
online JIT-SDP model based on Oversampling Online Bagging
(ORB) to tackle these problems
P30 Catolino et al. [10] 2019 investigating cross-project JIT-SDP models for mobile apps and
comparing model performance of four classifiers and four ensemble
techniques
P31 Eken et al. [13] 2019 applying JIT-SDP to an industrial project of a telecommunication
company in Turkey, for which, extracting features from multiple
sources (software changes and commit messages)
P32 Fan et al. [15] 2019 investigating labeling errors of SZZ variants and their impacts on
predictive performance of JIT-SDP
(Continued)

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:20 Y. Zhao et al.

Table 15. Continued

ID Study Yeara Primary Topics


P33 Hoang et al. [24] 2019 proposing an “end-to-end” JIT-SDP model (DeepJIT) that learns
feature representations from tokenized software changes (diffs)
and commit messages and evaluating the predictive performance in
the cross-validation, short-term, and long-term prediction settings
P34 Jahanshahi et al. [29] 2019 investigating concept drift by replicating the study by McIntosh
and Kamei [48] using the data sets in Kamei et al. [33]
P35 Kondo et al. [41] 2019 designing and investigating “context metrics,” metrics that measure
the complexity or the number of the surrounding lines of a change
for JIT-SDP
P36 Pascarella et al. [52] 2019 designing JIT-SDP models to predict defectives of files in a commit
and investigating the models using the product and the process
metrics
P37 Qiao and Wang [54] 2019 applying a fully-connected neural network for effort-aware JIT-SDP
P38 Yang et al. [76] 2019 addressing limited training data problem by applying progressive
sampling to identify a small but sufficient set of data for training
JIT-SDP models
P39 Yang et al. [74] 2019 comparing local and global JIT-SDP models where local models are
those trained using a subset of homogeneous data and global
models trained using all of the training data
P40 Zhang et al. [81] 2019 investigation of semi-supervised effort-aware JIT-SDP model (EATT)
using a tri-training method (also see Li et al. [43])
P41 Huang et al. [28] 2018 investigating a supervised effort-aware model (called CBS)
combining Kamei et al.’s supervised EALR model [33] and Yang
et al.’s unsupervised LT [77]
P42 Nayrolles et al. [51] 2018 designing a two-phase approach (called CLEVER) that in the first
phase predicts defect risks of SCM commits and in the second phase
suggests a possible fix by comparing with known fix-commits, also
presenting lessons learned by deploying it to software company
Ubisoft
P43 Young et al. [78] 2018 comparing the prediction of defect-prone changes using traditional
machine learning techniques and ensemble learning algorithms by
a replicating study
P44 Zhu et al. [86] 2018 experimenting class imbalance handing methods (resampling and
ensemble learning methods) across learning algorithms for JIT-SDP,
and examining effort-aware and defect proneness predictive
performance and model interpretation (effects of contribution of
different groups of change features on dependable variables)
P45 Yang et al. [71] 2017 proposing a model (VulDigger) that predicts vulnerability
defect-inducing changes with a Random Forest classifier using
software change metrics derived from both software defect
prediction and vulnerability prediction
P46 Chen et al. [11] 2017 formulating JIT-SDP as a dual-objective optimization problem
based on logistic regression and NSGA-II to balance the benefit, i.e.,
the number of predicted defective changes and the cost (the efforts
of reviewing the software changes for quality assurance)
P47 Fu and Menzies [17] 2017 investigating Yang et al.’s [77] unsupervised models (e.g., LT) and
proposing an effort-aware JIT-SDP model (OneWay) that uses the
supervised models to prune unsupervised models before employing
Yang et al.’s approach
P48 Huang et al. [27] 2017 investigating a supervised effort-aware model (called CBS)
combining Kamei et al.’s supervised EALR model [33] and Yang
et al.’s unsupervised models, such as LT [77]
P49 Liu et al. [46] 2017 investigating the effectiveness of the code churn metric-based
unsupervised defect prediction model (CCUM) for effort-aware
JIT-SDP
(Continued)

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:21

Table 15. Continued

ID Study Yeara Primary Topics


P50 McIntosh and 2017 investigating evolving nature of software project leading to
Kamei [48] fluctuations of software metrics data (or concept drift) and
presenting insights, such as JIT models that should be retrained
using recently recorded data
P51 Yang et al. [72], 2017 proposing and investigating a two-layer ensemble model (TLEL) for
effort-aware JIT-SDP
P52 Barnett et al. [5] 2016 investigating the usefulness of SCM commit message volume and
commit message content for JIT-SDP and showing benefits by
adding commit message features to software change defect
prediction
P53 Kamei et al. [31] 2016 examining cross-project JIT-SDP and providing insights and
guidelines to improve predictive performance (also see Fukushima
et al. [18])
P54 Tourani and 2016 investigating usefulness of ITS data, such as issue reports, issue
Adams [66] discussions, and code reviews and designing ITS data metrics for
JIT-SDP
P55 Yang et al. [77] 2016 investigating the predictive power of simple unsupervised models,
such as LT and AGE in effort-aware JIT defect prediction and
comparing these simple models with supervised models
P56 Mori et al. [50] 2015 applying text classifiers (i.e., spam filter) to software changes to
assess the probability of files in changesets to be defect-inducing
P57 Rosen et al. [59] 2015 describing a publicly available defect prediction tool called Commit
Guru
P58 Tan et al. [62] 2015 investigating two problems, the class imbalance problem and the
problem about cross-validation, and proposing online change
classification for JIT-SDP using resampling and updatable
classification techniques
P59 Yang et al. [73] 2015 proposing a model called Deeper consisting of a deep belief
network and a logistic regression classifier to predict defect
proneness of software changes
P60 Fukushima et al. [18] 2014 examining cross-project JIT-SDP and showing its feasibility by
demonstrating that models trained using historical data from other
projects can be as accurate as JIT-SDP models that are trained on a
single project (also see Kamei et al. [31])
P61 Jiang et al. [30] 2013 building (file) change-level defect prediction model for each
developer from file modification histories (i.e., a personalized defect
prediction)
P62 Tarvo et al. [63] 2013 using a classification model to identify those pre-release code
changes that can cause post-release failures using code metrics,
change size, historical code churn, and organization metrics, and
also investigating impacts of changes on trunk and branches
P63 Kamei et al. [33] 2012 predicting defect-proneness of software changes with logistic
regression and quality assurance effort of software changes with
linear regression (EALR) from software change metrics
P64 Shivaji et al. [60] 2012 investigating feature selection techniques for change-level defect
prediction
P65 Kim et al. [36] 2008 proposing a JIT-SDP model based on Support Vector Machine (SVM)
and using the bag-of-words features to classify whether software
changes are defect-inducing or clean
P66 Aversano et al. [4] 2007 studying defect inducing change prediction by representing
software snapshots as TF-IDF vectors and software changes as
vector differences of two snapshots and by comparing multiple
classification and clustering algorithms
P67 Mockus and 2000 predicting from software change metrics with logistic
Weiss [49] regression the defect-proneness of the Initial Modification
Requests (IMR) in 5ESS network switch project
a The publication year is from the online publication date if available. The online publication date may be different

from the bibliographic or the final publication date.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:22 Y. Zhao et al.

REFERENCES
[1] Saleh Albahli. 2019. A deep ensemble learning method for effort-aware just-in-time defect prediction. Future Internet
11, 12 (2019), 246.
[2] Sousuke Amasaki, Hirohisa Aman, and Tomoyuki Yokogawa. 2021. A preliminary evaluation of CPDP approaches on
just-in-time software defect prediction. In Proceedings of the 47th Euromicro Conference on Software Engineering and
Advanced Applications (SEAA’21). IEEE, 279–286.
[3] Pasquale Ardimento, Lerina Aversano, Mario Luca Bernardi, Marta Cimitile, and Martina Iammarino. 2021. Just-in-
time software defect prediction using deep temporal convolutional networks. Neural Comput. Appl. (2021), 1–21.
[4] Lerina Aversano, Luigi Cerulo, and Concettina Del Grosso. 2007. Learning from bug-introducing changes to prevent
fault prone code. In Proceedings of the 9th International Workshop on Principles of Software Evolution: In Conjunction
with the 6th ESEC/FSE Joint Meeting. 19–26.
[5] Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, and Shane McIntosh. 2016. The relationship between com-
mit message detail and defect proneness in java projects on github. In Proceedings of the IEEE/ACM 13th Working
Conference on Mining Software Repositories (MSR’16). IEEE, 496–499.
[6] Kwabena E. Bennin, Nauman bin Ali, Jürgen Börstler, and Xiao Yu. 2020. Revisiting the impact of concept drift on
just-in-time quality assurance. In Proceedings of the IEEE 20th International Conference on Software Quality, Reliability
and Security (QRS’20). IEEE, 53–59.
[7] Markus Borg, Oscar Svensson, Kristian Berg, and Daniel Hansson. 2019. SZZ unleashed: An open implementation
of the SZZ algorithm-featuring example usage in a study of just-in-time bug prediction for the Jenkins project. In
Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality
Evaluation. 7–12.
[8] George G. Cabral, Leandro L. Minku, Emad Shihab, and Suhaib Mujahid. 2019. Class imbalance evolution and verifi-
cation latency in just-in-time software defect prediction. In Proceedings of the IEEE/ACM 41st International Conference
on Software Engineering (ICSE’19). IEEE, 666–676.
[9] Cagatay Catal. 2011. Software fault prediction: A literature review and current trends. Expert Syst. Appl. 38, 4 (2011),
4626–4636.
[10] Gemma Catolino, Dario Di Nucci, and Filomena Ferrucci. 2019. Cross-project just-in-time bug prediction for mobile
apps: An empirical assessment. In Proceedings of the IEEE/ACM 6th International Conference on Mobile Software Engi-
neering and Systems (MOBILESoft’19). IEEE, 99–110.
[11] Xiang Chen, Yingquan Zhao, Qiuping Wang, and Zhidan Yuan. 2018. MULTI: Multi-objective effort-aware just-in-time
software defect prediction. Info. Softw. Technol. 93 (2018), 1–13.
[12] Ruifeng Duan, Haitao Xu, Yuanrui Fan, and Meng Yan. 2022. The impact of duplicate changes on just-in-time defect
prediction. IEEE Trans. Reliabil. 71, 3 (2022), 1294–1308. DOI:10.1109/TR.2021.3061618
[13] Beyza Eken, RiFat Atar, Sahra Sertalp, and Ayşe Tosun. 2019. Predicting defects with latent and semantic features
from commit logs in an industrial setting. In Proceedings of the 34th IEEE/ACM International Conference on Automated
Software Engineering Workshop (ASEW’19). IEEE, 98–105.
[14] Beyza Eken, Selda Tufan, Alper Tunaboylu, Tevfik Guler, Rifat Atar, and Ayse Tosun. 2021. Deployment of a change-
level software defect prediction solution into an industrial setting. J. Softw.: Evol. Process 33, 11 (2021), e2381.
[15] Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2019. The impact of
changes mislabeled by SZZ on just-in-time defect prediction. IEEE Trans. Softw. Eng. (2019).
[16] Norman E. Fenton and Martin Neil. 1999. A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25,
5 (1999), 675–689.
[17] Wei Fu and Tim Menzies. 2017. Revisiting unsupervised learning for defect prediction. In Proceedings of the 11th Joint
Meeting on Foundations of Software Engineering. 72–83.
[18] Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, and Naoyasu Ubayashi. 2014. An empir-
ical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference
on Mining Software Repositories. 172–181.
[19] Jiri Gesi, Jiawei Li, and Iftekhar Ahmed. 2021. An empirical examination of the impact of bias on just-in-time de-
fect prediction. In Proceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM’21). 1–12.
[20] Trisha Greenhalgh and Richard Peacock. 2005. Effectiveness and efficiency of search methods in systematic reviews
of complex evidence: Audit of primary sources. BMJ 331, 7524 (2005), 1064–1065.
[21] Yuchen Guo, Martin Shepperd, and Ning Li. 2018. Bridging effort-aware prediction and strong classification: A just-
in-time software defect prediction study. In Proceedings of the 40th International Conference on Software Engineering:
Companion Proceeedings. 325–326.
[22] Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. A systematic literature review on
fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38, 6 (2011), 1276–1304.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:23

[23] Abram Hindle, Michael W. Godfrey, and Richard C. Holt. 2008. Reading beside the lines: Indentation as a proxy for
complexity metric. In Proceedings of the 16th IEEE International Conference on Program Comprehension. IEEE, 133–142.
[24] Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: An end-to-end
deep learning framework for just-in-time defect prediction. In Proceedings of the IEEE/ACM 16th International Confer-
ence on Mining Software Repositories (MSR’19). IEEE, 34–45.
[25] Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes.
In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 518–529.
[26] Seyedrebvar Hosseini, Burak Turhan, and Dimuthu Gunarathna. 2017. A systematic literature review and meta-
analysis on cross project defect prediction. IEEE Trans. Softw. Eng. 45, 2 (2017), 111–147.
[27] Qiao Huang, Xin Xia, and David Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-
in-time defect prediction. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution
(ICSME’17). IEEE, 159–170.
[28] Qiao Huang, Xin Xia, and David Lo. 2019. Revisiting supervised and unsupervised models for effort-aware just-in-time
defect prediction. Empir. Softw. Eng. 24, 5 (2019), 2823–2862.
[29] Hadi Jahanshahi, Dhanya Jothimani, Ayşe Başar, and Mucahit Cevik. 2019. Does chronology matter in JIT defect
prediction? A partial replication study. In Proceedings of the 15th International Conference on Predictive Models and
Data Analytics in Software Engineering. 90–99.
[30] Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 28th IEEE/ACM
International Conference on Automated Software Engineering (ASE’13). IEEE, 279–289.
[31] Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E.
Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21, 5 (2016),
2072–2106.
[32] Yasutaka Kamei and Emad Shihab. 2016. Defect prediction: Accomplishments and future challenges. In Proceedings
of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 5. IEEE,
33–45.
[33] Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi.
2012. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (2012), 757–773.
[34] Jonggu Kang, Duksan Ryu, and Jongmoon Baik. 2021. Predicting just-in-time software defects to reduce post-release
quality costs in the maritime industry. Softw.: Pract. Exper. 51, 4 (2021), 748–771. https://onlinelibrary.wiley.com/doi/
abs/10.1002/spe.2927.
[35] Chaiyakarn Khanan, Worawit Luewichana, Krissakorn Pruktharathikoon, Jirayus Jiarpakdee, Chakkrit Tantithamtha-
vorn, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, and Thanwadee Sunetnanta. 2020. JITBot: An explainable
just-in-time defect prediction bot. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software
Engineering (ASE’20). IEEE, 1336–1339.
[36] Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Trans.
Softw. Eng. 34, 2 (2008), 181–196.
[37] Barbara Kitchenham and Pearl Brereton. 2013. A systematic review of systematic review process research in software
engineering. Info. Softw. Technol. 55, 12 (2013), 2049–2075.
[38] Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software
Engineering. Technical Report EBSE-2007-01. School of Computer Science and Mathematics, Keele University.
[39] Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Sys-
tematic literature reviews in software engineering—A systematic literature review. Info. Softw. Technol. 51, 1 (2009),
7–15. https://doi.org/10.1016/j.infsof.2008.09.009
[40] Barbara Ann Kitchenham, David Budgen, and Pearl Brereton. 2015. Evidence-based Software Engineering and System-
atic Reviews. Vol. 4. CRC Press.
[41] Masanari Kondo, Daniel M. German, Osamu Mizuno, and Eun-Hye Choi. 2020. The impact of context metrics on
just-in-time defect prediction. Empir. Softw. Eng. 25, 1 (2020), 890–939.
[42] Ning Li, Martin Shepperd, and Yuchen Guo. 2020. A systematic review of unsupervised learning techniques for soft-
ware defect prediction. Info. Softw. Technol. (2020), 106287.
[43] Weiwei Li, Wenzhou Zhang, Xiuyi Jia, and Zhiqiu Huang. 2020. Effort-aware semi-supervised just-in-time defect
prediction. Info. Softw. Technol. 126 (2020), 106364.
[44] Zhiqiang Li, Xiao-Yuan Jing, and Xiaoke Zhu. 2018. Progress on approaches to software defect prediction. IET Softw.
12, 3 (2018), 161–175.
[45] Dayi Lin, Chakkrit Tantithamthavorn, and Ahmed E. Hassan. 2022. The impact of data merging on the
interpretation of cross-project just-in-time defect models. IEEE Trans. Softw. Eng. 48, 8 (2022), 2969–2986.
DOI:10.1109/TSE.2021.3073920

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
201:24 Y. Zhao et al.

[46] Jinping Liu, Yuming Zhou, Yibiao Yang, Hongmin Lu, and Baowen Xu. 2017. Code churn: A neglected metric in effort-
aware just-in-time defect prediction. In Proceedings of the ACM/IEEE International Symposium on Empirical Software
Engineering and Measurement (ESEM’17). IEEE, 11–19.
[47] Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Appl. Soft
Comput. 27 (2015), 504–518.
[48] Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? A longitudinal case study of
just-in-time defect prediction. IEEE Trans. Softw. Eng. 44, 5 (2017), 412–428.
[49] Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Tech. J. 5, 2 (2000), 169–180.
[50] Keita Mori and Osamu Mizuno. 2015. An implementation of just-in-time fault-prone prediction technique using text
classifier. In Proceedings of the IEEE 39th Annual Computer Software and Applications Conference, Vol. 3. IEEE, 609–612.
[51] Mathieu Nayrolles and Abdelwahab Hamou-Lhadj. 2018. CLEVER: Combining code metrics with clone detection
for just-in-time fault prevention and resolution in large industrial projects. In Proceedings of the 15th International
Conference on Mining Software Repositories. 153–164.
[52] Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained just-in-time defect prediction. J. Syst. Softw.
150 (2019), 22–36.
[53] Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A simpler, better, faster, finer-grained just-in-
time defect prediction. Retrieved from https://arXiv:2103.07068.
[54] Lei Qiao and Yan Wang. 2019. Effort-aware and just-in-time defect prediction with neural network. PloS One 14, 2
(2019), e0211359.
[55] Sophia Quach, Maxime Lamothe, Bram Adams, Yasutaka Kamei, and Weiyi Shang. 2021. Evaluating the impact of
falsely detected performance bug-inducing changes in JIT models. Empir. Softw. Eng. 26, 5 (2021), 1–32.
[56] Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A
systematic literature review. Info. Softw. Technol. 55, 8 (2013), 1397–1418.
[57] Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Proceedings of the 35th
International Conference on Software Engineering (ICSE’13). IEEE, 432–441.
[58] Gema Rodriguez-Perez, Meiyappan Nagappan, and Gregorio Robles. 2022. Watch out for extrinsic bugs! A case study
of their impact in just-in-time bug prediction models on the OpenStack project. IEEE Trans. Softw. Eng. 48, 4 (2022),
1400–1416. DOI:10.1109/TSE.2020.3021380
[59] Christoffer Rosen, Ben Grawi, and Emad Shihab. 2015. Commit Guru: Analytics and risk prediction of software com-
mits. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 966–969.
[60] Shivkumar Shivaji, E. James Whitehead, Ram Akella, and Sunghun Kim. 2012. Reducing features to improve code
change-based bug prediction. IEEE Trans. Softw. Eng. 39, 4 (2012), 552–569.
[61] Sadia Tabassum, Leandro L. Minku, Danyi Feng, George G. Cabral, and Liyan Song. 2020. An investigation of cross-
project learning in online just-in-time software defect prediction. In Proceedings of the IEEE/ACM 42nd International
Conference on Software Engineering (ICSE’20). IEEE, 554–565.
[62] Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceed-
ings of the IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 99–108.
[63] Alexander Tarvo, Nachiappan Nagappan, Thomas Zimmermann, Thirumalesh Bhat, and Jacek Czerwonka. 2013. Pre-
dicting risk of pre-release code changes with checkinmentor. In Proceedings of the IEEE 24th International Symposium
on Software Reliability Engineering (ISSRE’13). IEEE, 128–137.
[64] Hailemelekot Demtse Tessema and Surafel Lemma Abebe. 2021. Enhancing just-in-time defect prediction using
change request-based metrics. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and
Reengineering (SANER’21). IEEE, 511–515.
[65] Yuli Tian, Ning Li, Jeff Tian, and Wei Zheng. 2020. How well just-in-time defect prediction techniques enhance soft-
ware reliability? In Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security
(QRS’20). IEEE, 212–221.
[66] Parastou Tourani and Bram Adams. 2016. The impact of human discussions on just-in-time quality assurance: An
empirical study on openstack and eclipse. In Proceedings of the IEEE 23rd International Conference on Software Analysis,
Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 189–200.
[67] Alexander Trautsch, Steffen Herbold, and Jens Grabowski. 2020. Static source code metrics and static analysis warn-
ings for fine-grained just-in-time defect prediction. In Proceedings of the IEEE International Conference on Software
Maintenance and Evolution (ICSME’20). IEEE, 127–138.
[68] Zhou Xu, Kunsong Zhao, Tao Zhang, Chunlei Fu, Meng Yan, Zhiwen Xie, Xiaohong Zhang, and Gemma Catolino.
2022. Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans.
Reliabil. 71, 1 (2022), 204–220. DOI:10.1109/TR.2021.3066170
[69] Meng Yan, Xin Xia, Yuanrui Fan, Ahmed E. Hassan, David Lo, and Shanping Li. 2020. Just-in-time defect identification
and localization: A two-phase framework. IEEE Trans. Softw. Eng. (2020).

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.
A Systematic Survey of Just-in-time Software Defect Prediction 201:25

[70] Meng Yan, Xin Xia, Yuanrui Fan, David Lo, Ahmed E. Hassan, and Xindong Zhang. 2020. Effort-aware just-in-time
defect identification in practice: A case study at Alibaba. In Proceedings of the 28th ACM Joint Meeting on European
Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1308–1319.
[71] Limin Yang, Xiangxue Li, and Yu Yu. 2017. Vuldigger: A just-in-time and cost-aware tool for digging vulnerability-
contributing changes. In Proceedings of the IEEE Global Communications Conference (GLOBECOM’17). IEEE, 1–7.
[72] Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time
defect prediction. Info. Softw. Technol. 87 (2017), 206–220.
[73] Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In
Proceedings of the IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17–26.
[74] Xingguang Yang, Huiqun Yu, Guisheng Fan, Kai Shi, and Liqiong Chen. 2019. Local versus global models for just-in-
time software defect prediction. Sci. Program. 2019 (2019).
[75] Xingguang Yang, Huiqun Yu, Guisheng Fan, and Kang Yang. 2020. A differential evolution-based approach for effort-
aware just-in-time software defect prediction. In Proceedings of the 1st ACM SIGSOFT International Workshop on Rep-
resentation Learning for Software Engineering and Program Languages. 13–16.
[76] Xingguang Yang, Huiqun Yu, Guisheng Fan, Kang Yang, and Kai Shi. 2019. An empirical study on progressive sampling
for just-in-time software defect prediction. In Proceedings of the International Workshop on Quantitative Approaches to
Software Quality in conjunction with the Asia-Pacific Software Engineering Conference (QuASoQ@APSEC’19). 12–18.
[77] Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016.
Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In
Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 157–168.
[78] Steven Young, Tamer Abdou, and Ayse Bener. 2018. A replication study: Just-in-time defect prediction with ensemble
learning. In Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software
Engineering. 42–47.
[79] Abubakar Zakari, Sai Peck Lee, Rui Abreu, Babiker Hussien Ahmed, and Rasheed Abubakar Rasheed. 2020. Multiple
fault localization of software programs: A systematic literature review. Info. Softw. Technol. 124, 106312 (2020), 1–20.
https://www.sciencedirect.com/science/article/pii/S0950584920300641.
[80] Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction:
How far are we? In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.
427–438.
[81] Wenzhou Zhang, Weiwei Li, and Xiuyi Jia. 2019. Effort-aware tri-training for semi-supervised just-in-time defect
prediction. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 293–304.
[82] Kunsong Zhao, Zhou Xu, Meng Yan, Yutian Tang, Ming Fan, and Gemma Catolino. 2021. Just-in-time defect prediction
for Android apps via imbalanced deep learning model. In Proceedings of the 36th Annual ACM Symposium on Applied
Computing. 1447–1454.
[83] Kunsong Zhao, Zhou Xu, Meng Yan, Lei Xue, Wei Li, and Gemma Catolino. 2021. A compositional model for effort-
aware just-in-time defect prediction on android apps. IET Softw. (2021).
[84] K. Zhao, Z. Xu, T. Zhang, Y. Tang, and M. Yan. 2021. Simplified deep forest model-based just-in-time defect prediction
for Android Mobile apps. IEEE Trans. Reliabil. (2021), 1–12. https://doi.org/10.1109/TR.2021.3060937
[85] Kun Zhu, Nana Zhang, Shi Ying, and Dandan Zhu. 2020. Within-project and cross-project just-in-time defect predic-
tion based on denoising autoencoder and convolutional neural network. IET Softw. 14, 3 (2020), 185–195.
[86] Xiaoyan Zhu, Binbin Niu, E. James Whitehead Jr., and Zhongbin Sun. 2018. An empirical study of software change
classification with imbalance data-handling methods. Softw.: Pract. Exper. 48, 11 (2018), 1968–1999.

ACM Computing Surveys, Vol. 55, No. 10, Article 201. Publication date: February 2023.

You might also like