A Longitudinal Study On The Effect of Patches On Code Coverage and Software System Maintainability
A Longitudinal Study On The Effect of Patches On Code Coverage and Software System Maintainability
1 Introduction
Meir Lehman, a prominent figure in the field of software engineering argued that
a software system’s enduring utility and success hinge on its ability to evolve
continuously; failure to do so leads to a decline in relevance and quality [13,14].
Lehman’s laws of evolution posit that a software system’s functional capabilities
must evolve to maintain user satisfaction, inevitably resulting in an increase in
system size and complexity over time with a concurrent decline in system quality
unless actively monitored and addressed.
Central to the concept of software evolution is the source code, the centre of a
multifaceted process that requires the co-evolution of various artifacts, including
unit tests (also known as developer tests). These tests, proven effective in identi-
fying bugs [23], play a pivotal role in ensuring the continued proper functioning
of a system. One measure of determining how thorough a test suite is, is known
as code coverage. Code coverage refers to the number of source code lines that
are executed when the test suite is run.
There is a delicate relationship between the source code and its tests because
as the software evolves even minor code changes and refactoring efforts can dis-
rupt existing tests [20,21] and significantly alter code coverage [6]. This under-
scores the need for the continued maintenance of tests as the source code, itself,
evolves.
Software typically evolves through incremental changes or modifications to
source code repositories by means of commits. In this paper, these modifications
are termed “patches”. A modification refers to the alteration, deletion, or addi-
tion of one or more lines within one or more files. A patch can modify source
code files (production or test) or non-source files such as a README file. Patch
testing specifically evaluates the testing of modified code, focusing on the extent
to which the altered code is tested and covered. For the remainder of the study,
the terms patch and commit are used interchangeably.
Previous research efforts have extensively investigated the co-evolution of test
and source code, however, this has been done through mining multiple, stable
release versions of systems [24]. While such an approach provides information
at stable points in a system’s development, it does not provide insights into
the development process at day-to-day level. On the other hand, studies using
more fine-grained empirical data have predominantly aimed to investigate the
synchronous or sequential alteration of production code and test code [15,17,25,
26]. Limited work has been done to understand how test coverage is affected by
incremental changes. To date, only two studies from Marinescu et al. [16] and
Hilton et al. [9] have investigated how incremental changes effect test coverage.
As hosting providers have become more sophisticated over time in terms of
the services they offer, new opportunities have arisen to study evolution of source
code and accompanying tests. GitHub [7], for example, now allows development
teams to create flexible build pipelines which can be used to take the source code
in a repository through a number of different stages, including unit testing, and
ultimately deploy it into production. GitHub also affords tight integration with
third-party code coverage services. These services enable the development team
to graphically visualise and track their coverage statistics. GitHub’s built-in
build pipeline functionality, together with the public APIs offered by the third-
party coverage services that it integrates with, has both increased the number of
open-source projects which generate coverage statistics and made these statistics
accessible.
In this paper, the work of Marinescu et al. [16] and Hilton et al. [9] is extended
by considering a different dataset, and specifically making use of code coverage
service data, to investigate the maintainability of incremental changes (patches)
The Effect of Patches on Code Coverage and Maintainability 81
and the relationship between patch testing and its impact on incremental change
maintainability.
2 Related Work
Marinescu et al. [16] pioneered the exploration of patch coverage in small incre-
mental changes and subsequently established a formal definition for this concept.
To investigate the co-evolution of test and production code, and patch coverage,
the authors developed the COVRIG tool and conducted a study involving six
open-source software (OSS) projects written in C/C++. The authors selected
250 revisions per project, iteratively checking out each revision, compiling and
collecting coverage information from coverage reports. Their findings revealed
that patches were either fully covered or not covered at all, with engineers sel-
dom opting for partial patch coverage. Additionally, the study observed that
testing appeared to be a phased activity for five out of six projects, occurring
intermittently after extended periods of development. Hilton et al. [9] expanded
upon the research conducted by Marinescu et al., introducing an investigation
into how covered lines transition between patches and the overall impact of patch
coverage. Hilton et al. employed a mixed method to collecting coverage informa-
tion by using coverage service tool Coveralls [18] along with manually compiling
code and collecting coverage information. They chose 47 projects spanning dif-
ferent programming languages and utilised 250 revisions for projects hosted on
Coveralls. Hilton et al. present slightly contradicting results from Marinescu et
al., whereby they state that patch coverage is not bimodal, rather, it varies from
patch to patch with no-discernible pattern. Notably, Hilton et al. identified an
intriguing phenomenon termed “flipping”, where some patches led to changes in
the coverage status of lines, transitioning from previously covered to uncovered
in the new modification.
Zaidman et al. [25,26] studied whether production and test code co-evolve
synchronously at a commit level using two Java projects. The authors observed
two patterns of evolution, synchronous - where production and test code are
changed together and phased - where production and test code are changed sep-
arately. Stanislav et al. [15] examined sixty-one projects with a total of 240000
commits to examine the co-evolution of test and production maintenance. Their
results showed that, in the majority of cases, production code changes, in par-
ticular, code fixes happens solely without modifying test code. Marsavina et
al. [17], mined five open-source projects and used association rules to identify
co-evolution patterns. Their results showed six distinct co-evolution patterns.
3 Research Questions
1. What is the distribution of patch sizes? This inquiry aims to assess the total
number of lines affected per patch (i.e. magnitude of each patch), potentially
unveiling insights into the incremental changes that occur as the system and
tests evolve.
2. What is the distribution of patch coverage across the revisions of the sys-
tem? Analysing patch testing activities could provide insights into testing
practices within an open-source environment and help understand why the
system coverage is as it appears.
3. How does individual patch coverage affect overall system coverage? The rea-
sonable hypothesis would be that higher patch coverage implies an improved
system coverage and vice versa, therefore, this question aims at validating
this hypothesis.
4. How maintainable is a typical patch, and how does patch maintainability vary
across revisions? Maintainability is measured using the Software Improve-
ment Group’s (SIG) maintainability model [12]. This question is investigated
because along with patch testing, it is important to understand how main-
tainability varies at the level of incremental changes.
Aside from the addition of a novel research question, this study adds value
by making use of an almost entirely different dataset to the studies that it
replicates. Lastly, and importantly, the methodology used here is different in
that this work exclusively makes use of commercial coverage services which offer
free coverage reporting for open-source projects. Using existing coverage services
greatly simplifies the calculation of coverage statistics, enabling a greater breadth
of projects to be covered or a more in-depth analysis of individual projects to
be conducted, by considering a greater number of project builds.
Patch and system code coverage are determined by compiling and executing a
project’s test suite along with the production code being tested, and recording
which production code statements have been hit or missed. Downloading and
compiling projects can pose significant challenges due to project dependency
mismatches and resource requirements, sometimes leading to compile failures
for different revisions [25]. Marinescu et al. [16] attempted to address this using
virtualisation, but acknowledged its continued resource intensity. Hilton et al.
[9] adopted an approach in which they manually ran the test suites for some
projects but used the Coveralls code coverage service for others.
All the projects that were selected for this study are hosted on GitHub [7].
GitHub serves as a central data hub for a huge number of open-source projects
The Effect of Patches on Code Coverage and Maintainability 83
due to its openness and licensing nature in contrast to proprietary source control
management systems [19].
To identify possible projects, the sampling strategy employed by Pinto et al.
[22] was adopted. In order for a repository to be considered it needed to be:
1. popular - using number of stars, forks, and contributors as measure of popu-
larity,
2. actively maintained, and
3. stewarded by a well-respected open-source organisation, such as the Apache
Foundation, or private company, such as Microsoft.
Repositories meeting the above criteria were then manually inspected to iden-
tify README files containing coverage badges from either CodeCov (eg.
) or Coveralls (eg. ). These badges indicate that the
project in question was submitting data to the respective coverage service (with
the badge giving the project’s overall code coverage). Projects with diverse range
of coverage, and meeting the additional criteria of having a minimum of one hun-
dred builds over a two year period were finally selected. This last criteria ensured
that each of the projects chosen has sufficient historical data for longitudinal
analysis to be conducted.
It is noted that none of the projects from Marinescu et al. study utilised
a coverage service; hence, they were excluded. Five projects appearing in the
Hilton et al. study were selected; however, a larger number of revisions were
available for analysis in this investigation.
Sudden Coverage Drops - The most common data problem observed was
either build failures or multi-triggered builds. These issues often lead to the cov-
erage services’ API failing to retrieve data from a build. This results in coverage
spiking and/or incorrectly reported coverage. In an exchange with the Codecov
team [10], it was revealed that sometimes build failures can affect how the API
and subsequently the user interface may retrieve and display data from the build.
For example, the Apache/libcloud build details as retrieved from Coveralls API
in Listing 1 shows two consecutive builds, the first one returns correct data while
the second returns 0% coverage. Such a response is highly unlikely and is flagged
as a corrupted build in order to maintain the data integrity of the study. Due
to these improbable circumstances, a 30% maximum drop in coverage is chosen
to filter out the noise in the data. An exponentially-weighted moving average
(EWMA) is used to smooth the data from such outliers. Take the example of
the Apache Gobblin project in Fig. 1 (top), there are numerous points where the
coverage is picked-up as zero. Upon further investigation, it was discovered that
these commits contained minimal changes, indicating that they could not have
been the sole cause of the significant decrease in coverage from 45% to 0%.
A 30-point EWMA is applied selectively to maintain genuine coverage points,
and is calculated only when a coverage value has deviated by a 30% or more
from the previous value. The noise reduction that is achieved by adopting this
approach can be seen in the bottom graph of Fig. 1.
1 [ {"repo_name": "apache/libcloud",
2 "branch": "trunk",
3 "commit_sha": "af264fecea1adc8ded707094854a3c64790c3285",
4 "coverage_change": 36.8,
5 "covered_percent": 36.791,
6 "covered_lines": 18742,
7 "missed_lines": 32199,
8 "relevant_lines": 50941,
9 "covered_branches": 0,
10 },
11 {"repo_name": "apache/libcloud",
12 "branch": null,
13 "commit_sha": null,
14 "coverage_change": 0.0,
15 "covered_percent": 0.0,
16 "covered_lines": 0,
17 "missed_lines": 0,
18 "relevant_lines": 0,
19 "covered_branches": 0,
20 },
21 ... # excluded for brevity
22 ]
Fig. 1. Apache/Gobblin before data cleaning (top graph) and after data cleaning (bot-
tom graph) with an EWMA
86 E. B. Mamba and S. P. Levitt
that adds the patch”. In other words, unlike system coverage, patch coverage
is a far more granular measure focused on newly added code changes [1,11].
Executed lines are lines that are invoked during a test execution whilst executable
lines are all “source lines” that have the potential to be invoked, yet may not
necessarily be invoked. This definition excludes statements like comments as
these are ignored by both interpreters and compilers.
To compute patch coverage for a given commit, the modified files, and the
corresponding modified lines, are first identified. Then for each modified line,
the line’s coverage status is extracted from the code coverage service. The line’s
coverage status defines whether the line was executed or not. Coveralls represents
line coverage status of executable lines as zero (line was not executed during the
test run) or N ∈ {1, . . . , ∞}, where N represents how many times the line was
invoked during the test run. Codecov represents the line coverage status with a
zero or one. Zero denotes lines that are not executed while one represents lines
that are run by the tests. Codecov also has the concept of half-covered lines,
termed partial coverage. Partially covered lines are branches with one or more
omitted execution paths, and these are represented with a line coverage status of
two. In this context, partially covered lines are not counted as executed (covered)
lines and are added to the count of executable lines.
Cyclomatic
Unit Complexity
Complexity
Stability
Testability
Unit Interface No. of Parameters
Fig. 2. Relation between SIG Maintainability Model and source code measures and
ISO/IEC 25010 quality characteristics
The Effect of Patches on Code Coverage and Maintainability 87
Collecting and computing the above metrics required multiple RESTful API calls
to the two coverage services. In addition to the API calls, the computation of
the maintainability metric and the patch coverage required cloning each project
repository in order to iterate over the commit history. Performing all these steps
on a local machine can easily become cumbersome and inefficient due to the
processing power required. Thus, a virtual machine and containers were utilised
to collect and compute all the metrics. A virtual machine hosted on the Microsoft
Azure cloud running Ubuntu-20.04 served as the host operating system. The
Docker containerisation engine [5] was chosen for its popularity and efficiency.
To ensure data collection from the coverage services was conducted in parallel,
two containers were deployed with one dedicated to collecting and computing
metrics using Coveralls while the other used Codecov.
The study data consists of projects from both Codecov (Table 1) and Coveralls
(Table 2). In total 50, 666 revisions across 46 projects in 10 programming lan-
guages were analysed. Tables 1 and 2 present the projects, along with the time
in months denoting the difference between the first and last revision (build)
dates, as well as the number of revisions during this period as well as the system
coverage as measured for the first and last revisions.
Hilton et al. [9] argues that smaller patches are easier for engineers to understand,
while larger patches may necessitate external strategies for comprehension due to
potential complexity. To investigate the size of patches, patches touching non-
source files only are excluded from the analysis. This exclusion of non-source
files patches led to the exclusion of projects CL{09,18} as their entire build
history is based on non-source file patches. The distribution of patch sizes across
revisions illustrated by the whisker box in Fig. 3 indicates that the typical size
of patch is around 10 lines. It can also be observed that the upper quartile of
the distribution indicates that most of the patches contain fewer than 1000 lines
of code, suggesting that commits rarely exceed this threshold. These findings
align with the work of Hilton et al. but differ slightly from Marinescu et al., who
reported a lower number of lines changed, ranging from 4 to 7.
A number of outliers are observed in this distribution, especially project
CV10, which has a patch that has close to 100000 lines of code. Upon further
examination this patch was revealed to be a merge patch that added just over
62000 lines of code from 3000 files. Further examination of the other outliers
revealed that most of the commits of over 1000 LoC are actually merge commits.
88 E. B. Mamba and S. P. Levitt
CV26
CV25
CV24
CV23
CV22
CV21
CV20
CV19
CV18
CV17
CV16
CV15
CV14
CV13
CV12
CV11
CV10
CV09
CV08
CV07
CV06
CV05
CV04
CV03
CV02
CV01
CL20
CL19
CL17
CL16
CL14
CL13
CL12
CL11
CL10
CL08
CL07
CL06
CL05
CL04
CL03
CL02
CL01
Fig. 3. Distribution of patch size: whisker/box plot representing the minimum, median
and maximum patch size of patches modifying source code. The (log scale) x-axis
represents the size of a patch in lines of code.
The Effect of Patches on Code Coverage and Maintainability 89
CV26
CV25
CV24
CV23
CV22
CV21
CV20
CV19
CV18
CV17
CV16
CV15
CV14
CV13
CV12
CV11
CV10
CV09
CV08
CV07
CV06
CV05
CV04
CV03
CV02
CV01
CL20
CL19
CL17
CL16
CL14
CL13
CL12
CL11
CL10
CL08
CL07
CL06
CL05
CL04
CL03
CL02
CL01
0% 25% 50% 75% 100%
Fig. 4. Distribution of patch coverage for patches touching source files. Each colour
represents the range and the size of the bar is the percentage of the patches whose
coverage lies within the range.
from b1 (where b1 is the earlier build) to b2 (where b2 is the later build) is less
than zero, the patch is said to have decreased (or negatively impacted) coverage,
if the difference is greater than zero, the patch is said to have increased (or
positively impacted) coverage, and if the difference is zero, the patch is classified
as having no impact or change to overall coverage.
The majority of patches depicted in Fig. 5, particularly those altering non-
source files, show a significant positive and negative impact. These findings align
with Hilton et al.’s research [9], who asserted that patches involving non-source
files can influence patch coverage. However, it is noted that such magnitudes
could also be influenced by the absence of a one-to-one correspondence between
commits and builds. Intermediate commits preceding a build-triggering com-
mit may not be fully considered, potentially leading to fluctuations in coverage
that are not accurately attributed. Conversely, projects CV{15,16,20,23,24} and
CL19 demonstrate behavior deemed as “plausible”, wherein patches modifying
non-source files exhibit less influence on system coverage. Note in both modifi-
cation types is the prevalence of patches with no impact/change, which aligns
with the distribution of patch coverage. Interestingly, projects CL{09,18}, hav-
ing build histories in which only non-source code files are modified, have some
patches which increase and decrease system coverage. It is suspected that this
phenomenon can be attributed to intermediate commits that did not initiate a
build, or to the modification of build scripts, such as Makefile’s, which could
cause coverage changes. To further assess the relationship between patch cov-
The Effect of Patches on Code Coverage and Maintainability 91
CV26
CV25
CV24
CV23
CV22
CV21
CV20
CV19
CV18
CV17
CV16
CV15
CV14
CV13
CV12
CV11
CV10
CV09
CV08
CV07
CV06
CV05
CV04
CV03
CV02
CV01
CL20
CL19
CL18
CL17
CL16
CL15
CL14
CL13
CL12
CL11
CL10
CL09
CL08
CL07
CL06
CL05
CL04
CL03
CL02
CL01
0% 25% 50% 75% 100%
Fig. 5. Distribution of patch impact. Colour represents the range and size represents
the percentage of patches whose impact lies within the range.
erage and overall coverage Kendall’s coefficient is computed and Cohen’s inter-
pretation of correlation strength is employed, revealing a correlation strength of
0.239, classified by Cohen as weak [2].
According to Di Biase et al. [4], the DMM explores the changes to source code as
the “ratio of good changes over the sum of good and bad changes”. The resulting
value of this ratio is, therefore, between 0 and 1, where zero indicates a “bad”
(unmaintainable) change and one indicates “good” (maintainable) change.
Figure 6 illustrates the distribution of DMM score per patch. The DMM bins
are split as 0, (0–0.25], (0.25–0.50], (0.50–0.75], (0.75–1.00) and 1.00. As in the
case of patch coverage, the bins, 0 and 1.00 are specifically chosen to separate
the scores of patches with DMM scores at the extremes, indicating a poor patch
and good patch (from a maintainability perspective), respectively. As the DMM
score is specifically designed for source code, any patches modifying non-source
files will inherently have a DMM score of zero, and therefore, these patches are
excluded. Note that the majority of patches have DMM scores either in the
(0.50–0.75] bin or 0, which means patches are either “somewhat” maintainable
or not maintainable at all. Kendall’s coefficient is also computed to assess the
influence of testing practices on maintainability. The relationship between patch
coverage and DMM is found to be very weak, with a τ value of 0.160.
92 E. B. Mamba and S. P. Levitt
CV26
CV25
CV24
CV23
CV22
CV21
CV20
CV19
CV18
CV17
CV16
CV15
CV14
CV13
CV12
CV11
CV10
CV09
CV08
CV07
CV06
CV05
CV04
CV03
CV02
CV01
CL20
CL19
CL17
CL16
CL14
CL13
CL12
CL11
CL10
CL08
CL07
CL06
CL05
CL04
CL03
CL02
CL01
0% 25% 50% 75% 100%
Fig. 6. Distribution of DMM: Each colour represent the range and the size of the bar
is the percentage of the patches whose coverage lies within the range
6 Conclusion
A longitudinal study examining testing and maintainability dynamics for small
incremental changes is presented. By mining over 50, 000 build histories from
The Effect of Patches on Code Coverage and Maintainability 93
two popular coverage services, Codecov and Coveralls, it is found that patches
are generally either fully covered or entirely uncovered, with rare instances of
partial coverage. Patch coverage shows a weak correlation with system coverage,
indicating that this metric alone cannot be used to infer the trajectory of system
coverage. This, therefore, implies for the development team that patch metrics
must be used in conjunction with system metrics for a more thorough under-
standing of whether the system coverage is improving or not. Patch coverage
is also found to have a weak correlation with patch maintainability, suggesting
that patch testing may not significantly enhance maintainability. This study con-
tributes to research on fine-grained changes to software systems by replicating
two existing studies on an expanded dataset with new projects and by analysing
these projects far more comprehensively through processing many more builds.
Additionally, the value of leveraging coverage service data for research purposes
is emphasised, albeit with a need for data scrutiny and the introduction of data
cleaning processes.
References
1. Ben, S.: Patch coverage. https://seriousben.com/posts/2022-02-patch-coverage/.
Accessed Feb 2022
2. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erl-
baum Associates (LEA), 2nd edn. (1988)
3. Coveralls: API returning same commit hash yet different coverage percentage and
date (2023). https://github.com/lemurheavy/coveralls-public/issues/1648. (Per-
sonal Communication)
4. Di Biase, M., Rastogi, A., Bruntink, M., van Deursen, A.: The delta maintain-
ability model: Measuring maintainability of fine-grained code changes. In: 2019
IEEE/ACM International Conference on Technical Debt (TechDebt), pp. 113–122
(2019). https://doi.org/10.1109/TechDebt.2019.00030
5. Docker: Make better, secure software from the start. https://www.docker.com/.
Accessed Aug 2023
6. Elbaum, S., Gable, D., Rothermel, G.: The impact of software evolution on code
coverage information. In: Proceedings IEEE International Conference on Software
Maintenance. ICSM 2001, pp. 170–179 (2001). https://doi.org/10.1109/ICSM.
2001.972727
7. GitHub: Let’s build from here: The world’s leading AI-powered developer platform.
https://github.com/. Accessed Nov 2023
8. Heitlager, I., Kuipers, T., Visser, J.: A practical model for measuring maintainabil-
ity. In: 6th International Conference on the Quality of Information and Commu-
nications Technology (QUATIC 2007), pp. 30–39 (2007). https://doi.org/10.1109/
QUATIC.2007.8
9. Hilton, M., Bell, J., Marinov, D.: A large-scale study of test coverage evolution.
In: 2018 33rd IEEE/ACM International Conference on Automated Software Engi-
neering (ASE), pp. 53–63 (2018). https://doi.org/10.1145/3238147.3238183
10. Hu, T.: Github issue: Patch coverage formula vs overall coverage ratio. https://
github.com/codecov/feedback/issues/55. Accessed Aug 2023
11. Hu, T.: Why patch coverage is more important than project cover-
age. https://about.codecov.io/blog/why-patch-coverage-is-more-important-than-
project-coverage/. Accessed Jan 2024
94 E. B. Mamba and S. P. Levitt
12. Kuipers, T., Visser, J.: Maintainability index revisited: position paper. In: Special
Session on System Quality and Maintainability (SQM 2007) of the 11th European
conference on software maintenance and reengineering (CSMR 2007) (2007)
13. Lehman, M.M., Belady, L.A.: Program evolution: processes of software change.
Academic Press Professional, USA (1985)
14. Lehman, M.: Programs, life cycles, and laws of software evolution. Proc. IEEE
68(9), 1060–1076 (1980). https://doi.org/10.1109/PROC.1980.11805
15. Levin, S., Yehudai, A.: The co-evolution of test maintenance and code maintenance
through the lens of fine-grained semantic changes. In: 2017 IEEE International
Conference on Software Maintenance and Evolution (ICSME), pp. 35–46 (2017).
https://doi.org/10.1109/ICSME.2017.9
16. Marinescu, P., Hosek, P., Cadar, C.: Covrig: a framework for the analysis of code,
test, and coverage evolution in real software. In: Proceedings of the 2014 Interna-
tional Symposium on Software Testing and Analysis, pp. 93—104. ACM (2014).
https://doi.org/10.1145/2610384.2610419
17. Marsavina, C., Romano, D., Zaidman, A.: Studying fine-grained co-evolution pat-
terns of production and test code. In: 2014 IEEE 14th International Working Con-
ference on Source Code Analysis and Manipulation, pp. 195–204 (2014). https://
doi.org/10.1109/SCAM.2014.28
18. Merwin, N., Donahoe, L.: Coveralls: deliver better code. https://coveralls.io/.
Accessed Nov 2023
19. Midha, V., Palvia, P.: Factors affecting the success of open source software. J. Syst.
Softw. 85(4), 895–905 (2012). https://doi.org/10.1016/j.jss.2011.11.010
20. Moonen, L., van Deursen, A., Zaidman, A., Bruntink, M.: On the interplay between
software testing and evolution and its effect on program comprehension. In: Soft-
ware Evolution, pp. 173–202. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-76440-3 8
21. Nierstrasz, O., Demeyer, S.: Object-oriented reengineering patterns. In: Proceed-
ings of the 26th International Conference on Software Engineering, pp. 734—735.
ICSE 2004, IEEE Computer Society, USA (2004)
22. Pinto, L.S., Sinha, S., Orso, A.: Understanding myths and realities of test-suite
evolution. In: SIGSOFT FSE (2012). https://api.semanticscholar.org/CorpusID:
9072512
23. Runeson, P.: A survey of unit testing practices. IEEE Softw. 23(4), 22–29 (2006).
https://doi.org/10.1109/MS.2006.91
24. Yu, L., Mishra, A.: An empirical study of Lehman’s law on software quality evo-
lution. Int. J. Softw. Inform. 7, 469–481 (2013)
25. Zaidman, A., Rompaey, B., Deursen, A., Demeyer, S.: Studying the co-evolution
of production and test code in open source and industrial developer test processes
through repository mining. Empir. Softw. Eng. 16(3), 325–364 (2011)
26. Zaidman, A., Van Rompaey, B., Demeyer, S., van Deursen, A.: Mining software
repositories to study co-evolution of production and test code. In: 2008 1st Inter-
national Conference on Software Testing, Verification, and Validation, pp. 220–229
(2008). https://doi.org/10.1109/ICST.2008.47