Evolution of the Practice of Software Testing in Java Projects
Evolution of the Practice of Software Testing in Java Projects
Java Projects
Anisha Islam∗ , Nipuni Tharushika Hewage† , Abdul Ali Bangash‡ , and Abram Hindle§
Department of Computing Science, University of Alberta
Edmonton, AB, Canada
Email: ∗ [email protected], † [email protected], ‡ [email protected], § [email protected]
Abstract—Software testing helps developers minimize bugs and become irrelevant as per the new practices. To ensure that the
errors in their code, improving the overall software quality. findings of different studies remain consistent, they should be
In 2013, Kochhar et al. analyzed 20,817 software projects in re-evaluated over time to uncover any changes in conclusions
order to study how prevalent the practice of software testing
is in open-source projects. They found that projects with more due to the changing environment since the original studies
lines of code (LOC) and projects with more developers tend to were conducted [7], [8]. By validating the stability of the
have more test cases. Additionally, they found a weak positive conclusions of a study, developers can determine which areas
correlation between the number of test cases and the number of require further improvement and whether their focus should
bugs. Since the conclusions of a study might become irrelevant shift in accordance with the updated results.
over time because of the latest practices in the relevant fields,
in this paper, we investigate if these conclusions remain valid In this study, we aim to reproduce some of the research
if we re-evaluate Kochhar et al.’s findings on the Java projects questions of Kochhar et al. by using open-source Java projects
that were developed from 2012 to 2021. For evaluation, we use on GitHub from 2012 to 2021. To observe the trend in software
a random sample of 20,000 open-source Java projects each year. testing practices and how they have evolved in the past decade,
Our results show that Kochhar et al.’s conclusions regarding
we validate the stability of Kochhar et al.’s findings over time.
the projects with test cases having more LOC, the weak positive
correlation between the number of test cases and authors, and the We investigate the recent relationship between different project
weak positive correlation between the number of test cases and entities and test cases. Our study can help developers re-
bugs remain stable until 2021. Our study corroborates Kochhar adjust their understanding of how Java projects on open-source
et al.’s conclusions and helps developers refocus in light of the platforms adopt software testing. We will refer to Kochhar et
latest findings regarding the practice of software testing.
al. [4] as KOCHHAR S TUDY [4] from here onwards.
Index Terms—Software Testing, Replication Study, WoC, Java
Projects We re-investigate three research questions in this paper:
I. I NTRODUCTION RQ1: “How many projects have test cases?”
RQ2: “Does the number of developers affect the number of
Software testing is essential because it helps ensure the
test cases present in a project?”
functionality [1], reliability [2], and security of a software
RQ3: “Does the presence of test cases correlate with the
system [3], leading to higher quality and user satisfaction.
number of bugs?”
Lack of software testing can leave unhandled bugs in the
system that could lead to severe financial issues and wasted — Quoted from KOCHHAR S TUDY [4].
resources [4]. Since software testing is a resource-consuming To find out if the conclusions of the first three research
task [5], [6], researchers are continuously looking for ways to questions of KOCHHAR S TUDY [4] have been stable over the
help developers test their code effectively. Besides, developers years, we used the World of Code (WoC) [9] infrastructure
also need the latest information on software testing trends to provided by the MSR 2023 [10] organizers to extract the
recognize potential sources of vulnerability in the system and names of the Java projects on GitHub from 2012 to 2021.
concentrate on relevant details while developing tools. Then we used Git [11] commands to clone the repositories
To understand the practice of software testing in open- and to extract relevant project information required for the
source projects, in one of the previous studies, Kochhar et research questions mentioned above. We took a random sample
al. [4] investigated how software testing is adopted on GitHub of 20,000 Java projects per year, discarded projects with no
projects using software project data from 2012 [4]. The goal lines of code, and only considered projects hosted on GitHub.
of their study was to determine the popularity of software Although KOCHHAR S TUDY [4] considered projects from
testing in open-source projects and to examine various aspects multiple programming languages, our study is limited to the
of software development related to testing [4]. Among the Java projects on GitHub only as Java is one of the most
findings, Kochhar et al. showed that the project size, the count popular and frequently used programming languages [12].
of authors, and the bug count positively correlate with the Furthermore, Java offers a wide range of popular testing
number of test cases [4]. libraries such as JUnit, TestNG, PowerMock, Mockito, and
However, over the years, the software community has Hamcrest. Therefore, we decided to observe the latest trends
undergone changes, for which the results of a study might in Java’s software testing practices.
368
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on January 22,2025 at 17:42:55 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Number of projects in our sample after applying the exclusion criteria
Year 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Projects 14,665 14,353 14,922 15,317 16,208 15,785 14,964 15,476 16,924 17,862
TABLE II: Differences in methodology between KOCHHAR S TUDY [4] and our study
Our study KOCHHAR S TUDY [4]
Time range 2012 to 2021 2012
Number of projects 20,000 per year 20,817
Project collection Used WoC Used GitHub API
Calculating LOC Git commands SLOCCount [17] utility
Counting bugs Used bug referencing commits Used GitHub issue tracking system
Definition of test case Calculated test cases using @Test annotation Considered test files same as test cases
Programming language Java Multiple programming languages
TABLE III: The trend (in %) of projects having test cases and test files over the past decade (2012-2021)
Year 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
% of projects containing test files 48.37 45.99 44.65 55.38 57.87 59.97 62.87 61.17 60.43 58.64
% of projects containing test cases 29.06 27.58 24.81 28.24 42.65 47.60 50.92 50.70 50.31 48.96
Fig. 1: Comparison of Lines of Code between Java Projects Fig. 2: Comparison of Number of Authors between Java
with and without Test Cases from 2012-2021 Projects, with and without Test Cases, from 2012-2021
369
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on January 22,2025 at 17:42:55 UTC from IEEE Xplore. Restrictions apply.
We also manually investigated our data and identified some
projects with zero developers. This could have been a technical
issue, so we discarded such data points to ensure a valid
number of authors for every project in our sample.
To confirm the relationship between the number of authors
in projects that contain test cases and projects that do not,
KOCHHAR S TUDY [4] conducted the MWW test and observed
p-value < 2.2 e−16 [4]. In our study, the p-values were between
2.2 e−16 and 0.01 for all the years except for 2017, when
the p-value was 0.065 (> α). As a result, we can say that
the conclusion of KOCHHAR S TUDY [4] for this particular
scenario, “The results signify that the difference between these Fig. 3: Comparison of Number of Bugs between Java Projects,
two sets is statistically significant” [4], is stable for all years with and without Test Cases, from 2012-2021
from 2012 to 2021 with an exception in 2017. this assumption, they calculated Spearman’s rho, which signi-
KOCHHAR S TUDY [4] also used Spearman’s rho to measure fied a weak positive correlation with a value of 0.181, [4]. In
the correlation between the number of authors and the number our study, We found the ρ values for all years to be between
of test cases, giving ρ = 0.207. In our study, we observed the 0.25 and 0.417 (p-values < 2.2 e−16 ). For this reason, we can
values between 0.216 and 0.338 (p-value = 2.2 e−16 ) for all say that the finding of KOCHHAR S TUDY [4], “Projects having
years, including 2017. Our result validates the finding that higher numbers of test cases observe an increase in the number
“there is a weak positive correlation between the number of of bugs, although the correlation is weak between them.” [4],
developers and test cases” [4]. remains stable for all the years from 2012-2021.
Furthermore, KOCHHAR S TUDY [4] examined the correla- We also manually investigated some cases where the bug
tion between the number of test cases per developer and the count is high but the test cases count is low and vice versa.
number of developers and observed a negative correlation (ρ Three of the projects we explored had 6, 040, 9, 841, and
= −0.444) [4]. Similarly, we observed Spearman’s rho values 12, 390 bugs respectively, but our calculation did not show
between −0.11 and −0.269 with p-values = 2.2 e−16 for all any test cases for these projects. We found that the tests of
years except 2015, when the p-value = 9.473 e−14 , denoting these projects were written in JavaScript, LiveCode, and Lua
a negative correlation. So, we can say that in our study, programming languages. As we only considered test cases of
the observation of KOCHHAR S TUDY [4] that “the number of Java test files, our script ignored these test cases.
test cases per developer decreases for the projects with more On the other hand, another project had 14, 500 test cases
developers” [4], remains stable from 2012 to 2021. and one bug because most of the commit messages were in the
The MWW test value is not valid for the Java projects in the format apply <alphanumeric digits>. The alphanumeric digits
year 2017. Possible reasons for this difference in results could could mean a bug fix id or issue id internal to the project. As
be the differences in our experimental settings, methodology, this pattern did not match our regex pattern for counting bugs
and data sample. KOCHHAR S TUDY [4] considered projects from commit messages, our script could not identify these
of various programming languages, whereas we only consider commits as references to bugs. This finding shows that the
Java projects for our study. Also, our definition of test cases outlier data that defies the positive correlation between bugs
differs from KOCHHAR S TUDY [4]. These reasons could have and test cases could be caused by how we calculated bugs and
contributed to the differences in our observed results. test cases, which is different from KOCHHAR S TUDY [4].
For the Java projects on GitHub from 2012 to 2021, For the Java projects on GitHub from 2012 to 2021,
the conclusions from KOCHHAR S TUDY [4] that “there the conclusion from KOCHHAR S TUDY [4] that “Projects
is a weak positive correlation between the number of having higher numbers of test cases observe an increase
developers and test cases” [4] and “the number of test in the number of bugs, although the correlation is weak
cases per developer decreases for the projects with more between them.” [4] remains stable through 2012-2021.
developers” [4], remain stable through 2012-2021.
IV. R ELATED W ORK
C. Bug Count Correlation with Test Cases (RQ3) Previous studies have focused on the types of available
In their third research question, KOCHHAR S TUDY [4] in- software testing practices and tried to find a relationship
vestigated, “Does the presence of test cases correlate with the between testing and various entities of a software project. In
number of bugs?” [4]. To reproduce this research question, we 2013, KOCHHAR S TUDY [4] examined the correlation between
calculated the number of bugs in a project using the procedure the number of test cases and different project entities like
mentioned in Section II. Figure 3 shows the bug count in size, number of authors, number of bugs, bug reporters, and
projects that contain test cases and those that do not. programming languages in GitHub projects. Additionally, due
KOCHHAR S TUDY [4] initially assumed “with increase in to the significance of software testing in the software develop-
the number of test cases, bug count increases” [4]. To confirm ment lifecycle, Jangra et al. [5] explored different existing soft-
370
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on January 22,2025 at 17:42:55 UTC from IEEE Xplore. Restrictions apply.
ware testing strategies and illustrated the connection between version control systems by leveraging the vast amount of
these strategies using a diagram. Bangash et al. [21] explored software repository information and the advanced mappings
how stable the conclusions of defect prediction models are available in the World of Code.
and found out that the defect prediction models performance
VII. ACKNOWLEDGEMENTS
remains inconclusive over time. Shepperd et al. [7] examined
how likely it is for the replication studies to confirm with the This research was supported by the Natural Sciences and
original studies and found that replication studies with at least Engineering Research Council of Canada (NSERC) through
one common author from the original study are more likely their Discovery Grant. We are grateful for NSERC’s commit-
to confirm the original findings. Our study is motivated by ment to supporting scientific innovation in Canada.
KOCHHAR S TUDY [4]. We replicated a part of their work to R EFERENCES
re-investigate if their conclusions are still valid on the Java
[1] V. R. Basili and R. W. Selby, “Comparing the effectiveness of software
projects that were developed between 2012 and 2021. testing strategies,” IEEE transactions on software engineering, no. 12,
pp. 1278–1296, 1987.
V. T HREATS TO VALIDITY [2] R. H. Rosero, O. S. Gómez, and G. Rodríguez, “15 years of software re-
gression testing techniques—a survey,” International Journal of Software
KOCHHAR S TUDY [4] mentioned some limitations of their Engineering and Knowledge Engineering, vol. 26, no. 05, pp. 675–689,
study, which are also valid in our case. For example, con- 2016.
sidering only a sample of the GitHub projects, which may [3] B. Potter and G. McGraw, “Software security testing,” IEEE Security &
Privacy, vol. 2, no. 5, pp. 81–85, 2004.
not reflect the general behavior of all real-world projects, [4] P. S. Kochhar, T. F. Bissyandé, D. Lo, and L. Jiang, “An empirical
and using the "test" keyword to identify the test files, which study of adoption of software testing in open source projects,” in 2013
may not identify all test files. Moreover, we used the @Test 13th International Conference on Quality Software, pp. 103–112, IEEE,
2013.
annotation to identify the test cases. However, there might be [5] A. Jangra, G. Singh, J. Singh, and R. Verma, “Exploring testing strate-
some testing frameworks that do not use the same annotation. gies,” International Journal of Information Technology and Knowledge
In those cases, we might not have identified some test cases. Management, vol. 4, pp. 297–299, 2011.
[6] M. A. Jamil, M. Arif, N. S. A. Abubakar, and A. Ahmad, “Software
We also calculated the number of bugs from the commit testing techniques: A literature review,” in 2016 6th international con-
messages using a regex pattern to map the commits to bugs. ference on information and communication technology for the Muslim
Nevertheless, not all commits use the regex pattern we used to world (ICT4M), pp. 177–182, IEEE, 2016.
[7] M. Shepperd, N. Ajienka, and S. Counsell, “The role and value of
reference bugs. For this reason, we might have missed some replication in empirical software engineering results,” Information and
commits that mention bugs or issues differently. Finally, as Software Technology, vol. 99, pp. 120–132, 2018.
collecting information about thousands of projects using git [8] M. Cruz, B. Bernárdez, A. Durán, J. A. Galindo, and A. Ruiz-Cortés,
“Replication of studies in empirical software engineering: A systematic
was a long-running query, some technical issues might have mapping study, from 2013 to 2018,” IEEE Access, vol. 8, pp. 26773–
evaded our inspection. 26791, 2019.
[9] Y. Ma, T. Dey, C. Bogart, S. Amreen, M. Valiev, A. Tutko, D. Kennard,
VI. C ONCLUSION R. Zaretzki, and A. Mockus, “World of code: enabling a research
workflow for mining and analyzing the universe of open source VCS
Software testing is paramount to the success and maintain- data,” Empirical Software Engineering, vol. 26, pp. 1–42, 2021.
ability of software systems. KOCHHAR S TUDY [4] explored the [10] A. Mockus, A. Nolte, and J. Herbsleb, “MSR Mining Challenge: World
of Code,” 2023.
correlation of software testing with various software develop- [11] S. Chacon, “Git SCM.” https://git-scm.com. Accessed: 2023-01-26.
ment entities like project size, number of authors, and number [12] D. Qiu, B. Li, E. T. Barr, and Z. Su, “Understanding the syntactic rule
of bugs in an effort to understand the importance of software usage in java,” Journal of Systems and Software, vol. 123, pp. 160–172,
2017.
testing from a broader perspective. However, results or con- [13] A. Mockus, “README.md.” https://bitbucket.org/swsc/lookup/src/
clusions drawn in a study can become obsolete as a system or master/README.md. Accessed: 2023-03-12.
development culture changes over time, requiring that previous [14] JUnit, “Annotation Type Test.” https://junit.org/junit4/javadoc/4.12/org/
junit/Test.html. Accessed: 2023-01-26.
studies be re-evaluated to ensure that their conclusions remain [15] C. Beust, “TestNG.” https://testng.org/doc/documentation-main.html.
accurate. For this reason, in this paper, we have reproduced Accessed: 2023-01-26.
the results of KOCHHAR S TUDY [4]’s three research questions [16] N. C. Borle, M. Feghhi, E. Stroulia, R. Greiner, and A. Hindle, “Analyz-
ing the effects of test driven development in Github,” in Proceedings of
to validate if their conclusions still hold on the latest Java the 40th International Conference on Software Engineering, pp. 1062–
projects that were developed from 2012 to 2021. Our study 1062, 2018.
confirms the validity of KOCHHAR S TUDY [4]’s conclusions [17] D. A. Wheeler, “SLOCCount.” http://dwheeler.com/sloccount/. Ac-
cessed: 2023-03-11.
that there are more lines of code in projects with test cases, [18] H. B. Mann and D. R. Whitney, “On a test of whether one of two
and a weak positive correlation exists between i) the number random variables is stochastically larger than the other,” The annals of
of test cases and the author count and ii) the number of test mathematical statistics, pp. 50–60, 1947.
[19] J. H. Zar, “Spearman rank correlation,” Encyclopedia of biostatistics,
cases and the bug count in Java projects on GitHub till 2021. vol. 7, 2005.
Additionally, our study allows developers to identify the latest [20] “Data for Evolution of the Practice of Software Testing in Java Projects.”
trends in software testing and to re-adjust their knowledge https://doi.org/10.6084/m9.figshare.22258444.v1. Accessed: 2023-03-
11.
accordingly to address issues that require more attention. In [21] A. A. Bangash, H. Sahar, A. Hindle, and K. Ali, “On the time-based
the future, researchers may improve their understanding of conclusion stability of cross-project defect prediction models,” Empirical
software testing across different programming languages and Software Engineering, vol. 25, no. 6, pp. 5047–5083, 2020.
371
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on January 22,2025 at 17:42:55 UTC from IEEE Xplore. Restrictions apply.