Taming The Beast 2024
Taming The Beast 2024
Yeoneo Kim
CODEMIND Corporation
Seoul, South Korea
[email protected]
arXiv:2401.01073v2 [cs.PL] 4 Jan 2024
Abstract—In this paper, we present Coyote C++, a fully This paper presents Coyote C++, a concolic-execution-based
automated white-box unit testing tool for C and C++. Whereas fully automated unit testing tool for C/C++ that can detect
existing tools have struggled to realize unit test generation for various runtime errors by automatic injection of assertions
C++, Coyote C++ is able to produce high coverage results
from unit test generation at a testing speed of over 10,000 and achieves high coverage results even for complex C++
statements per hour. This impressive feat is made possible by projects. Coyote C++ is an industrial strength tool that is
the combination of a powerful concolic execution engine with already actively being used in validation of real-world projects
sophisticated automated test harness generation. Additionally, the such as automotive software. The tool also includes additional
GUI of Coyote C++ displays detailed code coverage visualizations features that enable requirement-based test case generation and
and provides various configuration features for users seeking to
manually optimize their coverage results. Combining potent one- allow users to manually improve coverage, but the focus of
click automated testing with rich support for manual tweaking, this paper is on its one-click automated test case generation
Coyote C++ is the first automated testing tool that is practical capabilities.
enough to make automated testing of C++ code truly viable in The remainder of this paper is structured as follows. First,
industrial applications. we briefly refer to some previous research on unit test gen-
Index Terms—Software Testing, Test case generation, Auto-
mated unit test generation, Symbolic execution, C++ eration tools for C++. We then describe the overall imple-
mentation of Coyote C++ and challenging problems that were
encountered when dealing with C++. After demonstrating how
I. I NTRODUCTION
Coyote C++ can be used in practice we also present test results
Test case generation has been researched for a long time to comparing to the latest research for C++ projects. Finally we
automate white box dynamic testing, which involves metic- conclude by discussing future works.
ulously examining and validating source code of a given
software. If this technology becomes a reality, it will not only II. R ELATED W ORKS
significantly reduce the cost and effort of software testing While a number of automated white-box unit testing tools
by automating the most rigorous testing, but also greatly exist for Java [3], [6], C [2], [7], [8], and some other
contribute to ensuring software reliability. languages [9]–[11], there has not been much success yet
The success of such research can be evaluated by how when targeting C++, as the language’s many intricacies like
much code coverage is achieved through automatically gen- templates or namespaces greatly complicate automated testing.
erated test cases. Various techniques for test case genera- To make matters worse, out of the already few tools that do
tion have been studied, including symbolic-execution-based claim to be able to handle C++ code, some are not available for
approaches [1], [2], search-based approaches [3], and LLM- public use and others ultimately support only a small subset
based approaches [4], [5], which all compete for higher cov- of C++ and can thus only deal with very simple programs.
erage and performance and each have distinct advantages and Recently, the two most promising tools for automated unit
disadvantages. There has been a lot of promising research into testing of C++ programs have been CITRUS [12] and UT-
unit test generation for Java and C and even a tool competition Bot [13]. CITRUS combines concolic execution with fuzzing
for unit test generation of Java. For C++ however, due to its and employs mutation techniques for test case generation.
well-known complexity, there has been comparatively little Typically, CITRUS is executed with a fixed time budget per
research into unit test generation and the existing research has project for generating parametric test cases and then a fixed
not been able to report big successes yet. time budget for applying libfuzzer to each test case, with the
whole testing process reported to take about 10-20 hours to Point Point::bound(Point min, Point max) {...}
achieve good coverage for projects between 1000 and 20,000
void __COYOTE_DRIVER_Point_bound() {
line of code. This large time consumption makes it hard to ...
use CITRUS in many practical applications and especially in ::Point x1(__CYC__);
the context of continuous testing. UTBot also uses concolic ::Point x2(__CYC__);
::Point cls(__CYC__);
execution for automated test case generation and has good __COYOTE_SYM_Point(2, (void*)(::Point*)&(cls));
support for C programs since it was developed using the __COYOTE_SYM_Point(4, (void*)(::Point*)&(x1));
well-established KLEE [1] symbolic execution engine as its __COYOTE_SYM_Point(6, (void*)(::Point*)&(x2));
foundation. However, it only supports a very limited subset cls.bound(x1, x2);
of the C++ syntax, making it unsuitable for testing most real- ...
world projects. }
lines and branches were covered by the generated test cases. files only containing test code were excluded from automated
These test cases can in turn be inspected in the bottom right testing. It should also be noted at this point that CITRUS
part of the page, which shows the concrete values that were only considers public functions as test targets, whereas Coyote
assigned to various symbols (i.e. program inputs) for each test C++ conducts automated testing for all functions regardless
case. of access specifiers. Furthermore, while CITRUS originally
To deal with cases were Coyote C++ is unable reach reported multiple coverage results for different configurations,
satisfactory coverage results through automated testing, our we only included the best results for each project respectively.
tool also supports the manual addition of test cases as well as For CITRUS, the testing process took a total of about 155
manual modifications of the code for driver and stub functions. hours to execute and the reported test coverage totals up to
We do not go into more detail about these features here as this 81.41% statement and 63.06% branch coverage. Coyote C++
paper focuses on the automated testing capabilities of Coyote on the other hand was able to yield 92.34% statement and
C++, but more information about manual improvement of test 88.85% branch coverage in only around 56 minutes. Therefore,
results can be found in the user manual provided alongside Coyote C++ produces 10.93% and 25.79% higher statement
the Coyote C++ demo. and branch coverage with a test time that is two orders
of magnitude faster than CITRUS. Despite the different test
V. E VALUATION
environments used for the two tools, this large of a difference
In order to demonstrate the improvement of testing per- clearly indicates that Coyote C++ produces significantly better
formance achieved by Coyote C++ over the current state of coverage results in drastically less time than CITRUS. Addi-
the art of automated unit testing for C++, we perform an tionally, the coverage results produced by Coyote C++ as well
evaluation where we compare the testing performance in terms as the achieved testing speed of over 20,000 statements per
of achieved coverage2 and time consumption between Coyote hour once again fulfill the criteria for practicality of automated
C++ and the CITRUS [12] tool. We would have liked to unit testing that we recently proposed: yielding at least 90%
include more tools into the comparison, but unfortunately other statement coverage and 80% branch coverage with a testing
existing tools were either not publicly available, or, in the speed of more than 10,000 statements per hour.
case of UTBot [13], were found to be unable to test any
non-trivial code. Nonetheless, CITRUS is a rather recent tool
VI. C ONCLUSION
reporting quite decent coverage results, so we still consider
this comparison to be a meaningful evaluation. As CITRUS is We have presented Coyote C++, a fully automated unit
no longer publicly available, we settled for using its previously testing tool that features one-click automation of test case
published performance data as the baseline for our comparison generation achieving coverage high enough to be practical
by executing Coyote C++ on the same test set. In the course of for industrial use. Coyote C++ handles the well-known com-
this evaluation, Coyote C++ was executed on a Ubuntu 20.04 plexity of C++ syntax by generating sophisticated test drivers
system equipped with a 24 core Intel i7-13700 and 64GB of and function stubs, and then employs LLVM-based concolic
RAM. execution to produce input data for concrete test cases. Coyote
Table I shows information about the test projects and the C++ also allows users to add supplementary test inputs and
achieved testing performance of Coyote C++ and CITRUS, write specialized custom drivers and stubs for cases where
both in terms of yielded statement and branch coverage as automated testing is unable to reach 100% coverage. In order
well as time consumption. The statement counts are generally to minimize the need for such user involvement, we plan to
a lot lower than the physical lines of code because we only devise advanced methods to generate specially tailored drivers
consider executable statements and header files as well as and stubs for functions with intricately structured inputs, such
2 As mentioned previously, the achieved code coverage is the main indicator as using static analysis to infer the range of accessed indexes
for the quality of automated test case generation. for arrays with unspecified size or to find the minimum
number of loop iterations necessary to cover all statements
and branches in a function.
R EFERENCES
[1] C. Cadar, D. Dunbar, D. R. Engler et al., “KLEE: Unassisted and auto-
matic generation of high-coverage tests for complex systems programs.”
in OSDI, vol. 8, 2008, pp. 209–224.
[2] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing
engine for C,” ACM SIGSOFT Software Engineering Notes, vol. 30,
no. 5, pp. 263–272, 2005.
[3] G. Fraser and A. Arcuri, “A large-scale evaluation of automated unit test
generation using EvoSuite,” ACM Trans. Softw. Eng. Methodol., vol. 24,
no. 2, dec 2014. [Online]. Available: https://doi.org/10.1145/2685612
[4] M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and
N. Sundaresan, “Unit test case generation with transform-
ers,” ArXiv, vol. abs/2009.05617, 2020. [Online]. Available:
https://api.semanticscholar.org/CorpusID:221655653
[5] M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “An empirical evaluation of
using large language models for automated unit test generation,” arXiv
preprint arXiv:2302.06527, 2023.
[6] K. Sen and G. Agha, “Cute and jcute: Concolic unit testing and explicit
path model-checking tools,” in Computer Aided Verification, T. Ball and
R. B. Jones, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006,
pp. 419–423.
[7] Y. Kim, D. Lee, J. Baek, and M. Kim, “Concolic testing for high test
coverage and reduced human effort in automotive industry,” in 2019
IEEE/ACM 41st International Conference on Software Engineering:
Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 151–
160.
[8] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated
random testing,” in Proceedings of the 2005 ACM SIGPLAN conference
on Programming language design and implementation, 2005, pp. 213–
223.
[9] N. Tillmann and J. de Halleux, “Pex–white box test generation for .net,”
in Tests and Proofs, B. Beckert and R. Hähnle, Eds. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2008, pp. 134–153.
[10] A. Giantsios, N. Papaspyrou, and K. Sagonas, “Concolic testing
for functional languages,” Science of Computer Programming, vol.
147, pp. 109–134, 2017, selected and Extended papers from the
International Symposium on Principles and Practice of Declarative
Programming 2015. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0167642317300837
[11] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs, “Jalangi: A selective
record-replay and dynamic analysis framework for javascript,” in
Proceedings of the 2013 9th Joint Meeting on Foundations of Software
Engineering, ser. ESEC/FSE 2013. New York, NY, USA: Association
for Computing Machinery, 2013, p. 488–498. [Online]. Available:
https://doi.org/10.1145/2491411.2491447
[12] R. S. Herlim, Y. Kim, and M. Kim, “CITRUS: Automated unit testing
tool for real-world C++ programs,” in 2022 IEEE Conference on
Software Testing, Verification and Validation (ICST), 2022, pp. 400–410.
[13] D. Ivanov, A. Babushkin, S. Grigoryev, P. Iatchenii, V. Kalugin,
E. Kichin, E. Kulikov, A. Misonizhnik, D. Mordvinov, S. Morozov et al.,
“UnitTestBot: Automated unit test generation for C code in integrated
development environments,” in 2023 IEEE/ACM 45th International
Conference on Software Engineering: Companion Proceedings (ICSE-
Companion). IEEE, 2023, pp. 380–384.
[14] S. Rho, P. Martens, S. Shin, Y. Kim, H. Heo, and S. Oh, “Coyote c++:
An industrial-strength fully automated unit testing tool,” 2023.
[15] R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, and I. Finocchi, “A
survey of symbolic execution techniques,” ACM Comput. Surv., vol. 51,
no. 3, may 2018. [Online]. Available: https://doi.org/10.1145/3182657