Taming The Beast 2024

This paper introduces Coyote C++, a fully automated unit testing tool for C and C++ that utilizes concolic execution to achieve high coverage and rapid test generation. It addresses the complexities of C++ programming by combining a powerful execution engine with automated test harness generation, allowing for efficient testing of real-world applications. Coyote C++ has demonstrated significant improvements in testing speed and coverage compared to existing tools, making it a viable option for industrial use.

Uploaded by

xabambatjazwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

Taming The Beast 2024

Uploaded by

xabambatjazwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Taming the Beast: Fully Automated Unit Testing

with Coyote C++

Sanghoon Rho Philipp Martens Seungcheol Shin
CODEMIND Corporation CODEMIND Corporation CODEMIND Corporation
Seoul, South Korea Seoul, South Korea Seoul, South Korea
[email protected] [email protected] [email protected]

Yeoneo Kim
CODEMIND Corporation
Seoul, South Korea
[email protected]
arXiv:2401.01073v2 [cs.PL] 4 Jan 2024

Abstract—In this paper, we present Coyote C++, a fully This paper presents Coyote C++, a concolic-execution-based
automated white-box unit testing tool for C and C++. Whereas fully automated unit testing tool for C/C++ that can detect
existing tools have struggled to realize unit test generation for various runtime errors by automatic injection of assertions
C++, Coyote C++ is able to produce high coverage results
from unit test generation at a testing speed of over 10,000 and achieves high coverage results even for complex C++
statements per hour. This impressive feat is made possible by projects. Coyote C++ is an industrial strength tool that is
the combination of a powerful concolic execution engine with already actively being used in validation of real-world projects
sophisticated automated test harness generation. Additionally, the such as automotive software. The tool also includes additional
GUI of Coyote C++ displays detailed code coverage visualizations features that enable requirement-based test case generation and
and provides various configuration features for users seeking to
manually optimize their coverage results. Combining potent one- allow users to manually improve coverage, but the focus of
click automated testing with rich support for manual tweaking, this paper is on its one-click automated test case generation
Coyote C++ is the first automated testing tool that is practical capabilities.
enough to make automated testing of C++ code truly viable in The remainder of this paper is structured as follows. First,
industrial applications. we briefly refer to some previous research on unit test gen-
Index Terms—Software Testing, Test case generation, Auto-
mated unit test generation, Symbolic execution, C++ eration tools for C++. We then describe the overall imple-
mentation of Coyote C++ and challenging problems that were
encountered when dealing with C++. After demonstrating how
I. I NTRODUCTION
Coyote C++ can be used in practice we also present test results
Test case generation has been researched for a long time to comparing to the latest research for C++ projects. Finally we
automate white box dynamic testing, which involves metic- conclude by discussing future works.
ulously examining and validating source code of a given
software. If this technology becomes a reality, it will not only II. R ELATED W ORKS
significantly reduce the cost and effort of software testing While a number of automated white-box unit testing tools
by automating the most rigorous testing, but also greatly exist for Java [3], [6], C [2], [7], [8], and some other
contribute to ensuring software reliability. languages [9]–[11], there has not been much success yet
The success of such research can be evaluated by how when targeting C++, as the language’s many intricacies like
much code coverage is achieved through automatically gen- templates or namespaces greatly complicate automated testing.
erated test cases. Various techniques for test case genera- To make matters worse, out of the already few tools that do
tion have been studied, including symbolic-execution-based claim to be able to handle C++ code, some are not available for
approaches [1], [2], search-based approaches [3], and LLM- public use and others ultimately support only a small subset
based approaches [4], [5], which all compete for higher cov- of C++ and can thus only deal with very simple programs.
erage and performance and each have distinct advantages and Recently, the two most promising tools for automated unit
disadvantages. There has been a lot of promising research into testing of C++ programs have been CITRUS [12] and UT-
unit test generation for Java and C and even a tool competition Bot [13]. CITRUS combines concolic execution with fuzzing
for unit test generation of Java. For C++ however, due to its and employs mutation techniques for test case generation.
well-known complexity, there has been comparatively little Typically, CITRUS is executed with a fixed time budget per
research into unit test generation and the existing research has project for generating parametric test cases and then a fixed
not been able to report big successes yet. time budget for applying libfuzzer to each test case, with the
whole testing process reported to take about 10-20 hours to Point Point::bound(Point min, Point max) {...}
achieve good coverage for projects between 1000 and 20,000
void __COYOTE_DRIVER_Point_bound() {
line of code. This large time consumption makes it hard to ...
use CITRUS in many practical applications and especially in ::Point x1(__CYC__);
the context of continuous testing. UTBot also uses concolic ::Point x2(__CYC__);
::Point cls(__CYC__);
execution for automated test case generation and has good __COYOTE_SYM_Point(2, (void*)(::Point*)&(cls));
support for C programs since it was developed using the __COYOTE_SYM_Point(4, (void*)(::Point*)&(x1));
well-established KLEE [1] symbolic execution engine as its __COYOTE_SYM_Point(6, (void*)(::Point*)&(x2));
foundation. However, it only supports a very limited subset cls.bound(x1, x2);
of the C++ syntax, making it unsuitable for testing most real- ...
world projects. }

III. T HE I NTERNALS OF C OYOTE C++ void __COYOTE_SYM_Point(int id, void *_x) {

::Point *x = (::Point*)_x;
Coyote C++ realizes automatic testing through automatic __COYOTE_ID_SYM_I32(id, (int*)&(x->x));
__COYOTE_ID_SYM_I32(id+1, (int*)&(x->y));
harness generation and concolic execution to generate test ...
input values. For a more detailed overview about the testing }
process as a whole, please refer to our previous paper [14].
Listing 1: A sample driver generated by Coyote C++
A. Implementation of Concolic Execution
Coyote C++ is based on our own concolic execution engine
for automated test generation of C and C++ programs. To When compared to C, the addition of classes in C++ greatly
achieve high coverage results in a relatively short time for complicates the generation of driver and stub functions. Not
programs with a wide range of complex language features, only do fields have to be initialized and member functions
a number of influential design decisions were made during tested, but many rules regarding the implicit generation or
its implementation [14], the most significant of which will be deletion of default/copy/move constructors also make it dif-
summarized in the following. ficult to simply create and pass around class instances in the
First, we employ offline symbolic execution, which means test harness. In order to correctly handle classes, it is important
that symbolic execution is performed after each individual to analyze the structure and connection of classes in the source
concrete execution finished, in contrast to online symbolic code. In Coyote C++, this extraction of information about the
execution which runs at the same time as concrete execution source code is performed by a custom-built clang plugin in a
in an interleaved fashion. Offline mode tends to have reduced preliminary step before the actual harness generation. This data
memory usage compared to online mode and is thus especially is then further processed to obtain more refined information
advantageous when symbolic execution is run in a highly about the program such as call- or use-graphs.
parallel implementation of test case generation. Second, the Concolic execution requires the injection of tracing code
insertion of tracing code and the symbolic execution itself are into the original program to gather information about its
not performed on the source code level but rather on LLVM concrete execution. However, on the source code level, in-
IR level, allowing for more detailed tracing and thus more serting tracing code into C++ programs can be challenging
accurate symbolic execution. Third, in order to reach high due to unintuitive control flow caused by language features
coverage results in a short amount of time, we use a variant like operator overloading and copy/move constructors, as well
of CCS (Code Coverage Search) [15] as our default search as complicated typing in the presence of templates. Coyote
strategy. A second depth first search (DFS) based strategy is C++ circumvents these problems by inserting tracing code
used in cases where the coverage achieved by CCS is not yet not into the source code itself but rather into the LLVM IR
sufficient. Last, our approach to symbolic memory modelling code generated from it. This greatly reduces the difficulty of
uses concrete addresses when writing and symbolic addresses injecting tracing code and makes it trivial to guarantee that
when reading, motivated by the need to find a good balance the inserted tracing code does not cause any undesired side
between solving efficiency and the accuracy of path constraints effects.
produced by symbolic execution.
C. Test Harness Generation in Practice
B. Taming the Complexity of C++ Test harness generation comprises the generation of code for
To handle the complexity of C++ in concolic-execution- driver functions which initialize input values before calling the
based testing, one needs not only a powerful symbolic exe- functions under test, as well as stub functions that simulate
cution engine but also sophisticated techniques for automated unit-external functions in a way suitable for testing. In the
test harness generation. In the following, we discuss common following, we will illustrate how Coyote C++ automatically
challenges in automated test harness generation for C++, and generates test harness code by discussing the test driver
give some insight into how these are dealt with by Coyote generated for a sample function (see listing 1). The function
C++. to be tested here, Point::bound, is a member function of
Fig. 1: Progress displayed in the main page of Coyote C++

a simple class called Point with two integer fields x and y.

As the name suggests, the function receives a maximum and a
minimum Point as arguments and adjusts the values of the
this object’s x and y fields so that they lie between those
Fig. 2: Results page with test cases and coverage info
of the argument points.
In order to automatically test this member function, the
test driver function has to initialize the arguments min testing process. Afterwards, as can be seen in figure 1, the
and max as well as the this object that the member main page displays the progress of the build and test steps as
function is invoked on before actually calling the mem- well as an overview of test results once the testing process is
ber function. To this end, next to the main driver func- finished. As can be seen in the screenshot, the testing process
tion __COYOTE_DRIVER_Point_bound, Coyote C++ also consists of the following five steps:
generates a helper function __COYOTE_SYM_Point that
1) Select Files: The user can optionally select which files of
handles the symbolic initialization of Point instances. The
the project to include or exclude for testing. By default,
individual fields x and y are then initialized through calls to
all source code files (excluding header files) will be
__COYOTE_ID_SYM_I32, which is a function provided by
tested.
the Coyote C++ API that initializes its second argument to a
2) Prebuild: The clang prebuild step is run for each selected
32bit integer value associated to the symbol ID given as its
file, performing standard preprocessing and creating a
first argument. While the Point class in this sample code
single file containing the original source code as well as
only has two fields, classes in real-world C++ programs often
all code included from other files.
have dozens of fields that may each in turn be of class types
3) Select Functions: The user can optionally select individ-
containing more fields recursively. If the initialization code
ual functions to be included or excluded for testing. By
for such classes is not carefully split into separate functions,
default, all functions will be tested.
driver code alone can easily grow to take up tens or hundreds
4) Build: For each test unit (i.e. each file), Coyote C++
of megabytes.
automatically creates a test harness including drivers and
Note also that the Point instances are initially created
stub functions, and, where needed to improve testability,
through a stub constructor that was added by Coyote C++
some transformations are applied on both source and
as a replacement for the class’s default constructor, which
LLVM IR level without changing program semantics.
got implicitly deleted because Point has a user defined
The LLVM IR code is then compiled to an executable
constructor. As . We avoid the invocation of such user defined
test binary.
constructors in test harness code because the test harness code
5) Test: Test cases are iteratively generated and executed
should focus only on the current function under test1 .
by the concolic execution engine.
IV. U SING C OYOTE C++ Once again, it should be noted that the entire build and
In this section, we give an overview of how Coyote C++ test process is usually executed fully automatically and user
can be used in practice for fully automated unit testing of C++ intervention is only necessary when specific project files or
projects. For more details and concrete step-by-step guidance individual functions should be excluded from testing.
for the usage of Coyote C++ please refer to the user manual B. Inspecting Test Results
provided alongside the demo version of Coyote C++.
Once the testing process for a project has finished, the
A. Executing Tests results page of Coyote C++ shows detailed information about
the generated test cases and achieved coverage results. As can
After finishing the initial setup for a project, involving e.g.
be seen in figure 2, the left side of the page displays the
the specification of include paths, simply clicking the ”Run”
statement and branch coverage results for each tested function
button in the main page of Coyote C++ starts the automated
individually as well as combined for each tested file. Selecting
1 As user defined constructors are also test targets, they are of course still a function or file in this list shows its code on the right, where
called from their respective driver functions. highlighting in and next to the code indicates precisely which
TABLE I: Coverage and test time results for CITRUS and Coyote C++

Project CITRUS Coyote C++

Time Test Time
Name LOC #Stmt #Branch Stmt Branch Stmt Branch
(hh:mm:ss) cases (hh:mm:ss)
JsonBox 1,490 778 342 94.20% 79.10% 16:12:00 99.87% 98.83% 1,030 00:01:42
hjson-cpp 4,497 1,622 762 80.20% 70.20% 20:30:00 92.29% 90.94% 7,075 00:04:44
JSON Voorhees 12,486 2,281 688 76.70% 50.30% 20.48:00 92.42% 87.79% 2,048 00:05:38
JsonCpp 9,691 2,787 1,148 95.40% 60.70% 21.42:00 92.25% 88.76% 4,250 00:19:21
jvar 5,373 1,316 608 84.50% 69.70% 19:18:00 96.05% 94.08% 1,068 00:05:35
RE2 27,687 7,053 3,485 80.20% 62.40% 22:42:00 89.08% 84.16% 7,574 00:10:35
TinyXML-2 5,627 1,383 483 59.50% 49.10% 16.48:00 90.96% 89.03% 1,607 00:03:35
yaml-cpp 78,290 3,038 1,299 80.60% 63.00% 17:12:00 97.07% 95.69% 3,340 00:05:14
Total 145,141 20,258 8,815 81.41% 63.06% 155:12:00 92.34% 88.85% 27,992 00:56:24

lines and branches were covered by the generated test cases. files only containing test code were excluded from automated
These test cases can in turn be inspected in the bottom right testing. It should also be noted at this point that CITRUS
part of the page, which shows the concrete values that were only considers public functions as test targets, whereas Coyote
assigned to various symbols (i.e. program inputs) for each test C++ conducts automated testing for all functions regardless
case. of access specifiers. Furthermore, while CITRUS originally
To deal with cases were Coyote C++ is unable reach reported multiple coverage results for different configurations,
satisfactory coverage results through automated testing, our we only included the best results for each project respectively.
tool also supports the manual addition of test cases as well as For CITRUS, the testing process took a total of about 155
manual modifications of the code for driver and stub functions. hours to execute and the reported test coverage totals up to
We do not go into more detail about these features here as this 81.41% statement and 63.06% branch coverage. Coyote C++
paper focuses on the automated testing capabilities of Coyote on the other hand was able to yield 92.34% statement and
C++, but more information about manual improvement of test 88.85% branch coverage in only around 56 minutes. Therefore,
results can be found in the user manual provided alongside Coyote C++ produces 10.93% and 25.79% higher statement
the Coyote C++ demo. and branch coverage with a test time that is two orders
of magnitude faster than CITRUS. Despite the different test
V. E VALUATION
environments used for the two tools, this large of a difference
In order to demonstrate the improvement of testing per- clearly indicates that Coyote C++ produces significantly better
formance achieved by Coyote C++ over the current state of coverage results in drastically less time than CITRUS. Addi-
the art of automated unit testing for C++, we perform an tionally, the coverage results produced by Coyote C++ as well
evaluation where we compare the testing performance in terms as the achieved testing speed of over 20,000 statements per
of achieved coverage2 and time consumption between Coyote hour once again fulfill the criteria for practicality of automated
C++ and the CITRUS [12] tool. We would have liked to unit testing that we recently proposed: yielding at least 90%
include more tools into the comparison, but unfortunately other statement coverage and 80% branch coverage with a testing
existing tools were either not publicly available, or, in the speed of more than 10,000 statements per hour.
case of UTBot [13], were found to be unable to test any
non-trivial code. Nonetheless, CITRUS is a rather recent tool
VI. C ONCLUSION
reporting quite decent coverage results, so we still consider
this comparison to be a meaningful evaluation. As CITRUS is We have presented Coyote C++, a fully automated unit
no longer publicly available, we settled for using its previously testing tool that features one-click automation of test case
published performance data as the baseline for our comparison generation achieving coverage high enough to be practical
by executing Coyote C++ on the same test set. In the course of for industrial use. Coyote C++ handles the well-known com-
this evaluation, Coyote C++ was executed on a Ubuntu 20.04 plexity of C++ syntax by generating sophisticated test drivers
system equipped with a 24 core Intel i7-13700 and 64GB of and function stubs, and then employs LLVM-based concolic
RAM. execution to produce input data for concrete test cases. Coyote
Table I shows information about the test projects and the C++ also allows users to add supplementary test inputs and
achieved testing performance of Coyote C++ and CITRUS, write specialized custom drivers and stubs for cases where
both in terms of yielded statement and branch coverage as automated testing is unable to reach 100% coverage. In order
well as time consumption. The statement counts are generally to minimize the need for such user involvement, we plan to
a lot lower than the physical lines of code because we only devise advanced methods to generate specially tailored drivers
consider executable statements and header files as well as and stubs for functions with intricately structured inputs, such
2 As mentioned previously, the achieved code coverage is the main indicator as using static analysis to infer the range of accessed indexes
for the quality of automated test case generation. for arrays with unspecified size or to find the minimum
number of loop iterations necessary to cover all statements
and branches in a function.
R EFERENCES
[1] C. Cadar, D. Dunbar, D. R. Engler et al., “KLEE: Unassisted and auto-
matic generation of high-coverage tests for complex systems programs.”
in OSDI, vol. 8, 2008, pp. 209–224.
[2] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing
engine for C,” ACM SIGSOFT Software Engineering Notes, vol. 30,
no. 5, pp. 263–272, 2005.
[3] G. Fraser and A. Arcuri, “A large-scale evaluation of automated unit test
generation using EvoSuite,” ACM Trans. Softw. Eng. Methodol., vol. 24,
no. 2, dec 2014. [Online]. Available: https://doi.org/10.1145/2685612
[4] M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and
N. Sundaresan, “Unit test case generation with transform-
ers,” ArXiv, vol. abs/2009.05617, 2020. [Online]. Available:
https://api.semanticscholar.org/CorpusID:221655653
[5] M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “An empirical evaluation of
using large language models for automated unit test generation,” arXiv
preprint arXiv:2302.06527, 2023.
[6] K. Sen and G. Agha, “Cute and jcute: Concolic unit testing and explicit
path model-checking tools,” in Computer Aided Verification, T. Ball and
R. B. Jones, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006,
pp. 419–423.
[7] Y. Kim, D. Lee, J. Baek, and M. Kim, “Concolic testing for high test
coverage and reduced human effort in automotive industry,” in 2019
IEEE/ACM 41st International Conference on Software Engineering:
Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 151–
160.
[8] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated
random testing,” in Proceedings of the 2005 ACM SIGPLAN conference
on Programming language design and implementation, 2005, pp. 213–
223.
[9] N. Tillmann and J. de Halleux, “Pex–white box test generation for .net,”
in Tests and Proofs, B. Beckert and R. Hähnle, Eds. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2008, pp. 134–153.
[10] A. Giantsios, N. Papaspyrou, and K. Sagonas, “Concolic testing
for functional languages,” Science of Computer Programming, vol.
147, pp. 109–134, 2017, selected and Extended papers from the
International Symposium on Principles and Practice of Declarative
Programming 2015. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0167642317300837
[11] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs, “Jalangi: A selective
record-replay and dynamic analysis framework for javascript,” in
Proceedings of the 2013 9th Joint Meeting on Foundations of Software
Engineering, ser. ESEC/FSE 2013. New York, NY, USA: Association
for Computing Machinery, 2013, p. 488–498. [Online]. Available:
https://doi.org/10.1145/2491411.2491447
[12] R. S. Herlim, Y. Kim, and M. Kim, “CITRUS: Automated unit testing
tool for real-world C++ programs,” in 2022 IEEE Conference on
Software Testing, Verification and Validation (ICST), 2022, pp. 400–410.
[13] D. Ivanov, A. Babushkin, S. Grigoryev, P. Iatchenii, V. Kalugin,
E. Kichin, E. Kulikov, A. Misonizhnik, D. Mordvinov, S. Morozov et al.,
“UnitTestBot: Automated unit test generation for C code in integrated
development environments,” in 2023 IEEE/ACM 45th International
Conference on Software Engineering: Companion Proceedings (ICSE-
Companion). IEEE, 2023, pp. 380–384.
[14] S. Rho, P. Martens, S. Shin, Y. Kim, H. Heo, and S. Oh, “Coyote c++:
An industrial-strength fully automated unit testing tool,” 2023.
[15] R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, and I. Finocchi, “A
survey of symbolic execution techniques,” ACM Comput. Surv., vol. 51,
no. 3, may 2018. [Online]. Available: https://doi.org/10.1145/3182657