JST An Automatic Test Generation Tool
JST An Automatic Test Generation Tool
Abstract—In this paper we present JST, a tool that auto- Even though traditional symbolic execution has mainly dealt
matically generates a high coverage test suite for industrial with numeric input variables, industrial Java applications are
strength Java applications. This tool uses a numeric-string hybrid different as almost all inputs to these applications are strings.
symbolic execution engine at its core which is based on the
Symbolic Java PathFinder platform. However, in order to make While some string inputs are used and manipulated as strings
the tool applicable to industrial applications the existing generic inside these applications, other inputs are converted to integers
platform had to be enhanced in numerous ways that we describe or floating point numbers after extensive format checking
in this paper. The JST tool consists of newly supported essential in the string domain. Such back-and-forth conversions pose
Java library components and widely used data structures; novel unique challenges to the symbolic execution tool which has
solving techniques for string constraints, regular expressions, and
their interactions with integer and floating point numbers; and traditionally handled only numeric constraints well. Even in
key optimizations that make the tool more efficient. We present the numeric domain traditional SMT solvers cannot handle
a methodology to seamlessly integrate the features mentioned non-linear equations as solving such equations is undecidable
above to make the tool scalable to industrial applications that in general. However, the industrial examples that we deal
are beyond the reach of the original platform in terms of with routinely have non-linear operations which we have to
both applicability and performance. We also present extensive
experimental data to illustrate the effectiveness of our tool. tackle during symbolic execution. Instead of giving up on such
situations, we have devised some techniques that are able to
I. I NTRODUCTION circumvent the problem in certain situations which we describe
in this paper. Finally, symbolic execution suffers from the path
With the ubiquitous presence of software programs per-
explosion problem which is even more acute in large industrial
meating almost all aspects of daily life, it has become a
examples. In this paper we describe effective steps to eliminate
necessity to provide robust and reliable software. Traditionally,
large portions of the symbolic search tree thus making the
software quality has been assured through manual testing
analysis engine scale to large examples.
which is tedious, difficult, and often gives poor coverage of
The tool that we present in this paper, JST (Java String
the source code especially when availing of random testing
Testing), is a comprehensive Java testing tool that addresses
approaches. This has led to much recent work in the arena
the above described issues in traditional symbolic execution
of formal validation and testing. One such formal technique
engines. It has extensive support for string operations and
is symbolic execution [1], [2], [3], [4] which can be used to
complex interactions between strings and numbers. For ex-
automatically generate test inputs with high structural coverage
ample, the JST symbolic executor can handle virtually all
for the program under test.
Java string operations, regular expressions, as well as string-
Symbolic execution is a model checking technique that
number conversions. In addition, we have also added support
treats input variables to a program as unknown quantities or
for symbolic container classes like Maps, Arrays, etc., and
symbols [5]. It then creates complex equations by executing
other numeric data structures like BigDecimal and BigInteger
all possible finite paths in the program with the symbolic
which are widely used in financial Java applications.
variables. These equations are then solved through an SMT
JST is based on the Java PathFinder 1 (JPF) model checker
(Satisfiability Modulo Theories, [6]) solver to obtain test
and its symbolic execution extension, Symbolic PathFinder
cases and error scenarios, if any. Thus this technique can
[2], [7]. JPF consists of a highly configurable and easy to
reason about all possible values at a symbolic input in an
extend toolkit which is the main reason for using it as our
application. Though it is very powerful, and can lead to
underlying platform. JPF implements its own Java virtual
detection of interesting input values that uncover corner case
machine (JVM) to execute Java bytecode which we have
bugs, it is computationally intensive. Hence some techniques
extended to handle all Java primitive types and Strings. We
and methodology are needed to make symbolic execution scale
have addressed many bottlenecks in this process through a
to industrial size applications.
variety of innovations, each of which is essential to achieve
** Contributed to this work during an intership program at Fujitsu Labo-
ratories of America, Sunnyvale, CA, USA 1 http://babelfish.arc.nasa.gov/trac/jpf/wiki/projects/jpf-core
scalability of the tool and the desired quality of results. We to numbers and finally, branches over arithmetic operations
summarize these experiences in the following sections. on those numbers. For example, a valid test case for path 4
We organize the paper as follows. We first give the back- is “-1,000,200”. It is not trivial to automatically cover all the
ground and a motivating example. This is followed by the branches through constraint solving, especially when the string
details of the symbolic execution framework and the string- formats and the numeric computation are very complicated.
numeric hybrid solver. Finally, after presenting experimental The satisfiability of string+numeric constraints is an un-
results, we end with the discussion and conclusion. decidable problem in general ([9] shows that a small subset
of string operations will result in undecidability of a hybrid
II. BACKGROUND AND M OTIVATING E XAMPLE
set of constraints). Hence practical solutions are important to
Symbolic execution is used to achieve high code coverage tackle string-intensive programs. Existing string solvers (see
by reasoning on all possible input values. It characterizes each [10] for a comparison of the automaton-based approaches)
program path it explores with a path condition encoded as a cannot fulfill our needs completely. These solvers provide no
conjunction of Boolean clauses. A path condition denotes a support or only very limited support for hybrid constraints,
set of branching decisions. When the execution is finished, i.e. non-trivial combinations of numeric constraints and string
multiple path conditions may be generated, each corresponding constraints. In contrast, our solver supports almost all Java
to a feasible execution path with respect to the symbolic inputs. string operations and, more importantly, hybrid constraints. In
The solutions to path conditions are the test inputs that will addition, we use various techniques to control and optimize the
assure that the program under test runs along a particular solving during symbolic execution rather than use the solver
concrete path during concrete execution. Exhaustive testing as a black box.
is achieved by exploring all true paths. The motivating example in Figure 1 involves the following
In addition to comprehensiveness, symbolic execution has techniques:
other benefits. First, it actually “executes” the code in a real
environment, hence eliminating the need to build models, 1) s.matches(...) checks whether s is of a spe-
or apply abstract analysis. Second, it is highly automated, cific format, and f=Integer.parseInt(s) checks
producing test cases without requiring the intervention from whether s is an integer. For these cases, we need to inter-
users [1], [8]. This makes it a good choice to apply symbolic sect the automaton representing s and the ones modeling
execution to realistic programs such as web applications. the regular expression and the syntax of valid integers.
On the other hand, there exist various problems that pre- This may incur complicated automaton refinements.
clude its widespread use. The main issue is poor scalability 2) s.lastIndexOf(...) may return a symbolic inte-
due to the problem of path explosion. This is because the ger whose value is used to break s into multiple parts.
engine creates a new path for every comparison, or branching Some parts may be converted into numbers (e.g. through
instruction and may create thousands of paths. What is worse, parseInt) and then used in intensive computations
each branching or assertion check (e.g. on memory out-of- (e.g., x>=100). Hence the path condition may contain
bound accesses) in the program will invoke the solver for hybrid constraints manipulating strings and numbers
a satisfiability check. The vast number of invocations to in tricky ways. Our solver handles these constraints
the solver makes constraint solving the main bottleneck of by connecting the string and numeric domains through
a symbolic execution tool. This necessitates good solving guided iterations, relational graphs, refinement rules, and
techniques which is one focus of this work. Specifically, we other advanced techniques.
have developed an in-house string-numeric solver to cater to 3) f is further converted into a specific data structure
the special needs of Java web applications. BigDecimal, which stores the numeric values in ad-
Now, consider an example with its corresponding branching hoc format. Then some rounding operations are applied.
tree shown in Figure 1. As the tree shows, at each branching We model many widely-used data structures.
point (if statement) the path condition splits into two, and 4) In order to make the tool scale, we use relaxed solving, a
each branch extends with new constraints. Notation L(q) two-tier execution and solving technique, to handle hy-
represents the length of string q. This example first checks brid constraints at run-time. Specifically, we separate the
whether the symbolic input s starts with ’-’ such that s may two domains and avoid iterations between them when
represent a negative number. If so, it checks whether s is of checking the satisfiability of an intermediate branch,
a popular format represented by a regular expression (e.g. a and only apply the iterations at the end of a path. This
string starting with ’-’, followed by at least one digit, and a key optimization enables us to achieve a good balance
comma, and then 3 digits). Then it checks whether ’,’ appears between accuracy and performance.
in s. If not then s is converted into a BigDecimal number. Oth- Among these, Item 4 has not been used in any prior work,
erwise, the substring after character ’,’ is taken and converted and Item 2 has been studied to a very limited extent. The
into an integer x. Then the computation continues by checking automaton refinement scheme in Item 1 is quite different
whether x is not less than 100. This is a typical computation from existing automaton-based ones. One of the first attempts
sequence in web applications. The application first performs at solving hybrid constraints using automata in the string
format checking on input strings. Then it converts the strings domain is described in our earlier work [11]. This paper
1) String s; // symbolic input
2) if (s.charAt(0)=='-') { PC0: true
3) if (s.matches("-\d+,\d{3}"))
4) ...; // path 1
5) else { PC1: s.charAt(0)='-' ! PC2: s.charAt(0) ≠'-' !!
6) int i=s.lastIndexOf(',');
2)
s.subString(0,1)="-" s.subString(0,1) ≠"-"
7) if (i==-1) {
8) int f=Long.parseLong(s);
PC3: PC1 ! PC4: PC1 ! path 6
9) BigDecimal f1=BigDecimal(f,2); 3)
10) BigDecimal d1=BigDecimal(-1); s.matches("-\d+,\d{3}") s.notMatches("-\d+,\d{3}")
11) if (f1.divid(3,BigDecimal.ROUND_HALF_UP).equals(d1))
12) ...; // path 2 path 1
13) else PC5: PC4 ! PC6: PC4 !
7)
14) ...; // path 3 s.notContains(",") ! i=-1 s.subString(i,i+1)=","
15) } else {
16) String s1=s.substring(i+1);
17) int x=Integer.parseInt(s1); PC7: PC5 ! f=VOF(s) ! PC8: PC5 ! f=VOF(s) !
11)
18) if (x>=100) 2
f1=(f/10 ) ! Round(f1/3)=d1 f1=(f/102) ! Round(f1/3)≠d1
19) ...; // path 4
20) else
21) ...; // path 5 path 2 path 3 PC9: PC6 ! s1=s.subString(i+1) ! PC10: PC6 ! s1=s.subString(i+1) !
22) } 18)
VOF(s1)"100 VOF(s1)<100
23) }
24)} else
25) ...; // path 6 path 4 path 5
Fig. 1. Our motivating example and its corresponding branching tree with path conditions.
s1 ≠
"xy"
< ≥
� � Accept
� s2 startsWith contains
s3
New Accept
Fig. 4. Model operation substring(2, 4) using automaton. C. Numeric and String Solving Interactions
String constraints depict the relation between strings String constraints and numeric constraints must agree on
(and numbers). The dk.brics.automaton package does the same value of every shared symbolic variable. Roughly,
not address string constraints directly. Hence we enhance we have the numeric solver N give some candidate values,
it to refine string values according to given string con- and ask the string solver S whether these values violate the
straints. This procedure includes (1) automaton refinement, string constraints. If no violation is found, then these values
(2) fix-point calculation, and (3) optimizations to speed-up are valid for both domains. Otherwise, S gives some feedback,
the convergence. In the motivating example, for constraint like the conflicts it learns, to N, and asks for other candidates.
s.charAt(0)==‘-’, we intersect s with an automaton ac- Next, N adds the conflicts and the negation of the current
cepting any string starting from character ‘-’. Later on, since assignment, and then searches for another valid assignment
a portion of s is converted into an integer using parseInt, using some heuristics. Note that only the concrete values of
we intersect this portion with the automaton modeling all valid shared variables will be passed from N to S, and N can
integers; we also need to refine s based on the updated portion, learn some string conflicts and passes them to S too. For
i.e. intersect s’s automaton with the one that accepts strings example, numeric constraint s.length() > 5 enforces that
ending with this portion. Basically, if a constraint enforces a the corresponding automaton should be intersected with one
relation over strings s1 and s2 , e.g. s1 .beginsWith(s2 ), then accepting strings of length > 5.
we refine (1) s1 by enforcing that it starts with s2 , and (2) s2 Since the two domains interact with each other mainly
by enforcing that it is the beginning part of s1 . This process through concrete values, it is important to avoid using fruit-
is repeated until no more refinement is possible and a fix- less values. During the iteration we exchange the constraints
point (on the possible string values) is reached. We will skip learned from each domain (called interactive constraints) so as
the details here, but the basic algorithm is similar to abstract to speed-up the convergence. For example, consider string con-
interpretation [15], e.g. for abstract domains such as integer straint s1 =s2 .trim(). A numeric constraint L(s1 )≤L(s2 )
intervals. is added into N, where L(s1 ) and L(s2 ) are (symbolic)
In addition to automaton refinement, we apply special integer variables representing the lengths of s1 and s2 . In our
handing to some cases where the pure automaton model is implementation, interactive constraints are modeled in a RULE
inadequate. Take constraint s1 < s2 ∧ s2 < s3 ∧ s3 < s1 library which we describe next.
for example. It is difficult to model s1 < s2 using au- Interactive Constraint Propagation. The RULE library in-
tomata, as the automaton capturing the < relation may have cludes commonly occurring patterns that we observed when
a huge size. Furthermore, this constraint should be proven applying JST to a wide range of Java applications. These
to be false immediately without involving any automaton patterns are particularly useful in the web and financial
computation. In our implementation, we introduce two extra application domain, especially those from S to N. Table I
models for such constraints. In the first one, each string shows an excerpt of these patterns, The “a” rules have been
variable is associated with a “number” representative, e.g. s1 ’s used by other solvers [9], [16], [17], while the “b” rules are
representative is integer is1 , such that this constraint can be new in JST. One example is (s.startswith(’-’) ∧
falsified immediately by the numeric solver through checking n=Integer.valueOf(s)) where s is a string variable
is1 < is2 ∧ is2 < is3 ∧ is3 < is1 . This model allows us to prove and n is an integer. This pattern leads to numeric constraint
unsat cases quickly. n<0. For another example, if the numeric solution found for
In the second model, we maintain a relational graph with VOF(s) is 5, then string constraint s.equals("5") is
the string variables and their relations as the nodes and edges enforced.
TABLE I
S OME RULES IN THE RULE LIBRARY. H ERE s1 , s2 , AND s3 ARE STRING VARIABLES ; i AND j ARE INTEGER VARIABLES ; n IS A NUMERIC VARIABLE ; c IS
A POSITIVE INTEGER CONSTANT.
For illustration, consider path 4 in the motivating example. The situation becomes worse when the solver is invoked mul-
We give below its path condition pc where some additional tiple times. Suppose a program contains n satisfiable branches,
“S to N” constraints (#1-#4) have been added into the nu- then O(2n ) paths can be generated and O(2n ) queries are in-
meric domain. To solve this pc, the numeric solver N first voked on the solver. If m iterations are needed for each query,
obtains a valid assignment to numeric variables, e.g. L(s)=2, then O(2n m) iterations happen in total. Fortunately, we have
L(s1)=1, i=0, and x=100, then passes these values to the observed in real applications that many path constraints will
string solver S. Unfortunately S cannot find a solution since turn into unsat in subsequent computations quickly leading to
constraint x = parseInt(s1) is unsat. It can ask N to perform false paths. We also know that only the solved values at the
new iterations until s1’s length is at least 3. Or, the solver end of a symbolic path will contribute to valid test cases.
uses a rule to derive from “x=parseInt(s1) ∧ x≥100” Based on these observations, we apply a two-phase solving
an additional constraint L(s1)≥3. Moreover, the automata technique during symbolic execution. For the intermediate
enforce that s’s length is at least 2 more than that of s1. branching nodes, we use relaxed solving which checks only
With these two new constraints the solver can quickly find a whether the string and the numeric constraints are sat without
valid solution, e.g. s=“-,100”, s1=“100”, i=1, and x=100. multiple iterations. At the end of each path, we use the usual
regular solving with full iterations and constraint propagation.
Numeric String This method dramatically improves the performance since we
(1 : L(s) > 0) s.charAt[0] = ‘−0 not only avoid expensive solving for intermediate nodes but
(2 : i = −1 ∨ 0 ≤ i < L(s)) i = s.lastIndexOf(‘,0 )
(3 : i + 1 + L(s1) = L(s)) s1 = s.substring(i + 1)
are able to quickly cut out many false paths using the over
(4 : L(s1) > 0) x = parseInt(s1) approximation techniques of relaxed solving.
i 6= −1 ∧ x ≥ 100 ¬(s.matches(−\d+, \d{3})) The relaxed solving process is similar to the regular one but
without multiple iterations. It starts with solving the numeric
When S cannot find a valid solution, additional constraints constraints. If a solution is found, it passes additional con-
are passed to N and then a new iteration starts by throwing straints (e.g. those in Table I) but not the values of shared vari-
away previous candidate values. For example, suppose that ables to the string solver. Note that these additional constraints
numeric shared variables a, b, and c with the solutions 5, -7, represent a strict over-approximation of the set of solutions
0 are found unsat in S, then we will need to add the numeric possible in the numeric domain. If the string solver cannot
constraint ¬(a = 5 ∧ b = −7 ∧ c = 0) or a 6= 5 ∨ b 6= −7 ∨ c 6= 0. find a solution, the path condition pc is unsatisfiable because
Clearly this may lead to an exponential number of case no matter what other solution is passed from the numeric
splittings. To mitigate the blow-up, we apply an optimization domain it will satisfy the over-approximation constraint and
through utilizing a feature provided by the Yices solver: Yices hence be unsat in the string domain. Otherwise, pc is regarded
returns satisfiable values closest to zero for numeric variables. to be sat. Thus in the relaxed mode the two domains do not
Hence we can safely try three narrower cases: a > 5; b < −7; communicate through concrete values.
and c 6= 0, which searches the same space but can converge The main disadvantage of relaxed solving is that it may
faster. We also consider another case, (a > 5 ∧ b < −7 ∧ c 6= 0), explore infeasible paths whose path conditions are sat in
in hope of finding a valid solution rapidly. Our experience the relaxed mode while unsat in the regular mode. This
demonstrates that such simple improvements and heuristics seems to increase the number of intermediate paths. However,
can have considerable effects in practice. this happens rarely in practice since (1) we still use the
string solver S and the numeric solver N to rule out most
D. Relaxed Solving and Execution infeasible paths, and (2) an infeasible path is often falsified by
One of the main bottlenecks of our hybrid solver is that it subsequent relaxed solving at intermediate nodes. Thus relaxed
may need many iterations to find a solution or derive unsat. solving can be very effective in pruning infeasible paths early.
Regular: then be solved using faster SMT solvers. One example is the
pc
round method in the Java class java.lang.Math which
returns the closest integer to the given floating point number.
If the nearest integers to the given number are equidistant,
S1 ↔n N1 ¬S1 ↔n ¬N1
this operation returns the greater integer (e.g. round(2.5)
... ... becomes 3). Specifically, for the operation round(e), we
add the following linear constraint to the numeric set, where
Sk ↔n Nk ¬Sk ↔n ¬Nk Sk ↔n Nk ¬Sk ↔n ¬Nk e is a real variable, and x1, x2, and result are introduced
• • • ◦ integer variables. result represents the value of round(e),
Relaxed: and it replaces all its occurrences in numeric constraints.
pc When e is positive, the constraint enforces e − 0.5 < x1 ≤
e + 0.5, e.g. when e = 1.6, constraint 1.1 < x1 ≤ 2.1 implies
S1 ↔1 N1 ¬S1 ↔1 ¬N1 that result = 2. Variable x2 is for the case when e is negative.
... ((e − 0.5 < x1) ∧ (e + 0.5 ≥ x1)) ∧
... ... ((e − 0.5 ≤ x2) ∧ (e + 0.5 > x2)) ∧
Sk ↔1 Nk (result = (if (e > 0) x1 x2))
Sk ↔1 Nk ¬Sk ↔1 ¬Nk
• Similarly, we replace other operations in the Math class,
• ¬S1 ↔n ¬N1
such as ceil and floor, with their equivalent linear con-
... straints. Many rounding methods in the BigDecimal class
Fig. 5. Comparing regular solving and relaxed solving. are also handled in this manner. While most conversions are
not complicated, this represents one important enhancement
Consider Figure 5 which contains two branching trees start- we use to scale JST to handle industrial applications. It is
ing from a node with path condition pc. Each tree branches obvious that this transformation can only work for non-linear
over a sequence of conditions S1 ↔m N1 , . . . , Sk ↔m Nk , functions that are piecewise linear. Otherwise we still need to
where Si ↔m Ni denotes the ith condition such that its use Coral and abort in case of a time-out.
numeric constraint Ni and string constraint Si iterate m times. V. E VALUATION
When m = 1 the relaxed mode is used. The regular mode
We evaluate JST on three string-intensive benchmark ex-
assumes m = n where n is the average iteration number. The
amples whose characteristics are described in Table II. They
“regular” tree is of height k, incurring (2k+1 −2)n iterations in
represent many other similar applications we have tested. For
total. Suppose all the leaf nodes except the rightmost one are
each benchmark we create a driver and and some stubs to
invalid (marked by •) and the rightmost path is valid (marked
produce a closed system on which JST can be run. Since the
by ◦). In this case exploring the others is fruitless. This can
first two benchmarks are very large and relied on multiple
be avoided through the relaxed mode. Assume that in all the
external libraries, packages and jars, and a lot of source code
paths except for the rightmost one, Si and Ni may conflict
is missing, we focus on the parts where the core logic are
with Sj and Nj for i 6= j, i.e. the paths become unsat in
implemented. Despite their small sizes (1-4k lines), they rep-
the relaxed solving phase. Hence there is no need to perform
resent the most complicated string and numeric computations
expensive iterative solving on those paths.
in the applications. Note that, in order to to test the core logic,
The number of iterations now is reduced to 2k+1 − 2 + kn, we have to symbolically execute the whole application leading
(2k+1 −2)n
an improvement of 2k+1 −2+kn ≈ n times (for large k) over to huge symbolic paths through the complete application.
the regular mode. That is, if the average iteration number is We only measure the coverage of the core logic although
1000, then relaxed solving can produce a speed-up of 1000x. many other parts of the applications are also covered (many
This is validated in the experimental results section. of which are not sensitive to the symbolic inputs). We run
the experiments on a Ubuntu Linux Machine with quad core
E. Handling Some Nonlinear Operations
3.4Ghz Intel core i7 processor and 8GB of RAM.
Some applications that we tested generate path conditions We could not compare our results with any other freely
involving nonlinear numeric constraints. Solving them is un- available tool because we found all of them to be completely
decideable in general and beyond the capabalities of SMT inadequate in handing all the String operations that existed in
solvers. For these constraints, we tried the Coral solver [12] our examples and the specific interactions between the numeric
which is a randomized solver that uses machine-learning algo- and string domain. There is also currently no standard format
rithms to search solutions for complex non-linear constraints. for expressing String constraints. Thus it is not possible to
Unfortunately, using such a random solver considerably translate all the complex constraints to operations that other
slows down the solving process (it has been reported that String-Numeric constraint solving tools can understand.
Coral can be > 100x slower than SMT solvers for linear Table III shows the result of running JST on Example A
constraints [12]). To mitigate this problem, we transform some while gradually increasing the number of symbolic inputs to
nonlinear constraints to equivalent linear ones, which can 4. We also present some results for different combinations of
TABLE II
B ENCHMARK S TATISTICS
symbolic variables. All these inputs are string variables though with even as little as 3 symbolic variables and even for the
some are converted to integers and double values within the cases it finishes, it is unable to find many solutions within the
application. The number of true paths in the table corresponds stipulated 2,000 iterations. The reasons for this are twofold
to the actual test cases generated at valid end states in the (see Section IV-D too). First, the relaxed mode is able to prove
program. However, there are also some unhandled exception some intermediate path conditions unsat in the middle of the
paths that are encountered during the symbolic execution like symbolic execution tree thus pruning off large portions of the
NumberFormatException, ArithmeticExecption, symbolic search tree. This speeds up the exploration of the
etc. which are shown in the Column 3. These exception complete symbolic execution tree. Second, even at a true leaf
paths usually indicate that the application is missing the error node, the rule based constraint exchange between the numeric
handling code. Manual investigation indicates that a small and string domains is able to quickly converge to a solution
portion of them are “real” issues (wrong assumptions or bugs). whereas without that feature the hybrid solver often hits the
In Column 4 the maximum number of iterations required 2,000 iteration limit without finding a solution. Thus a lot of
between the numeric and string domains to arrive at a valid time is wasted without producing any new results. Note that
solution is shown. As expected this number does increase as the relaxed mode does not lead to over-approximations since
we have more symbolic inputs interacting with each other full solving is applied on the leaf nodes to eliminate false
but due to the optimizations mentioned in subsection IV-B, “tentative” paths.
JST is able to keep the number of iterations to a reasonable
TABLE IV
limit. There are no unsolved constraints in this case and E XPERIMENTAL R ESULTS FOR E XAMPLE B. #T.P AND #E.P. DENOTE
with 4 symbolic inputs we can achieve a greater than 80% (#T RUE PATHS ) AND (#E XCEPTION PATHS ) RESPECTIVELY. I N RELAXED
code coverage on the core logic part of the application. We MODE , #T.P. CONTAINS “# FINAL PATHS /# TENTATIVE PATHS ”
will discuss in Section VII the reasons for the coverage not # Sym. With Relaxed Mode Without Relaxed Mode
attaining 100%. Vars #T.P. #E.P. Time #T.P. #E.P. Time
1 3/5 9 7s 3 9 35s
TABLE III 2 (subset 1) 15/26 15 20s 15 15 2:37h
E XPERIMENTAL RESULTS FOR E XAMPLE A 2 (subset 2) 20/36 18 22s 20 18 3:05h
3 (subset 1) 178 183 7:47m - - TO
#Sym.Vars #Paths Time #Iterations 3 (subset 2) 253 301 9:42m - - TO
True Exception 4 (subset 1) 890 2,535 4:31h - - TO
1 6 1 8s 1 4 (subset 2) 1,430 3,242 5:54h - - TO
2 (subset 1) 27 5 16s 1 5 7,156 20,530 28:50h - - TO
2 (subset 2) 35 9 55s 2
3 (subset 1) 595 16 3:24m 5
3 (subset 2) 493 25 4:01m 5 Finally, we present results to demonstrate the effectiveness
4 1,971 95 11:08m 24
of non-linear constraint modeling as described in subsection
IV-E. In this case we run experiments of example C which is
Table IV shows the effectiveness of the relaxed solving carved out of a second financial application for unit testing.
described in subsection IV-D. In this experiment we run These classes consist of a lot of different types of rounding
multiple experiments on Example B first by using the relaxed operations on the Java BigDecimal data type. We make
mode (Columns 2-4) and subsequently turning this feature off multiple runs with the non-linear modeling turned on (Table
(Columns 5-7). The hybrid solver is allowed 2,000 iterations V, Columns 2-4) and then off (Table V, Columns 5-7). When
before giving up. The complete time out of a run is set the non-linear modeling is turned on all the resulting path
to 48 hours. Again we gradually increase the number of conditions can be solved by the Yices [13] solver but when it
symbolic variables from 1 to 5 and again we report results is turned off we have to invoke the non-linear solver Coral
for multiple combinations of same number of variables. In [12]. Again it is evident from the table that we can get
the relaxed mode run for the 5 variable case, JST finishes orders of magnitude improvement in runtimes and dramatic
after 1 day with a total of about 28,000 test cases. Again, we improvements in the quality of results. Since this is unit
can achieve about 80% code coverage on the core logic code. testing, with all input variables symbolic, we can obtain 100%
All valid path conditions can be solved or proved unsat and code coverage in the whole example. However, when we turn
the maximum number of iterations needed among the solvers off this feature we run into frequent time-outs inside Coral.
is 268. However, when we turn off that feature the effect is This is to be expected as the Coral algorithm is by definition
a dramatic reduction in effectiveness of JST. JST times out incomplete and weaker than SMT solving which relies on effi-
cient algorithms for solving Boolean and numeric constraints. TABLE VII
Of course there will be cases when it is impossible to model D EVELOPMENT SIZE OF JST. W E COUNT ONLY OUR EXTENSIONS .
non-linear constraints with piecewise linear operations. In such Main Component Extend Over l.o.c
cases we do invoke Coral as a last resort and give up in case Execution Engine Sym. JPF [2] 23,418
Coral times out without finding a solution. Automaton Package JSA [14] 4,546
Hybrid Solver – 8,784
Nonlinear Solver Coral [12] 1,180
TABLE V Others – 1,500
E XPERIMENTAL RESULTS FOR E XAMPLE C. #T.P AND #E.P. DENOTE THE Total 39,428
NUMBERS OF TRUE PATHS AND EXCEPTION PATHS RESPECTIVELY.