PyCT: A Python Concolic Tester
PyCT: A Python Concolic Tester
1 Introduction
Python language has been widely adopted to develop modern applications such
as web applications, data analytics, machine learning, and robotics due to its
high-level interactive nature and its maturing ecosystem of scientific libraries. As
a general-purpose language, it is increasingly used not only in academic settings
but also in industry. While it is an appealing choice for algorithmic development
and exploratory data analysis, a systematic approach to analyze behaviors of
Python programs is of the essence for software security. Systematic input gen-
eration that can cover all (or most critical) program behaviors is critical for
software testing and debugging. While a concrete execution can only explore a
specific path, randomly generating inputs is hard to hit honeypot and in most
c Springer Nature Switzerland AG 2021
H. Oh (Ed.): APLAS 2021, LNCS 13008, pp. 38–46, 2021.
https://doi.org/10.1007/978-3-030-89051-3_3
PyCT: A Python Concolic Tester 39
of data types including integer, string, and range (Sect. 3). We also propose a
new method to upcast constant values for the prevention of unnecessary down-
casting (Sect. 4). We evaluate PyCT on a well-known GitHub project1 related
to algorithm implementations, and the experiment shows that these two opti-
mizations lead to a significant improvements in terms of code coverage. With
more member functions being supported, the coverage rate is raised to (80.20%)
from (71.55%). It goes up to (85.68%) as constant upcasting is also implemented
(Note that there are different means to define code coverage, in the paper, we
consider the line coverage of programs).
1
We use The Algorithms/Python project (https://github.com/TheAlgorithms/
Python), the 4th top-starred Python project on GitHub, introducing plenty of com-
mon algorithm implementations learning purposes.
PyCT: A Python Concolic Tester 41
1st Iteration: The integer variable x is the input argument of function isPalin-
drome. Initially, we randomly pick an integer value for x, say 0, and create the
concolic integer cx = (0, x) as the input. Now T is an empty tree, and Q is an
empty queue. After executing line 2 and 3, we have cy = (0, 0) and cz = (0, x). In
line 4, the Boolean statement z > 0 is encountered. Since cz .exp = x and the tree
T is empty, the node n1 with label ψ1 := (x > 0) is inserted into T as a root. See
Fig. 2. Because that cz .val = 0 does not satisfy the statement, the current execu-
tion will take the false branch. We push the formula ψ1 , whose model corresponds
to input values going to the true branch, into the queue Q (the result is Q1 in the
figure) Then, line 8 is executed. From the condition x == y, we create the con-
straint ψ2 := (cx .exp == cy .exp) = (0 == x). Since we are coming from the
left (false) branch of n1 , we add the node n2 with label ψ2 into T as a left-child of
n1 . Then we push the formula ¬ψ1 ∧ ¬ψ2 , which corresponds to taking n1 ’s left
branch and followed by taking n2 ’s left branch, is pushed into Q (to obtain Q2 in
the figure). Notice that the models of both formullae in Q2 will lead to some unex-
plored program lines, as they take different branch direction then we did in the
current execution. At the end, line 9 is executed and the first iteration finished.
1 def isPalindrome ( i n t x ) : n1
2 y = 0 ψ1 := (x > 0)
3 z = x F T
4 while z > 0:
n2 n3
5 y = lshift (y) ψ2 := (0 == x) ψ3 := (x//10 > 0)
6 + z 10
7 z = rshift (z) F T F T
n4
8 i f y == x :
x := −1 x := 0 ψ4 := (x%10 == x) . . .
9 return True
10 else : F T
11 return False No solutions. x := 1
12 def l s h i f t ( x ) :
13 return x 10
Q0 : Q1 : ψ1 Q2 : ψ1 ¬ψ1 ∧ ¬ψ2
14 def r s h i f t ( x ) :
15 r e t u r n x / / 10 Q3 : ¬ψ1 ∧ ¬ψ2 ψ1 ∧ ψ3 Q4 : . . .
2nd Iteration: The halting conditions for the process are either the queue Q is
empty or all lines of P are covered after an iteration. Neither of the halting
conditions is satisfied, then the process continues (until a given timeout period
is reached). At the beginning of the 2nd iteration, the formula ψ1 is removed
from the front of Q and the new initial value cx can be set as (1, x) since x = 1
is a solution of ψ1 . (Note that, in implementation, the formula is sent to an
SMT solver for a solution.) We then repeat the procedure in a similar way as we
did before. In this example, entire procedure will stop in three iterations when
all programs lines are covered. We refer the readers to our appendix for further
details.
Table 1. Member functions of integer and string types and their PyCT implementa-
tion.
Integer
Complete abs , add , bool , ceil , eq , floor , ge , gt , le ,
lt , mul , ne , neg , pos , radd , rmul , round ,
rsub , sub , trunc , conjugate, denominator, imag, numerator, real
Partial floordiv , mod , rfloordiv , rmod , rtruediv , truediv
Unsupported and , divmod , format , hash , index , invert , lshift ,
rshift , or , pow , rand , rdivmod , rlshift , ror ,
rpow , rrshift , rxor , xor , as integer ratio, bit length, to bytes
String
Complete add , contains , eq , iter , len , mul , ne , rmul ,
count, find, index, isalpha, isdigit, islower, isupper, lower, replace, upper
Partial ge , getitem , gt , le , lt , mod , endswith, isalnum,
isnumeric, lstrip, rstrip, split, splitlines, startswith, strip
Unsupported format , hash , rmod , capitalize, casefold, center, encode, expandtabs,
format, format map, isascii, isdecimal, isidentifier, isprintable, isspace, istitle, join,
ljust, partition, rfind, rindex, rjust, rpartition, rsplit, swapcase, title, translate, zfill
PyCT: A Python Concolic Tester 43
because they are not expressible in TSLIA . One major class of this type is the
bitwise operations, which usually requires the use of SMT bit-vector theory to
model them precisely and efficiently. Another class of examples are those whose
return types are not expressible in TSLIA . For example, “as integer ratio” returns
a tuple and “to bytes” returns a list. Our preliminary study over the top 5 starred
GitHub projects suggests that in totally 421,214 lines of code, the unsupported
functions (not counting the magic functions which cannot be found by simple
pattern matching) only occur 381 times (less than 0.1%).
The Range Class. In Python, Table 2. Member functions of range type.
“range()” is often used in the “for”
statement and is one of the most fre- Complete init , contains ,
quently used functions. However, dif- iter , len , count,
ferent from many other programming index
languages, in Python, “range” is a Partial getitem
class like string and integer and there- Unsupported bool , eq , ge ,
fore, refers to a set of member func- gt , hash , le ,
tions, listed in Table 2. To increase the lt , ne , reversed
coverage, some member functions of
range type are supported in PyCT.
Consider the statement “for i in range (a, b, c): S”, where S is a Python state-
ment. Expression range (a, b, c) corresponds to the sequence π : a, a + c, . . . , a +
c ∗ n when c ≥ 0, where n is the greatest non-negative integer s.t. a + c ∗ n < b. So
the statement S is covered when π is a non-empty sequence. The case of c ≤ 0
is symmetric.
To support member functions of range type, we introduce concolic range.
For the concolic object of range (a, b, c), the concrete value is the sequence π
and the symbolic expression is a quadruple (start, stop, step, current), where
start, stop, and step refer to ca .exp, cb .exp, and cc .exp, respectively, and “cur-
rent” is the element in π of the current loop, i.e., i.val. Each i.val ∈ π refers
to a branch ca .exp + cc .exp ∗ j < cb .exp, where j = (i.val − ca .val)/cc .val, i.e.,
the index of loop corresponding to i.val. The constraint ca .exp + cc .exp ∗ n <
cb .exp is then pushed into the queue as S is not covered. We use the exam-
ple given in Fig. 3 to explain why this setting can help increase coverage.
At the beginning, cb is set as (0, b). Obviously,
1 def range_example ( b ) :
2 f o r e i n range ( 0 , b , 4 ) : lines 3 and 4 are not covered, then after the 1st
3 i f e == 8 : iteration is completed, constraint 0 + 4 ∗ 0 < b
4 return
is pushed into the queue and b = 1 is a solution
to the constraint. Similarly, constraints 0 + 4 ∗
Fig. 3. “range” in the “for” state- 1 < b and 0 + 4 ∗ 2 < b are pushed into the
ment. queue at the ends of the 2nd and 3rd iterations
and b = 5, b = 9 are solutions, respectively.
Accordingly, program P is fully covered as the input value b = 9. In this example
in range (a, b, c), a and c are set as two constants, the other cases where a, c are
variables can be derived similarly.
44 Y.-F. Chen et al.
4 Constant Upcasting
In PyExZ3, besides non-supported member function calls, constant values
being callers may also downcast concolic objects to primitive ones, i.e., inte-
gers or strings. For instance, for constant string ‘abcd’, the function call
‘abcd’. contains (x) with input argument x causes the concolic object cx cor-
responding to x to be downcasted. This would reduce the total coverage rate
because in the subsequent program execution, we lost the symbolic information
of x, and hence cannot switch to some unexplored path when encountered branch
statements involving x. Our experiments (Sect. 5) shows that such downcast has
significant negative impact to code coverage.
A naive solution is to make Python constant/primitive strings’ member func-
tions (such as “ contains ”) also accept concolic objects as input arguments.
PyCT: A Python Concolic Tester 45
To achieve this, one has to modify, for example, Python’s source codes, which
is cumbersome and needs a great effort. Besides, they also have to update the
modification frequently for maintaining the compatibility as Python’s official
version advances.
In PyCT, a more feasible and reliable solution is implemented instead. The
idea is to upcast constant values to their corresponding concolic objects. More
specifically, each constant value s is upcasted to the concolic object (s, s). So
the primitive string “abcd” is upcasted to the concolic string (“abcd”, “abcd”)
and the appearance of the function call “abcd”. contains (x) is replaced with
the concolic string member function call introduced in the last section, i.e., the
function contains in Table 1. Accordingly, cx is not downcasted and so we
can use its symbolic value afterward. By replacing each occurrence of constants
in the AST of the source code under testing with its corresponding concolic
object, then the idea, upcasting constant values to concolic objects, is instantly
realized. This constant upcasting technique is also used to conquer a similar
problem that PyExZ3 only outputs concrete values from Python’s common built-
in constructors such as int(x), str(x), and range(x, y, z) in PyCT.
5 Experiment Results
To evaluate PyCT, we compare it with PyExZ3 on the following five bench-
marks: (1) UnitTest(PyExZ3) provided by PyExZ3; (2) UnitTest(PyCT) which
we compose for testing; (3) LeetCode collected from the LeetCode platform;
(4) PythonLib, the Python core libraries, where both LeetCode and Python-
Lib involve diverse usages of string-number conversion in Python such as
parsing date-time, verifying and restoring IP addresses from strings, etc.; (5)
The Algorithms/Python2 , the 4th top-starred Python project on GitHub, intro-
ducing plenty of common algorithm implementations in Python language for
learning purposes, and therefore including many integer and string type-hinting
functions. The experiments are run in a Docker container on a PC with an Intel
Core i7-10700 (2.90 GHz) processor with 8 cores and 16 threads, a 48 GB of
RAM, and a 1.8 TB, 7200 rpm hard disk drive running the Ubuntu 20.04.1 LTS
operating system. The versions of Python and the SMT solver CVC4 are 3.8.5
and 1.73 , respectively.
For the evaluation of the constant upcasting technique, in the experiments,
PyCT are run in two modes, the mode without and with constant upcasting,
denoted PyCT (Sect. 3 only) and PyCT+Up (Sect. 3 and 4), respectively. For
each concolic tester, each function in the benchmarks is tested at most 15 min.
The timeout of one concolic testing iteration is set to 15 s. The timeout of an
SMT constraint solving is set to 10 s. We use the package “coverage”4 to compute
line coverage (the number of executed lines ÷ the number of lines in the source
code).
2
https://github.com/TheAlgorithms/Python.
3
https://github.com/cvc5/cvc5/tree/d1f3225e26b9d64f065048885053392b10994e71.
4
https://pypi.org/project/coverage/4.5.4/.
46 Y.-F. Chen et al.
Table 3 shows the results. Both PyCT and PyCT+Up outperform PyExZ3
on all benchmarks in terms of coverage. The results show that constant upcast-
ing technique significantly improves PyCT’s line coverage. One can tell that
comparing with PyExZ3, on average, PyCT and PyCT+Up take more time on
testing a function. Whereas that the median times of the three testers on all
benchmarks are almost equivalent, which suggests that PyCT and PyCT+Up
spend more time on solving difficult cases to increase coverage rate.
References
1. Ball, T., Daniel, J.: Deconstructing dynamic symbolic execution. In: Dependable
Software Systems Engineering, pp. 26–41 (2015)
2. Barrett, C., Stump, A., Tinelli, C.: The SMT-LIB Standard: Version 2.0. Technical
report (2010). www.SMT-LIB.org
3. Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing.
In: ACM SIGPLAN Notices, vol. 40, pp. 213–223. ACM (2005)
4. Godefroid, P., Levin, M.Y., Molnar, D.A.: SAGE: whitebox fuzzing for security
testing. ACM Queue 10(1) (2012)
5. King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–
394 (1976)
6. Luckow, K., et al.: JDart: a dynamic symbolic analysis framework. In: Chechik,
M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 442–459. Springer,
Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9 26
7. Sen, K., Kalasapur, S., Brutch, T.G., Gibbs, S.: Jalangi: a tool framework for
concolic testing, selective record-replay, and dynamic analysis of Javascript. In:
ESEC/FSE 2013, pp. 615–618 (2013)
8. Sen, K., Marinov, D., Agha, G.: CUTE: a concolic unit testing engine for C. In:
ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 263–272. ACM (2005)