Countering The Path Explosion Problem in The Symbo
Countering The Path Explosion Problem in The Symbo
Abstract—Symbolic execution is a powerful verification tool for hard- Symbolic execution is closely related to symbolic simulation [7]
ware designs, but suffers from the path explosion problem. We introduce [5] [8]. In both, concrete input values are replaced with symbolic
a new approach, piecewise composition, which leverages the modular
values, representing any possible value, and how the symbolic values
structure of hardware to transfer the work of path exploration to
SMT solvers. We present a symbolic execution engine implementing the propagate through the design is tracked. However, there is a key
arXiv:2304.05445v1 [cs.CR] 11 Apr 2023
technique. The engine operates directly over register transfer level (RTL) difference. In symbolic simulation, the analysis is centered around
Verilog designs without requiring translation to a netlist or software dataflow. At the end of a simulation run, each signal may hold the
simulation. In our evaluation, piecewise composition reduces the number value true, false, or a boolean expression characterizing the entire
of paths explored by an order of magnitude and reduces the runtime by
97%. Using 84 properties from the literature we find assertion violations circuit that drives that particular signal. Where there are control points
in 5 open-source designs including an SoC and CPU. in the circuit, they are expressed as ITE statements in the boolean
Index Terms—verification, formal methods, hardware, security expression. In symbolic execution, the analysis is centered around
control flow. At the end of one iteration, each signal is characterized
I. I NTRODUCTION by an expression in first-order logic that characterizes the particular
The verification of hardware designs is a key activity for ensuring path taken through the Verilog RTL. In addition there is a path
the correctness and security of a design early in the hardware condition that represents the conditions under which execution would
lifecycle. Current best practice includes assertion-based verification follow the particular path through the design.
(ABV) [1], which has simulation-based testing as the underlying There is a trade-off to be made between the complexity of queries
means of verification, and formal verification techniques, an umbrella sent to the SMT solver (symbolic simulation) and the number of
term encompassing many techniques with the goal of proving a given paths to explore (symbolic execution). With piecewise composition,
property of a design. One technique in particular that has gained we examine a new point in the design space, reducing the number
recent attention, especially in security verification applications, is of paths to explore to a tractable amount, while still keeping SMT
symbolic execution [2] [3] [4]. queries simple enough for modern solvers. The result is a symbolic
Symbolic execution generalizes testing by replacing input values execution engine that can handle large designs and operate directly
with symbols, where each symbol represents the set of possible values over Verilog at the register-transfer level.
of the input parameter. A symbolic execution engine drives symbolic Piecewise composition works by recognizing that independent
execution using the semantics of the program’s language, but updated parts of a design do not need to be re-explored, once per root-
to include symbols. As execution proceeds the symbols are used to-leaf path. The algorithm symbolically explores each independent
in place of literal values. The result of symbolically executing a block of Verilog once, without consideration of the remaining blocks,
design for one clock cycle is a tree of paths, each one associated producing a set of symbolic execution trees. To reconstruct full root-
with a unique path condition that describes the conditions satisfied to-leaf paths, whether for finding assertion failures, describing how
by branches taken along the path. If any path is found to violate a information flows through a design, or to generate testcases, we
given assertion, then the associated path condition acts as a precise can use SMT queries to combine the independently explored path
description of the inputs that will drive (concrete) execution along fragments.
the same path; concrete values that satisfy the path condition are a Perhaps surprisingly, we show that for a design with N always
counter-example to the assertion. blocks, each with at most b binary branch points, symbolic execution
Unfortunately, symbolic execution suffers from the path explosion of the design for a single clock cycle requires symbolically executing
problem – each path through a design is explored separately and O(2b N) paths, instead of the O(2bN ) paths typical of symbolic
the number of paths grows exponentially with the number of branch execution. The number of paths to explore grows exponentially with
points, or control flow statements, in the design. Prior work has only the number of branch points in any one independent block, and
sought to avoid the path explosion problem by combining symbolic linearly with the number of blocks.
execution with model checking [5], concrete execution traces [4], or We apply piecewise composition to symbolically explore five open-
by limiting the use to small designs [6]. source designs, including SoC and CPU designs, to find assertion
We introduce piecewise composition, a technique that leverages the violations in the design. Using 84 assertions from the literature, we
structure of hardware designs to transfer the problem to the domain find that on average, piecewise composition reduces runtime by 97%
of satisfiability modulo theories (SMT) solving so that the number compared to conventional symbolic execution techniques without loss
of paths to symbolically explore grows exponentially with only the of efficacy.
number of branch points in any one always block, and linear in the This paper presents the following contributions: (1) Introduction
number of always blocks in the design. In this way we reap the and definition of piecewise composition, a technique that leverages
benefits of recent advances in SMT, while maintaining the usability the modular nature of hardware designs to counter the path explosion
of having individual path information at the register-transfer level problem in symbolic execution. (2) Design and implementation of a
(RTL). symbolic execution engine for Verilog RTL using piecewise composi-
tion. (3) Evaluation of piecewise composition and our implementation SMT solvers are crucial to symbolic execution both in checking
on five open-source SoC and CPU designs. feasibility of paths as the engine progresses, and in generating
assignments to symbolic variables for a found path, e.g., to produce
II. BACKGROUND
a test-case. Some of the most widely used solvers are STP [11] (used
We provide a review of the general techniques of symbolic in KLEE [13]) and Z3 [12] (used in Mayhem [14] and angr [15]).
execution and SMT solving, and describe key aspects of the Verilog
hardware description language. C. Verilog
Verilog is a hardware description language and is the industry
A. Symbolic Execution
standard for developing real-world computer systems. A basic unit of
In symbolic execution [9], concrete literals are replaced with design in Verilog is a module. Modules often contain other modules,
symbolic values: input values are made symbolic and a symbolic making the design hierarchical. A module combines multiple sub-
execution engine “executes” the design, keeping track of the current modules by making the output signals of one module connect to
execution state at each line of code. The execution state has two main the input signals of a second module, with connection wires and
components: the symbolic store σ and the path condition π: registers in between. Verilog has several constructs that allow data to
1) σ , the symbolic store maintains mappings between program flow differently than a sequential-only software program would. For
variables and symbolic expressions. example, multiple modules composed in parallel are truly executing
2) π, the path condition is a boolean formula over symbolic in parallel. Within a module, an always block is used to define a
expressions describing the conditions satisfied by branches set of events that only happen under certain conditions. For instance,
taken along the current path. The path condition is always assignment statements that are only to be executed at a clock’s rising
initialized to True. or falling edge. The statements within a sequential always block
As the symbolic execution engine executes each line of code, are all executed in parallel, and two different always blocks operate
using symbols in place of literal values wherever they appear, the in parallel.
engine updates the symbolic state. When a branching statement with
III. P IECEWISE C OMPOSITION
condition b is reached, the path condition π is checked. If π → b,
the then branch is taken. If π → ¬b, the else branch, if present, Symbolic execution has two main costs: the time required to
is taken. If neither implication holds, then both branches must be simulate execution and update the symbolic store for each line of
explored in turn, forking the current path into two separate paths to code, and the time required to determine whether both branches are
explore. To explore the path following the then branch, the path feasible at each branch point in a path. Both parts are costly, but
condition π is updated: π := π ∧ b. To explore the path following the recent advances in SMT solving have cut the cost to deciding branch
else branch, the path condition is updated: π := π ∧ ¬b. At each feasibility, whereas the costs to symbolically execute an instruction
branch point, the number of paths to explore doubles. This is the path have remained relatively stable.
explosion problem, and typically heuristics or merging strategies [5], In conventional symbolic execution, each line of code is potentially
[10], [37] are used to guide the exploration to maximize coverage or visited multiple times, once for each path explored. Our approach
depth. is to aggressively decompose the design into independent blocks,
The complete exploration of a hardware design corresponds to a symbolically explore each block once, then use an SMT solver to
single clock cycle of execution. Hardware executes continuously and compose path conditions and symbolic stores from each block. This
latent security vulnerabilities may only become clear many clock strategy is made possible by the inherent modular nature of hardware
cycles after the initial state. This requires symbolically executing designs, and lets us leverage the relative speed of modern SMT
the design for multiple clock cycles, adding to the path explosion solvers compared to the cost of symbolically executing lines of code.
problem. The result is a complete (logical) tree of paths through the design.
Importantly, our symbolic execution engine operates directly over While the number of paths in the full tree is exponential in the number
the Verilog RTL without translating to C or compiling down to of branches in the design, the symbolic execution engine explores a
the netlist. This allows for greater human-readability of any found number of paths exponential in only the number of branches in any
assertion violations – the path taken and the constraints over inputs single independent block and polynomial in the number of blocks.
will be directly traceable through the RTL design. Our symbolic
A. Motivating Example
execution engine is cycle accurate. We assume no combinational
latches, no asynchronous resets, and always blocks are conditioned Figure 1 shows a code snippet with two always blocks and branch
on input clocks. These assumptions are in keeping with prior work points at lines 2 and 9. The corresponding control flow graph with
in this area [5]. an arbitrary ordering of the always blocks is given in Figure 2a,
and the tree of paths through the design is given in Figure 2b. With
B. Constraint Solving conventional symbolic execution, each of the four root-to-leaf paths
Boolean satisfiability (SAT) solvers are decision procedures that in Figure 2b is symbolically executed. This is the strategy taken by
take in a propositional formula and determine if there is an as- current approaches (e.g., [3], [39]) that translate a design into a C++
signment of boolean values that will make the formula evaluate to representation and then use the KLEE symbolic execution engine.
true. SAT modulo theories (SMT) solvers generalize SAT solving But, the two subtrees rooted at a node labeled 9 represent repeated
by supporting theories that are more expressive and can capture work. For each subtree, the symbolic execution engine was exploring
more functionality than simple atomic propositions. Examples in- the paths through the same block of code: the second if-else block
clude supporting array operations and supporting linear arithmetic. starting at line 9.
SMT solvers have become quite powerful and are able to handle The branching condition and assignments in lines 2–5 are inde-
expressions with hundreds of variables. However, SMT solving is pendent of the branching condition and assignments in lines 9–12.
still an NP-complete problem. Regardless of which path is taken at the first branch (line 2), the
composition results in the two trees produced in Figure 2c. The
1 always @ ( posedge clk ) begin symbolic execution engine now explores the second if-else block
2 i f ( g0 ) only once. Continuing with our example, piecewise composition will
3 x <= inpA ; / / inpA i s an i n p u t s i g n a l separately explore the two always blocks, producing the following
4 else four path fragments (labeled a through d) with associated path
5 x <= 0 ; conditions and (partial) symbolic stores:
6 end
7 h2, 3ia : σa = {x := α}, πa = γ0
8 always @ ( posedge clk ) begin h2, 5ib : σb = {x := 0}, πb = ¬γ0
9 i f ( g1 )
10 y <= inpB ; / / inpB i s an i n p u t s i g n a l h9, 10ic : σc = {y := β }, πc = γ1
11 else h9, 12id : σd = {y := 0}, πd = ¬γ1
12 y <= 0 ;
13 end
To find full paths through the design and to successfully find
assertion violations, all realizable combinations of path fragments are
Fig. 1: Verilog code example with two branches. composed with the help of an SMT solver. For example, to realize
path h2, 5, 9, 10i, the symbolic execution engine queries the SMT
solver to find whether the two path fragments, h2, 5i and h9, 10i can
be joined: isSAT(x = 0 ∧ ¬γ0 ∧ y = β ∧ γ1 ). In this simple example,
all four combinations of path fragments are possible, but in general
that will not always be the case. If, for example, the gating signals
g0 and g1 both took their values from the same input, and were
therefore constrained to always have the same value, the SMT solver
would find that paths h2, 5, 9, 10i and h2, 3, 9, 12i were unrealizable.
An important result of piecewise composition is that it is sound:
all composed paths that are realizable correlate to replayable paths
through the full design. The satisfying solutions returned by the SMT
solver can be provided as inputs to the design in the starting reset
(a) Control flow graph (b) Full tree of paths
state and will result in execution (either in simulation or running on
an FPGA) following the corresponding path.
Configuration LoC branch points paths RTL to a software model to improve scalability in bug finding via
explored explored completed fuzzing [33].
Baseline 90706 7380 1459 Information Flow Tracking In a hardware context, information
Piecewise Composition 6783 323 1459 flow refers to the transfer of information between different signals.
Information flow tracking is a verification method that studies how
TABLE II: Full Exploration of MC68HC11 SPI Design information flows through a hardware design to make statements
about security [34] [35]. A hardware design can be instrumented
with tracking logic to capture timing [36] or data flow information.
over bounded-length trajectories using a modified form of 3-valued This method can provide strong security guarantees, for example,
symbolic simulation [28]. SymbiYosys [8] is a model checking engine demonstrating leakage of secret key data to undesired output signals.
built on top of Yosys that serves as an extension of Yosys [29] existing
property checking framework. VIII. C ONCLUSION
Symbolic and Concolic Execution We have presented piecewise composition, a technique for coun-
The use of symbolic execution and related techniques for RTL de- tering the path explosion problem in symbolic execution. We im-
signs is gaining traction. RTLConTest [4] and the work by Witharana plemented a symbolic execution engine using the technique and
et al. [30] are examples of concolic testing engines developed for the evaluated the engine on five open-source designs. The engine reduces
security verification of hardware designs. Coppelia [3] is a hardware- redundant work by 98%–99% compared to conventional symbolic
oriented backward symbolic execution engine built on top of KLEE execution, improves overall performance and successfully finds as-
for RTL designs translated to C++. EISec [39] also uses KLEE, but sertion violations.
for netlists translated to C++. All demonstrate the power of symbolic
execution as applied to hardware designs, but all still struggle with IX. ACKNOWLEDGMENTS
the path explosion problem. Many of the techniques developed in This material is based upon work supported by the National
those papers can be combined with piecewise composition. Science Foundation under Grant No. CNS-1816637, and by a Meta
Fuzzing Fuzzing has also been shown to be a useful technique for Security Research Award.
finding security vulnerabilities in SoCs and CPU designs. RFUZZ
R EFERENCES
is a coverage-directed fuzz tester for circuits that presents a hard-
ware specific coverage metric called mux control coverage [31]. [1] Claudionor Nunes Coelho and Harry D. Foster. Assertion-Based Verifi-
DifuzzRTL is an RTL fuzzing tool used to find unknown security cation, pages 167–204. Springer US, Boston, MA, 2004.
[2] Lixiang Shen, Dejun Mu, Guo Cao, Maoyuan Qin, Jeremy Blackstone,
bugs that measures coverage based on control registers rather than and Ryan Kastner. Symbolic execution based test-patterns generation
multiplexors’ control signals to improve efficiency and scalability algorithm for hardware trojan detection. Comput. Secur., 78:267–280,
[32]. A recently developed Hardware Fuzzing Pipeline translates the 2018.
Design Baseline Piecewise Redund COI Overall
runtime runtime % dec runtime % dec runtime % dec % dec
(sec) (sec) (sec) (sec)
OR1200 timeout (1800) 52.47 97.08% 37.56 21.31% 25.22 12.56% 98.60%
Hack@DAC timeout (1800) 174.24 90.32% 121.94 28.34% 81.83 16.62% 95.45%
MC68HC11 962 17.53 98.18% 14.30 19.93% 0.07 99.19% 99.99%
openMSP430 timeout (1800) 37.65 97.91% 23.14 38.55% 0.73 96.83% 99.96%
CrypTech TRNG timeout (1800) 14.92 99.17% 12.08 19.15% 0.09 99.19% 99.99%
Design # Bugs # Bugs Avg Max [19] Ghada Dessouky, David Gens, Patrick Haney, Garrett Persyn, Arun
Found Time Clock Kanuparthi, Hareesh Khattri, Jason M. Fung, Ahmad-Reza Sadeghi, and
(sec) Cycles Jeyavijayan Rajendran. HardFails: Insights into Software-Exploitable
Taken hardware bugs. In USENIX ’19, pages 213–230, Santa Clara, CA, August
2019. USENIX Association.
Hack@DAC 31 24 122 4 [20] Matthew Hicks, Cynthia Sturton, Samuel T. King, and Jonathan M.
OR1200 31 27 34 5 Smith. Specs: A lightweight runtime mechanism for protecting software
from security-critical processor bugs. In ASPLOS, ASPLOS ’15, page
TABLE IV: Known Bugs 517–529, New York, NY, USA, 2015. ACM.
[21] Rui Zhang, Natalie Stanley, Christopher Griggs, Andrew Chi, and
Cynthia Sturton. Identifying security critical properties for the dynamic
verification of a processor. In ASPLOS, ASPLOS ’17, page 541–554,
[3] Rui Zhang, Calvin Deutschbein, Peng Huang, and Cynthia Sturton. New York, NY, USA, 2017. ACM.
End-to-end automated exploit generation for validating the security of [22] M. Bilzor, T. Huffmire, C. Irvine, and T. Levin. Security checkers:
processor designs. In Proceedings of the International Symposium on Detecting processor malicious inclusions at runtime. In HOST, 2011.
Microarchitecture (MICRO). IEEE/ACM, 2018. [23] Rui Zhang and Cynthia Sturton. Transys: Leveraging common security
[4] Xingyu Meng, Shamik Kundu, Arun K. Kanuparthi, and Kanad Basu. properties across hardware designs. In Proceedings of the Symposium
RTL-ConTest: Concolic testing on RTL for detecting security vulner- on Security and Privacy (S&P). IEEE, 2020.
abilities. IEEE Transactions on Computer-Aided Design of Integrated [24] Edmund M. Clarke. Model checking. In S. Ramesh and G. Sivakumar,
Circuits and Systems, 41(3):466–477, 2022. editors, Foundations of Software Technology and Theoretical Computer
[5] Anish Athalye, M. Frans Kaashoek, and Nickolai Zeldovich. Verifying Science, pages 54–56, Berlin, Heidelberg, 1997. Springer Berlin Heidel-
hardware security modules with Information-Preserving refinement. In berg.
OSDI. USENIX Association, 2022. [25] Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu.
[6] R. Mukherjee, D. Kroening, and T. Melham. Hardware verification using Bounded model checking using satisfiability solving. 19(1):7–34, 2001.
software analyzers. In 2015 IEEE Computer Society Annual Symposium [26] Randal E. Bryant, Derek L. Beatty, and Carl-Johan H. Seger. Formal
on VLSI, pages 7–12, 2015. hardware verification by symbolic ternary trajectory evaluation. In
[7] Voss ii. https://github.com/TeamVoss/VossII. Accessed: 2022-11-21. ACM/IEEE DAC, 1991.
[8] Symbiyosys. https://github.com/YosysHQ/sby. Accessed: 2022-11-21. [27] Supratik Chakraborty, Zurab Khasidashvili, Carl-Johan H. Seger, Rajku-
[9] James C. King. Symbolic execution and program testing. Commun. mar Gajavelly, Tanmay Haldankar, Dinesh Chhatani, and Rakesh Mistry.
ACM, 19(7):385â C“394, July 1976. Symbolic trajectory evaluation for word-level verification: Theory and
[10] S. Krishnamoorthy, M. S. Hsiao, and L. Lingappan. Tackling the implementation. Form. Methods Syst. Des., 50(2–3):317–352, June 2017.
path explosion problem in symbolic execution-driven test generation for [28] Koen Claessen and Jan-Willem Roorda. An introduction to symbolic
programs. In 2010 19th IEEE Asian Test Symposium, pages 59–64, 2010. trajectory evaluation. In Marco Bernardo and Alessandro Cimatti,
[11] Blayne Mayfield and Timothy Baird. STP. In Proceedings ACM, editors, Formal Methods for Hardware Verification, pages 56–77, Berlin,
SIGSMALL ’90, page 98–105, New York, NY, USA, 1990. Association Heidelberg, 2006. Springer Berlin Heidelberg.
for Computing Machinery. [29] Yosys. https://github.com/YosysHQ/Yosys. Accessed: 2022-11-21.
[12] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In [30] Hasini Witharana, Yangdi Lyu, and Prabhat Mishra. Directed test
International conference on Tools and Algorithms for the Construction generation for activation of security assertions in RTL models. ACM
and Analysis of Systems, pages 337–340. Springer, 2008. Trans. Des. Autom. Electron. Syst., 26(4), Jan 2021.
[13] Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. Klee: unassisted [31] Kevin Laeufer, Jack Koenig, Donggyu Kim, Jonathan Bachrach, and
and automatic generation of high-coverage tests for complex systems Koushik Sen. Rfuzz: Coverage-directed fuzz testing of rtl on fpgas. In
programs. In OSDI, volume 8, pages 209–224, 2008. ICAAD, pages 1–8, 2018.
[14] Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David [32] Jaewon Hur, Suhwan Song, Dongup Kwon, Eunjin Baek, Jangwoo Kim,
Brumley. Unleashing mayhem on binary code. In Proceedings of and Byoungyoung Lee. Difuzzrtl: Differential fuzz testing to find CPU
the 2012 IEEE Symposium on Security and Privacy, SP ’12, page bugs. In 42nd IEEE S&P, pages 1286–1303. IEEE, 2021.
380â C“394, USA, 2012. IEEE Computer Society. [33] Timothy Trippel, Kang G. Shin, Alex Chernyakhovsky, Garret Kelly,
[15] Soomin Kim, Markus Faerevaag, Minkyu Jung, SeungIl Jung, DongYeop Dominic Rizzo, and Matthew Hicks. Fuzzing hardware like software.
Oh, JongHyup Lee, and Sang Kil Cha. Testing intermediate represen- In USENIX ’22), pages 3237–3254, Boston, MA, August 2022. USENIX
tations for binary analysis. In Proceedings of the 32nd IEEE/ACM Association.
International Conference on Automated Software Engineering, ASE [34] Armaiti Ardeshiricham, Wei Hu, Joshua Marxen, and Ryan Kastner.
2017, page 353–364. IEEE Press, 2017. Register transfer level information flow tracking for provably secure
[16] Nusrat Farzana, Fahim Rahman, Mark Tehranipoor, and Farimah Farah- hardware design. In DATE, pages 1691–1696, 2017.
mandi. SoC security verification using property checking. In 2019 IEEE [35] Wei Hu, Armaiti Ardeshiricham, Mustafa S Gobulukoglu, Xinmu Wang,
International Test Conference (ITC), pages 1–10, 2019. and Ryan Kastner. Property specific information flow analysis for
[17] Nusrat Farzana, Farimah Farahmandi, and Mark Mohammad Tehra- hardware security verification. In ICCAD, ICCAD ’18, New York, NY,
nipoor. SoC security properties and rules. IACR Cryptol. ePrint Arch., USA, 2018. Association for Computing Machinery.
2021:1014, 2021. [36] Armaiti Ardeshiricham, Wei Hu, and Ryan Kastner. Clepsydra: Mod-
[18] Hack@DAC 2018 SoC. https://github.com/seth-lab-tamu/hackdac- eling timing flows in hardware designs. In ICCAD, pages 147–154,
2018-soc. Accessed: 2022-02-15. 2017.
[37] Drew Davidson, Benjamin Moench, Thomas Ristenpart, and Somesh tion In DAC, 2001. IEEE.
Jha, FIE on Firmware: Finding Vulnerabilities in Embedded Systems
Using Symbolic Execution In USENIX Security, pages 463–478, [39] Farhaan Fowze, Muhtadi Choudhury, Domenic Forte, EISec: Exhaustive
Washington, DC, August 2013. USENIX Association Information Flow Security of Hardware Intellectual Property Utilizing
[38] Alfred Kolbl, James Kukula, Robert Damiano, Symbolic RTL Simula- Symbolic Execution In AsianHOST, 2022. IEEE.