21-Gagandeep Singh-Scalable Automated Reasoning For Programs and Deep Learning
21-Gagandeep Singh-Scalable Automated Reasoning For Programs and Deep Learning
Doctoral Thesis
Author(s):
Singh, Gagandeep
Publication date:
2020
Permanent link:
https://doi.org/10.3929/ethz-b-000445921
Rights / license:
In Copyright - Non-Commercial Use Permitted
Funding acknowledgement:
163117 - Making Program Analysis Fast (SNF)
This page was generated automatically upon download from the ETH Zurich Research Collection.
For more information, please consult the Terms of use.
diss. eth no. 27096
S C A L A B L E A U T O M AT E D R E A S O N I N G F O R P R O G R A M S A N D
DEEP LEARNING
by
gagandeep singh
2020
ABSTRACT
iii
a few seconds. Both the systems were developed from scratch, and are currently
state-of-the-art for their respective domains, producing results not possible with
other competing systems.
iv
Z U S A M M E N FA S S U N G
v
Polyeder-, Oktagon- und Zonendomäne, basierend auf unserer Theorie der Online-
Zerlegung, die eine schnelle und präzise Analyse großer Linux-Gerätetreiber mit
> 500 Variablen in wenigen Sekunden ermöglicht. ERAN enthält unsere benutzer-
definierte Abstraktions- und konvexe Relaxationsmethodik, und eine Kombination
von Relaxationen mit Solvern, um eine schnelle und präzise Analyse großer neuro-
naler Netzwerke mit Zehntausenden von Neuronen innerhalb weniger Sekunden
zu ermöglichen. Beide Systeme wurden von Grund auf neu entwickelt, setzen den
derzeitigen Stand der Technik für ihren jeweiligen Bereich und erzielen Ergebnisse,
die ausserhalb der Reichweite konkurrierender Systeme liegen.
vi
P U B L I C AT I O N S
The following publications were part of my Ph.D. research and contain results
that are supplemental to this work or build upon the results of this thesis:
vii
• Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, Mar-
tin Vechev.
Certifying Geometric Robustness of Neural Networks.
Neural Information Processing Systems (NeurIPS), 2019. [15]
The following publications were part of my Ph.D. research and are available on
Arxiv:
viii
ACKNOWLEDGEMENTS
I would like to use this page to thank all those people that directly and indirectly
supported me throughout my doctoral studies.
My gratitude goes first of all to my advisors Prof. Markus Püschel and Prof.
Martin Vechev. Your valuable advice has shaped my perspective on research and
life. I would also like to express my gratitude to the reviewers: Prof. Patrick Cousot
and Prof. Clark Barrett, for providing constructive feedback on the thesis that I have
incorporated in the final version. I am thankful to the ETH faculty that helped me
at various points of my doctoral studies: Prof. Peter Müller, Prof. Zhendong Su,
Prof. Ghaffari Mohsen, and Prof. Srdjan Capkun.
I would like to acknowledge the co-authors of papers published during my Ph.D.
I really enjoyed working with all of you: Maximilian Baader, Mislav Balunovic,
Pavol Bielik, Jiayu Chen, Andrei Dan, Raphaël Dang Nhu, Dimitar I. Dimitrov,
Rupanshu Ganvir, Timon Gehr, Jingxuan He, Matthew Mirman, Christoph Müller,
and Wonryong Ryou.
I would also like to thank many past and present colleagues in the software
group at ETH Zurich, in particular Victoria Caparrós Cabezas, Makarchuk Gleb,
Georg Ofenbeck, Joao Rivera, Bastian Seifert, Francois Serre, Tyler Smith, Daniele
Spampinato, Alen Stojanov, Chris Wendler, Eliza Wszola, Luca Della Toffola, Afra
Amini, Benjamin Bichsel, Rudiger Birkner, Dana Drachsler Cohen, Dimitar K. Dim-
itrov, Marc Fischer, Inna Grijnevitch, Viktor Ivanov, Pesho Ivanov, Jonathan Mauer,
Sasa Misailovic, Rumen Paletov, Momchil Peychev, Veselin Raychev, Anian Ruoss,
Samuel Steffen, Petar Tsankov, Vytautas Astrauskas, Lucas Brutschy, Alexandra
Bugariu, Fábio Pakk Selmi Dei, Jérôme Dohrau, Marco Eilers, Uri Juhasz, Gau-
rav Parthasarathy, Federico Poli, Alexander Summers, Caterina Urban, Arshavir
Ter Gabrielyan, and Manuel Rigger for many insightful discussions about research
and beyond.
I would also like to thank the administrative staff at ETH that helped me navigate
through many of the Swiss rules related to immigration and beyond: Fiorella Meyer,
Mirella Rutz, Sandra Schneider, and Marlies Weissert.
I am grateful to all my friends and flatmates in Zurich. Without you, my time
in Zurich would have been quite boring. Special thanks to Kushagra Alankar and
Jagannath Biswakarma, we started our journey in Zurich together in the same
building. It has been great knowing you all these years and I will cherish our great
memories of cooking, traveling, and of course, watching cricket together.
Finally, I would like to thank my parents and sister, without whom none of this
would have been possible.
ix
CONTENTS
abstract iii
acknowledgments ix
1 introduction 1
1.1 Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Fast and Precise Numerical Program Analysis . . . . . . . . . . . . . 7
1.3 Fast and Precise Neural Network Certification . . . . . . . . . . . . . 11
xi
xii Contents
and in X. The first figure shows these values for all joins
whereas the second figure shows it for one of the expensive
regions of the analysis. . . . . . . . . . . . . . . . . . . . . . . 53
Figure 4.1 Policies for balancing precision and speed in static analysis. 89
Figure 4.2 Reinforcement learning for static analysis. . . . . . . . . . . . 90
xv
xvi List of Figures
L I S T O F TA B L E S
Table 1.1 Polyhedra analysis of Linux device drivers with ELINA vs.
PPL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table 2.1 Asymptotic complexity of Polyhedra operators with differ-
ent representations. . . . . . . . . . . . . . . . . . . . . . . . . 28
Table 2.2 Asymptotic time complexity of Polyhedra operators with de-
composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 2.3 Speedup of Polyhedra domain analysis for ELINA over
NewPolka and PPL. . . . . . . . . . . . . . . . . . . . . . . . 51
Table 2.4 Partition statistics for Polyhedra analysis with ELINA. . . . 52
Table 3.1 Instantiation of constraints expressible in various numerical
domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 3.2 Speedup for the Polyhedra analysis with our decomposition
vs. PPL and ELINA. . . . . . . . . . . . . . . . . . . . . . . . 80
Table 3.3 Partition statistics for the Polyhedra domain analysis. . . . 81
Table 3.4 Asymptotic time complexity of the Octagon transformers. . 82
xvii
xviii List of Tables
Table 3.5 Speedup for the Octagon domain analysis with our decom-
position over the non-decomposed and the decomposed ver-
sions of ELINA. . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 3.6 Partition statistics for the Octagon domain analysis. . . . . . 83
Table 3.7 Speedup for the Zone domain analysis with our decomposi-
tion over the non-decomposed implementation. . . . . . . . 84
Table 3.8 Partition statistics for the Zone domain analysis. . . . . . . . 84
Table 4.1 Mapping of RL concepts to Static analysis concepts. . . . . . 94
Table 4.2 Features for describing RL state s (m ∈ {1, 2}, 0 6 j 6 8, 0 6
h 6 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table 4.3 Instantiation of Q-learning to Polyhedra domain analysis. . 103
Table 4.4 Timings (seconds) and precision of approximations (%) w.r.t.
ELINA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Table 5.1 Neural network architectures used in our experiments. . . . 139
Table 5.2 Certified robustness by DeepZ and DeepPoly on the large
convolutional networks trained with DiffAI. . . . . . . . . . . 143
Table 6.1 Volume of the output bounding box from kPoly on the
MNIST FFNNMed network. . . . . . . . . . . . . . . . . . . . . . 151
Table 6.2 Neural network architectures and parameters used in our
experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Table 6.3 Number of certified adversarial regions and runtime of
kPoly vs. DeepPoly and RefineZono. . . . . . . . . . . . . . . 166
1
INTRODUCTION
1
2 introduction
2. Fast and precise neural network certification: We designed new automated meth-
ods for certifying large real-world neural networks. Our methods enable state-
of-the-art certification beyond the reach of other existing methods.
number of variables occurring in a polyhedron during the analysis. In the table, the
entry TO means that the analysis timed out after 4 hours while entry MO represents
memory consumption > 16 GB. The last column of Table 1.1 shows the speedup
of ELINA over PPL. We provide a lower bound on the speedup in the case of a
timeout with PPL. The speedup when the Polyhedra analysis with PPL runs out
of memory is ∞ as it can never finish on the given machine. It can be seen that
ELINA is significantly more memory and time-efficient than PPL. It finishes the
analysis of 12 out of 13 benchmarks in a few seconds and never consumes more
than 1 GB of memory. In contrast, PPL either times out or runs out of memory
on 8 benchmarks. Currently, ELINA is used in several research projects in both
academia and industry. Its Github repository at the time of this writing has 77
stars and 31 forks.
Table 1.1: Polyhedra analysis of Linux device drivers with ELINA vs. PPL.
Benchmark n PPL ELINA Speedup
time(s) memory(GB) time(s) memory(GB)
firewire_firedtv 159 331 0.9 0.2 0.2 1527
net_fddi_skfp 589 6142 7.2 4.4 0.3 1386
mtd_ubi 528 MO MO 1.9 0.3 ∞
usb_core_main0 365 4003 1.4 29 0.7 136
tty_synclinkmp 332 MO MO 2.5 0.1 ∞
scsi_advansys 282 TO TO 3.4 0.2 >4183
staging_vt6656 675 TO TO 0.5 0.1 >28800
net_ppp 218 10530 0.1 891 0.1 11.8
p10_l00 303 121 0.9 5.4 0.2 22.4
p16_l40 874 MO MO 2.9 0.4 ∞
p12_l57 921 MO MO 6.5 0.3 ∞
p13_l53 1631 MO MO 25 0.9 ∞
p19_l59 1272 MO MO 12 0.6 ∞
All Instances
All Instances
100
100
Time (sec)
Time (sec)
NNV 10
10 Venus NNV
PeregriNN MIPVerify
MIPVerify Oval
1 VeriNet 1 VeriNet
nnenum nnenum
ERAN ERAN
(a) (b)
All Instances
All Instances
1000
100
Time (sec)
Time (sec)
100
10
10
1 1 NNV
ERAN Verinet
Oval ERAN
(c) (d)
[148] and currently achieve state-of-the-art accuracy and provable robustness for
their respective datasets. The number of problem instances in this category was
337; the time limit was set to 5 minutes per instance. The largest network in this
category had ≈ 50K neurons. Out of the 6, ERAN (in purple) certified the highest
number of instances certifying 266/337 instances.
Fig. 1.2 (c) compares the results on 3 CIFAR10 ReLU-based convolutional net-
works trained by [135]. The largest network had > 6K neurons. The specifications
for these networks are harder; therefore, the time limit was increased to 1 hour per
instance. Only 2 certifiers competed in this category with ERAN (in red) certifying
286/300 instances within the time limit which is 9 more than the competition.
Fig. 1.2 (d) compares the number of instances certified on the MNIST Sigmoid
and Tanh based fully-connected networks. The number of instances in this category
was 128; the time limit was set to 15 minutes per instance. The largest network had
3 layers and 350 neurons. Only 3 competing certifiers supported networks with
Sigmoid and Tanh activations. It can be seen that ERAN (in sky blue) certified the
highest number of problem instances.
On the remaining category, ERAN verified the second-highest number of bench-
marks. We note that ERAN is the only verifier that competed in all the categories
demonstrating its ability to handle diverse networks and specifications in a precise
and scalable manner. Based on its demonstrated performance and flexibility, ERAN
is already widely used in several research projects in both academia and industry.
Its repository on Github at the time of this writing has 145 stars and 49 forks.
Example 1.1.1. Consider a concrete set C containing only the integers that are both
positive and even. An abstraction of C in the Sign domain will represent it finitely
using its sign, i.e., +. A transformation on C which multiplies a negative integer
with C can be overapproximated by an abstract transformer that operates on +
and returns − approximating the concrete output containing even and negative
integers.
In this thesis, we focus on numerical abstract interpretation, i.e., both f(φ) and
its approximations g(φ) in Fig. 1.1 computed via an abstract domain are numerical
functions. There is a tradeoff between the expressivity of a numerical domain and
its cost. The Polyhedra domain [57] is the most expressive linear relational domain
as it can capture all linear constraints between variables, i.e., constraints of the form
P
i ai · xi 6 b where ai , b ∈ Z and xi is a variable. In an ideal setting, one would
simply use the Polyhedra domain for numerical analysis. However, it has a worst-
case exponential cost in time and space. Thus, an analyzer using Polyhedra can
easily fail to analyze large programs by running out of memory or by timing out.
Because of this, the Polyhedra domain has been often thought to be impractical
[51, 122, 176]. On the other hand, the Interval domain [54] has a linear cost but is
very imprecise as it captures only constraints of the form li 6 xi 6 ui where li , ui ∈
Z, thus ignoring relationships between variables. To balance the expressivity-cost
tradeoff, researchers have designed domains that limit a domain’s expressivity in
exchange for better asymptotic complexity. Examples include Octagon [142], Zone
[140], Pentagon [134], SubPolyhedra [122] and Gauge [205].
1.2 fast and precise numerical program analysis 7
The goal of numerical program analysis is computing the set f(φ) of numerical
values that the program variables can take during all program executions starting
from an initial program state φ. Here, φ encodes the set of initial variable values.
The problem of exactly computing f(φ) is in general undecidable for programs
due to Rice’s Theorem [168]. Numerical domains allow obtaining a numerical over-
approximation g(φ) of f(φ). The design of these domains for program analysis
remains an art requiring considerable and rare expertise. The various available
domains tend to work well for the particular applications for which they were de-
signed but may not be as effective on others. For example, the Octagon domain
with variable packing is fast and precise for analyzing avionics software [27], but it
is not as effective for windows libraries. Similarly, the Pentagon domain [134] used
for effectively analyzing Windows libraries is imprecise for analyzing avionics soft-
ware. The question then is if there are general methods for improving the speed of
existing domains without sacrificing precision.
The starting point of our work is in identifying the considerable redundancy in
numerical program analysis: unnecessary computations that are performed with-
out affecting the final results. Removing this redundancy reduces the cost of anal-
ysis and improves speed without sacrificing precision. Numerical analysis has two
types of redundancies:
• Single-step redundancy: At each step of the analysis, redundant computa-
tions are performed which do not affect the output of that step.
case, the first two, leaving the third part unchanged. This reduces complexity and
improves performance.
Figure 1.4: Precision and cost of the Zone, Octagon and Polyhedra domain with and with-
out ELINA.
elina library. We have implemented all of the methods in the publicly avail-
able library ELINA available at https://github.com/eth-sri/ELINA. ELINA is the
current state-of-the-art for numerical domains containing complete end-to-end im-
plementations of the popular and expensive Zone, Octagon, and Polyhedra do-
mains. Besides online decomposition, ELINA also contains domain-specific algo-
rithmic improvements and performance optimizations including for cache locality,
vectorization, and more. Fig. 1.4 shows the cost and precision of the Zone, Octagon,
and Polyhedra domain analysis with and without ELINA. Both Zone and Octagon
domains have asymptotic worst-case cubic cost, while that of Polyhedra is expo-
10 introduction
Figure 1.5: Three Polyhedra analysis traces, the left-most and middle trace obtain precise
results (the polyhedron at the bottom), however the analysis cost of the middle
trace is lower. The right-most trace obtains imprecise result.
nential. It can be seen in Fig. 1.4 that ELINA reduces the cost of domain analysis
significantly without affecting precision.
ELINA enables the analysis of large real-world software, for example, Linux
device drivers, containing thousands of lines of code and >500 variables with the
exponentially expensive Polyhedra in few seconds. Before our work, Polyhedra
analysis was considered practically infeasible for such benchmarks as the resulting
analysis did not finish even after several hours. For the Octagon analysis, ELINA
gave typical speedups up to 32x faster than prior state-of-the-art libraries. ELINA
is currently used for numerical analysis in several projects, including the popular
Seahorn verification framework [91].
Online decomposition is effective at removing redundant computations at each
analysis step without losing precision. Next we discuss our approach for removing
redundancies across different analysis steps.
machine learning for program analysis. Our next key observation for
speeding up numerical program analysis is that the results of all abstract trans-
formers applied for obtaining the final result in a sequence need not be precise,
i.e., it is possible to apply imprecise intermediate transformers and still obtain a
precise result. This points to redundancy in the abstract sequences. This redun-
dancy occurs because some of the precise intermediate results are discarded later
in the analysis. For example, an assignment transformer may remove all previous
constraints involving the assigned variable.
1.3 fast and precise neural network certification 11
Example 1.2.2. Fig. 1.5 shows three Polyhedra analysis traces for an overview of
our approach. The nodes in the traces are the Polyhedra, and the edges represent
Polyhedra transformers. The green and orange nodes respectively denote precise
and imprecise Polyhedra. Similarly, the green and orange edges respectively denote
precise but expensive and imprecise but fast transformers. In the left-most trace,
the precise transformer is applied at each step to obtain a precise final result. In
the middle trace, an approximate transformer is applied at the first node, however,
this does not affect the final result computed faster. The choice of the node for the
approximate transformer is crucial, as the final result in the right-most trace after
applying the approximate transformer at the second node is imprecise.
Our key idea for removing redundancy across sequences is learning policies via
reinforcement learning for selectively losing precision at different analysis steps
such that the performance improves while the precision loss is as little as pos-
sible. We take this approach as in practice, hand-crafted or fixed policies often
yield suboptimal results as the resulting analysis is either too slow or too impre-
cise. This is because policies maximizing precision and minimizing cost need to
make adaptive decisions based on high-dimensional abstract elements computed
dynamically during the analysis. Further, the sequence of transformers is usually
quite long in practice. Using our approach, we showed for the first time that re-
inforcement learning can benefit static analysis in [192]. We created approximate
transformers for the Polyhedra domain that enforce different degrees of finer parti-
tions by explicitly removing constraints yielding several approximate transformers
with different speeds and precision. Reinforcement learning then obtains a pol-
icy that selects among different transformers based on the abstract elements. Our
overall approach is presented in detail in Chapter 4.
This approach is helpful when the analysis is inherently non-decomposable or
the partitions with online decomposition become too coarse causing slowdowns.
Our results show that analysis performance improves significantly with up to 550x
speedup without significant precision loss enabling precise Polyhedra analysis of
large programs not possible otherwise. In our follow-up work [96] (not covered in
this thesis), we improve upon this approach by leveraging structured prediction
and also show that our concept of using machine learning for speeding up static
analysis is more general by applying it also to speed up the Octagon domain by
up to 28x over ELINA without losing significant precision.
Neural networks are increasingly deployed for decision making in many real-world
applications such as self-driving cars [30], medical diagnosis [6], and finance [74].
However, recent research [86] has shown that neural networks are susceptible to
undesired behavior in many real-world scenarios posing a threat to their reliability.
Thus there is a growing interest in ensuring they behave in a provably reliable man-
12 introduction
(a) Verified
ner. To address this challenge, we designed scalable and precise methods based on
numerical abstract interpretation for certifying the safety of deep neural networks.
problem statement. Fig. 1.6 shows the problem setting for neural network
certification. We are given a set φ of inputs to an already trained neural network f
and a property ψ over the network outputs. Our goal is to prove whether f(φ) ⊆
ψ holds, as in Fig. 1.6 (a) or produce an input i ∈ φ for which the property is
violated, as in Fig. 1.6 (b). φ and ψ are usually chosen by a domain expert in such
a way that a counter-example represents an undesired behavior. An example of
φ is the popular L∞ -norm based region [40] for images. φ here is constructed by
considering all images that can be obtained by perturbing the intensity of each
pixel in a correctly classified image by an amount of ∈ R independently. ψ, in
this case, is usually classification robustness: all images in φ should be classified
correctly.
For each dimension, the text in green represents cases that we handle in this thesis,
those in blue represent cases handled by our work but not covered in this thesis.
We do not handle the remaining cases currently.
The first dimension that we consider is that of the application domain. The com-
puter vision domain is currently the most popular for neural network certification,
but there is also growing interest in certifying models for natural language process-
ing (NLP), speech, and aviation. To the best of our knowledge, there is no existing
work on certifying neural networks for the remaining domains in Fig. 1.7.
The second dimension is of the specification being certified and has two com-
ponents: the set of inputs φ and the property over network outputs ψ. Our work
focusses on φ defined by changes to pixel or sound intensity, geometric transforma-
tions including rotation, translation, scaling on images, and changes to the sensor
values. For a given φ, the property ψ that we consider can be classification ro-
bustness, safety, or stability. We defined robustness above. Safety means that the
network outputs do not satisfy a given error condition while stability signifies that
the outputs are bounded within a given threshold. Both the error condition and
the threshold are provided by a domain expert.
The particular architecture of the considered neural network is the third dimen-
sion. Our work considers fully-connected (FCN), convolutional (CNN), residual
and recurrent neural networks (RNN) architectures. For such architectures, the
non-linear functions in the hidden layers that we handle are ReLU, Sigmoid, Tanh,
and Maxpool.
There are a variety of methods used in the literature for neural network certifi-
cation such as SMT solvers [37, 69, 113, 114], mixed-integer linear programming
[8, 32, 36, 49, 66, 135, 197], Lipschitz optimization [170], duality [67, 212], convex
relaxations [7, 31, 68, 78, 127, 163, 175, 186, 188, 199, 211, 221], and combination
of relaxations with solvers [187, 206]. Our work is based on custom convex relax-
ations for neural networks defined under the framework of abstract interpretation
and the combination of such relaxations with MILP solvers.
The final dimension is of the type of formal guarantees. We provide deterministic
guarantees meaning that we prove whether the property ψ holds for the entire set
φ or not. We note that there are models such as those in probabilistic forecasting
[62] for which the outputs are probability distributions and probabilistic guarantees
are a natural fit for these.
Figure 1.7: Different dimensions of the neural network certification problem. The text in
green, blue, and black respectively represent cases included in this thesis, those
that we consider in our work but not covered in this thesis, those that are not
considered in our work.
key idea: specialized domains for neural networks. The main chal-
lenge in precisely certifying large neural networks is the fast and precise handling
of the non-linearities employed in the networks. The approximations of these non-
linearities in the implementations of the numerical domains commonly used in
program analysis are either too imprecise or too expensive. For example, in the
setting considered in this work, the input to a ReLU is always bounded while the
ReLU approximations employed in program analysis assume unbounded input.
Therefore to enable fast and precise certification of large neural networks, we de-
signed relaxations tailored for exploiting the setting of neural network certification.
custom zonotope relaxations. In our first work in this direction [186], not
included in this thesis, we designed new parallelizable Zonotope [81] relaxations
tailored for handling the commonly used ReLU, Sigmoid, and Tanh non-linearities
1.3 fast and precise neural network certification 15
overall. DeepPoly, DeepZ, and GPUPoly are sound with respect to floating-point
arithmetic which is essential since otherwise the certification results may be wrong
[107]. Both RefineZono and kPoly can be used to refine the results of DeepPoly,
DeepZ, and GPUPoly by incurring an extra cost.
For each certification instance, analysis with ERAN yields one of three outcomes:
(a) proves that the specification holds, (b) returns a counterexample (cex) when it
can determine that a point in the precondition φ violates the postcondition ψ, (c)
otherwise the certification status is unknown which can happen when running
incomplete certification or running complete certification with a time limit.
ERAN can be easily extended for other certification tasks and is currently the
state-of-the-art tool for both complete and incomplete certification of neural net-
works. As a result, ERAN is widely used in many research projects [90, 133, 180].
Beyond certification, our methods have also been used to train state-of-the-art ro-
bust neural networks [148].
PART I
FAST AND PRECISE NUMERICAL
PROGRAM ANALYSIS
19
2
FA S T P O LY H E D R A A N A LY S I S V I A O N L I N E D E C O M P O S I T I O N
We start the thesis with our contributions for fast and precise numerical program
analysis. In this chapter, we formally describe our theoretical framework and new
algorithms for making the exponentially expensive Polyhedra domain practical for
analyzing large real-world programs. We generalize the applicability of our meth-
ods to all subpolyhedra domains in Chapter 3 and provide theoretical guarantees
on the analysis precision. Finally, in Chapter 4, we leverage reinforcement learning
to further improve the performance of numerical program analysis. Overall, our
methods yield many orders of magnitude speedup over prior approaches.
For almost 40 years, program analysis with the Polyhedra domain has been con-
sidered impractical for large real-world programs due to its exponential complex-
ity [51, 122, 176]. In this chapter, we challenge this assumption and present a new
approach that enables the application of Polyhedra for analyzing large, realistic
programs, with speedups ranging between two to five orders of magnitude com-
pared to the state-of-the-art. This allows us to analyze large real-world programs
such as Linux device drivers beyond the reach of existing approaches within a few
seconds. We note that our approach does not lose precision, i.e., it computes the
same invariants as the original analysis.
The work in this chapter was published in [190].
key idea: online decomposition Our key insight is based on the observa-
tion that the set of program variables can be partitioned with respect to the Poly-
hedra generated during the analysis into subsets, called blocks, such that linear
constraints only exist between variables in the same subset [93, 94, 189]. We lever-
age this observation to decompose a large polyhedron into a set of smaller polyhe-
dra, thus reducing the asymptotic complexity of the Polyhedra domain. However,
maintaining decomposition online is challenging because over 40 Polyhedra trans-
formers change the partitions dynamically and in non-trivial ways: blocks in the
partitions can merge, split, grow, or shrink during analysis. Note that an exact par-
tition cannot be computed a priori as it depends on the exact Polyhedra generated
during the analysis. Therefore, static approaches for computing the partition lose
significant precision [27].
To ensure our method does not lose precision, we develop a theoretical frame-
work that asserts how partitions are modified during analysis. We then use our
21
22 fast polyhedra analysis via online decomposition
theory to design new abstract transformers for Polyhedra. Our framework guaran-
tees that the original polyhedron can be recovered exactly from the decomposed
polyhedra at each step of the analysis. Thus our decomposed analysis produces
the same fixpoint and has the same convergence rate as the original analysis. In-
terestingly as we will show in the next chapter, with a non-trivial extension our
framework can be used for decomposing other numerical domains without losing
precision, not only Polyhedra.
notation Lower case letters (a, b, . . .) represent column vectors and integers
(g, k, . . .). Upper case letters A, D represent matrices whereas O, P, Q are polyhe-
dra. Greek (α, β, . . .) and calligraphic letters (P, C, . . .) represent scalars and sets
respectively.
2.1 background on polyhedra analysis 23
Collecting these yields matrices A, D and vectors of rational numbers b, e such that
the polyhedron P can be written as:
P|V|
where λi , µi > 0, and i=1 λi = 1. The above vectors are the generators of P and
are collected in the set G = GP = {V, R, Z}.
x2= 4
x2 (1,4) (4,4) x2 x2 = 2.x1
x1 = 1 x1 = 4
x1 x1
(a) (b)
Figure 2.1: Two representations of polyhedron defined over variables x1 and x2 . (a)
Bounded polyhedron; (b) unbounded polyhedron.
Example 2.1.1. Fig. 2.1 shows two examples of both representations for polyhedra.
In Fig. 2.1(a) the polyhedron P is bounded and can be represented as either the
intersection of four closed half spaces or as the convex hull of four vertices:
Note that the sets of rays R and lines Z are empty in this case.
In Fig. 2.1(b), the polyhedron P is unbounded and can be represented either as
the intersection of two closed half planes or as the convex hull of two rays starting
at vertex (1, 2):
C = {−x2 6 −2, x2 6 2 · x1 }, or
G = {V = {(1, 2)}, R = {(1, 2), (1, 0)}, Z = ∅}.
To reduce clutter, we abuse notation and often write P = (C, G) since our algo-
rithms, introduced later, maintain both representations. Both C and G represent
minimal sets, i.e., they do not contain redundancy.
the set of linear inequalities. The restrictions limit the set of assertions that can be
proved using these domains. For example, the assertion in the code in Fig. 2.2 can-
not be expressed using weakly relational domains whereas Polyhedra can express
and prove the property. The expressivity of the Polyhedra domain comes at higher
cost: it has asymptotic worst-case exponential complexity in both time and space.
The Polyhedra abstract domain consists of the polyhedra lattice (P, v, t, u, ⊥, >)
and a set of transformers. P is the set of convex closed polyhedra ordered by stan-
dard set inclusion: v = ⊆. The least upper bound (t) of two polyhedra P and Q
is the convex hull of P and Q, which, in general, is larger than the union P ∪ Q.
The greatest lower bound (u) of P and Q is simply the intersection P ∩ Q. The top
element > = Qn in the lattice is encoded by C = ∅ or generated by n linearly in-
dependent lines. The bottom element (⊥) is represented by any unsatisfiable set of
constraints in C or with G = ∅.
where:
CP0 = {c ∈ CP | CQ |= c},
0
CQ = {c ∈ CQ | ∃c 0 ∈ CP , CP |= c and ((CP \ c 0 ) ∪ {c}) |= c 0 }.
26 fast polyhedra analysis via online decomposition
P1=⊤
x:=1
P2={x=1}
P1
x:=1; y:=2.x
P2
P3
y:=2.x;
P3={x=1,y=2.x} ⊔ P3’={-x≤-1,x≤2,y=4.x-2}
while(x≤n){ while(x≤n)
P4
x:=x+1;
P5 P4={x=1,y=2.x,x≤n}
y:=y+2.x;
P6 x:=x+1
}
P7 P5={x=2,y=2.x-2,x≤n+1}
y:=y+2.x
P6={x=2,y=4.x-2,x≤n+1}
Figure 2.3: Polyhedra domain analysis (first iteration) on the example program on the left.
The polyhedra are shown in constraint representation.
variable xi is then projected out [102] from the constraint set C ∪ {xi0 − δ = 0}. Finally,
the variable xi0 is renamed back to xi .
Fig. 2.3 shows a simple program that computes the sum of the first n even num-
bers where a polyhedron P` is associated with each line ` in the program. At the
fixpoint, the polyhedron P` represents invariants that hold for all executions of the
program before executing the statement at line `. Here, we work only with the
constraint representation of polyhedra. The analysis proceeds iteratively by select-
ing the polyhedron at a given line, say P1 , then applying the transformer for the
statement at that program point (x:=1 in this case) on that polyhedron, and produc-
ing a new polyhedron, in this case P2 . The analysis terminates when a fixpoint is
reached, i.e., when further iterations do not add extra points to any polyhedra.
2.1 background on polyhedra analysis 27
first iteration The initial program state does not restrict possible values of
the program variables x, y, n. Thus initially, polyhedra P1 is set to top (>). Next,
the analysis applies the transformer for the assignment x:=1 to P1 , producing P2 .
The set C1 is empty and the transformer adds constraint x = 1 to obtain P2 . The
next statement assigns to y. Since C2 does not contain any constraint involving y,
the transformer for the assignment y:=2·x adds y = 2 · x to obtain P3 . Next, the con-
ditional statement for the loop is processed: that transformer adds the constraint
x 6 n to obtain polyhedron P4 . The assignment statement x:=x+1 inside the loop
assigns to x which is already present in the set C4 . Thus, a new variable x 0 is intro-
duced and constraint x 0 − x − 1 = 0 is added to C4 producing:
C50 = {x = 1, y = 2 · x, x 6 n, x 0 − x − 1 = 0}
C500 = {x 0 = 2, y = 2 · x 0 − 2, x 0 6 n + 1}
C5 = {x = 2, y = 2 · x − 2, x 6 n + 1}
next iterations The analysis then returns to the head of the while loop and
propagates the polyhedron P6 to that point. To compute the new program state
at the loop head, it now needs to compute the union of P6 with the previous
polyhedron P3 at that point. Since the union of convex polyhedra is usually not
convex, it is approximated using the join transformer (t) to yield the polyhedron
P30 .
The analysis then checks if the new polyhedron P30 at the loop head is included
in P3 using inclusion testing (v). If yes, then no new information was added and
the analysis terminates. However, here, P30 6v P3 and so the analysis continues. After
several iterations, the widening transformer (∇) may be applied at the loop head
along with the join to accelerate convergence.
set for the result starting from the input constraint set(s); the column Generator has
similar meaning for the generators. The column Both shows the asymptotic cost of
computing at least one of the representations for the result when both representa-
tions are available for the input(s).
(e.g., at the loop head when join is needed). The eager approach computes the
conversion after every operation. Chernikova’s algorithm is incremental, which
means that for transformers which add constraints or generators such as meet (u),
join (t), conditional and others, the conversion needs to be computed only for
the added constraints or generators. Because of this, in some cases eager can be
faster than lazy. While our transformers and algorithms are compatible with both
approaches, we use the eager approach in this work.
We next present our key insight and show how to leverage it to speed up program
analysis using Polyhedra. Our observation is that the set of program variables can
be partitioned into smaller subsets with respect to the polyhedra arising during
analysis such that no constraints exist between variables in different subsets. This
allows us to decompose a large polyhedron into a set of smaller polyhedra, which
reduces the space complexity of the analysis. For example, the n-dimensional hy-
percube requires 2n generators whereas with decomposition only 2n generators are
required. The original polyhedron can be recovered exactly using the decomposed
Polyhedra; thus, analysis precision is not affected. Further, the decomposition al-
lows the expensive polyhedra transformers to operate on smaller polyhedra, thus
reducing their time complexity without losing precision.
We first introduce our notation for partitions. Then, we introduce the theoretical
underpinning of our work: the interaction between the Polyhedra domain trans-
formers and the partitions.
2.2.1 Partitions
X = {x1 , x2 , x3 } and
P = {x1 + 2 · x2 6 3}.
Here, X is partitioned into two blocks: X1 = {x1 , x2 } and X2 = {x3 }. Now consider
P = {x1 + 2 · x2 6 3, 3 · x2 + 4 · x3 6 1}.
the variables in Xk . The polyhedron P can be recovered from the factors Pk by com-
puting the union of the constraints CPk and the Cartesian product of the generators
GPk . For this, we introduce the ./ transformer defined as:
P = P1 ./ P2 ./ . . . ./ Pr
(2.4)
= (CP1 ∪ CP2 . . . ∪ CPr , GP1 × GP2 . . . × GPr ).
Example 2.2.2. The polyhedron P in Fig. 2.1 (a) has no constraints between vari-
ables x1 and x2 . Thus, X = {x1 , x2 } can be partitioned into blocks: πP = {{x1 }, {x2 }}
with corresponding factors P1 = (CP1 , GP1 ) and P2 = (CP2 , GP2 ) where:
The set L consisting of all partitions of X forms a partition lattice (L, v, t, u, ⊥, >).
The elements π of the lattice are ordered as follows: π v π 0 , if every block of π is
included in some block of π 0 (π "is finer" than π 0 ). This lattice contains the usual
transformers of least upper bound (t) and greatest lower bound (u). In the partition
lattice, > = {X} and ⊥ = {{x1 }, {x2 }, . . . , {xn }}.
Now consider,
π = {{x1 , x2 }, {x3 , x4 }, {x5 }} and
π 0 = {{x1 , x2 , x3 }, {x4 }, {x5 }}
Then,
π t π 0 = {{x1 , x2 , x3 , x4 }, {x5 }} and
π u π 0 = {{x1 , x2 }, {x3 }, {x4 }, {x5 }}
Definition 2.2.1. We call a partition π permissible for P if there are no variables xi
and xj in different blocks of π related by a constraint in P, i.e., if π w πP .
Note, that the finest partition π> for the top (>) and the bottom (⊥) polyhedra is
the bottom element in the partition lattice, i.e., π> = π⊥ = ⊥. Thus, every partition
is permissible for these.
The decomposed transformers require the computation of the output partition be-
fore the output is actually computed. We next show how the output partitions
2.2 polyhedra decomposition 31
are computed. The optimality of the computed partitions depends upon the de-
gree to which the polyhedra are observed. The finest partition for the output of a
Polyhedra transformer can always be computed from scratch by taking the output
polyhedron and connecting the variables that occur in the same constraint in that
polyhedron. However, this nullifies any performance gains as standard transformer
needs to be applied for computing the output polyhedron. For efficiency, we com-
pute the output partitions based on limited observation of the inputs. The partition
for the output of transformers such as meet, conditionals, assignment, and widen-
ing is computed from the corresponding partitions of input polyhedra P and Q.
For the join however, to ensure we do not end up with a trivial and imprecise par-
tition, we need to examine P and Q (discussed later in the section). Our approach
to handling the join partition is key to achieving significant analysis speedups.
We also note that the same polyhedron can have multiple constraint representa-
tions with different finest partitions as shown in the example below.
Example 2.2.4. The polyhedron P = {x1 = 0, x2 = 0} has the partition πP =
{{x1 }, {x2 }}. This polyhedron can also be represented as P 0 = {x1 = 0, x2 = x1 } with
the associated partition πP 0 = {{x1 , x2 }}. Here P = P 0 but πP 6= πP 0 .
We note that the conversion algorithm performs transformations to change the
polyhedron representation. The exact output after such transformations depends
on the polyhedron and cannot be determined statically. In this work, we do not
model the effect of such transformations.
We next provide optimal partitions under our observation model. For polyhe-
dron P, we denote the associated optimal partition in our model as πobsP . We will
present a refinement of our observation model to obtain finer output partitions at
small extra cost in Section 3.4. We will also present conditions when πobs
P = πP .
meet The constraint set for the meet P u Q is the union CP ∪ CQ . Thus, overlap-
ping blocks Xi ∈ πP and Xj ∈ πQ will merge into one block in πPuQ . This yields
Lemma 2.2.1. Let P and Q be two polyhedra with P u Q 6= ⊥. Then πPuQ v πobs
PuQ =
πP t πQ .
in πO , all blocks Xi ∈ πP that overlap with B will merge into one, whereas non-
overlapping blocks remain independent. Thus, we get the following lemma for
calculating πO .
Lemma 2.2.2. Let P be the input polyhedra and let B be the block corresponding
O = πP ↑ B.
to the conditional α · xi ⊗ δ. If O 6= ⊥, then πO v πobs
πO for the output O of the transformer for the assignment xi := δ can be com-
puted similarly to that of the conditional transformer.
Lemma 2.2.3. Let P be the input polyhedra and let B be the block corresponding
O = πP ↑ B.
to an assignment xi := δ. Then πO v πobs
widening Like the join, the partition for widening (∇) depends not only on
partitions πP and πQ , but also on the exact form of P and Q. By definition, the
constraint set for P∇Q contains only constraints from Q. Thus, the partition for
P∇Q satisfies
Lemma 2.2.4. For polyhedra P and Q, πP∇Q v πobs
P∇Q = πQ .
Note that the widening transformer can potentially remove all constraints con-
taining a variable, making the variable unconstrained. Thus, in general, πP∇Q 6= πQ .
join Let CP = {A1 · x 6 b1 } and CQ = {A2 · x 6 b2 }3 and Y = {x10 , x20 , . . . , xn0 , λ},
then the constraint set CPtQ for the join of P and Q can be computed by projecting
out variables yi ∈ Y from the following set S of constraints:
S = {A1 · x 0 6 b1 · λ, A2 · (x − x 0 ) 6 b2 · (1 − λ), −λ 6 0, λ 6 1}. (2.5)
The Fourier-Motzkin elimination algorithm [102] is used for this projection. The
algorithm starts with S0 = S and projects out variables iteratively one after another
so that CPtQ = Sn+1 . Let Si−1 be the constraint set obtained after projecting out the
first i − 1 variables in Y. Then yi ∈ Y is projected out to produce Si as follows:
S+
yi = {c | c ∈ Si−1 and ai > 0},
S−
yi = {c | c ∈ Si−1 and ai < 0},
S0yi = {c | c ∈ Si−1 and ai = 0}, (2.6)
S±
yi = {µ · c1 + ν · c2 | (c1 , c2 ) ∈ Syi × Syi and µ · a1i + ν · a2i = 0},
+ −
Si = S0yi ∪ S±
yi .
x2 x2=4 x2 x2=4
Figure 2.4: Two examples of P t Q with πP = πQ = {{x1 }, {x2 }}. (a)P1 6= Q1 , P2 6= Q2 ; (b)
P1 = Q1 , P2 6= Q2 .
However,
πP t πQ = {{x1 , x2 }, {x3 }} and
πP u πQ = {{x1 }, {x2 }, {x3 }}.
Thus, neither πP t πQ nor πP u πQ are permissible partitions for P t Q.
Theorem 2.2.5. Let P and Q be two polyhedra with the same permissible partition π =
{X1 , X2 , . . . , Xr } and let π 0 be a permissible partition for the join, that is, πPtQ v π 0 . If for
any block Xk ∈ π, Pk = Qk , then Xk ∈ π 0 .
Proof. Since both P and Q are partitioned according to π, the constraint set in (2.5)
can be written for each Xk separately:
{A1k · xk0 6 b1k · λ, A2k · (xk − xk0 ) 6 b2k · (1 − λ), −λ 6 0, λ 6 1}. (2.7)
34 fast polyhedra analysis via online decomposition
where xk is column vector for the variables in Xk . λ occurs in the constraint set for
all blocks. For proving the theorem, we need to show that no variable in Xk will
have a constraint with a variable in Xk 0 ∈ π after join. The variables in Xk can have
a constraint with the variables in Xk 0 only by projecting out λ. Since Pk = Qk , CPk
and CQk are equivalent, we can assume A1k = A2k and b1k = b2k .4 Inserting this
into (2.7) we get
{A1k · xk0 6 b1k · λ, A1k · (xk − xk0 ) 6 b1k · (1 − λ), −λ 6 0, λ 6 1}. (2.8)
The result of the projection is independent of the order in which the variables are
projected out. Thus, we can project out λ last. For proving the theorem, we need to
show that it is possible to obtain all constraints for CPk tQk before projecting out λ
in (2.8). We add A1k · xk0 6 b1k · λ and A1k · (xk − xk0 ) 6 b1k · (1 − λ) in (2.8) to project
out all xk0 and obtain:
Note that the constraint set in (2.9) does not contain all constraints generated by
the Fourier-Motzkin elimination. Since Pk = Pk t Pk , we have CPk tQk = CPk and CPk
is included in the constraint set of (2.9); thus, the remaining constraints generated
by the Fourier-Motzkin elimination are redundant. In (2.9), all constraints among
the variables in Xk are free from λ; therefore, projecting out λ does not create new
constraints for the variables in Xk . Thus, there cannot be any constraint from a
variable in Xk to a variable in Xk 0 .
The proof of the theorem also yields the following result.
Example 2.2.6. Fig. 2.4 shows two examples of P t Q where both P and Q have the
same partition πP = πQ = {{x1 }, {x2 }}. In Fig. 2.4(a),
4 One can always perform a transformation so that A1k = A2k and b1k = b2k holds.
2.3 polyhedra domain analysis with partitions 35
For a given polyhedron, NewPolka and PPL store both, the constraint set C and
the generator set G, each represented as a matrix. We follow a similar approach
adapted to our partitioned scenario. Specifically, assume a polyhedron P with per-
missible partition πP = {X1 , X2 , . . . , Xr }, i.e., associated factors {P1 , P2 , . . . , Pr }, where
Pk = (CPk , GPk ). The blocks of πP are stored as a linked list of variables and the poly-
hedron as a linked list of factors. Alternatively, trees can also be used. Each factor
is stored as two matrices. We do not explicitly store the factors and the blocks for
the unconstrained variables. For example, > is stored as ∅.
36 fast polyhedra analysis via online decomposition
The results in Section 2.2.2 calculated for each input polyhedra P, Q with partitions
πP , πQ either the best (finest) or a permissible partition of the output polyhedron
O of a transformer. Inspection shows that each result can be adapted to the case
where the input partitions are only permissible. In this case, the output partition is
likewise only permissible.
Table 2.2 shows the asymptotic time complexity of the Polyhedra transformers
decomposed with our approach. For simplicity, we assume that for binary trans-
formers both inputs have the same partition. In the table, r is the number of blocks
in the partition, ni is the number of variables in the i-th block, gi and mi are the
number of generators and constraints in the i-th factor respectively. It holds that
P P Q
n = ri=1 ni , m = ri=1 mi and g = ri=1 gi . We denote the number of variables
and generators in the largest block by nmax and gmax , respectively. Since we follow
the eager approach for conversion, both representations are available for inputs,
i.e., the second column of Table 2.2 corresponds to column Both in Table 2.1. We do
not show the cost of conversion.
2.3 polyhedra domain analysis with partitions 37
P1 :>
x:=5;
P2 :{{x = 5}}
u:=3;
P3 :{{x = 5}, {u = 3}}
if(x==y){
P4 :{{x = 5, x = y}, {u = 3}}
x:=2 · y;
P5 :{{y = 5, x = 2 · y}, {u = 3}}
}
P6 :{{−x 6 −5, x 6 10}, {u = 3}}
if(u==v){
P7 :{{−x 6 −5, x 6 10}, {u = 3, u = v}}
u :=3 · v;
P8 :{{−x 6 −5, x 6 10}, {v = 3, u = 3 · v}}
}
P9 :{{−x 6 −5, x 6 10}, {−u 6 −3, u 6 9}}
z:=x + u;
P10 :{{−x 6 −5, x 6 10, −u 6 −3, u 6 9, z = x + u}}
Figure 2.5: Example of complexity reduction through decomposition for Polyhedra analy-
sis on an example program.
In this section, we describe our algorithms for the main Polyhedra transformers.
For each transformer, we first describe the base algorithm, followed by our adapta-
tion of that algorithm to use partitions. We also discuss useful code optimizations
for our algorithms. We follow an eager approach for the conversion; thus, the in-
puts and the output have both C and G available. Our choice allows us to always
apply the conversion incrementally for the expensive meet, conditional, and join
transformers while with the lazy approach it is possible that this cannot be done.
Join is the most challenging transformer to adapt with partitions as the partition
for the output depends on the exact form of the inputs. Our algorithms rely on two
auxiliary transformers, conversion and refactoring, which we describe first.
X = {x1 , x2 , x3 , x4 , x5 , x6 },
P = {{x1 = x2 , x2 = 2}, {x3 6 2}, {x5 = 1}, {x6 = 2}},
Q = {{x1 = 2, x2 = 2}, {x3 6 2}, {x5 = 2}, {x6 = 3}}, with
πP = {{x1 , x2 }, {x3 , x4 }, {x5 }, {x6 }} and
πQ = {{x1 , x2 , x4 }, {x3 }, {x5 }, {x6 }}.
where,
P1 ./ P2 = {x1 = x2 , x2 = 2, x3 6 2} and
Q1 ./ Q2 = {x1 = 2, x2 = 2, x3 6 2}
After explaining refactoring, we now present our algorithms for the Polyhedra
transformers with partitions.
For the double representation, the constraint set CO of the output O = P u Q is the
union of the constraints of the inputs P and Q, i.e., CPuQ = CP ∪ CQ . GO is obtained
by incrementally adding the constraints in CQ to the polyhedron defined by GP
through the conversion transformer. If the conversion returns GO = ∅, then CO is
unsatisfiable and thus O = ⊥.
meet with partitions Our algorithm first computes the common partition
πP t πQ . P and Q are then refactored according to this partition using Algorithm
2.1 to obtain P 0 and Q 0 . If Pk0 = Qk0 , then CP 0 ∪ CQ 0 = CP 0 and no conversion
k k k
is required and we simply add Pk0 to O. If Pk0 6= Qk0 we add CP 0 ∪ CQ 0 to CO .
k k
Next the constraints in CQ 0 are incrementally added to the polyhedron defined
k
by GP 0 through the conversion transformer obtaining GP 0 uQ 0 . If the conversion
k k k
algorithm returns GP 0 uQ 0 = ∅, then we set O = ⊥. We know from Section 2.3 that
k k
πO = πP t πQ if O 6= ⊥, otherwise πO = ⊥.
For the double representation, P v Q holds if all generators in GP satisfy all con-
straints in CQ . A vertex v ∈ VP satisfies the constraint set CQ if A · v 6 b and
D · v = e. A ray r ∈ RP satisfies CQ if A · r 6 0 and D · r = 0. A line z ∈ ZP satisfies
CQ if A · z = 0 and D · z = 0.
2.4.4 Conditional
For the double representation, the transformer for the conditional statement α · xi ⊗
δ adds the constraint c = (α − ai ) · xi ⊗ δ − ai · xi to the constraint set CP , producing
CO . GO is obtained by incrementally adding the constraint c to the polyhedron
defined by GP through the conversion transformer. The conversion returns GO = ∅,
if CO is unsatisfiable and thus we get O = ⊥.
2.4.5 Assignment
VO = {v 0 | vi0 = aT · v + , v ∈ VP },
RO = {r 0 | ri0 = aT · r, r ∈ RP }, (2.10)
ZO = {z 0 | zi0 = aT · z, z ∈ ZP }.
For the double representation, the widening transformer requires the generators
and the constraints of P and the constraints of Q. A given constraint a · x ⊗ b,
where ⊗ ∈ {6, =}, saturates a vertex v ∈ V if a · v = b, a ray r ∈ R if a · r = 0, and a
line z ∈ Z if a · z = 0.
For a given constraint c and G, the set Sc,G is defined as:
The standard widening transformer computes for each constraint cp ∈ CP , the set
Scp ,GP and for each constraint cq ∈ CQ , the set Scq ,GP . If Scq ,GP = Scp ,GP for any cp ,
then cq is added to the output constraint set CO . The widening transformer removes
the constraints from CQ , so the conversion is not incremental in the standard im-
plementations. Recent work [184] allows incremental conversion when constraints
or generators are removed.
44 fast polyhedra analysis via online decomposition
For the double representation, the generators GO of the output O = P t Q of the join
are simply the union of the generators of the input polyhedra, i.e., GO = GP ∪ GQ .
CO is obtained by incrementally adding the generators in GQ to the polyhedron
defined by CP .
2.4 polyhedra transformers 45
computing the generators for the join If Pk0 = Qk0 holds, then Pk0 can
be added to O by Corollary 2.2.1. Since no new generators are added, the conver-
46 fast polyhedra analysis via online decomposition
sion transformer is not required for these. This results in a large reduction of the
operation count for the conversion transformer.
As in Section 2.3, we compute π = πP t πQ and U = {Xk ∈ π | Pk0 = Qk0 }. The
factors in P 0 and Q 0 corresponding to the blocks A ∈ π \ U are merged using
the ./ transformer to produce PN0 and QN 0 respectively. Next, we compute G =
O
{GP 0 , GP 0 , . . . , GP 0 , GP 0 ∪ GQ 0 } where u = |U|. The pseudo code for this step is
U1 U2 Uu N N
shown in Algorithm 2.7.
computing the constraints for the join We know the constraint set for
all factors corresponding to the blocks in U. CP 0 ∪Q 0 is obtained by incrementally
N N
adding the generators in GQ 0 to the polyhedron defined by CP 0 . Similar to the meet
N N
transformer in Section 2.4.2, we apply our code optimization of not computing the
constraints for the generators common in both GP 0 and GQ 0 .
N N
X = {x1 , x2 , x3 , x4 , x5 , x6 },
P = {{x1 = x2 , x2 = 2}, {x3 6 2}, {x5 = 1}, {x6 = 2}},
Q = {{x1 = 2, x2 = 2}, {x3 6 2}, {x5 = 2}, {x6 = 3}} with
πP = {{x1 , x2 }, {x3 , x4 }, {x5 }, {x6 }} and
πQ = {{x1 , x2 , x4 }, {x3 }, {x5 }, {x6 }}
2.4 polyhedra transformers 47
We observe that only P10 = Q10 ; thus, we add P10 to the join O and {x1 , x2 , x3 , x4 } to U.
Applying Algorithm 2.7 we get,
GQ 0 contains only one vertex (2, 3). The conversion transformer incrementally adds
N
this vertex to the polyhedron defined by CP 0 . Thus, the factors O1 and O2 of O =
N
{O1 , O2 } are given by,
O1 = {x1 = x2 , x2 = 2, x3 6 2} and
O2 = {−x5 6 −1, x5 6 2, x6 = x5 + 1}.
even though πP = πP and πQ = πQ . This is because the join transformer will not
have a constraint involving a variable xi if either P or Q does not contain any
constraint involving xi . We illustrate this with an example below:
48 fast polyhedra analysis via online decomposition
why theorem 2.2.5 works in practice In the program analysis setting, the
join is applied at the loop head or where the branches corresponding to if-else state-
ments merge. In case of a loop head, P represents the polyhedron before executing
the loop body and Q represents the polyhedron after executing the loop. The loop
usually modifies only a small number of variables. The factors corresponding to
the blocks containing only the unmodified variables are equal, thus | A∈π\U A| is
S
small. Hence, the application of the ./ transformer while merging factors corre-
sponding to A does not create an exponential number of new generators. Similarly
for the if-else, the branches modify only a small number of variables and thus
| A∈π\U A| remains small.
S
2.4 polyhedra transformers 49
x:=0;
y:=0;
if (∗){
x ++;
y ++;
}
z := x ;
platform All of our experiments were carried out on a 3.5 GHz Intel Quad
Core i7-4771 Haswell CPU. The sizes of the L1, L2, and L3 caches are 256 KB, 1024
KB, and 8192 KB, respectively, and the main memory has 16 GB. Turbo boost was
disabled for consistency of measurements. All libraries were compiled with gcc
5.2.1 using the flags -O3 -m64 -march=native.
analyzer We use the crab-llvm analyzer which is part of the SeaHorn [91] ver-
ification framework. The analyzer is written in C++ and analyzes LLVM bitcode
for C programs. It generates polyhedra invariants which are then checked for sat-
isfiability with an SMT-solver. The analysis is intra-procedural and the time for
analyzing different functions in the analyzed program varies.
We measured the time and memory consumed for the Polyhedra analysis by New-
Polka, PPL, and ELINA on more than 1500 benchmarks. We used a time limit of 4
hours and a memory limit of 12 GB for our experiments.
Table 2.3: Speedup of Polyhedra domain analysis for ELINA over NewPolka and PPL.
Benchmark Category LOC NewPolka PPL ELINA Speedup ELINA vs.
time(s) memory(GB) time(s) memory(GB) time(s) memory(GB) NewPolka PPL
firewire_firedtv LD 14506 1367 1.7 331 0.9 0.4 0.2 3343 828
net_fddi_skfp LD 30186 5041 11.2 6142 7.2 9.2 0.9 547 668
mtd_ubi LD 39334 3633 7 MO MO 4 0.9 908 ∞
usb_core_main0 LD 52152 11084 2.7 4003 1.4 65 2 170 62
tty_synclinkmp LD 19288 TO TO MO MO 3.4 0.1 >4235 ∞
scsi_advansys LD 21538 TO TO TO TO 4 0.4 >3600 >3600
staging_vt6656 LD 25340 TO TO TO TO 2 0.4 >7200 >7200
net_ppp LD 15744 TO TO 10530 0.15 924 0.3 >16 11.4
p10_l00 CF 592 841 4.2 121 0.9 11 0.8 76 11
p16_l40 CF 1783 MO MO MO MO 11 3 ∞ ∞
p12_l57 CF 4828 MO MO MO MO 14 0.8 ∞ ∞
p13_l53 CF 5816 MO MO MO MO 54 2.7 ∞ ∞
p19_l59 CF 9794 MO MO MO MO 70 1.7 ∞ ∞
ddv_all HM 6532 710 1.4 85 0.5 0.05 0.1 12772 1700
• There was no integer overflow during the analysis for the most time consum-
ing function in the analyzed program.
At each step of the analysis, our algorithms obtain mathematically/semantically
the same polyhedra as NewPolka and PPL, just represented differently (decom-
posed). In the actual implementation, since our representation contains different
numbers, ELINA may produce an integer overflow before NewPolka or PPL. How-
ever, on the benchmarks shown in Table 2.3, NewPolka overflowed 296 times
whereas ELINA overflowed 13 times. We also never overflowed on the procedures
in the benchmarks that are most expensive to analyze (neither did NewPolka and
PPL). Thus ELINA does not benefit from faster convergence due to integer over-
flows which sets the corresponding polyhedra to >.
We show the speedups for ELINA over NewPolka and PPL which range from
one to at least four orders of magnitude. In the case of a time out, we provide
a lower bound on the speedup, which is very conservative. Whenever there is
memory overflow, we show the corresponding speedup as ∞, because the analysis
can never finish on the given machine even if given arbitrary time.
Table 2.3 also shows the number of lines of code for each benchmark. The largest
benchmark is usb_core_main0 with 52K lines of code. ELINA analyzes it in 65 sec-
onds whereas NewPolka takes > 3 hours and PPL requires > 1 hour. PPL performs
52 fast polyhedra analysis via online decomposition
Figure 2.7: The join transformer during the analysis of the usb_core_main0 benchmark. The
x-axis shows the join number and the y-axis shows the number of variables in
N = A∈π\U A (subset of variables affected by the join) and in X. The first figure
S
shows these values for all joins whereas the second figure shows it for one of
the expensive regions of the analysis.
variables in the set X. The first part of Fig. 2.7 plots the number of variables in N
and in X for all joins during the analysis of the usb_core_main0 benchmark. |X| varies
for all joins at different program points. It can be seen that the number of variables
in N is close to the number of variables in X till join number 5000. Although the
number of variables is large in this region, it is not the bottleneck for NewPolka
and PPL as the number of generators is linear in the number of variables. We get a
speedup of 4x mainly due to our conversion transformer which leverages sparsity.
The most expensive region of the analysis for both NewPolka and PPL is after
join number 5000 where the number of generators grows exponentially. In this
region, N contains 9 variables on average whereas X contains 54. The second part
of Fig. 2.7 zooms in one of these expensive regions. Since the cost of conversion
depends exponentially on the number of generators which in turn depends on the
number of variables, we get a large speedup.
We also measured the effect of optimizations not related to partitioning on the
overall speedup. The maximum difference was on the net_ppp benchmark which
was 2.4x slower without the optimizations.
2.6 discussion
Program analysis with the exponentially expensive Polyhedra domain was believed
to be intractable for analyzing large real-world programs for around 40 years. In
this chapter, we presented a theoretical framework, and its implementation, for
speeding up the Polyhedra domain analysis by orders of magnitude without los-
ing precision. Our key idea is to decompose the analysis and its transformers to
work on sets of smaller polyhedra, thus reducing its asymptotic time and space
complexity. This is possible because the statements in real-world programs affect
only a few variables in the polyhedra. As a result, the variable set can be parti-
tioned into independent subsets. The challenge in maintaining these partitions is
in handling their continuous change during the analysis. These changes cannot
be predicted statically in advance. Our partition computations leverage dynamic
analysis state and the semantics of the Polyhedra transformers. These computa-
tions are fast and produce sufficiently fine partitions, which enables significant
speedups. Precision-wise, our decomposed analysis computes polyhedra seman-
tically equivalent to those produced by the original non-decomposed analysis at
each step. Overall our analysis computes the same invariants as the original one,
but significantly faster.
We provided a complete end-to-end implementation of the Polyhedra domain
analysis within ELINA [1]. Benchmarking against two state-of-the-art libraries for
the Polyhedra analysis, namely, NewPolka and PPL, on real-world programs in-
cluding Linux device drivers and heap manipulating programs showed orders of
magnitude speedup or successful completion where the others time-out or exceed
memory. We believe that our framework presents a significant step forward in mak-
ing Polyhedra domain analysis practical for real-world use.
In the next chapter, we will show that our theoretical framework of online decom-
position is generic and can be extended to any numerical domain that maintains
linear constraints between program variables.
3
GENERALIZING ONLINE DECOMPOSITION
this chapter Our key objective is to bring the power of decomposition to all
sub-polyhedra domains without requiring complex manual effort from the domain
designer. This enables domain designers to achieve speed-ups without requiring
55
56 generalizing online decomposition
them to rewrite all abstract transformers from scratch each time. More formally, our
goal is to provide a systematic correct-by-construction method that, given a sound
abstract transformer T in a sub-polyhedra domain (e.g., Zone), generates a sound
decomposed version of T that is faster than T and does not require any change to the
internals of T. In this chapter, we present a construction that achieves this objective
under certain conditions. We provide theoretical guarantees on the convergence,
monotonicity, and precision of the decomposed analysis with respect to the non-
decomposed analysis. We also show that the obtained decomposed transformers
are faster than the prior, hand-tuned decomposed domains from [189, 190].
The work in this chapter was published in [191].
In this section, we introduce a generic model for the abstract domains to which
our theory applies. An abstract domain consists of a set of abstract elements and
a set of transformers that model the effect of program statements (assignment,
conditionals, etc.) and control flow (join etc.) on the abstract elements. Let X =
{x1 , x2 , . . . , xn } be a set of program variables. We consider sub-polyhedra domains,
i.e., numerical abstract domains D that encode linear relationships between the
variables in X of the form:
X
n
ai · xi ⊗ c, where xi ∈ X, ai ∈ Z, ⊗ ∈ {6, =}, c ∈ C. (3.1)
i=1
3.1 generic model for numerical abstract domains 57
Typical choices for C include Q (rationals) and R (reals). As with any abstraction,
the design of a numerical domain is guided by the cost vs. precision tradeoff. For
instance, the Polyhedra domain [57] is the most precise domain yet it is also the
most expensive. On the other hand, the Interval domain is cheap but also very
imprecise as it does not preserve relational information between variables. Between
these two sit a number of domains with varying degrees of precision and cost:
examples include Two Variables Per Inequality (TVPI) [143], Octagon [142], and
Zone [140].
• The set C containing the allowed values for the constant c in (3.1). Typical
examples include Q and R.
Table 3.1 shows common constraints in the above notation allowed by different
numerical domains. The set of constraints LX,D representable by a domain D con-
P
tains all constraints of the form ni=1 ai · xi ⊗ c where: (i) the coefficient list of each
Pn
expression i=1 ai · xi is a permutation of a tuple in R, (ii) ⊗ ∈ T, and (iii) the
constant c ∈ C. For instance, the possible constraints LX,Octagon for the Octagon
domain over real numbers are described via the tuple (n, U2 × {0}n−2 , {6, =}, R).
Example 3.1.1. Consider a program with four variables and a fictive domain that
can relate at most two:
Definition 3.1.1. A given abstract transformer T is sound w.r.t to its concrete trans-
former T # iff for any element I ∈ D, T # (γ(I)) ⊆ γ(T (I)).
Definition 3.1.2. We say an abstract domain D is closed (also called forward com-
plete in [83, 164]) for a concrete transformer T # (e.g., conditional, meet) iff it can
be done precisely in the domain, i.e., if there exists an abstract transformer T cor-
responding to that concrete transformer such that for any abstract element I in D,
γ(T (I)) = T # (γ(I)).
The Polyhedra domain is closed for conditional, assignment, and meet, but not
for the join. All other domains in Table 3.1 are only closed for the meet. Indeed,
a crucial aspect of abstract interpretation is to permit sound approximations of
transformers for which the domain is not closed.
Example 3.1.3. The Octagon domain is not closed for the conditional transformer.
For example, if the condition is x1 − 2x2 6 0 and the abstract element is I = {x1 6
1, x2 6 0}, then the concrete element T # (γ(I)) = {x1 6 1, x2 6 0, x1 − 2x2 6 0} is not
representable exactly in the Octagon domain (because the constraint x1 − 2x2 6 0
is not exactly representable).
A useful concept in analysis (and one we refer to throughout the chapter) is that
of a best abstract transformer.
Definition 3.1.3. A (unary) abstract transformer T in D is best iff for any other
sound unary abstract transformer T 0 (corresponding to the same concrete trans-
former T # ) it holds that for any element I in D, T always produces a more precise
result (in the concrete), that is, γ(T (I)) ⊆ γ(T 0 (I)). The definition is naturally lifted
to multiple arguments.
In Example 3.1.3, a possible sound approximation for the output in the Octagon
domain is I 00 = I while a best transformer would produce {x1 6 0, x2 6 0, x1 − x2 6
0}. Since there can be multiple abstract elements with the same concretization, there
can be multiple best abstract transformers in D. We require the sub-polyhedra
abstract domains to be equipped with a best transformer and also be closed under
meet. Due to these restrictions, our theory does not apply to the Zonotope [81, 82,
191] and DeepPoly [188] domains.
1 Throughout the chapter we will simply use the term transformer to mean a sound abstract trans-
former.
60 generalizing online decomposition
In this section, we introduce the needed notation and concepts for decomposing
abstract elements and transformers. We extend the terminology of partitions, blocks,
and factors as introduced in Section 2.2.1 for handling the decomposition of the
elements of the abstract domain D. We write πI for referring to the unique finest
partition for an element I in D. Each factor Ik ⊆ I is defined by the constraints that
exist between the variables in the corresponding block Xk ∈ πI . I can be recovered
from the set of factors by taking the union of the constraint sets Ik .
Example 3.2.1. Consider the element I = {x1 − x2 6 1, x3 6 0, x4 6 0} in the TVPI
domain
X = {x1 , x2 , x3 , x4 } and LX,TVPI : (4, Z2 × {0}2 , {6, =}, Q).
Here X can be partitioned into three blocks with respect to I resulting in three
factors:
πI = {{x1 , x2 }, {x3 }, {x4 }}, I1 = {x1 − x2 6 1}, I2 = {x3 6 0}, and I3 = {x4 6 0}.
For a given D, π⊥ = π> = ⊥ = {{x1 }, {x2 }, . . . , {xn }}. More generally, note that
I v I 0 does not imply that πI 0 is finer, coarser, or comparable to πI .
X = {x1 , x2 , x3 }, LX,Stripes : {3, {(a, a, −1) | a ∈ Z} ∪ {(0, a, −1) | a ∈ Z}, {6, =}, Q},
There are many ways to define sound approximations of the best transformers in
D. As a consequence, it is possible to have two transformers T1 , T2 in D on the same
input I such that one produces the > partition for the output while the other does
not. There are two principal ways to obtain a decomposable transformer: (a) white
box: here, one designs the transformer from scratch, maintaining the (changing)
partitions during the analysis, and (b) black box: here, one provides a construction
for decomposing existing transformers without knowing their internals. In the next
section, we pursue the second approach and show that it is possible and, under
certain conditions there is no loss of precision. As a preview, we describe the high-
level steps that one needs to perform dynamically in our black-box decomposition.
62 generalizing online decomposition
4. apply the transformer on one or more factors of the inputs from step 3.
recipe and applied it to Polyhedra, Octagon, and Zone. Using a set of large Linux
device drivers, we show later in Section 3.5 the performance of our generated de-
composed transformers vs. transformers obtained via state-of-the-art hand-tuned
decomposition [189, 190]. Our approach yields up to 6x speed-ups for Polyhedra
and up to 2x speed-ups for Octagon. This speed-up is due to our decomposition
theorems (discussed next) that enable, in certain cases, finer decomposition of ab-
stract elements than previously possible. Speedups compared to the original trans-
formers without decomposition are orders of magnitude larger. Further, we also
decompose the Zone domain using our approach (for which no previous decom-
position exists) without changing the existing domain transformers. We obtain a
speedup of up to 6x over the non-decomposed implementation of the Zone domain.
In summary, our recipe is generic in nature yet leads to state-of-the-art performance
for classic abstract transformers.
In this section we show a construction that takes as input a sound and monotone
transformer in a given domain D and produces a decomposed variant of the same
transformer that operates on part(s) of the input(s). The resulting decomposed
transformer is always sound. We define classes of transformers for which the out-
put produced by the decomposed transformer has the same concretization as the
original non-decomposed transformer, i.e., there is no loss of precision. Although
our results apply to all transformers, we focus on the conditional, assignment, meet,
join, and widening transformers. We also show how to obtain finer partitions than
the manually decomposed transformers for Polyhedra (Chapter 2) and Octagon
considered in prior work [189, 190].
3.4.1 Conditional
P
We consider conditional statements of the form e ⊗ c where e = n i=1 ai · xi with
ai ∈ Z, ⊗ ∈ {6, =}, and c ∈ Q, R, on an abstract element I with an associated
permissible partition πI in domain D. The conditional transformer computes the
effect of adding the constraint e ⊗ c to I. As discussed in Section 3.1, most exist-
ing domains are not closed for the conditional transformer. Moreover, computing
the best transformers is expensive in these domains and thus the transformer is
usually approximated to strike a balance between precision and cost. The example
below illustrates two sound conditional transformers on the same input: the first
transformer produces a decomposable output whereas the output of the second
results in the > partition.
Example 3.4.1. Consider
d
Algorithm 3.1 Decomposed conditional transformer Tcond
1: function Conditional((I, πI ), stmt, Tcond )
2: B∗cond := compute_block(stmt, πI )
3: Icond := I(B∗cond )
4: πIO := {A ∈ πI | A ∩ B∗cond = ∅} ∪ {B∗cond }
5: Irest := I(X \ B∗cond )
6: IO := Tcond (Icond ) ∪ Irest
7: return (IO , πIO )
8: end function
focus on also maintaining precision and thus monotonicity. Thus, we define a class
Cond(D) of conditional transformers Tcond where γ(Tcond (I)) = γ(Tcond d (I)) (this is
one of the two conditions discussed earlier that ensure fixpoint equivalence).
Definition 3.4.1. A (sound and monotone) transformer Tcond for the conditional
expression e ⊗ c is in Cond(D) iff for any element I and any associated permissible
partition πI , the output Tcond (I) satisfies:
• Tcond (I) = I ∪ I 0 ∪ I 00 where I 0 contains non-redundant constraints between the
variables from B∗cond only and I 00 is a set of redundant constraints between
the variables in X.
Note that we can strengthen the condition in Definition 3.4.1 by replacing B∗cond
with Bcond . This makes it independent of permissible partitions but would reduce
the size of the class Cond(D).
In Example 3.4.1, T1 ∈ Cond(D) whereas T2 6∈ Cond(D) since T2 does not keep
the original constraints. Most standard transformers used in practice satisfy the
two conditions and can thus be decomposed with our construction without losing
any precision. The following example illustrates our construction for decomposing
the standard conditional transformer Tcond in the Octagon domain.
Consider the conditional statement x3 6 0 with Bcond = {x3 }. Tcond adds the
constraint x3 6 0 to I and then applies Octagon closure on the resulting element to
produce the output:
Note that best transformers are not necessarily in Cond(D). This is due to con-
straints on the coefficient set R or the constant set C in D. We provide an example
of a domain D which does not have any best transformer in Cond(D).
I ∈ D with the associated partition πI . We note that the result of the concrete
conditional transformer is I ∪ {e ⊗ c}. We compute its abstraction in D by initializing
best (I) = ∅ and computing an upper bound c
Tcond
Pn max for all representable expressions
ai · xi in D constrained under the set I ∪ {e ⊗ c} using LP. If cmax 6= ∞, then
0
Pi=1
n 0 best
i=1 ai · xi 6 cmax is added to Tcond (I).
We now show that Tcond best (I) satisfies Definition 3.4.1. We note that our construc-
tion will not remove any existing constraint from I by adding a bound of ∞; thus,
I ⊆ Tcond
best (I). Since the constraint e ⊗ c only affects the variables in B∗
cond in the con-
crete, any constraint in Tcond (I) involving only the variables of X \ B∗cond is either
best
are only between the variables in block B∗cond . Further, our construction ensures
that these can be recovered by applying Tcond best on I
cond .
best in Theorem 3.4.2 will not terminate if R is
We note that the algorithm for Tcond i
infinite. Existing domains including Polyhedra, Octagon and Zone satisfy the con-
ditions of Theorem 3.4.2 and thus a best conditional transformer in these domains
is in Cond(D). The following obvious corollary provides a condition under which
the output partition πIO computed by Algorithm 3.1 is finest, i.e., πIO = πIO .
3.4.2 Assignment
For the assignment x5 := −x6 , a best sound assignment transformer T1 may return
the decomposable output:
d
Algorithm 3.2 Decomposed assignment transformer Tassign
1: function Assignment((I, πI ), stmt, Tassign )
2: B∗assign := compute_block(stmt, πI )
3: Iassign := I(B∗assign )
4: πIO := {A ∈ πI | A ∩ B∗assign = ∅} ∪ {B∗assign }
5: Irest := I(X \ B∗assign )
6: IO := Tassign (Iassign ) ∪ Irest
7: πIO := refine(πIO )
8: return (IO , πIO )
9: end function
Proof. B∗assign contains xj by definition, thus I \ Ixj = Irest ∪ (Iassign \ Ixj ). We have,
IO = Tassign
d
(I) = {x1 6 0, x2 − 2 · x3 = 0, x3 6 3, x2 6 6, x2 + x3 6 9} with πIO .
Proof. The result of the concrete assignment transformer is (I \ Ixj ) ∪ IB∗assign where
IB∗assign contains non-redundant constraints only between the variables in B∗assign
added by the concrete transformer. Since I \ Ixj is representable in D, Tassign
best can
3.4 decomposing domain transformers 71
in Assign(D).
We note that most existing domains such as Polyhedra, Octagon, Zone, Octa-
hedron satisfy the conditions of Theorem 3.4.4 and thus a best assignment trans-
former in these domains is in Assign(D).
refinement So far we have assumed that line 7 of Algorithm 3.2 is the identity
(that is, refinement does not affect πIO ). We now discuss refinement in more detail.
Definition 3.4.3 (Refinement condition). The output partition πIO of a non-
invertible assignment transformer Tassign satisfying Definition 3.4.2 is a candidate
for refinement if Xt ∩ Bassign = {xj } where Xt is the block of πI containing xj .
Here, I is the abstract element upon which the transformer is applied and πI is a
permissible partition for I.
If the above condition holds during analysis (it can be checked efficiently), then
refinement can split B∗assign from πIO into two blocks Xt \ {xj } and B∗assign \ (Xt \
{xj }), provided no redundant constraint (I 00 in Definition 3.4.2) fuses these two
blocks. The result is a finer partition πIO . Note that our observation model for
computing the output partition is more refined compared to Section 2.2.2 where
we do not distinguish between invertible and non-invertible assignments and also
do not check whether the condition Xt ∩ Bassign = {xj } holds.
Example 3.4.8. Consider
X = {x1 , x2 , x3 , x4 , x5 }, LX,Zone : (5, {1, 0} × {0, −1} × {0}3 , {6, =}, Q),
I = {x1 6 x2 , x2 6 x3 , x4 6 x5 } with πI = πI = {{x1 , x2 , x3 }, {x4 , x5 }}.
Consider the non-invertible assignment x2 := x4 with Bassign = {x2 , x4 } and the
standard Zone assignment transformer Tassign . Without refinement, we obtain the
partition πIO = {X} = >. However, our refinement condition enables us to obtain
a finer output partition. We have that Xt = {x1 , x2 , x3 } and Xt ∩ Bassign = {x2 } and
thus the dynamic refinement condition applies, splitting the block B∗assign = X into
two blocks: Xt \ {x2 } = {x1 , x3 } and B∗assign \ (Xt \ {x2 }) = {x2 , x4 , x5 }. This produces a
finer partition for the output:
IO = {x1 6 x3 , x2 − x4 = 0, x4 6 x5 , x2 6 x5 } with πIO = πIO = {{x1 , x3 }, {x2 , x4 , x5 }}.
As with the conditional, πIO 6= πIO in general even if πI = πI . The following
corollary provides conditions under which πIO = πIO after applying Algorithm 3.2.
Corollary 3.4.2. For the assignment xj := δ, πIO = πIO holds if πI = πI and, in
the invertible case IO = (I \ Ixj ) ∪ Iinv or, in the non-invertible case IO = (I \ Ixj ) ∪
Inon-inv .
72 generalizing online decomposition
As discussed in Section 3.1, all domains we consider are closed under the meet (u)
transformer and thus it is common to implement a best transformer. In fact, any
meet transformer Tu that obeys Tu (I, I 0 ) v I, I 0 is precise. Thus, we assume a given
best meet transformer, i.e., γ(Tu (I, I 0 )) = γ(I) ∩ γ(I 0 ) for all I, I 0 . As a consequence,
our decomposed construction will always yield an equivalent transformer, without
any conditions.
construction for meet (u) Algorithm 3.3 shows our construction of a de-
composed transformer for a given meet transformer Tu on input elements I, I 0 with
the respective permissible partitions πI , πI 0 in domain D. The algorithm computes a
common permissible partition πI t πI 0 for the inputs and then applies Tu separately
on the individual factors of I, I 0 corresponding to the blocks in πI t πI 0 .
Theorem 3.4.5. γ(Tu (I, I 0 )) = γ(Tud (I, I 0 )) for all inputs I, I 0 in D. In particular, Tud is
sound and monotone.
Proof.
The following corollary provides conditions under which the output partition is
finest.
Corollary 3.4.3. πIO = πIO if πI = πI , πI 0 = πI 0 , and IO = I ∪ I 0 .
X = {x1 , x2 , x3 , x4 , x5 , x6 }, LX,Zone : (6, {1, 0} × {0, −1} × {0}4 , {6, =}, R),
I = {x1 = 1, x2 = 2, x3 6 3, x4 = 4, x5 = 0, x6 = 0} and
I 0 = {x1 = 1, x2 = 2, x3 6 3, x4 = 4, x5 = 1, x6 = 1} with
πI = πI 0 = πI = ⊥.
Another sound transformer T2 for the join could produce the output IO
0 with the >
partition:
0
IO = {x2 − x1 6 1, x1 − x5 6 1, x3 − x2 6 1, x3 − x4 6 −1, x5 − x6 = 0} with
πI 0 = πI 0 = >.
O O
74 generalizing online decomposition
[
Let πcommon := πI t πI 0 and N = {A ∈ πcommon | I(A) 6= I 0 (A)} be the union of
all blocks for which the corresponding factors I(A) and I 0 (A) are not semantically
equal. In Example 3.4.10, we have N = {x5 , x6 }.
Definition 3.4.4. A join transformer Tt is in Join(D) iff for all pairs of input ele-
ments I, I 0 and all associated common permissible partitions πcommon , the output
Tt (I, I 0 ) satisfies the following conditions:
Theorem 3.4.6. If Tt ∈ Join(D), then γ(Tt (I, I 0 )) = γ(Ttd (I, I 0 )) for all inputs I, I 0 in D.
In particular, Ttd is sound and monotone.
3.4 decomposing domain transformers 75
Proof.
γ(Tt (I, I 0 )) = γ(Irest ∪ J 0 ∪ J 00 ) (by Definition 3.4.4)
0
= γ(Irest ∪ J ) (J 00 is redundant)
= γ(Irest ) ∩ γ(J 0 ) (γ is meet-preserving)
0
= γ(Irest ) ∩ γ(Tt (It , It )) (by Definition 3.4.4)
= γ(Irest ∪ Tt (It , It0 )) (γ is meet-preserving)
= γ(Ttd (I, I 0 ))
refinement We can sometimes refine the output partition πIO after comput-
ing the output IO without inspecting or modifying IO . Namely, if a variable xi is
unconstrained in either I or I 0 , then it is also unconstrained in IO . πIO can thus be
refined by removing xi from the block containing it and adding the singleton set
{xi } to πIO . This refinement can be performed after applying Ttd . Note that our ober-
vation model in Section 2.2.2 did not check if the variables were unconstrained for
polyhedra inputs and thus it produces coarser partitions. The following theorem
formalizes this refinement.
Theorem 3.4.7. Let I, I 0 be abstract elements in D with the associated permissible parti-
tions πI , πI 0 respectively. Let U = {x ∈ X | x is unconstrained in I or I 0 } = {u1 , . . . , ur }
and let πIO as computed in line 7 of Algorithm 3.4. Then the following partition is permis-
sible for the output IO :
πIO u {X \ U, {u1 }, . . . , {ur }}.
The proof of Theorem 3.4.7 is immediate from the discussion above. Unlike other
transformers, we do not know of conditions for checking whether πIO = πIO .
semantic The semantic widening [13] requires the set of constraints in the in-
put I to be non-redundant. The output satisfies IO ⊆ I ∪ I 0 . Specifically, IO contains
the constraints from I that are satisfied by I 0 and the constraints ι 0 from I 0 that are
mutually redundant with a constraint ι in I.
Both these transformers are decomposable in practice. The following example
illustrates the semantic and the syntactic widening on the Octagon domain.
d
Algorithm 3.5 Decomposed widening transformer T5
1: function Widening((I, πI ), (I 0 , πI 0 ), T5 )
2: πIO := πI t πI 0
[
3: IO := T5 (I(A), I 0 (A))
A∈πIO
4: πIO := refine(πIO )
5: return (IO , πIO )
6: end function
in D. Thus, T5
d is sound.
Proof. \
γ(T5 (I, I 0 )) = γ(T5 (I(A), I 0 (A))) (by Definition 3.4.5)
A
[
= γ( T5 (I(A), I 0 (A)) (γ is meet-preserving)
A
= γ(T5d
(I, I 0 ))
78 generalizing online decomposition
benchmarks The benchmarks were taken from the popular software verifica-
tion competition [24]. The benchmark suite is divided into categories suited for
different kinds of analyses, e.g., pointer, array, numerical, and others. We chose
two categories suited for numerical analysis: (i) Linux Device Drivers (LD), and (ii)
Control Flow (CF). Each of these categories contains hundreds of benchmarks and
we evaluated the performance of our analysis on each of these. We use the crab-
llvm analyzer, part of the SeaHorn verification framework [91], for performing the
analysis as in Section 2.5 but a different version. Therefore our reported numbers
for the baselines are different than in Table 2.3.
3.5.1 Polyhedra
The standard implementation of the Polyhedra domain contains the best condi-
tional, assignment, meet, and join transformers together with a semantic widening
transformer (as described in Chapter 2). All these transformers are in the classes of
decomposable transformers defined in Section 3.4.
We refer the reader to Table 2.2 and Table 2.1 for the asymptotic complexity
of the Polyhedra transformers in the standard implementation with and without
decomposition [190] respectively. We compare the runtime and memory consump-
tion for end-to-end Polyhedra analysis with our generic decomposed transformers
versus the original non-decomposed transformers from the Parma Polyhedra Li-
brary (PPL) [12] and the decomposed transformers from ELINA [190] presented in
Chapter 2. Note that we do not compare against NewPolka [104] as it performed
worse than PPL in our previous evaluation in Section 2.5. PPL, ELINA, and our
implementation store the constraints and the generators using matrices with 64-bit
integers. PPL stores a single matrix for either representation whereas both ELINA
and our implementation use a set of matrices corresponding to the factors, which
requires exponential space in the worst case.
Table 3.2 shows the results on 13 large, representative benchmarks. These bench-
marks were chosen based on the following criteria which is similar to the one in
Section 2.5:
• The most time consuming function in the benchmark did not produce any
integer overflow with ELINA or our approach.
Our decomposition maintains semantic equivalence with both ELINA and PPL as
long as there is no integer overflow and thus gets the same semantic invariants.
All three implementations set the abstract element to > when an integer overflow
occurs. The total number of integer overflows on the chosen benchmarks were 58,
23 and 21 for PPL, ELINA, and our decomposition, respectively. We also had fewer
integer overflows than both ELINA and PPL on the remaining benchmarks. Thus,
our decomposition improves in some cases also the precision of the analysis with
respect to both ELINA and PPL.
Table 3.2: Speedup for the Polyhedra analysis with our decomposition vs. PPL and ELINA.
Benchmark PPL ELINA Our Decomposition Speedup vs.
time(s) memory(GB) time(s) memory(GB) time(s) memory(GB) PPL ELINA
firewire_firedtv 331 0.9 0.4 0.2 0.2 0.2 1527 2
net_fddi_skfp 6142 7.2 9.2 0.9 4.4 0.3 1386 2
mtd_ubi MO MO 4 0.9 1.9 0.3 ∞ 2.1
usb_core_main0 4003 1.4 65 2 29 0.7 136 2.2
tty_synclinkmp MO MO 3.4 0.1 2.5 0.1 ∞ 1.4
scsi_advansys TO TO 4 0.4 3.4 0.2 >4183 1.2
staging_vt6656 TO TO 2 0.4 0.5 0.1 >28800 4
net_ppp 10530 0.1 924 0.3 891 0.1 11.8 1
p10_l00 121 0.9 11 0.8 5.4 0.2 22.4 2
p16_l40 MO MO 11 3 2.9 0.4 ∞ 3.8
p12_l57 MO MO 14 0.8 6.5 0.3 ∞ 2.1
p13_l53 MO MO 54 2.7 25 0.9 ∞ 2.2
p19_l59 MO MO 70 1.7 12 0.6 ∞ 5.9
Table 3.2 shows our experimental findings. The entry MO and T O in the table
have the same meaning as in Table 2.3. We follow the same convention of reporting
speedups in the case of a memory overflow or a time out as in Table 2.3.
In the table, PPL either ran out of memory or did not finish within four hours
on 8 out of the 13 benchmarks. Both ELINA and our decomposition are able to
analyze all benchmarks. We are faster than ELINA on all benchmarks with a maxi-
mum speedup of 5.9x on the P19_l59 benchmark. We also save significant memory
over ELINA. The speedups on the remaining (not shown) benchmarks over the
decomposed version of ELINA varies from 1.1x to 4x with an average of ≈ 1.4x.
partitions, we also computed (with the needed overhead) the finest partition after
each join and show the largest blocks under nfinest
max (maximum and average). As can
be observed, our partitions are strictly finer than the ones produced by our polyhe-
dra decomposition in Chapter 2 on all benchmarks due to the refinements for the
assignment and join transformers. Moreover, it can be seen that the average size of
our partitions is sometimes close to that of the finest partition but in many cases
there is room for further improvement. We consider this as an interesting item for
future work.
3.5.2 Octagon
The standard implementation of the Octagon domain works only with the con-
straint representation and approximates the best conditional and best assignment
transformers but implements best join and meet transformers. The widening is
defined syntactically. All of these transformers are in the classes of (decompos-
able) transformers from Section 3.4. Since the syntactic widening does not pro-
duce semantically equivalent outputs for semantically equivalent but syntactically
different inputs, our fixpoint can be different than the one computed by non-
decomposed analysis. However, we still get the same semantic invariants at fix-
point on most of our benchmarks. The standard implementation requires a strong
closure operation for the efficiency and precision of transformers such as join, con-
ditional, assignment, and others.
Table 3.4 shows the asymptotic complexity of standard Octagon transformers as
well as the strong closure operation with and without decomposition [189]. In the
table, n, ni , nmax have the same meaning as in Table 2.2. In can be seen that strong
82 generalizing online decomposition
Table 3.5: Speedup for the Octagon domain analysis with our decomposition over the non-
decomposed and the decomposed versions of ELINA.
Benchmark ELINA-ND ELINA-D Our Decomposition Speedup vs.
time(s) time(s) time(s) ELINA-ND ELINA-D
firewire_firedtv 0.4 0.07 0.07 5.7 1
net_fddi_skfp 28 2.6 1.9 15 1.4
mtd_ubi 3411 979 532 6.4 1.8
usb_core_main0 107 6.1 4.9 22 1.2
tty_synclinkmp 8.2 1 0.8 10 1.2
scsi_advansys 9.3 1.5 0.8 12 1.9
staging_vt6656 4.8 0.3 0.2 24 1.5
net_ppp 11 1.1 1.2 9.2 0.9
p10_l00 20 0.5 0.5 40 1
p16_l40 8.8 0.6 0.5 18 1.2
p12_l57 19 1.2 0.7 27 1.7
p13_l53 43 1.7 1.3 33 1.3
p19_l59 41 2.8 1.2 31 2.2
closure is the most expensive operation with cubic complexity. It is applied incre-
mentally with quadratic cost for the conditional and the assignment transformers.
We compare the performance of our approach for the standard Octagon analysis,
using the non-decomposed ELINA (ELINA-ND) and the decomposed (ELINA-D)
transformers from ELINA. All of these implementations store the constraint repre-
sentation using a single matrix with 64-bit doubles. The matrix requires quadratic
space in n. Thus, overall memory consumption is the same for all implementations.
We compare the runtime and report speedups for the end-to-end Octagon anal-
ysis in Table 3.5. We achieve up to 40x speedup for the end-to-end analysis over
the non-decomposed implementation. More importantly, we are either faster or
have the same runtime as the decomposed version of ELINA on all benchmarks
but one. The maximum speedup over the decomposed version of ELINA is 2.2x.
The speedups on the remaining (not shown) benchmarks vary between 1x and 1.6x
3.5 experimental evaluation 83
with an average of about 1.2x. Notice that on the mtd_ubi benchmark, the Octagon
analysis takes longer than the Polyhedra analysis. This is because the Octagon
widening takes longer to converge compared to the Polyhedra widening.
Table 3.6 shows the partition statistics for the Octagon analysis (as we did for
the Polyhedra analysis). It can be seen that while our refinements often produce
finer partitions than the decomposed version of ELINA, they are coarser on 3 of
the 13 benchmarks. This is because the decomposed transformers in ELINA are
specialized for the standard approximations of the conditional and assignment
transformers. We still achieve comparable performance on these benchmarks. Note
that the average size of our partitions is close to that of the finest in most cases.
3.5.3 Zone
The standard Zone domain uses only the constraint representation. The conditional
and assignment transformers are approximate whereas the meet and join are best
transformers [140]. The widening is defined syntactically. All of these transformers
are in the class of (decomposable) transformers from Section 3.4. As for Octagon,
fixpoint equivalence is not guaranteed due to syntactic widening. However, we still
get the same semantic invariants at fixpoint on most of our benchmarks. As for the
Octagon domain, a cubic closure operation is required. The domain transformers
have the same asymptotic complexity as in the Octagon domain.
We implemented both, a non-decomposed version as well as a version with our
decomposition method of the standard transformers. Both implementations store
the constraints using a single matrix with 64-bit doubles that requires quadratic
space in n. We compare the runtime and report speedups for the Zone analy-
84 generalizing online decomposition
Table 3.7: Speedup for the Zone domain analysis with our decomposition over the non-
decomposed implementation.
Benchmark Non-Decomposed Our Decomposition Speedup vs.
time(s) time(s) Non-Decomposed
firewire_firedtv 0.05 0.05 1
net_fddi_skfp 3 1.5 2
mtd_ubi 1.4 0.7 2
usb_core_main0 10.3 4.6 2.2
tty_synclinkmp 1.1 0.7 1.6
scsi_advansys 0.9 0.7 1.3
staging_vt6656 0.5 0.2 2.5
net_ppp 1.1 0.7 1.5
p10_l00 1.9 0.4 4.6
p16_l40 1.7 0.7 2.5
p12_l57 3.5 0.9 3.9
p13_l53 8.7 2.1 4.2
p19_l59 9.8 1.6 6.1
sis in Table 3.7. Our decomposition achieves speedups of up to 6x over the non-
decomposed implementation. The speedups over the remaining benchmarks not
shown in the table vary between 1.1x and 5x with an average of ≈ 1.6x.
Table 3.8 shows the partition statistics for the Zone analysis. It can be seen that
partitioning is the core reason for the speed-ups obtained and that the average size
of our partitions is close to that of the finest in most cases.
3.6 related work 85
3.5.4 Summary
Overall, our results show that the generic decomposition method proposed in
this chapter works well. It speeds up analysis compared to non-decomposed do-
mains significantly, and, importantly, the more expensive the domain, the higher
the speed-ups. Our generic method also compares favorably with the prior man-
ually decomposed domains provided by ELINA due to refined partitioning for
the outputs of the assignment and join transformers presented in Section 3.4. The
refinement is possible because we refine our model for observing the abstract el-
ements. We also show that the partitions computed during analysis are close to
optimal for Octagon and Zone but have further room for improvement for Polyhe-
dra. The challenge is how to obtain those with reasonable cost. Further speed-ups
can also be obtained by different implementations of the transformers that are, for
example, selectively approximate to achieve finer partitions.
We now discuss the work most closely related to Chapters 2 and 3 for improving
the performance of numerical program analysis.
tifying the factors; in contrast, Theorem 2.2.5 relies on semantic equality. Further,
their partition is coarser as it has only two blocks which increases the number of
generators.
The work of [137, 138, 218] focuses on improving the performance of standard
Polyhedra transformers based on constraint representation using parametric linear
programming. We believe that our approach is complementary and their transform-
ers could benefit from our decomposition.
[10, 75] provide conversion algorithms that can be more efficient than the
Chernikova algorithm used in ELINA currently for certain polyhedra. In the fu-
ture, a straightforward way to speedup ELINA would be to run the Chernikova
algorithm and the ones from [10, 75] in parallel and pick the fastest one.
[184] proposed an incremental conversion algorithm when the constraints are re-
moved. This is useful for speeding up conversions for the assignment and widening
transformers. Integrating their algorithms in ELINA would yield further speedups.
[20] introduces a new double representation and efficient conversion algorithm
for the NNC (not necessarily closed) Polyhedra domain which is more expressive
than the closed Polyhedra domain considered in our work. The follow-up work
[21] provides a domain implementation based on the new representation and con-
version algorithm. [219] identifies a number of optimization opportunities to make
NNC Polyhedra domain even faster. We believe that online decomposition can fur-
ther speedup the NNC Polyhedra domain without precision loss.
[33] performs bottom-up interprocedural analysis and computes procedure sum-
maries as a disjunction of convex polyhedra. We note that our framework of online
decomposition presented here can also be extended to disjunctions of polyhedra.
[29] targets the analysis of particular kinds of programs which produce polyhe-
dra that are not decomposable using online decomposition. The authors develop
new data structures for the efficient analysis of the intended programs with the
Polyhedra domain. Similarly, [22] identifies a list of optimization opportunities
when analyzing hybrid systems. We believe that while our approach is agnostic to
the program being analyzed, tailoring the Polyehdra domain implementation for
the class of programs being analyzed can further improve our performance and it
is an interesting direction of future work.
octagon Variable packing [27, 99] has been used for decomposing the Octagon
transformers. It partitions X statically before running the analysis based on certain
criteria. For example, two variables are in the same block of the partition if they
occur together in the same program statement. Although variable packing could
also be generalized to decompose transformers of other domains, it is fundamen-
tally different from our dynamic decomposition; it is not guaranteed to preserve
precision. This is because the permissible partition depends on the Octagon pro-
duced during the analysis. Therefore the enforced static partition would not be
permissible throughout the analysis and thus the analysis loses precision. Further,
the dynamic decomposition often yields even finer partitions than can be detected
3.6 related work 87
ers when the resulting decomposed analysis does not lose precision. We believe
that providing such conditions is interesting future work.
3.7 discussion
Figure 4.1: Policies for balancing precision and speed in static analysis.
89
90 reinforcement learning for numerical domains
We illustrate this connection on the example shown in Fig. 4.1. Here, a static
analyzer starts from an initial abstract state s0 depicted by the root node of the
tree. It transitions to a new abstract state, i.e., one of its children, by applying a
transformer. At each step, the analyzer can select either a precise but expensive
transformer Tprecise or a fast but imprecise one Tfast . If the analyzer follows a fixed
policy that guarantees maximum precision (orange path in Fig. 4.1), it will always
apply Tprecise and obtain a precise fixpoint at the rightmost leaf. However, the com-
putation is slow. Analogously, by following a fixed policy maximizing performance
(yellow path in Fig. 4.1), the analyzer always chooses Tfast and obtains an imprecise
fixpoint at the leftmost leaf. A policy maximizing both speed and precision (green
path in Fig. 4.1) yields a precise fixpoint but is computed faster as the policy ap-
plies both Tfast and Tprecise selectively. A globally optimal sequence of transformer
choices that optimizes both objectives is generally very difficult to achieve. How-
ever, as we show in this chapter, effective policies that work well in practice can be
obtained using principled RL based methods.
Fig. 4.2 explains the connection with RL intuitively. The left-hand side of the
figure shows an RL agent in a state st at timestep t. The state st represents the
agents’ knowledge about its environment. It takes an action at and moves to a new
state st+1 . The agent obtains a numerical reward rt+1 for the action at in the state st .
The agents’ goal is maximizing long-term rewards. Notice that the obtained reward
depends on both the action and the state. A policy maximizing short-term rewards
at each step does not necessarily yield better long term gains as the agent may reach
an intermediate state from which all further rewards are negative. RL algorithms
typically learn the so-called Q-function, which quantifies the expected long term
gains by taking action at in the state st . This setting also mimics the situation that
arises in an iterative static analysis shown on the right-hand side of Fig. 4.2. Here
the analyzer obtains a representation of the abstract state (its environment) via a
set of features φ. The analyzer selects among a set of transformers T with different
precision and speed. The transformer choice represents the action. The analyzer
obtains a reward in terms of speed and precision. In Fig. 4.1, a learned policy
would determine at each step whether to choose Tprecise or Tfast . To do that, for a
given state and action, the analyzer would compute the value of the Q-function
reinforcement learning for numerical domains 91
using the features φ. Querying the Q-function would then return the suggested
action from that state.
While the overall connection between the two areas is conceptually clean, the
details of making it work in practice pose significant challenges. The first is the
design of suitable approximations to be able to gain performance when precision is
lost. The second is the design of the features φ, which should be cheap to compute
yet be expressive enough to capture key properties of abstract states so that the
learned policy generalizes to unseen states. And finally, a suitable reward function
is needed that combines both, precision and performance.
The work in this chapter was published in [192].
Overall, we believe the recipe outlined in this chapter opens up the possibility
for speeding-up other analyzers with reinforcement learning based concepts.
92 reinforcement learning for numerical domains
and depends on the application domain. We collect the feature functions into a
vector φ(s, a) = (φ1 (s, a), φ2 (s, a), . . . , φ` (s, a)); doing so, the Q-function has the
form:
X̀
Q(s, a) = θj · φj (s, a) = φ(s, a) · θT , (4.1)
j=1
where θ = (θ1 , θ2 , . . . , θ` ) is the parameter vector. The goal of Q-learning with linear
function approximation is thus to estimate (learn) θ.
Algorithm 4.1 shows the Q-learning procedure with linear function approxima-
tion. In the algorithm, 0 6 γ < 1 is the discount factor which represents the dif-
ference in importance between immediate and future rewards. γ = 0 makes the
agent only consider immediate rewards while γ ≈ 1 gives more importance to
future rewards. The parameter 0 < α 6 1 is the learning rate that determines the
extent to which the newly acquired information overrides the old information. The
algorithm first initializes θ randomly. Then at each step t in an episode, the agent
takes an action at , moves to the next state st+1 and receives a reward r(st , at , st+1 ).
Line 12 in the algorithm shows the equation for updating the parameters θ. Notice
that Q-learning is an off-policy learning algorithm as the update in the equation
assumes that the agent follows a greedy policy (from state st+1 ) while the action
(at ) taken by the agent (in st ) need not be greedy.
Once the Q-function is learned, a policy p∗ for maximizing the agent’s cumula-
tive reward is obtained as:
Xt = {x1 , x2 , x3 , x4 , x5 , x6 },
P(Xt ) = {x1 − x2 + x3 6 0, x2 + x3 + x4 6 0, x2 + x3 6 0,
x3 + x4 6 0, x4 − x5 6 0, x4 − x6 6 0}.
Algorithm 4.2 shows our generic function block_split for splitting a given block
Xt and the associated factor P(Xt ). If the size of the block Xt is below the thresh-
old, no decomposition is performed. Otherwise, one out of several possible con-
straint removal algorithms is chosen (these are explained below) using the func-
tion choose_removal_algorithm learned by RL. Using the removal algorithm, a set
of constraints is removed obtaining O such that Xt decomposes into blocks of size
6 threshold. We note that O approximates P(Xt ) by construction. The associated
partition of Xt is computed from scratch by connecting the variables that occur in
the same constraint using the function finest_partition.
We discuss next choices for the removal algorithm that we consider. Depending
on the inputs, each may yield different decompositions and different precisions.
Example 4.2.2. Fig. 4.3 shows the graph G for P(Xt ) in Example 4.2.1. Applying the
Stoer-Wagner min-cut on G once will cut off x5 or x6 by removing the constraint
4.2 polyhedra analysis and approximate transformers 97
x2 x5
1
1 1
x1 3 x4
1 2 1
x3 x6
Once the weights are computed, we remove the constraint with the maximum
weight. The intuition is that variables in this constraint most likely occur in other
constraints in P(Xt ) and thus they do not become unconstrained upon constraint
removal. This reduces the loss of information. The procedure is repeated until we
get the desired partition of Xt .
Example 4.2.3. Applying the first definition of weights in Example 4.2.1, we get
n1 = 1, n2 = 3, n3 = 4, n4 = 4, n5 = 1, n6 = 1. The constraint x2 + x3 + x4 6 0
has the maximum weight of n2 + n3 + n4 = 11 and thus is chosen for removal.
Removing this constraint from P(Xt ) does not yet yield a decomposition; thus we
have to repeat. Doing so {x3 + x4 6 0} is chosen. Now, P(Xt ) \ M = {x1 − x2 + x3 6
0, x2 + x3 6 0, x4 − x5 6 0, x4 − x6 6 0} which can be decomposed into two factors
{x1 − x2 + x3 6 0, x2 + x3 6 0} and {x4 − x5 6 0, x4 − x6 6 0} corresponding to blocks
{x1 , x2 , x3 } and {x4 , x5 , x6 }, respectively, each of size 6 threshold.
98 reinforcement learning for numerical domains
Our basic objective when approximating is to ensure that the maximal block size
remains below a chosen threshold. Besides splitting to ensure this, there can also be
a benefit of merging small blocks, again provided the resulting block size remains
below the threshold. The merging itself does not change precision, but the resulting
transformer may be more precise when working on larger blocks. In particular this
can happen with the inputs of the join transformer as we will explain later.
We consider the following three merging strategies. To simplify the explanation,
we assume that the blocks in πP are ordered by ascending size:
2. Merge smallest first: We start merging the smallest blocks as long as the size
stays below the threshold. These blocks are then removed and the procedure
is repeated on the remaining set.
3. Merge large with small: We start to merge the largest block with the smallest
blocks as long as the size stays below the threshold. These blocks are then
removed and the procedure is repeated on the remaining set.
Using the notation and terminology from Chapter 2, we now show how our ap-
proximation methods discussed above can be instantiated for the Polyhedra analy-
sis. So far the discussion has been rather generic, i.e., approximation could be done
at any time during analysis. We choose to perform approximation only with the
join transformer. This is because as explained in Section 2.4 and Section 2.5, the join
usually coarsens the partitions substantially and is the most expensive transformer
of the Polyhedra analysis.
Let πcommon = πP1 t πP2 be a common permissible partition for the inputs P1 , P2
of the join transformer. Then, from Chapter 2, a permissible partition for the (not
approximated) output is obtained by keeping all blocks Xt ∈ πcommon for which
P1 (Xt ) = P2 (Xt ) in the output partition πO , and fusing all remaining blocks into
one. Formally, πO = {N} ∪ U, where
[
N = {Xk ∈ πcommon : P1 (Xk ) 6= P2 (Xk )}, U = {Xk ∈ πcommon : P1 (Xk ) = P2 (Xk )}.
4.2 polyhedra analysis and approximate transformers 99
Algorithm 4.3 shows the overall algorithm for approximating the Polyhedra join
transformer and is explained in greater detail next.
merging blocks All blocks in B \ Bt obey the threshold size and we can ap-
ply merging to obtain larger blocks Xm of size 6 threshold to increase precision of
the subsequent transformers. The merging function choose_merge_algorithm in Al-
gorithm 4.3 is learned by RL. The join is then applied on the factors P1 (Xm ), P2 (Xm )
and the result is added to the output O.
need for rl. Different choices of the threshold, splitting, and merge strategies
in Algorithm 4.3 yield a range of transformers with different performance and
precision depending on the inputs. Determining the suitability of a given choice
on inputs is highly non-trivial and thus we use RL to learn a policy that makes
decisions adapted to the join inputs. We note that all of our approximate trans-
formers are non-monotonic; however, the analysis always converges to a fixpoint
when combined with widening [13].
• Extracting the RL state s from the abstract program state numerically using a
set of features.
• Defining actions a as the choices among the threshold, merge and split meth-
ods defined in the previous section.
• Defining a reward function r favoring both high precision and fast execution.
states. We consider nine features for defining a state s for RL. The features ψi ,
their extraction complexity, and their typical range on our benchmarks are shown
4.3 reinforcement learning for polyhedra analysis 101
in Table 4.2. The first seven features capture the asymptotic complexity of the join,
as in Table 2.2, on the input polyhedra P1 , P2 . These are the number of blocks,
the distribution (using maximum, minimum and average) of their sizes, and the
distribution (using maximum, minimum and average) of the number of generators
in the different factors. The precision of the inputs is captured by considering the
number of variables xi ∈ X with finite upper and lower bound, and the number of
those with only a finite upper or lower bound in both P1 and P2 .
As shown in Table 4.2, each state feature ψi returns a natural number; however,
its range can be rather large, resulting in a massive state space. To ensure scalability
and generalization of learning, we use bucketing to reduce the state space size by
clustering states with similar precision and expected join cost. The number ni of
buckets for each ψi and their definition are shown in the last two columns of
Table 4.2. Using bucketing, the RL state s is then a 9-tuple consisting of the indices
of buckets where each index indicates the bucket that ψi ’s return value falls into.
• th ∈ {1, 2, 3, 4} depending on threshold ∈ [5, 9], [10, 14], [15, 19], or [20, ∞).
• r_algo ∈ {1, 2, 3}: the choice of a constraint removal, i.e., splitting method.
All three of these have been discussed in detail in Section 4.2. The threshold values
were chosen based on performance characterization on our benchmarks. With the
above, we have 36 possible actions per state.
first computing the smallest (often unbounded) box1 covering P1 t P2 which has
complexity O(ng). We then compute the following quantities from this box:
• nhb : number of variables xi with either finite upper or finite lower bounds,
i.e., xi ∈ (−∞, u] or xi ∈ [l, ∞).
Further, we measure the runtime in CPU cycles cyc for the approximate join
transformer. The reward is then defined by
As the order of precision for different types of intervals is: singleton > bounded
> half bounded interval, the reward function in (4.3) weighs their numbers by
3, 2, 1. The reward function in (4.3) favors both high performance and precision. It
also ensures that the precision part (3 · ns + 2nb + nhb ) has a similar magnitude
range as the performance part (log10 (cyc))2 .
X
9 X
ni X
36
Q(s, a) = θijk · φijk (s, a). (4.5)
i=1 j=1 k=1
random exploration to learn good Q-estimates that can be exploited later. The num-
ber of episodes required for obtaining such estimates is infeasible for the Polyhedra
analysis as an episode typically contains thousands of join calls. Therefore, we gen-
erate actions for Q-learning by exploiting the optimal policy for precision (which
always selects the precise join) and explore for performance by choosing a ran-
dom approximate join: both with a probability of 0.5. We note that we also tried
the exploitation probabilities of 0.7 and 0.9. However, the resulting policies had a
suboptimal performance during testing due to limited exploration.
Formally, the action at := p(st ) selected in state st during learning is given by
at = (th, r_algo, m_algo) where
rand() % 4+1 with probability 0.5
th = P|B| ,
min(4, ( i=1 |Xk |)/5 + 1) with probability 0.5 (4.6)
r_algo = rand() % 3 + 1, m_algo = rand() % 3 + 1.
obtaining the learned policy. After learning over the dataset D, the
learned approximating join transformer in state st chooses an action according
to (4.2) by selecting the maximal value over all actions. The value of th = 1, 2, 3, 4 is
decoded as threshold = 5, 10, 15, 20 respectively.
• Poly-Init: uses a random approximate join with probability 0.5 based on (4.6).
analyzer For both learning and testing, we used a newer version of the crab-
llvm analyzer that is different from the versions used in Section 2.5 and Section 3.5.
benchmarks We chose benchmarks from the Linux Device Drivers (LD) cate-
gory of the popular software verification competition [24]. Some of the benchmarks
we used for both learning and testing were also used in Section 2.5 and Section 3.5
while others are different.
• No overfitting: the benchmark was not used for learning the policy.
Table 4.4: Timings (seconds) and precision of approximations (%) w.r.t. ELINA.
Benchmark #Program ELINA Poly-RL Poly-Fixed Poly-Init
Points time time precision time precision time precision
wireless_airo 2372 877 6.6 100 6.7 100 5.2 74
net_ppp 680 2220 9.1 87 TO 34 7.7 55
mfd_sm501 369 1596 3.1 97 1421 97 2 64
ideapad_laptop 461 172 2.9 100 157 100 MO 41
pata_legacy 262 41 2.8 41 2.5 41 MO 27
usb_ohci 1520 22 2.9 100 34 100 MO 50
usb_gadget 1843 66 37 60 35 60 TO 40
wireless_b43 3226 19 13 66 TO 28 83 34
lustre_llite 211 5.7 4.9 98 5.4 98 6.1 54
usb_cx231xx 4752 7.3 3.9 ≈100 3.7 ≈100 3.9 94
netfilter_ipvs 5238 20 17 ≈100 9.8 ≈100 11 94
inspecting the learned policy Our learned policy chooses in the majority
of cases threshold=20, the binary weighted constraint removal algorithm for split-
ting, and the merge smallest first algorithm for merging. Poly-Fixed always uses
these values for defining an approximate transformer, i.e., it follows a fixed strategy.
Our experimental results show that following this fixed strategy results in subop-
timal performance compared to our learned policy that makes adaptive, context-
sensitive decisions to improve performance.
poly-rl vs elina In Table 4.4, Poly-RL obtains > 7x speed-up over ELINA
on 6 of the 11 benchmarks with a maximum of 515x speedup for the mfd_sm501
benchmark. It also obtains the same or stronger invariants on > 87% of program
points on 8 benchmarks. Note that Poly-RL obtains both large speedups and the
same invariants at all program points on 3 benchmarks.
3 The benchmarks contain up to 50K LOC but SeaHorn encodes each basic block as one program
point; thus, the number of points in Table 4.4 is significantly reduced.
106 reinforcement learning for numerical domains
Many of the constraints produced by the precise join transformer from ELINA
are removed by the subsequent transformers in the analysis which allows Poly-
RL to obtain the same invariants as ELINA despite the loss of precision dur-
ing join in most cases. Due to non-monotonic join transformers, Poly-RL can
produce fixpoints non-comparable to those produced by ELINA. Because of the
non-comparability, the quality of the obtained invariants cannot be established
using our precision metric. We take a conservative approach and mark all non-
comparable invariants as being imprecise. We note that this is the case for the 3
benchmarks in Table 4.4 where Poly-RL obtains low precision.
We also tested Poly-RL on 17 benchmarks from the product lines category.
ELINA did not finish within an hour on any of these benchmarks whereas Poly-RL
finished within 1 second. Poly-RL had 100% precision on the subset of program
points at which ELINA produces invariants. With Poly-RL, SeaHorn successfully
discharged the assertions. We did not include these results in Table 4.4 as the pre-
cision w.r.t. ELINA cannot be completely compared.
poly-rl vs poly-init From (4.6), Poly-Init takes random actions and thus the
quality of its result varies depending on the run. Table 4.4 shows the results on a
sample run. Poly-RL is more precise than Poly-Init on all benchmarks in Table 4.4.
Poly-Init also does not finish on 4 benchmarks.
As our work touches on several topics, we next survey some of the work that is
most closely related to ours.
the Octagon domain analysis on a given program. The abstract transformers are
then applied individually on the relevant factors. The work of [131] provides algo-
rithms to learn minimal values of the tuning parameters for points-to analysis. The
work of [105] proposes a data-driven approach that automatically learns a good
heuristic rule for choosing important context elements for points-to analysis. The
difference between all of these works and ours is that the above approaches fix
the learning parameters for a given program. We believe that better tuning of cost
and precision can be achieved by changing the learning parameters dynamically
based on the abstract states encountered by the analyzer during the analysis. Fur-
thermore, these approaches measure precision by the number of queries proved
whereas we target the stronger notion of fixed point equivalence.
The work of [100] uses reinforcement learning to select a subset of variables to
be tracked in a flow sensitive manner with the weaker Interval domain so that the
memory footprint of the resulting analysis fits within a pre-determined memory
budget and the loss of precision is minimized. The work of [47] uses reinforcement
learning to guide relational proof search for verifying relational properties defined
over multiple programs. The authors in [7] use bayesian optimization to learn a
verification policy that guides numerical domain analysis during proof search.
In our recent work [96], we presented a new structured learning method based
on graph neural networks for speeding up numerical domains. The method is
more generic than the one presented in this chapter as the approximate transform-
ers there are not derived via custom splitting and merging algorithms but simply
obtained via constraint removal. Further, the features are more precise than ours
and capture structural dependencies between constraints. The results show that the
new method outperforms Poly-RL and also outperforms [191] when instantiated
for the Octagon domain.
In a recent work by [119], the authors observe that many programs share com-
mon pieces of code. Thus analysis results can be reused across programs. To
achieve this, the authors use cross program training. This is a complementary ap-
proach to the one taken in this work and we believe that in the future the two
methods can be combined to further improve the performance of static analysis.
The work of [25] automatically learns abstract transformers from examples. This
is a rather different approach, instead, we build approximations on top of standard
transformers based on online decomposition. We believe that the approach of [25]
can be combined with ours in the future to automate the process of generating
approximate abstract transformers.
online decomposition The work of [189, 190, 191] improve the performance
of the numerical domain analysis based on online decomposition without losing
precision. We compare against [191] in this chapter. As our experimental results
suggest, the performance of Polyhedra analysis can be significantly improved with
our approach. Further, some benchmarks are inherently dense, i.e., fully precise
108 reinforcement learning for numerical domains
online decomposition cannot decompose the set of variables |X| efficiently, and in
such cases our approach can be used to generate precise invariants efficiently.
testing In recent years, there has been emerging interest in learning to produce
test inputs for finding program bugs or vulnerabilities. AFLFast [28] models pro-
gram branching behavior with Markov Chain, which guides the input generation.
Several works train neural networks to generate new test inputs, where the training
set can be obtained from an existing test corpus [61, 84], inputs generated earlier
in the testing process [178], or inputs generated by symbolic execution [95].
4.6 discussion 109
4.6 discussion
111
5
D E E P P O LY D O M A I N F O R C E R T I F Y I N G N E U R A L N E T W O R K S
this work In this work, we propose a new polyhedral domain, called Deep-
Poly, that makes a step forward in addressing the challenge of certifying neural
networks with respect to both scalability and precision. The key technical idea be-
hind our work is a novel abstract interpreter specifically tailored to the setting of
neural networks. Concretely, our abstract domain is a combination of floating-point
polyhedra with intervals, coupled with custom abstract transformers for common
neural network functions such as affine transforms, the rectified linear unit (ReLU),
sigmoid and tanh activations, and the maxpool operator. These abstract transform-
ers are carefully designed to exploit key properties of these functions and balance
analysis scalability and precision. As a result, DeepPoly is more precise than [211],
[78] and [186], yet can handle large convolutional networks and is also sound for
floating-point arithmetic.
Rotation
Fig. 5.1. Here, we will illustrate two kinds of robustness properties: L∞ -norm based
perturbations (first row) and image rotations (second row).
In the first row, we are given an image of the digit 7 (under “Original”). Then,
we consider an attack where we allow a small perturbation to every pixel in the
original image (visually this may correspond to darkening or lightening the im-
age). That is, instead of a number, each pixel now contains an interval. If each of
these intervals has the same size, we say that we have formed an L∞ ball around
the image (typically with a given epsilon ∈ R). This ball is captured visually by
the Lower image (in which each pixel contains the smallest value allowed by its
interval) and the Upper image (in which each pixel contains the largest value al-
lowed by its interval). We call the modification of the original image to a perturbed
version inside this ball an attack, reflecting an adversary who aims to trick the net-
work. There have been various works which aim to find such an attack, otherwise
called an adversarial example (e.g., [40]), typically using gradient-based methods.
For our setting however, the question is: are all possible images “sitting between”
the Lower and the Upper image classified to the same label as the original? Or, in
other words, is the neural net robust to this kind of attack?
The set of possible images induced by the attack is also called an adversarial region.
Note that enumerating all possible images in this region and simply running the
network on each to check if it is classified correctly, is practically infeasible. For
example, an image from the standard MNIST [124] dataset contains 784 pixels
and a perturbation that allows for even two values for every pixel will lead to 2784
images that one would need to consider. In contrast, our domain DeepPoly can
automatically prove that all images in the adversarial region classify correctly (that
is, no attack is possible) by soundly propagating the entire input adversarial region
through the abstract transformers of the network.
We also consider a more complex type of perturbation in the second row. Here,
we rotate the image by an angle and our goal is to show that any rotation up to
this angle classifies to the same label. In fact, we consider an even more challenging
problem where we not only rotate an image but first form an adversarial region
around the image and then reason about all possible rotations of any image in that
region. This is challenging, as again, the enumeration of images is infeasible when
using geometric transformations that perform linear interpolation (which is needed
to improve output image quality). Further, unlike the L∞ ball above, the entire set
of possible images represented by a rotation up to a given angle does not have a
116 deeppoly domain for certifying neural networks
• A new abstract domain for the certification of neural nets. The domain com-
bines floating-point polyhedra and intervals with custom abstract transform-
ers for affine transforms, ReLU, sigmoid, tanh, and maxpool functions. These
abstract transformers carefully balance the scalability and precision of the
analysis (Section 5.3).
i1 ∈ [−1, 1] 0 0 1
1 1 1
1 1 0
1 1 1
i2 ∈ [−1, 1] -1
0
-1
0
1
0
5.1 overview
hx1 > −1, hx3 > x1 + x2 , hx5 > 0, hx7 > x5 + x6 , hx9 > x7 , hx11 > x9 + x10 + 1,
x1 6 1, x3 6 x1 + x2 , x5 6 0.5 · x3 + 1, x7 6 x5 + x6 , x9 6 x7 , x11 6 x9 + x10 + 1,
l1 = −1, l3 = −2, l5 = 0, l7 = 0, l9 = 0, l11 = 1,
u1 = 1i u3 = 2i u5 = 2i u7 = 3i u9 = 3i u11 = 5.5i
0 0 1
[-1,1] 1 max(0, x3 ) 1 max(0, x7 ) 1
x1 x3 x5 x7 x9 x11
1 1 0
1 1 1
x2 x4 x6 x8 x10 x12
[-1,1] -1 max(0, x4 ) -1 max(0, x8 ) 1
0 0 0
hx2 > −1, hx4 > x1 − x2 , hx6 > 0, hx8 > x5 − x6 , hx10 > 0, hx12 > x10 ,
x2 6 1, x4 6 x1 − x2 , x6 6 0.5 · x4 + 1, x8 6 x5 − x6 , x10 6 0.5 · x8 + 1, x12 6 x10 ,
l2 = −1, l4 = −2, l6 = 0, l8 = −2, l10 = 0, l12 = 0,
u2 = 1i u4 = 2i u6 = 2i u8 = 2i u10 = 2i u12 = 2i
Figure 5.3: The neural network from Fig. 5.2 transformed for analysis with the DeepPoly
abstract domain.
x1 + x2 6 x3 6 x1 + x2
(5.1)
x1 − x2 6 x4 6 x1 − x2
The transformer uses these constraints and the constraints for x1 , x2 to compute
l3 = l4 = −2 and u3 = u4 = 2.
Next, the transformer for the ReLU activation is applied. In general, the out-
put xj of the ReLU activation on variable xi is equivalent to the assignment
xj := max(0, xi ). If ui 6 0, then our abstract transformer sets the state of the vari-
able xj to 0 6 xj 6 0, lj = uj = 0. In this case, our abstract transformer is exact. If
li > 0, then our abstract transformer adds xi 6 xj 6 xi , lj = li , uj = ui . Again, our
abstract transformer is exact in this case.
However, when li < 0 and ui > 0, the result cannot be captured exactly by
our abstraction and we need to decide how to lose information. Fig. 5.4 shows
several candidate convex approximations of the ReLU assignment in this case. The
approximation [69] of Fig. 5.4 (a) minimizes the area in the xi , xj plane, and would
add the following relational constraints and concrete bounds for xj :
xi 6 xj , 0 6 xj ,
xj 6 ui · (xi − li )/(ui − li ). (5.2)
lj = 0, uj = ui .
However, the approximation in (5.2) contains two lower polyhedra constraints for
xj , which we disallow in our abstract domain. The reason for this is the potential
blowup of the analysis cost as it proceeds. We will explain this effect in more detail
later in this section.
120 deeppoly domain for certifying neural networks
xj xj xj
+ µ + µ + µ
λ · xi λ · xi λ · xi
xj = =
xi xj = xj =
xi xi xi
xj
li ui li ui li ui
xi
=
xj
(a) (b) (c)
Figure 5.4: Convex approximations for the ReLU function: (a) shows the convex approxi-
mation [69] with the minimum area in the input-output plane, (b) and (c) show
the two convex approximations used in DeepPoly. In the figure, λ = ui /(ui − li )
and µ = −li · ui /(ui − li ).
To avoid this explosion we further approximate (5.2) by allowing only one lower
bound. There are two ways of accomplish this, shown in Fig. 5.4 (b) and (c), both of
which can be expressed in our domain. During analysis we always consider both
and choose the one with the least area dynamically.
The approximation from Fig. 5.4 (b) adds the following constraints and bounds
for xj :
0 6 xj 6 ui · (xi − li )/(ui − li ),
(5.3)
lj = 0, uj = ui .
The approximation from Fig. 5.4 (c) adds the following constraints and bounds:
xi 6 xj 6 ui · (xi − li )/(ui − li ),
(5.4)
lj = li , uj = ui .
that the approximations in Fig. 5.4 (b) and (c) cannot be captured by the Zonotope
abstraction as used in [78, 191].
In our example, for both x3 and x4 , we have l3 = l4 = −2 and u3 = u4 = 2. The
areas are equal in this case; thus we choose (5.3) and get the following constraints
and bounds for x5 and x6 :
0 6 x5 6 0.5 · x3 + 1, l5 = 0, u5 = 2,
(5.5)
0 6 x6 6 0.5 · x4 + 1, l6 = 0, u6 = 2.
Next, we apply the abstract affine transformer, which first adds the following con-
straints for x7 and x8 :
x5 + x6 6 x7 6 x5 + x6 ,
(5.6)
x5 − x6 6 x8 6 x5 − x6 .
It is possible to compute bounds for x7 and x8 from the above equations by sub-
stituting the concrete bounds for x5 and x6 . However, the resulting bounds are in
general too imprecise. Instead, we can obtain better bounds by recursively substitut-
ing the polyhedral constraints until the bounds only depend on the input variables
for which we then use their concrete bounds. In our example we substitute the
relational constraints for x5 , x6 from equation (5.5) to obtain:
0 6 x7 6 0.5 · x3 + 0.5 · x4 + 2,
(5.7)
−0.5 · x4 − 1 6 x8 6 0.5 · x3 + 1.
Replacing x3 and x4 with the constraints in (5.1), we get:
0 6 x7 6 x1 + 2,
(5.8)
−0.5 · x1 + 0.5 · x2 − 1 6 x8 6 0.5 · x1 + 0.5 · x2 + 1.
Now we use the concrete bounds of ±1 for x1 , x2 to obtain l7 = 0, u7 = 3 and
l8 = −2, u8 = 2. Indeed, this is more precise than if we had directly substituted the
concrete bounds for x5 and x6 in (5.6) because that would have produced concrete
bounds l7 = 0, u7 = 4 (which are not as tight as the ones above).
where x11 , x12 = Nfc (i1 , i2 ) are the concrete values for variables x11 and x12 pro-
duced by our small fully-connected (fc) neural network Nfc for inputs i1 , i2 .
In our simple example, this amounts to proving whether x11 − x12 > 0 or x12 −
x11 > 0 holds given the abstract results computed by our analysis. Note that using
the concrete bounds for x11 and x12 , that is, l11 , l12 , u11 , and u12 leads to the bound
[−1, 5.5] for x11 − x12 and [−5.5, 1] for x12 − x11 and hence we cannot prove that
either constraint holds. To address this imprecision, we first create a new temporary
variable x13 and apply our abstract transformer for the assignment x13 := x11 − x12 .
Our transformer adds the following constraint:
The transformer then computes bounds for x13 by backsubstitution (to the first
layer), as described so far, which produces l13 = 1 and u13 = 4. As the (concrete)
lower bound of x13 is greater than 0, our analysis concludes that x11 − x12 > 0 holds.
Hence, we have proved our (robustness) specification. Of course, if we had failed
to prove the property, we would have tried the same analysis using the second
constraint (i.e., x12 > x11 ). And if that would fail, then we would declare that we
are unable to prove the property. For our example, this was not needed since we
were able to prove the first constraint.
There are three types of neurons: m input neurons whose activations form the
input to the network, n output neurons whose activations form the output of the
network, and all other neurons, called hidden, as they are not directly observed.
classification For a neural network that classifies its inputs to multiple pos-
sible labels, n is the number of distinct classes, and the neural network classifies a
given input x to a given class k if N(x)k > N(x)j for all j with 1 6 j 6 n and j 6= k.
In this section, we introduce our abstract domain as well as the abstract trans-
formers needed to analyze the four kinds of assignment statements mentioned in
Section 5.2.
Elements in our abstract domain An consist of a set of polyhedral constraints
of a specific form, over n variables. Each constraint relates one variable to a linear
combination of the variables of a smaller index. Each variable has two associated
polyhedral constraints: one lower bound and one upper bound. In addition, the
abstract element records derived interval bounds for each variable. Formally, an
abstract element a ∈ An over n variables can be written as a tuple a = ha6 , a> , l, ui
where
P
a6 >
i , ai ∈ {x 7→ v + j∈[i−1] wj · xj | v ∈ R ∪ {−∞, +∞}, w ∈ Ri−1 } for i ∈ [n]
and l, u ∈ (R ∪ {−∞, +∞})n . Here, we use the notation [n] := {1, 2, . . . , n}. The
concretization function γn : An → P(Rn ) is then given by
from the two ReLU branches, i.e., the convex hull of {lj 6 xj 6 uj , xj 6 0, xi = 0}
and {lj 6 xj 6 uj , xj > 0, xi = xj }:
0 6 xi , xj 6 xi ,
xi 6 uj · (xj − lj )/(uj − lj ).
As there is only one upper bound for xi , we obtain the following rule:
On the other hand, we have two lower bounds for xi : xj and 0. Any convex com-
bination of those two constraints is still a valid lower bound. Therefore, we can
set
ai06 (x) = λ · xj ,
for any λ ∈ [0, 1]. We select the λ ∈ {0, 1} that minimizes the area of the resulting
shape in the (xi , xj )-plane. Finally, we set li0 = λ · lj and ui0 = uj .
Let f : Ri−1 → Ri be a function that executes xi ← maxj∈J xj for some J ⊆ [i − 1]. The
corresponding abstract maxpool transformer is Tf# (ha6 , a> , l, ui) = ha 06 , a 0> , l 0 , u 0 i
where ak06 =a6 0> > 0 0
k , ak = ak , lk = lk and uk = uk for k < i. For the new component
i, there are two cases. If there is some k ∈ J with uj < lk for all j ∈ J \ {k}, then
ai06 (x) = ai0> (x) = xk , li0 = lk and ui0 = uk . Otherwise, we choose k ∈ J such that lk
is maximized and set ai06 (x) = xk , li0 = lk and ai0> (x) = ui0 = maxj∈J uj .
5.3 abstract domain and transformers 127
We iterate until we reach bs 0 with bs 0 (x) = v 00 (i.e., s 0 is the smallest number with
this property). We then set li0 = v 00 .
We compute ui0 in an analogous fashion: to obtain ui0 , we start with c1 (x) =
P
ai0> (x). If we have ct (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈ Rk ,
then X
ct+1 (x) = v 0 + max(0, wj0 ) · aj0> (x) + min(wj0 , 0) · aj06 (x) .
j∈[k]
We now show how to use our analysis to prove robustness of a neural network
with p inputs, q hidden activations and r output classes, resulting in a total of
p + q + r activations. More explicitly, our goal is to prove that the neural network
classifies all inputs satisfying the given interval constraints (the adversarial region)
to a particular class k.
We first create an abstract element a = ha6 , a> , l, ri over p variables, where
a6 >
i (x) = li and ai (x) = ui for all i. The bounds li and ui are initialized such
that they describe the adversarial region. For example, for the adversarial region in
Fig. 5.2, we get
a = h(x 7→ l1 , x 7→ l2 ), (x 7→ u1 , x 7→ u2 ), (−1, −1), (1, 1)i.
Then, the analysis proceeds by processing assignments for all q hidden activations
and the r output activations of the neural network, layer by layer, processing nodes
in ascending order of variable indices, using their respective abstract transformers.
Finally, the analysis executes the following r − 1 (affine) assignments in the abstract:
xp+q+r+1 ← xp+q+k − xp+q+1 , . . . , xp+q+r+(k−1) ← xp+q+k − xp+q+(k−1) ,
xp+q+r+k ← xp+q+k − xp+q+(k+1) , . . . , xp+q+r+(r−1) ← xp+q+k − xp+q+r .
128 deeppoly domain for certifying neural networks
As output class k has the highest activation if and only if those differences are all
positive, the neural network is proved robust if for all i ∈ {p + q + r + 1, . . . , p + q +
r + (r − 1)} we have 0 < li . Otherwise, our robustness analysis fails to certify.
For the neural network in Fig. 5.2, if we want to prove that class 1 is most likely,
this means we execute one additional assignment x13 ← x11 − x12 . Abstract inter-
pretation derives the bounds l13 = 1, u13 = 4. The neural network is proved robust,
because l13 is positive.
The above discussion showed how to use our abstract transformers to prove
robustness. However, a similar procedure could be used to prove standard pre/post
conditions (by performing the analysis starting with the pre-condition).
In this section, we prove that our abstract transformers are sound, and that they
preserve the invariant. Formally, for Tf# (a) = a 0 we have Tf (γi−1 (a)) ⊆ γi (a 0 ) and
×
γi (a 0 ) ⊆ k∈[i] [lk0 , uk0 ].
Otherwise, we have lj < 0 and 0 < uj and that (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) >
xk ) implies lj 6 xj 6 uj and therefore
Therefore, in all cases, Tf (γi−1 (a)) ⊆ γi (Tf# (a)). Note that we lose precision only in
the last case.
Theorem 5.3.3. The sigmoid and tanh abstract transformers are sound.
Proof. A function g : R → R with g 0 (x) > 0 and 0 6 g 00 (x) ⇔ 0 6 x is monotoni-
cally increasing, and furthermore, g|(−∞,0] (the restriction to (−∞, 0]) is convex and
g|(0,∞) is concave.
Let f : Ri−1 → Ri execute the assignment xi ← g(xj ) for some j < i, and let
a ∈ Ai−1 be arbitrary. We have
where the inclusion is strict because we have dropped the constraint xi = g(xj ).
Therefore, the abstract transformer is sound.
Proof. Let f : Ri−1 → Ri execute the assignment xi ← maxj∈J (0, xj ) for some J ⊆
[i − 1], and let a ∈ Ai−1 be arbitrary. We have
There are two cases. If there is some k ∈ J with uj < lk for all j ∈ J \ {k}, then
(∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) implies that maxj∈J xj = xk and therefore
Otherwise, the transformer chooses a k with maximal lk . We also know that (∀k ∈
[i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) implies xj 6 uj for all j ∈ J, and therefore
Proof. If uj 6 0, we have ai06 (x) = ai0> (x) = 0 and therefore (∀k ∈ [i]. ak06 (x) 6
xk ∧ ak0> (x) > xk ) implies 0 = li0 = ai06 (x) 6 xi 6 ai0> (x) = ui0 = 0. If 0 6 lj , we have
ai06 (x) = ai0> (x) = xj and therefore (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies
li0 = lj 6 xj = xi 6 uj = ui0 . Otherwise, we have lj < 0 and 0 < uj , as well as
x −l
a 06 (x)i = λ · xj , a 0> (x)i = uj · ujj −ljj , and so (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk )
implies li0 = λ · lj 6 xi 6 uj = uj0 .
Theorem 5.3.7. The sigmoid and tanh abstract transformers preserve the invariant.
Proof. The constraints (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) imply lj 6 xj 6 uj and
by monotonicity of g, we obtain li0 = g(lj ) 6 xi 6 g(uj ) = ui0 using xi = g(xj ).
Proof. The maxpool transformer either sets ai06 (x) = ai0> (x) = xk and li0 = lk and
ui0 = uk , in which case (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies li0 = lk 6
xk = xi 6 uk = ui0 , or it sets ai06 (x) = xk , li0 = lk and ui0 = ai0> (x), such that
(∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ), which implies li0 6 xi 6 ui0 .
Proof. Note that s 0 and t 0 are finite, because in each step, the maximal index of a
variable whose coefficient in, respectively, bs and ct is nonzero decreases by at least
one. Assume ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk . We have to show that bs 0 (x) 6 xi
and ct 0 (x) > xi . It suffices to show that ∀s ∈ [s 0 ]. bs (x) 6 xi and ∀t ∈ [t 0 ]. ct (x) > xi .
To show ∀s ∈ [s 0 ]. bs (x) 6 xi , we use induction on s. We have b1 (x) = ai06 (x) 6 xi .
P
Assuming bs (x) 6 xi and bs (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈
Rk , we have
X
xi > bs (x) = v 0 + wj0 · xj
j∈[k]
X
= v0 + (max(0, wj0 ) ·xj + min(wj0 , 0) ·xj )
j∈[k]
| {z } | {z }
>0 60
X
> v0 + (max(0, wj0 ) · aj06 (x) + min(wj0 , 0) · aj0> (x))
j∈[k]
= bs+1 (x).
5.3 abstract domain and transformers 133
To show ∀t ∈ [t 0 ]. ct (x) > xi , we use induction on t. We have c1 (x) = ai0> (x) > xi .
P
Assuming ct (x) > xi and ct (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈
Rk , we have
X
xi 6 ct (x) = v 0 + wj0 · xj
j∈[k]
X
= v0 + (max(0, wj0 ) ·xj + min(wj0 , 0) ·xj )
j∈[k]
| {z } | {z }
>0 60
X
6 v0 + (max(0, wj0 ) · aj0> (x) + min(wj0 , 0) · aj06 (x))
j∈[k]
= ct+1 (x).
Therefore, (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies li0 6 xi 6 ui0 .
Our abstract domain and its transformers above are sound under real arithmetic
but unsound under floating-point arithmetic if one does not take care of the
rounding errors. To obtain soundness, let F be the set of floating-point values
and ⊕f , f , ⊗f , f be the floating-point interval addition, subtraction, multiplica-
tion, and division, respectively, as defined in [141] with lower bounds rounded
towards −∞ and upper bounds rounded towards +∞. For a real constant c, we
use c− , c+ ∈ F to denote the floating-point representation of c with rounding to-
wards −∞ and +∞ respectively. We use the standard interval linear form, where
the coefficients in the constraints are intervals instead of scalars, to define an ab-
stract element a ∈ An over n variables in our domain as a tuple a = ha6 , a> , l, ui
where for i ∈ [n]:
P
a6 > − +
i , ai ∈{x 7→ [v , v ] ⊕f
− +
j∈[i−1] [wj , wj ] ⊗f xj | v− , v+ ∈ F ∪ {−∞, +∞},
w− , w+ ∈ Fi−1 }
and l, u ∈ (F ∪ {−∞, +∞})n . For a floating-point interval [li , ui ], let inf and sup
be functions that return its lower and upper bound. The concretization function
γn : An → P(Fn ) is given by
[λ, λ] ⊗f xj 6 [1, 1] ⊗f xi ,
[1, 1] ⊗f xi 6 [ψ− , ψ+ ] ⊗f xj ⊕f [µ− , µ+ ].
ui = u+ j .
Note that ⊕f and ⊗f as defined in [141] add extra error terms that are not shown
above for simplicity so that our results contain all values that can arise by executing
the different additions and multiplications in different orders. Here, [θ− l , θl ] ∈ F
+
are the floating-point values of the lower bound of the interval [wj0− , wj0+ ] ⊗f [lj , uj ]
rounded towards −∞ and +∞ respectively. We iterate until we reach bs 0 with
bs 0 (x) = [v 00− , v 00+ ], i.e., s 0 is the smallest number with this property. We then set
li0 = v 00− . We compute ui0 analogously.
5.4 refinement of analysis results 135
we cannot simply enumerate all possible rotations as done for simpler rotation
algorithms and concrete images [159].
between variables that a plain interval analysis would ignore, and thus yields more
precise boxes X10 , . . . , Xn0 , on which we run the neural network analysis.
Using the approach outlined above, we were able to certify, for the first time,
that the neural network is robust to non-trivial rotations of all images inside an
adversarial region. Interval-based regions for geometric transformations were also
derived in the work of [149]. We note that in our follow-up work [15], we obtain
tighter polyhedral regions based on a combination of sampling and Lipschitz op-
timization. We also handle other geometric transformations such as translation,
scaling, shearing as well as any arbitrary composition of these. The robustness of
the network is then analyzed using DeepPoly for the obtained polyhedral region.
Our results using the polyhedral region are more precise than with the interval
region presented here. Further, the approach of [15] when combined the GPU im-
plementation GPUPoly [152] of DeepPoly enables the analysis of a 18-layer neural
network containing more than 0.5M neurons within 30 minutes.
In this section, we evaluate the effectiveness of our approach for certifying the
robustness of a large, challenging, and diverse set of neural networks for ad-
versarial regions generated by both changes in pixel intensity as well as im-
age rotations. We implemented our method in the ERAN analyzer [3]. ERAN
is written in Python and the abstract transformers of DeepPoly domain are im-
plemented on top of the ELINA library [1, 190] for numerical abstractions. We
have implemented both a sequential and a parallel version of our transformers.
All code, networks, datasets, and results used in our evaluation are available at
https://github.com/eth-sri/eran. We compared the precision and performance of
DeepPoly against the three state-of-the-art systems that can scale to larger net-
works:
• AI2 by [78] uses the Zonotope abstract domain [81] implemented in ELINA
for performing abstract interpretation of fully-connected and convolutional
ReLU networks. Their transformers are generic and based on standard nu-
merical domains used for program analysis. Therefore they do not exploit
the structure of ReLU. As a result, AI2 is often slow and imprecise.
All of our experiments for the feedforward networks were run on a 3.3 GHz 10 core
Intel i9-7900X Skylake CPU with a main memory of 64 GB; our experiments for the
convolutional networks were run on a 2.6 GHz 14 core Intel Xeon CPU E5-2690
with 512 GB of main memory. We next describe our experimental setup including
the datasets, neural networks, and adversarial regions.
evaluation datasets. We used the popular MNIST [124] and CIFAR10 [118]
image datasets for our experiments. MNIST contains grayscale images of size
28 × 28 pixels and CIFAR10 consists of RGB images of size 32 × 32 pixels. For
our evaluation, we chose the first 100 images from the test set of each dataset. For
the task of robustness certification, out of these 100 images, we considered only
those that were correctly classified by the neural network.
neural networks. Table 5.1 shows the MNIST and the CIFAR10 neural net-
work architectures used in our experiments. The architectures considered in our
evaluation contain up to 88K hidden units. We use networks trained with adversar-
ial training, i.e., defended against adversarial attacks, as well as undefended net-
works. We used DiffAI by [144] and projected gradient descent (PGD) from [64] for
adversarial training. In our evaluation, when we consider the certified robustness
of the defended and undefended networks with the same architecture together, we
append the suffix Point to the name of a neural network trained without adversarial
training and the name of the training procedure (either DiffAI or PGD) to the name
of a defended network. In the table, the FFNNSigmoid and FFNNTanh networks use sig-
moid and tanh activations, respectively. All other networks use ReLU activations.
The FFNNSmall and FFNNMed network architectures for both MNIST and CIFAR10
datasets were taken from [78] whereas the FFNNBig architectures were taken from
[211]. The ConvSmall, ConvBig, and ConvSuper architectures were taken from [144].
We first compare the precision and performance of DeepPoly vs AI2 , Fast-Lin, and
DeepZ for robustness certification against L∞ -norm based adversarial attacks on
the MNIST FFNNSmall network. We note that it is straightforward to parallelize
Fast-Lin, DeepZ, and DeepPoly. However, the abstract transformers in AI2 cannot
be efficiently parallelized. To ensure fairness, we ran all four analyzers in single
threaded mode. Fig. 5.5 compares the percentage of certified adversarial regions
and the average runtime in seconds per -value of all four analyzers. We used
six different values for shown on the x-axis. For all analyzers, the number of
certified regions decreases with increasing values of . As can be seen, DeepPoly
is the fastest and the most precise analyzer on the FFNNSmall network. DeepZ has
140 deeppoly domain for certifying neural networks
60% 40
40%
20
20%
0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030
Figure 5.5: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly against AI2 , Fast-Lin, and DeepZ on the MNIST FFNNSmall. DeepZ and
Fast-Lin are equivalent in robustness.
the exact same precision as Fast-Lin but is up to 2.5x faster. AI2 has significantly
worse precision and higher runtime than all other analyzers.
Based on our results in Fig. 5.5, we compare the precision and performance of the
parallelized versions of DeepPoly and DeepZ for all of our remaining experiments.
60% 4
40%
2
20%
0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030
20% 10
0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030
Figure 5.6: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST FFNNMed and FFNNBig networks.
regions certified by DeepZ starting at = 0.02. DeepZ certifies only 23% of the re-
gions when = 0.03; in contrast, DeepPoly certifies 80%. Similarly, for the FFNNTanh
network, DeepZ only certifies 1% of the regions when = 0.015, whereas Deep-
Poly certifies 94%. We also note that DeepPoly is more than 2x faster than DeepZ
on both these networks (we omit the relevant plots here as timings do not change
with increasing values of ): DeepZ has an average runtime of 6 35 seconds on
both networks whereas DeepPoly has an average runtime of 6 15 seconds on both.
mnist convolutional networks Fig. 5.9 compares the precision and aver-
age runtime of DeepPoly vs DeepZ on the MNIST ConvSmall networks. We consider
three types of ConvSmall networks based on their training method: (a) undefended
(Point), (b) defended with PGD (PGD), and (c) defended with DiffAI (DiffAI). Note
that our convolutional networks are more robust than the fully-connected networks
142 deeppoly domain for certifying neural networks
80 80
60 60
40 40
20 20
0 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030
Figure 5.7: Average percentage of ReLU inputs that can take both positive and negative val-
ues for DeepPoly and DeepZ on the MNIST FFNNSmall and FFNNMed networks.
80% 80%
60% 60%
40% 40%
20% 20%
0% 0%
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030
Figure 5.8: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST FFNNSigmoid and FFNNTanh networks.
and thus the values of considered in our experiments are higher than those for
fully-connected networks.
As expected, both DeepPoly and DeepZ certify more regions on the defended
neural networks than on the undefended one. This is because the adversarially
trained networks produce fewer inputs, where the ReLU transformer loses signif-
icant precision. We notice that ConvSmall trained with DiffAI is the most provably
robust network. Overall, DeepPoly certifies more regions than DeepZ on all neu-
ral networks for all values. The precision gap between DeepPoly and DeepZ
increases with increasing . For the largest = 0.12, the percentage of regions cer-
tified by DeepZ on the Point, PGD, and DiffAI networks are 7%, 38%, and 53%
5.5 experimental evaluation 143
50% 1
0% 0
0.020 0.040 0.060 0.080 0.100 0.120 0.020 0.040 0.060 0.080 0.100 0.120
Figure 5.9: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST ConvSmall networks.
Table 5.2: Certified robustness by DeepZ and DeepPoly on the large convolutional net-
works trained with DiffAI.
Dataset Model % Certified robustness Average runtime
respectively whereas DeepPoly certifies 17%, 67%, and 81% regions respectively.
The runtime of DeepZ increases with while that of DeepPoly is not affected sig-
nificantly. DeepPoly runs the fastest on the DiffAI network and is faster than DeepZ
for all values. DeepPoly is slower than DeepZ on the PGD and Point networks
for smaller values but faster on the largest = 0.12.
Table 5.2 shows our experimental results on the larger MNIST convolutional
networks trained using DiffAI. For the ConvBig network, DeepPoly certifies signif-
icantly more regions than DeepZ for = 0.2 and 0.3. In particular, the percentage
certified for = 0.3 with DeepPoly and DeepZ is 77% and 37%, respectively. For
the ConvSuper network, DeepPoly certifies one more region than DeepZ for = 0.1.
In terms of the runtime, DeepZ runs slower with increasing value of while Deep-
Poly is unaffected. DeepZ is slightly faster than DeepPoly on the ConvBig network
144 deeppoly domain for certifying neural networks
80% 4
60%
40% 2
20%
0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012
60% 10
40%
5
20%
0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012
80% 200
60%
40% 100
20%
0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012
Figure 5.10: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the CIFAR10 fully-connected networks.
5.5 experimental evaluation 145
for = 0.1 and 0.2 but is 2x slower for = 0.3. On the ConvSuper network, DeepPoly
is 3.4x faster than DeepZ.
50% 2
0% 0
0.002 0.004 0.006 0.008 0.010 0.012 0.002 0.004 0.006 0.008 0.010 0.012
Figure 5.11: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the CIFAR10 ConvSmall networks.
1 1 0.5s + 1.9s No
Figure 5.12: Results for robustness against rotations with the MNIST FFNNSmall network.
Each row shows a different attempt to prove that the given image of the digit
3 can be perturbed within an L∞ ball of radius = 0.001 and rotated by an
arbitrary angle θ between −45 to 65 degrees without changing its classification.
For the last two attempts, we show 4 representative combined regions (out of
220, one per batch). The running time is split into two components: (i) the time
used for interval analysis on the rotation algorithm and (ii), the time used to
prove the neural network robust with all of the computed bounding boxes
using DeepPoly.
sidered ConvSmall networks for all values. As was the case on the corresponding
MNIST networks, DeepPoly runs fastest on the DiffAI network.
The last two rows in Table 5.2 compare the precision and performance of Deep-
Poly and DeepZ on the CIFAR10 ConvBig convolutional network trained with Dif-
fAI. It can be seen that DeepPoly certifies more regions than DeepZ for both
= 0.006 and = 0.008 and is also up to 2x faster.
shows example regions and analysis times for several choices of parameters to the
refinement approach. For example, #Batches = 220, Batch Size = 300 means that
we split the interval [α, β] into n = 220 batches. To analyze a batch, we split the cor-
responding interval into m = 300 input intervals for interval analysis, resulting in
300 regions for each batch. We then run DeepPoly on the smallest common bound-
ing boxes of all regions in each batch, 220 times in total. Fig. 5.12 shows a few such
bounding boxes in the Regions column. Note that it is not sufficient for certifica-
tion to compute a single region that captures all rotated images. Fig. 5.12 shows two
such attempts: one where we did not use batching (therefore, our interval analysis
approach was applied to the rotation algorithm using an abstract θ covering the
entire range), and one where we used a batch size of 10, 000 to compute the bound-
ing box of the perturbations rather precisely. However, those perturbations cannot
be captured well using interval constraints, therefore the bounding box contains
many spurious inputs and the certification fails.
We then considered two certification attempts with 220 batches, with each batch
covering a range of θ of length 0.5 degrees. It was not sufficient to use a batch size
of 1, as some input intervals become large. Using a batch size of 300, the neural
network can be proved robust for this perturbation.
5.6 discussion
We introduced a new method for certifying deep neural networks which balances
analysis precision and scalability. The core idea is an abstract domain based on
combining floating-point polyhedra and intervals equipped with abstract trans-
formers specifically designed for common neural network functions such as affine
transforms, ReLU, sigmoid, tanh, and maxpool. These abstract transformers enable
us to soundly handle both, fully-connected and convolutional networks.
We implemented our method in the ERAN analyzer, and evaluated it extensively
on a wide range of networks of different sizes including defended and undefended
networks. Our experimental results demonstrate that DeepPoly is more precise and
faster than prior work. We also showed how to use DeepPoly to prove, for the first
time, the robustness of a neural network when the input image is perturbed by
complex transformations such as rotations employing linear interpolation.
In our follow-up work [152], we have extended the DeepPoly domain for han-
dling residual networks as well as designed efficient algorithms for adapting Deep-
Poly on GPUs. The resulting implementation GPUPoly enabled precise and fast
certification of large networks containing up to 1M neurons within a minute. In
[15], we extended our robustness certification against rotation to cover more geo-
metric transformations such as translation, scaling, and shearing as well as their
arbitrary composition. Combined with GPUPoly, we can verify neural networks
with up to 0.5M neurons against rotations within 30 minutes. All above results are
148 deeppoly domain for certifying neural networks
beyond the reach of any other existing certification method [8, 32, 36, 37, 67, 68, 69,
78, 113, 114, 135, 149, 163, 170, 175, 186, 197, 199, 206, 211, 212].
More recently, we designed new DeepPoly transformers for handling the specific
non-linearities in the RNN architectures and audio preprocessing pipeline in [172].
This enables the certification of audio classifiers against intensity perturbations in
the audio signal for the first time. We believe that DeepPoly domain can similarly
be extended for handling other neural network architectures such as transformers
[179], domains such as natural language processing [108], and specifications such
as robustness against patches [217].
Overall, we believe this work is a promising step towards more effective reason-
ing about deep neural networks and a useful building block for proving interesting
specifications as well as other applications of analysis (for example, training more
robust networks).
6
C O M B I N I N G A B S T R A C T I O N S W I T H S O LV E R S
Figure 6.1: The input space for the ReLU assignments y1 := ReLU(x1 ), y2 := ReLU(x2 ) is
shown on the left in blue. Shapes of the relaxations projected to 3D are shown
on the right in red.
and is optimal in the x1 y1 -plane. Because of this optimality, recent work [175] refers
to the triangle relaxation as the convex barrier, meaning the best convex approxi-
mation one can obtain when processing each ReLU separately. In our experiments,
using this relaxation does not yield significant precision gains. Our main insight is
that the triangle relaxation is not optimal when one considers multiple neurons at
a time as it ignores all dependencies between x1 and any other neuron x2 in the
same layer, and thus loses precision.
Our second key idea is proposing more precise but scalable convex relaxations
than possible with prior work. We introduce a novel parameterized framework,
called k-ReLU, for generating convex approximations that consider multiple ReLUs
jointly. Here, the parameter k determines how many ReLUs are considered jointly
with large k resulting in more precise output. For example, unlike prior work, our
framework can generate a convex relaxation for y1 :=ReLU(x1 ) and y2 :=ReLU(x2 ) that
is optimal in the x1 x2 y1 y2 -space. Next, we illustrate this point with an example.
Table 6.1: Volume of the output bounding box from kPoly on the MNIST FFNNMed network.
In contrast, 2-ReLU considers the two ReLU’s jointly and captures the relational
constraints between x1 and x2 . 2-ReLU computes the following relaxation:
The result is shown in Fig. 6.1 (c). In this case the shape of y1 y2 is not independent
of x1 + x2 as opposed to the triangle relaxation. At the same time, it is more precise
than Fig. 6.1 (b) for all values of z. We note that the work of [163] computes semi
definite relaxations that consider multiple ReLUs jointly, however these are not
optimal and do not scale to the large networks used in our experiments.
The work in this chapter was published in [185, 187].
6.1 overview
We now show, on a simple example, the working of our certifier kPoly combining
the k-ReLU concept with refinement improves the results of state-of-the-art certi-
fiers. In particular, we illustrate how the output kPoly instantiated with 1-ReLU is
refined by instantiating it with 2-ReLU. This is possible as the 2-ReLU relaxation
can capture extra relationships between neurons that 1-ReLU inherently cannot.
Consider the simple fully-connected neural network with ReLU activations
shown in Fig. 6.2. The network has two inputs each taking values independently
in the range [−1, 1], one hidden layer and one output layer each containing two
neurons. For simplicity, we split each layer into two parts: one for the affine trans-
formation and the other for the ReLU (as in Fig. 5.3). The weights of the affine
transformation are shown on the arrows and the biases are above or below the
respective neuron. The goal is to certify that x9 6 4 holds for the output x9 with
respect to all inputs.
We first show that 1-ReLU instantiated with the state-of-the-art DeepPoly [188]
abstract domain fails to certify the property. We refer the reader to Chapter 5 for
more details on the DeepPoly abstract domain. The bounds computed by our certi-
fier using this instantiation are shown as annotations in Fig. 6.2, in the same format
as in Fig. 5.3. We next show how our analysis proceeds layer-by-layer.
first layer The certifier starts by computing the bounds for x1 and x2 which
are simply taken from the input specification resulting in:
x3 > x1 + x2 , x3 6 x1 + x2 , l3 = −2, u3 = 2,
x4 > x1 − x2 , x4 6 x1 − x2 , l4 = −2, u4 = 2.
6.1 overview 153
1 2
x2 x4 x6 x8 x10
[-1,1] -1 max(0, x4 ) 1 max(0, x8 )
0 1.5
Figure 6.2: Certification of property x9 6 2. Refining DeepPoly with 1-ReLU fails to prove
the property whereas 2-ReLU adds extra constraints (in green) that help in
verifying the property.
DeepPoly can precisely handle ReLU assignments when the input neuron takes
only positive or negative values; otherwise, it loses precision. Since x3 and x4 can
take both positive and negative values, the approximation from Fig. 5.4 (b) is ap-
plied which for x5 yields:
x5 > 0, x5 6 1 + 0.5 · x3 , l5 = 0, u5 = 2. (6.1)
The lower and upper bounds are set to l5 = 0 and u5 = 2 respectively. Analo-
gously, for x6 we obtain:
x6 > 0, x6 6 1 + 0.5 · x4 , l6 = 0, u6 = 2. (6.2)
third layer Next, the affine assignments x7 := x5 + 2x6 and x8 := x6 + 1.5 are
handled. DeepPoly adds the constraints:
x7 > x5 + 2 · x6 , x7 6 x5 + 2 · x6 ,
(6.3)
x8 > x6 + 1.5, x8 6 x6 + 1.5.
To compute the upper and lower bounds for x7 and x8 , DeepPoly uses back-
substitution as described in Section 5.1. Doing so yields l7 = 0, u7 = 5 and
l8 = 1.5, u8 = 3.5.
154 combining abstractions with solvers
refinement with 1-relu fails Because DeepPoly discards one of the lower
bounds from the triangle relaxations for the ReLU assignments in the previous
layer, it is possible to refine lower and upper bounds for x7 and x8 by encoding
the network up to the final affine transformation using the relatively tighter ReLU
relaxations based on the triangle formulation and then computing bounds for x7
and x8 with respect to this formulation via an LP solver. However, this does not
improve bounds and still yields l7 = 0, u7 = 5, l8 = 1.5, u8 = 3.5.
As the lower bounds for both x7 and x8 are non-negative, the DeepPoly ReLU
approximation simply propagates x7 and x8 to the output layer. Therefore the final
output is:
x9 > x7 , x9 6 x7 , l9 = 0, u9 = 5,
x10 > x8 , x10 6 x8 , l10 = 1.5, u10 = 3.5.
Because the upper bound is u9 = 5, the certifier fails to prove the property that
x9 6 4 holds.
We now describe our refinement approach in more formal terms. As in Section 6.1,
we will consider affine transformations and ReLU activations as separate layers.
The key idea will be to combine abstract interpretation [55] with exact MILP or pre-
cise convex relaxation based formulations of the network, which are then solved, in
6.2 refinement with solvers 155
order to compute more precise results for neuron bounds. We begin by describing
the core components of abstract interpretation that our approach requires.
Our approach requires an abstract domain An over n variables (i.e., some set
whose elements can be encoded symbolically) such as Interval, Zonotope, Deep-
Poly, or Polyhedra. The abstract domain has a bottom element ⊥ ∈ An as well as
the following components:
• A (potentially non-computable) concretization function γn : An → P(Rn ) that
associates with each abstract element a ∈ An the set of concrete points from
Rn that it abstracts. We have γn (⊥) = ∅.
for all a ∈ Am .
#
• A ReLU abstract transformer TReLU|Q : An → An , where
i [li ,ui ]
Y
#
{ReLU(x) | x ∈ γn (a) ∩ [li , ui ]} ⊆ TReLU|Q (a)
i [li ,ui ]
i
for all abstract elements a ∈ An and for all lower and upper bounds l, u ∈ Rn
on input activations of the ReLU operation.
Given that the neural network can be written as a composition of affine func-
tions and ReLU layers, we can then propagate the abstract element ain through the
corresponding abstract transformers to obtain a symbolic overapproximation aout
of the concrete outputs of the neural network.
For example, if the neural network f(x) = A 0 · ReLU(Ax + b) + b 0 has a single
hidden layer with h hidden neurons, we first compute a 0 = Tx7#→Ax+b (ain ), which
is a symbolic overapproximation of the inputs to the ReLU activation function.
We then compute (l, u) = ιh (a 0 ) to obtain opposite corners of a bounding box
of all possible ReLU input activations, such that we can apply the ReLU abstract
transformer:
a 00 = TReLU|
#
Q (a 0 ).
i [li ,ui ]
0 0 if lj0 > 0,
hxj , xj , lj , uj i,
ha6 >
i (x), ai (x), li , ui i = h0, 0, 0, 0i, if uj0 6 0,
hλ · xj , uj0 · (xj − lj0 )/(uj0 − lj0 ), λ · lj0 , uj0 i, otherwise.
6.2 refinement with solvers 157
xi xi
(a) (b)
Figure 6.3: DeepPoly relaxations for xi :=ReLU(xj ) using the original bounds lj , uj (in blue)
and the refined bounds lj0 , uj0 (in green) for xj . The refined relaxations have
smaller area in the xi xj -plane.
where λ = {0, 1}. The refined ReLU transformer benefits from the improved bounds.
For example, when lj < 0 and uj > 0 holds for the original bounds then after
refinement:
• If lj0 > 0, then the relational constraints are the same, however the interval
bounds are more precise.
• Otherwise, as shown in Fig. 6.3, the approximation with the tighter lj0 and
uj0 has smaller area (in green) in the input-output plane than the original
transformer that uses the imprecise lj and uj (in blue).
milp Let ϕ(k) denote the conjunction of all constraints up to and including those
from layer k. To obtain the best possible lower and upper bounds for layer k with
p neurons, we need to solve the following 2 · p optimization problems:
0(k) (k)
li = min xi , for i = 1, . . . , p,
(0) (k)
x1 ,...,xp
(0) (k)
s.t. ϕ(k) (x1 ,...,xp )
0(k) (k)
ui = max xi , for i = 1, . . . , p.
(0) (k)
x1 ,...,xp
(0) (k)
s.t. ϕ(k) (x1 ,...,xp )
As was shown by [8, 197], such optimization problems can be encoded exactly
as MILP instances using the bounds computed by abstract interpretation and the
0(k)
instances can then be solved using off-the-shelf MILP solvers [92] to compute li
0(k)
and ui .
anytime milp relaxation MILP solvers usually provide the option to pro-
vide an explicit timeout T after which the solver must terminate. In return, the
solver may not be able to solve the instance exactly, but it will instead provide
lower and upper bounds on the objective function in a best-effort fashion. This
provides another way to compute sound but inexact bounds l 0(k) and u 0(k) .
final property certification Let k be the index of the last layer and p be
the number of output classes. We can encode the final certification problem using
the output abstract element aout obtained after applying the abstract transformer
for the last layer in the network. If we want to prove that the output satisfies the
property ψ, where ψ is given by a CNF formula ∧i ∨j li,j with all literals li,j being
linear constraints, it suffices to show that aout u (∧j ¬li,j ) = ⊥ for all i. If this fails,
one can resort to complete verification using MILP: the property is satisfied if and
(0) (k)
only if the set of constraints ϕ(k) (x1 , . . . , xp ) ∧ (∧j ¬li,j ) is unsatisfiable for all i.
In this section we formally describe our k-ReLU framework for generating optimal
convex relaxations in the input-output space for k ReLU operations jointly. In the
next section, we discuss the instantiation of our framework with existing certifiers
which enables more precise results.
We consider a ReLU based fully-connected, convolutional, or residual neural
network with h neurons from a set H (that is h = |H|) and a bounded input region
I ⊆ Rm where m < h is the number of neural network inputs. As before, we treat
the affine transformation and the ReLUs as separate layers. We consider a convex
approximation method M that processes network layers in a topologically sorted
sequence from the input to the output layer passing the output of predecessor
layers as input to the successor layers. Let S ⊆ Rh be a convex set computed via M
approximating the set of values that neurons up to layer l-1 can take with respect
to I and B ⊇ S be the smallest bounding box around S. We use Conv(S1 , S2 ) and
S1 ∩ S2 to denote the convex hull and the intersection of convex sets S1 and S2 ,
respectively.
160 combining abstractions with solvers
Let X, Y ⊆ H be respectively the set of input and output neurons in the l-th
layer consisting of n ReLU assignments of the form yi := ReLU(xi ) where xi ∈ X
and yi ∈ Y. We assume that each input neuron xi takes on both positive and
negative values in S. We define the polyhedra induced by the two branches of each
ReLU assignment yi := ReLU(xi ) as C+ i = {xi > 0, yi = xi } ⊆ R and Ci = {xi 6
h −
s(i)
0, yi = 0} ⊆ Rh . Let QJ = { i∈J Ci | s ∈ J → {−, +}} (where J ⊆ [n]}) be the set
T
C+
2 = {x2 > 0, y2 = x2 }, and C2 = {x1 6 0, y1 = 0}. QJ contains 4 polyhedra
−
{C+
1 ∩ C2 , C1 ∩ C2 , C1 ∩ C2 , C1 ∩ C2 } where the individual polyhedron are:
+ + − − + − −
C+
1 ∩ C2
+
= {x1 > 0, y1 = x1 , x2 > 0, y2 = x2 },
1 ∩ C2
C+ −
= {x1 > 0, y1 = x1 , x2 6 0, y2 = 0},
C−
1 ∩ C2
+
= {x1 6 0, y1 = 0, x2 > 0, y2 = x2 },
C−
1 ∩ C2
−
= {x1 6 0, y1 = 0, x2 6 0, y2 = 0}.
We note that QJ contains 2|J| polyhedra. We next formulate the best convex relax-
ation of the output after all n ReLU assignments in a layer.
6.3.2 1-ReLU
We now describe the prior convex relaxation [69] through triangles (here called 1-
ReLU) that handles the n ReLU assignments separately. Here, the input to the i-th
assignment yi := ReLU(xi ) is the polyhedron P1-ReLU ⊇ S where for each xi ∈ X,
P1-ReLU,i contains only an interval constraint [li , ui ] that bounds xi , that is, li 6 xi 6
ui . Here, the interval bounds are simply obtained from the bounding box B of S.
The output of this method after n assignments is
n
\
S1-ReLU = S ∩ Conv(P1-ReLU,i ∩ C+
i , P1-ReLU,i ∩ Ci ).
−
(6.5)
i=1
6.3 k-relu relaxation framework 161
gle minimizing the area as shown in Fig. 5.4 (a) and is the optimal convex relaxation
in this plane. However, because the input polyhedron P1-ReLU is a hyperrectangle
(when projected to X), it does not capture relational constraints between different
xi ’s in X (meaning it typically has to substantially over-approximate the set S).
Thus, as expected, the computed result S1-ReLU of the 1-ReLU method will incur
significant imprecision when compared with the Sbest result.
We now describe our k-ReLU framework for computing a convex relaxation of the
output of n ReLUs in one layer by considering groups of k ReLUs jointly with
k > 1. For simplicity, we assume that n > k and k divides n. Let J be a partition
of the set of indices [n] such that each block Ji ∈ J contains exactly k indices. Let
Pk-ReLU,i ⊆ Rh be a polyhedron containing interval and relational constraints over
the neurons from X indexed by Ji . In our framework, Pk-ReLU,i is derived via B and
S and satisfies S ⊆ Pk-ReLU,i .
Our k-ReLU framework produces the following convex relaxation of the output:
n/k
\
Sk-ReLU = S ∩ ConvQ∈QJ (Pk-ReLU,i ∩ Q). (6.6)
i
i=1
The result of (6.6) is the optimal convex relaxation for the output of n ReLUs for
the given choice of S, k, J, and Pk-ReLU,i .
Theorem 6.3.1. For k > 1 and a partition J of indices, if there exists a Ji for which
Pk-ReLU,i $ u∈Ji P1-ReLU,u holds, then Sk-ReLU $ S1-ReLU .
T
convex hull,
\
ConvQ∈QJ (Pk-ReLU,i ∩ Q) $ ConvQ∈QJ (( P1-ReLU,u ) ∩ Q) (6.7)
i i
u∈Ji
can replace all Q on the right hand side of (6.7) with either C+ u or Cu such that
−
for all u ∈ Ji both C+u and Cu are used at least in one substitution and obtain by
−
monotonicity,
\ \
⊆ Convu∈Ji (( P1-ReLU,u ) ∩ C+
u , ( P1-ReLU,u ) ∩ C−
u)
u∈Ji u∈Ji
\
⊆ Convu∈Ji (P1-ReLU,u ∩ C+
u , P1-ReLU,u ∩ Cu )
−
( P1-ReLU,u ⊆ P1-ReLU,u ).
u∈Ji
Note that P1-ReLU only contains interval constraints whereas Pk-ReLU contains
both, the same interval constraints and extra relational constraints. Thus, any con-
vex relaxation obtained using k-ReLU is typically strictly more precise than a 1-
ReLU one.
precise and scalable relaxations for large k For each Ji , the optimal
convex relaxation Ki = ConvQ∈QJ (Pk-ReLU,i ∩ Q) from (6.6) requires computing the
i
convex hull of 2k convex sets each of which has a worst-case exponential cost in
terms of k. Thus, computing Ki via (6.6) can become computationally expensive
for large values of k. We propose an efficient relaxation Ki0 for each block Ji ∈ J
(where |Ji | = k as described earlier) based on computing relaxations for all subsets
of Ji that are of size 2 6 l < k. Let Ri = {{j1 , . . . , jl } | j1 , . . . , jl ∈ Ji } be the set
containing all subsets of Ji containing l indices. For each R ∈ Ri , let Pl-ReLU,R 0 ⊆ Rh
be a polyhedron containing interval and relational constraints between the neurons
from X indexed by R with S ⊆ Pl-ReLU,R
0 .
The relaxation Ki is computed by applying l-ReLU kl times as:
0
\
Ki0 = 0
ConvQ∈QR (Pl-ReLU,R ∩ Q). (6.8)
R∈Ri
Our k-ReLU framework from Section 6.3 can be instantiated to produce different
relaxations depending on the parameters S, k, J, and Pk-ReLU,i . Fig. 6.4 shows the
steps to instantiating our framework. The inputs to the framework are the convex
set S computed via a convex relaxation method M and the partition J based on k.
These inputs are first used to produce a set containing n/k polyhedra {Pk-ReLU,i }.
Each polyhedron Pk-ReLU,i is then intersected with polyhedra from the set QJi pro-
ducing 2k polyhedra which are then combined via the convex hull (each called Ki ).
The Ki ’s are then combined with S to produce the final relaxation that captures
6.4 instantiating the k-relu framework 163
Tn/k
(S ∩ i=1 Ki ) as per (6.6)
Pk-ReLU,i ∩ Q
Convex set
S via M=
SDP [68, 163] for each Convex
Q ∈ Q Ji hull
Abstract
Interpretation
[78, 188, 191] for each Convex hull for Ji
Denoted by Ki
Linear Pk-ReLU,i 2x1 + x2 + x3 − y1 6 0
relaxations y2 + x2 − x3 6 −1
Pk-ReLU,i y3 − x1 + x3 6 1
[175, 206, ..
.
..
.
211, 221]
Duality
[67, 212]
{Pk-ReLU,i }
Partition
J of [n]
the values which neurons can take after the ReLU assignments. This relaxation
is tighter than that produced by applying M directly on the ReLU layer enabling
precision gains.
cept in the case where all possible au are zero). Thus Pk-ReLU,i ⊇ S contains 3k − 1
constraints which include the interval constraints for all xu .
We can use the constraints generated by our framework encoding the ReLU layers
(k)
in the formula ϕLP defined in Section 6.2, which can be used for either refining the
neuron bounds or for proving the property ψ.
6.5 evaluation
Table 6.2: Neural network architectures and parameters used in our experiments.
Dataset Model Type #Neurons #Layers Defense Refine k
ReLU
have m + 1 layers where the first m layers have n neurons each and the last layer
has 10 neurons. We note that that the MNIST 5 × 100 and 8 × 200 are named as
FFNNSmall and FFNNMed networks in Section 5.5 respectively. The largest network in
our experiments contains > 100K neurons and has 13 layers.
machines The runtimes of all experiments for the MNIST and ACAS Xu FNNs
were measured on a 3.3 GHz 10 Core Intel i9-7900X Skylake CPU with a main
memory of 64 GB whereas the experiments for the rest were run on a 2.6 GHz 14
core Intel Xeon CPU E5-2690 with 512 GB of main memory.
benchmarks For each MNIST and CIFAR10 network, we selected the first 1000
images from the respective test set and filtered out incorrectly classified images.
The number of correctly classified images by each network are shown in Table 6.3.
We chose challenging values for defining the adversarial region for each network.
For the ACAS Xu network, we consider the property φ9 as defined in [113]. We
note that our benchmarks (e.g., the 8 × 200 network with = 0.015) are quite
challenging to handle for state-of-the-art certifiers (as we will see below).
166 combining abstractions with solvers
Table 6.3: Number of certified adversarial regions and runtime of kPoly vs. DeepPoly and
RefineZono.
Dataset Model #correct DeepPoly [188] RefineZono [187] kPoly
We next describe our results for the complete certification of the ACAS Xu 6 × 50
and the MNIST 2 × 50 network.
acas xu 6 × 50 network As this network has only 5 inputs, we split the pre-
condition defined by φ9 into smaller input regions by splitting each input dimen-
sion independently. Our splitting heuristic is similar to the one used in Neurify
[206] which is state-of-the-art for certifying ACAS Xu networks. We certify that the
post-condition defined by φ9 holds for each region with DeepPoly domain analysis.
kPoly certifies that φ9 holds for the network in 14 seconds. RefineZono uses the
same splits with the DeepZ domain and verifies in 10 seconds. We note that both
these timings are faster than Neurify which takes > 100 seconds.
RefineZono is 2.9 and 3.3 seconds respectively. DeepPoly is faster than both kPoly
and RefineZono but is also quite imprecise and certifies only 411 regions.
Both our works RefineZono and kPoly refine abstract interpretation results with
precise solvers but with different domains. Further, RefineZono uses only 1-ReLU
(k)
approximation in ϕLP while kPoly uses k > 1. Next, we list the parameter values
for kPoly used in our experiments.
kpoly parameters We refine both the DeepPoly ReLU relaxation and the cer-
tification results for the MNIST FNNs. All neurons that are input to a ReLU oper-
ation and can take positive values based on the abstract interpretation results are
selected for refinement. As an optimization, we use the MILP ReLU encoding from
[197] when refining the ReLU relaxation for the second ReLU layer. Thus kMILP = 2
for these networks and kLP = m − kMILP , kAI = 0 where m is the number of layers.
Only the certification results are refined for the rest, thus kLP = kMILP = 0, kAI = m.
The last column of Table 6.2 shows the value of k for all networks. We use the
entry N/A for the ACAS Xu 6 × 50 and the MNIST 2 × 50 network as the k-ReLU
framework was not used for refinement on these. The entry Adapt means that k was
not fixed for all layers but computed dynamically. For the MNIST 5 × 100 network,
we use k = 3 for encoding all ReLU layers and use k = 2 for refining the remaining
FNNs. For the MNIST and CIFAR10 ConvBig networks, we encode the first 3 ReLU
layers with 1-ReLU while the remaining are encoded with 5-ReLU. We use l = 3
in (6.8) for encoding 5-ReLU. For the remaining 3 CNNs, we encode the first ReLU
layer with 1-ReLU while the remaining layers are encoded adaptively. Here, we
choose a value of k for which the total number of calls to 3-ReLU is 6 500. Next,
we discuss our experimental results shown in Table 6.3.
kpoly vs deeppoly and refinezono Table 6.3 compares the precision in the
number of adversarial regions certified and the average runtime per region in sec-
onds for kPoly, DeepPoly, and RefineZono. We refine the certification results with
kPoly and RefineZono only when DeepPoly and DeepZ fail to certify respectively.
It can be seen in the table that kPoly is more precise than both DeepPoly and Refine-
Zono on all networks. RefineZono is more precise than DeepPoly on the networks
trained without adversarial training. On the 8 × 200 and MNIST ConvSmall networks,
kPoly certifies 506 and 347 regions respectively whereas RefineZono certifies 316
and 179 regions respectively. The precision gain with kPoly over RefineZono and
DeepPoly is less on networks trained with adversarial training. kPoly certifies 25,
40, 38, and 2 regions more than DeepPoly on the last 4 CNNs in Table 6.3. kPoly is
faster than RefineZono on all networks and has an average runtime of < 8 minutes.
We note that the runtime of kPoly is not necessarily determined by the number
168 combining abstractions with solvers
of neurons but rather by the complexity of the refinement instances. In the table,
the larger runtimes of kPoly are on the MNIST 8 × 200 and ConvSmall networks.
These are quite small compared to the CIFAR10 ResNet network where kPoly has
an average runtime of only 91 seconds.
1-relu vs k-relu We consider the first 100 regions for the MNIST ConvSmall
network and compare the number of regions certified by kPoly when run with
k-ReLU and 1-ReLU. We note that kPoly run with 1-ReLU is equivalent to [175].
kPoly with 1-ReLU certifies 20 regions whereas with k-ReLU it certifies 35. kPoly
with 1-ReLU has an average runtime of 9 seconds.
effect of heuristic for J We ran kPoly based on k-ReLU with random par-
titioning Jr using the same setup as for 1-ReLU. We observed that kPoly produced
worse bounds and certified 34 regions.
There is plethora of work on neural network certification mostly where the input
regions can be encoded as a boxes such as L∞ -norm based. The approaches can be
broadly classfied into two types: complete and incomplete.
complete certifiers Complete certifiers are based on MILP solvers [8, 32, 36,
49, 66, 135, 197], SMT solving [37, 69, 113, 114], Lipschitz optimization [170], and
input and neuron refinement [206, 207]. In our experience, MILP solvers [8, 197]
scale the most for complete certification with high dimensional inputs such as
MNIST or CIFAR10 networks, while input refinement [206, 207] works the best for
lower dimensional inputs such as those for ACAS Xu. In our approach for complete
certification in ERAN, we use both. Our results in Section 6.5 indicate that ERAN
gets state-of-the-art complete certification results. We believe that our performance
can be further improved by designing new algorithms for the MILP solvers that
take advantage of the particular structure of the problem instances [32, 36, 135].
[211, 212], although based on different principles, obtains the same precision and
similar speed as DeepZ. Similarly, the work of [31, 221] obtains the same precision
and similar speed as DeepPoly. We note that our refinement approach presented
in this chapter allow us to be more precise than all competing incomplete certi-
fiers and our GPU implementation of DeepPoly in [152] allows scaling to larger
benchmarks with the precision of DeepPoly.
A complementary approach to the above is modifying the neural network to
make them easier to certify [85, 90, 154, 180, 215]. We believe that a combination of
this approach with ERAN can further improve the certification results.
There is growing interest in adversarial training where neural networks are trained
against a model of adversarial attacks. Here, a robustness loss is added to the nor-
mal training loss. The robustness loss is calculated as the worst-case loss in an
adversarial region. It is not possible to exactly compute this loss; thus, it is ei-
ther estimated with a lower bound or an upper bound. Using the lower bound
leads to better empirical robustness [39, 64, 86, 89, 136] whereas the upper bound
[59, 60, 87, 130, 144, 147, 148, 162, 212, 213, 220] leads to models that are relatively
easier to certify. In both cases, there is a loss in the standard accuracy of the trained
model. The main challenge is then to produce models that are both robust and
accurate.
6.7 discussion 171
Interestingly, the work of [144, 147, 148] trains neural networks against adver-
sarial attacks using abstract interpretation. The work of [148] currently produces
state-of-the-art models for CIFAR10 and MNIST using our DeepZ abstraction and
using a certification method similar to RefineZono. We believe that the results of
[148] can be further improved by using the DeepPoly domain and the more precise
kPoly certification method. We note that recent work by [11] provides theoretical
results on the existence of a neural network that can be certified with abstract inter-
pretation to the same degree as a "normally" trained network with exact methods.
Beyond robustness, the work of [73, 132] trains the network so that it satisfies a
logical property.
6.7 discussion
173
174 conclusion and future work
Our work opens up several open problems in numerical program analysis. We list
some of these below:
machine learning for systems We believe that our approach of using ma-
chine learning for speeding up numerical analysis presented in Chapter 4 is more
general and can be used to learn adaptive policies for balancing different tradeoffs
in system design. Examples include tuning the degree of compartmentalization in
operating systems [202] for improved performance without sacrificing system se-
curity and balancing the accuracy vs. performance tradeoff in IoT applications [26].
Another direction is to automate the learning process presented in Chapter 4 via
generative models for approximations and dataset generation.
We next discuss several directions and open problems for future research in neural
network certification:
[1] ELINA: ETH Library for Numerical Analysis. http://elina.ethz.ch. pages 50,
54, 56, 79, 103, 137
[2] Extended convex hull. Computational Geometry, 20(1):13 – 23, 2001. pages 28,
164
[3] ERAN: ETH Robustness Analyzer for Neural Networks, 2018. pages 137, 164
177
178 Bibliography
[20] A. Becchi and E. Zaffanella. A direct encoding for nnc polyhedra. In Proc.
Computer Aided Verification (CAV), pages 230–248, 2018. pages 86
[21] A. Becchi and E. Zaffanella. An efficient abstract domain for not necessar-
ily closed polyhedra. In A. Podelski, editor, Proc. Static Analysis Symposium
(SAS), pages 146–165, 2018. pages 86
[22] A. Becchi and E. Zaffanella. Revisiting polyhedral analysis for hybrid sys-
tems. In Proc. Static Analysis Symposium (SAS), pages 183–202, 2019. pages
86
[24] D. Beyer. Reliable and reproducible competition results with benchexec and
witnesses (report on sv-comp 2016). In Proc. Tools and Algorithms for the Con-
struction and Analysis of Systems (TACAS), pages 887–904, 2016. pages 50, 79,
104
[25] P. Bielik, V. Raychev, and M. Vechev. Learning a static analyzer from data.
pages 233–253, 2017. pages 107
[31] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel. Cnn-cert: An effi-
cient framework for certifying robustness of convolutional neural networks.
In Proc. AAAI Conference on Artificial Intelligence (AAAI), pages 3240–3247,
2019. pages 13, 149, 152, 164, 168, 169
[42] K. Chae, H. Oh, K. Heo, and H. Yang. Automatically generating features for
learning program analysis heuristics for c-like languages. Proc. ACM Program.
Lang., 1(OOPSLA):101:1–101:25, 2017. pages 89, 105, 106
[45] A. Chawdhary, E. Robbins, and A. King. Simple and efficient algorithms for
octagons. In Proc. Asian Symposium on Programming Languages and Systems
(APLAS), volume 8858 of Lecture Notes in Computer Science, pages 296–313.
Springer, 2014. pages 87
[47] J. Chen, J. Wei, Y. Feng, O. Bastani, and I. Dillig. Relational verification using
reinforcement learning. Proc. ACM Program. Lang., 3(OOPSLA):141:1–141:30,
2019. pages 107
[50] N. Chernikova. Algorithm for discovering the set of all the solutions of a
linear programming problem. USSR Computational Mathematics and Mathe-
matical Physics, 8(6):282 – 293, 1968. pages 28
[55] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for
static analysis of programs by construction or approximation of fixpoints. In
Proc. Symposium on Principles of Programming Languages (POPL), page 238–252,
1977. pages 1, 154
[60] F. Croce and M. Hein. Provable robustness against all adversarial $l_p$-
perturbations for $p\geq 1$. In Proc. International Conference on Learning Rep-
resentations (ICLR), 2020. pages 170
[64] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial
attacks with momentum. In Proc. Computer Vision and Pattern Recognition
(CVPR), pages 9185–9193, 2018. pages 138, 170
[71] P. Ferrara, F. Logozzo, and M. Fähndrich. Safer unsafe code for .net. SIG-
PLAN Not., 43:329–346, 2008. pages 58
[74] T. Fischer and C. Krauss. Deep learning with long short-term memory net-
works for financial market predictions. European Journal of Operational Re-
search, 270(2):654–669, 2018. pages 11
[81] K. Ghorbal, E. Goubault, and S. Putot. The zonotope abstract domain tay-
lor1+. In Proc. Computer Aided Verification (CAV), pages 627–633, 2009. pages
14, 59, 114, 137
[92] Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2018. pages
158, 164
Bibliography 185
[93] N. Halbwachs, D. Merchat, and L. Gonnord. Some ways to reduce the space
dimension in polyhedra computations. Formal Methods in System Design
(FMSD), 29(1):79–95, 2006. pages 8, 21, 85
[96] J. He, G. Singh, M. Püschel, and M. Vechev. Learning fast and precise numer-
ical analysis. In Proc. Programming Language Design and Implementation (PLDI),
page 1112–1127. Association for Computing Machinery, 2020. pages viii, 11,
106, 107
[100] K. Heo, H. Oh, and H. Yang. Resource-aware program analysis via online ab-
straction coarsening. In Proc. International Conference on Software Engineering
(ICSE), 2019. pages 106, 107
[105] M. Jeon, S. Jeong, and H. Oh. Precise and scalable points-to analysis via
data-driven context tunneling. Proc. ACM Program. Lang., 2(OOPSLA):140:1–
140:29, 2018. pages 106, 107
[106] K. Jia and M. Rinard. Efficient exact verification of binarized neural networks.
CoRR, abs/2005.03597, 2020. pages 169
[107] K. Jia and M. Rinard. Exploiting verified neural networks via floating point
numerical error. CoRR, abs/2003.03021, 2020. pages 17, 137
[109] J.-H. Jourdan. Sparsity preserving algorithms for octagons. Electronic Notes
in Theoretical Computer Science, 331:57 – 70, 2017. Workshop on Numerical
and Symbolic Abstract Domains (NSAD). pages 87
[117] C. Ko, Z. Lyu, L. Weng, L. Daniel, N. Wong, and D. Lin. POPQORN: quantify-
ing robustness of recurrent neural networks. In Proc. International Conference
on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Re-
search, pages 3468–3477, 2019. pages 169
[118] A. Krizhevsky. Learning multiple layers of features from tiny images. Tech-
nical report, 2009. pages 138, 164
[127] J. Li, J. Liu, P. Yang, L. Chen, X. Huang, and L. Zhang. Analyzing deep
neural networks with symbolic propagation: Towards higher precision and
faster verification. In Proc. Static Analysis Symposium (SAS), volume 11822 of
Lecture Notes in Computer Science, pages 296–319, 2019. pages 13, 168
188 Bibliography
[128] J. Li, S. Qu, X. Li, J. Szurley, J. Z. Kolter, and F. Metze. Adversarial music:
Real world audio adversary against wake-word detection system. In Proc. Ad-
vances in Neural Information Processing Systems (NeurIPS), pages 11908–11918,
2019. pages 170
[129] L. Li, M. Weber, X. Xu, L. Rimanic, T. Xie, C. Zhang, and B. Li. Provable robust
learning based on transformation-specific smoothing. CoRR, abs/2002.12398,
2020. pages 170
[130] L. Li, Z. Zhong, B. Li, and T. Xie. Robustra: Training provable robust neural
networks over reference adversarial space. In Proc. International Joint Confer-
ence on Artificial Intelligence (IJCAI), pages 4711–4717, 2019. pages 170
[133] J. Liu, L. Chen, A. Miné, and J. Wang. Input validation for neural networks
via runtime local robustness verification. CoRR, abs/2002.03339, 2020. pages
17, 169
[135] J. Lu and M. P. Kumar. Neural network branching for neural network ver-
ification. In Proc. International Conference on Learning Representations (ICLR),
2020. pages 5, 13, 108, 114, 148, 149, 168, 175
[141] A. Miné. Relational abstract domains for the detection of floating-point run-
time errors. In Proc. European Symposium on Programming (ESOP), pages 3–17,
2004. pages 26, 133, 134
[142] A. Miné. The octagon abstract domain. Higher Order and Symbolic Computa-
tion, 19(1):31–100, 2006. pages 6, 24, 55, 57, 58, 76
[146] M. Mirman, G. Singh, and M. Vechev. A provable defense for deep residual
networks, 2019. pages viii
[147] M. Mirman, G. Singh, and M. T. Vechev. A provable defense for deep residual
networks. CoRR, abs/1903.12519, 2019. pages 170, 171
[155] A. M. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images. In Proc. IEEE
Computer Vision and Pattern Recognition (CVPR), pages 427–436, 2015. pages
170
[157] H. Oh, H. Yang, and K. Yi. Learning a strategy for adapting a program anal-
ysis via bayesian optimisation. In Proc. Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), pages 572–588, 2015. pages 89, 105,
106
[158] K. Pei, Y. Cao, J. Yang, and S. Jana. Deepxplore: Automated whitebox testing
of deep learning systems. In Proc. Symposium on Operating Systems Principles
(SOSP), pages 1–18, 2017. pages 169, 170
[159] K. Pei, Y. Cao, J. Yang, and S. Jana. Towards practical verification of machine
learning: The case of computer vision systems. CoRR, abs/1712.01785, 2017.
pages 136
[167] M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explain-
ing the predictions of any classifier. In Proc. Knowledge Discovery and Data
Mining (KDD), pages 1135–1144, 2016. pages 176
[168] H. G. Rice. Classes of recursively enumerable sets and their decision prob-
lems. Transactions of the American Mathematical Society, 74(2):358–366, 1953.
pages 7
[169] X. Rival and L. Mauborgne. The trace partitioning abstract domain. ACM
Trans. Program. Lang. Syst., 29(5), 2007. pages 116, 135, 136
[177] S. A. Seshia, S. Jha, and T. Dreossi. Semantic adversarial deep learning. IEEE
Des. Test, 37(2):8–18, 2020. pages 176
[178] D. She, K. Pei, D. Epstein, J. Yang, B. Ray, and S. Jana. NEUZZ: efficient
fuzzing with neural program smoothing. In Proc. IEEE Symposium on Security
and Privacy (S&P), pages 803–817, 2019. pages 108
[181] X. Si, H. Dai, M. Raghothaman, M. Naik, and L. Song. Learning loop invari-
ants for program verification. In Proc. Neural Information Processing Systems
(NeurIPS), pages 7751–7762, 2018. pages 108
[183] A. Simon and A. King. The two variable per inequality abstract domain.
Higher Order Symbolic Computation (HOSC), 23:87–143, 2010. pages 55, 58
[185] G. Singh, R. Ganvir, M. Püschel, and M. Vechev. Beyond the single neuron
convex barrier for neural network certification. In Advances in Neural Infor-
mation Processing Systems (NeurIPS), pages 15098–15109. 2019. pages vii, 15,
151
[186] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. Vechev. Fast and effective
robustness certification. In Proc. Advances in Neural Information Processing
Systems (NeurIPS), pages 10825–10836. 2018. pages vii, 13, 14, 15, 114, 122,
148, 149, 152, 166, 168
Bibliography 193
[188] G. Singh, T. Gehr, M. Püschel, and M. Vechev. An abstract domain for certi-
fying neural networks. Proc. ACM Program. Lang., 3(POPL):41:1–41:30, 2019.
pages vii, 13, 15, 16, 59, 116, 151, 152, 154, 163, 164, 166, 168, 169, 171
[192] G. Singh, M. Püschel, and M. Vechev. Fast numerical program analysis with
reinforcement learning. In Proc. Computer Aided Verification CAV, pages 211–
229, 2018. pages vii, 11, 91
[196] P. Tabacof and E. Valle. Exploring the space of adversarial images. In Proc.
International Joint Conference on Neural Networks (IJCNN), pages 426–433, 2016.
pages 170
[201] C. Urban and A. Miné. A decision tree abstract domain for proving condi-
tional termination. In Proc. Static Analysis Symposium (SAS), pages 302–318,
2014. pages 2, 24
[204] A. Venet and G. Brat. Precise and efficient static array bound checking for
large embedded C programs. In Proc. Programming Language Design and Im-
plementation (PLDI), pages 231–242, 2004. pages 2, 87
[205] A. J. Venet. The Gauge domain: Scalable analysis of linear inequality invari-
ants. In Proc. Computer Aided Verification (CAV), pages 139–154, 2012. pages
6
[206] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Efficient formal safety
analysis of neural networks. In Proc. Advances in Neural Information Processing
Systems (NeurIPS), pages 6369–6379. 2018. pages 13, 114, 148, 151, 152, 154,
163, 164, 166, 168
[207] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal security anal-
ysis of neural networks using symbolic intervals. In Proc. USENIX Security
Symposium (USENIX Security 18), pages 1599–1614, 2018. pages 168
[210] Z. Wei, J. Chen, X. Wei, L. Jiang, T. Chua, F. Zhou, and Y. Jiang. Heuristic
black-box adversarial attacks on video recognition models. In Proc. AAAI
Conference on Artificial Intelligence (AAAI), pages 12338–12345, 2020. pages
170
[221] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efficient neural
network robustness certification with general activation functions. In Proc.
Advances in Neural Information Processing Systems (NeurIPS). 2018. pages 13,
149, 152, 154, 163, 164, 168, 169, 171