0% found this document useful (0 votes)
87 views

21-Gagandeep Singh-Scalable Automated Reasoning For Programs and Deep Learning

Uploaded by

xiaoma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

21-Gagandeep Singh-Scalable Automated Reasoning For Programs and Deep Learning

Uploaded by

xiaoma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 215

ETH Library

Scalable Automated Reasoning for


Programs and Deep Learning

Doctoral Thesis

Author(s):
Singh, Gagandeep

Publication date:
2020

Permanent link:
https://doi.org/10.3929/ethz-b-000445921

Rights / license:
In Copyright - Non-Commercial Use Permitted

Funding acknowledgement:
163117 - Making Program Analysis Fast (SNF)

This page was generated automatically upon download from the ETH Zurich Research Collection.
For more information, please consult the Terms of use.
diss. eth no. 27096

S C A L A B L E A U T O M AT E D R E A S O N I N G F O R P R O G R A M S A N D
DEEP LEARNING

A dissertation submitted to attain the degree of


doctor of sciences of eth zurich
(Dr. sc. ETH Zurich)

by

gagandeep singh

Bachelor in Computer Science and Engineering, IIT Patna


Master in Computer Science, ETH Zurich
born on 22.09.1989
citizen of India

accepted on the recommendation of


Prof. Martin Vechev (advisor)
Prof. Markus Püschel (co-examiner)
Prof. Patrick Cousot (co-examiner)
Prof. Clark Barrett (co-examiner)

2020
ABSTRACT

With the widespread adoption of modern computing systems in different real-


world applications such as autonomous vehicles, medical diagnosis, and aviation,
it is critical to establish formal guarantees on their correctness before they are em-
ployed in the real-world. Automated formal reasoning about modern systems has
been one of the core problems in computer science and has therefore attracted con-
siderable interest from the research community. However, the problem has turned
out to be quite challenging because of the ever-increasing scale, complexity, and
diversity of these systems, which has so far limited the applicability of formal
methods for their automated analysis.
The central problem addressed in this dissertation is: are there generic methods
for designing fast and precise automated reasoning for modern systems? We fo-
cus on two practically important problem domains: numerical software and deep
learning models. We design new concepts, representations, and algorithms for pro-
viding the desired formal guarantees. We build our methodology on the elegant
abstract interpretation framework for static analysis, which enables automated rea-
soning about infinite concrete behaviors with finite computable representations.
Our methods are generic for the particular problem domain and allow precise
analysis beyond the reach of prior work.
For programs, we present a new theory of online decomposition that dynami-
cally decomposes expensive computations, thereby reducing their complexity with-
out any precision loss. Our theory is generic and can be used for decomposing all
existing subpolyhedra domains without sacrificing precision. We leverage data-
driven machine learning to further improve the performance of numerical pro-
gram analysis without significant precision loss. For neural networks, we designed
a new abstraction equipped with custom approximations of non-linearities com-
monly used in neural networks for fast and scalable analysis. We also created a
new convex relaxations framework that produces more precise relaxations than
possible with prior work. We provide a novel combination of our abstractions and
relaxation framework with precise solvers, which enables state-of-the-art certifica-
tion results.
This thesis presents two new publicly available software systems: ELINA and
ERAN. ELINA provides optimized implementations of the popular Polyhedra, Oc-
tagon, and Zone domain, based on our theory of online decomposition enabling
fast and precise analysis of large Linux device drivers containing > 500 variables
in a few seconds. ERAN contains our custom abstraction, convex relaxation frame-
work, and combination of relaxations with solvers for enabling fast and precise
analysis of large neural networks, containing tens of thousands of neurons, within

iii
a few seconds. Both the systems were developed from scratch, and are currently
state-of-the-art for their respective domains, producing results not possible with
other competing systems.

iv
Z U S A M M E N FA S S U N G

Durch die verbreitete Nutzung moderner Computersysteme in verschiedenen An-


wendungen, wie zum Beispiel autonomen Fahrzeugen, der medizinischen Diagno-
stik und der Luftfahrt, ist es entscheidend geworden die formale Korrektheit von
diesen Computersystemen zu beweisen, bevor sie in der echten Welt eingesetzt wer-
den. Die automatisierte Anwendung Formaler Methoden für moderne Systeme ist
eines der Kernprobleme der Informatik und hat daher in der Forschungsgemein-
schaft grosses Interesse geweckt. Aufgrund des ständig zunehmenden Umfangs,
der Komplexität und Vielfalt dieser Systeme, ist die Anwendbarkeit Formaler Me-
thoden bisher eingeschränkt gewesen. Daher hat sich dieses Problem als schwierig
erwiesen.
Das zentrale Problem, das in dieser Dissertation behandelt wird, lautet: Gibt es
generische Methoden um schnelle und präzise automatisierte Korrektheitsbeweise
für moderne Systeme zu entwerfen? Wir konzentrieren uns auf zwei praxisrele-
vante Problembereiche: numerische Software und künstliche neuronale Netze. Wir
entwerfen neue Konzepte, Repräsentationen und Algorithmen um die gewünsch-
ten formalen Garantien zu erhalten. Wir bauen unsere Methodik auf der eleganten
Theorie der abstrakten Interpretation für statische Analyse auf, welche automati-
sierte Beweisführung über unendliche, konkrete Verhaltensweisen mit endlichen,
berechenbaren Darstellungen ermöglicht. Unsere Methoden sind generisch für die
jeweiligen Problembereiche und ermöglichen eine präzise Analyse weit über exi-
stierende Werke hinaus.
Für numerische Software präsentieren wir eine neue Theorie der Online-
Zerlegung, die dynamisch teure Berechnungen zerlegt und dadurch die Komplexi-
tät ohne Präzisionsverlust reduziert. Unsere Theorie ist generisch und kann ohne
Präzisionsverlust zur Zerlegung aller bestehender Subpolyederdomänen verwen-
det werden. Wir setzen datengesteuertes maschinelles Lernen ein, um die Leistung
der numerischen Programmanalyse ohne signifikanten Präzisionsverlust weiter zu
verbessern. Für neuronale Netze entwerfen wir eine neue Abstraktion für schnel-
le und skalierbare Analysen, die mit anwendungsspezifischen Approximationen
von Nichtlinearitäten, welche üblicherweise in neuronalen Netzen verwendet wer-
den, ausgestattet ist. Wir entwerfen auch eine neue konvexe Relaxationsmethodik,
welche Relaxationen produziert, welche präziser sind als jene früherer Ansätze.
Wir stellen eine neuartige Kombination unserer Abstraktions- und Relaxationsme-
thodik mit präzisen Solvern zur Verfügung und ermöglichen damit Zertifizierung
nach dem neuesten Stand der Technik.
In dieser Arbeit stellen wir zwei neue, öffentlich verfügbare Softwaresysteme vor:
ELINA und ERAN. ELINA bietet optimierte Implementierungen der populären

v
Polyeder-, Oktagon- und Zonendomäne, basierend auf unserer Theorie der Online-
Zerlegung, die eine schnelle und präzise Analyse großer Linux-Gerätetreiber mit
> 500 Variablen in wenigen Sekunden ermöglicht. ERAN enthält unsere benutzer-
definierte Abstraktions- und konvexe Relaxationsmethodik, und eine Kombination
von Relaxationen mit Solvern, um eine schnelle und präzise Analyse großer neuro-
naler Netzwerke mit Zehntausenden von Neuronen innerhalb weniger Sekunden
zu ermöglichen. Beide Systeme wurden von Grund auf neu entwickelt, setzen den
derzeitigen Stand der Technik für ihren jeweiligen Bereich und erzielen Ergebnisse,
die ausserhalb der Reichweite konkurrierender Systeme liegen.

vi
P U B L I C AT I O N S

This thesis is based on the following publications:

• Gagandeep Singh, Markus Püschel, Martin Vechev.


Fast Polyhedra Abstract Domain.
ACM Principles of Programming Languages (POPL), 2017. [190]

• Gagandeep Singh, Markus Püschel, Martin Vechev.


A Practical Construction for Decomposing Numerical Abstract Domains.
ACM Principles of Programming Languages (POPL), 2018. [191]

• Gagandeep Singh, Markus Püschel, Martin Vechev.


Fast Numerical Program Analysis with Reinforcement Learning.
Computer Aided Verification (CAV), 2018. [192]

• Gagandeep Singh, Timon Gehr, Markus Püschel, Martin Vechev.


An Abstract Domain for Certifying Neural Networks.
ACM Principles of Programming Languages (POPL), 2019. [188]

• Gagandeep Singh, Timon Gehr, Markus Püschel, Martin Vechev.


Boosting Robustness Certification of Neural Networks.
International Conference on Learning Representations (ICLR), 2019. [187]

• Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, Martin Vechev.


Beyond the Single Neuron Convex Barrier for Neural Network Certification.
Neural Information Processing Systems (NeurIPS), 2019 [185]

The following publications were part of my Ph.D. research and contain results
that are supplemental to this work or build upon the results of this thesis:

• Gagandeep Singh, Markus Püschel, Martin Vechev.


Making Numerical Program Analysis Fast.
ACM Programming Language Design and Implementation (PLDI), 2015.
[189]

• Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, Martin


Vechev.
Fast and Effective Robustness Certification.
Neural Information Processing Systems (NeurIPS), 2018. [186]

vii
• Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, Mar-
tin Vechev.
Certifying Geometric Robustness of Neural Networks.
Neural Information Processing Systems (NeurIPS), 2019. [15]

• Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev.


Learning Fast and Precise Numerical Analysis.
ACM Programming Language Design and Implementation (PLDI), 2020. [96]

• Raphaël Dang Nhu, Gagandeep Singh, Pavol Bielik, Martin Vechev.


Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.
International Conference on Machine Learning (ICML), 2020. [62]

The following publications were part of my Ph.D. research and are available on
Arxiv:

• Matthew Mirman, Gagandeep Singh, Martin Vechev.


A Provable Defense for Deep Residual Networks.
Arxiv, 2019. [146]

• Wonryong Ryou, Jiayu Chen, Mislav Balunovic, Gagandeep Singh, Andrei


Dan, Martin Vechev.
Fast and Effective Robustness Certification for Recurrent Neural Networks.
Arxiv, 2020. [172]

• Christoph Müller, Gagandeep Singh, Markus Püschel, Martin Vechev.


Neural Network Robustness Verification on GPUs.
Arxiv, 2020. [152]

• Dimitar I. Dimitrov, Gagandeep Singh, Timon Gehr, Martin Vechev.


Scalable Inference of Symbolic Adversarial Examples.
Arxiv, 2020. [63]

viii
ACKNOWLEDGEMENTS

I would like to use this page to thank all those people that directly and indirectly
supported me throughout my doctoral studies.
My gratitude goes first of all to my advisors Prof. Markus Püschel and Prof.
Martin Vechev. Your valuable advice has shaped my perspective on research and
life. I would also like to express my gratitude to the reviewers: Prof. Patrick Cousot
and Prof. Clark Barrett, for providing constructive feedback on the thesis that I have
incorporated in the final version. I am thankful to the ETH faculty that helped me
at various points of my doctoral studies: Prof. Peter Müller, Prof. Zhendong Su,
Prof. Ghaffari Mohsen, and Prof. Srdjan Capkun.
I would like to acknowledge the co-authors of papers published during my Ph.D.
I really enjoyed working with all of you: Maximilian Baader, Mislav Balunovic,
Pavol Bielik, Jiayu Chen, Andrei Dan, Raphaël Dang Nhu, Dimitar I. Dimitrov,
Rupanshu Ganvir, Timon Gehr, Jingxuan He, Matthew Mirman, Christoph Müller,
and Wonryong Ryou.
I would also like to thank many past and present colleagues in the software
group at ETH Zurich, in particular Victoria Caparrós Cabezas, Makarchuk Gleb,
Georg Ofenbeck, Joao Rivera, Bastian Seifert, Francois Serre, Tyler Smith, Daniele
Spampinato, Alen Stojanov, Chris Wendler, Eliza Wszola, Luca Della Toffola, Afra
Amini, Benjamin Bichsel, Rudiger Birkner, Dana Drachsler Cohen, Dimitar K. Dim-
itrov, Marc Fischer, Inna Grijnevitch, Viktor Ivanov, Pesho Ivanov, Jonathan Mauer,
Sasa Misailovic, Rumen Paletov, Momchil Peychev, Veselin Raychev, Anian Ruoss,
Samuel Steffen, Petar Tsankov, Vytautas Astrauskas, Lucas Brutschy, Alexandra
Bugariu, Fábio Pakk Selmi Dei, Jérôme Dohrau, Marco Eilers, Uri Juhasz, Gau-
rav Parthasarathy, Federico Poli, Alexander Summers, Caterina Urban, Arshavir
Ter Gabrielyan, and Manuel Rigger for many insightful discussions about research
and beyond.
I would also like to thank the administrative staff at ETH that helped me navigate
through many of the Swiss rules related to immigration and beyond: Fiorella Meyer,
Mirella Rutz, Sandra Schneider, and Marlies Weissert.
I am grateful to all my friends and flatmates in Zurich. Without you, my time
in Zurich would have been quite boring. Special thanks to Kushagra Alankar and
Jagannath Biswakarma, we started our journey in Zurich together in the same
building. It has been great knowing you all these years and I will cherish our great
memories of cooking, traveling, and of course, watching cricket together.
Finally, I would like to thank my parents and sister, without whom none of this
would have been possible.

ix
CONTENTS

abstract iii

acknowledgments ix

1 introduction 1
1.1 Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Fast and Precise Numerical Program Analysis . . . . . . . . . . . . . 7
1.3 Fast and Precise Neural Network Certification . . . . . . . . . . . . . 11

I Fast and Precise Numerical Program Analysis 19


2 fast polyhedra analysis via online decomposition 21
2.1 Background on Polyhedra Analysis . . . . . . . . . . . . . . . . . . . 22
2.1.1 Representation of Polyhedra . . . . . . . . . . . . . . . . . . . 23
2.1.2 Polyhedra Domain . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.3 Polyhedra Domain Analysis: Example . . . . . . . . . . . . . . 26
2.1.4 Transformers and Asymptotic Complexity . . . . . . . . . . . 27
2.2 Polyhedra Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2 Transformers and Partitions . . . . . . . . . . . . . . . . . . . . 30
2.3 Polyhedra Domain Analysis with Partitions . . . . . . . . . . . . . . . 35
2.3.1 Polyhedra Encoding . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Transformers and Permissible Partitions . . . . . . . . . . . . . 36
2.4 Polyhedra transformers . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 Auxiliary Transformers . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Meet (u) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3 Inclusion (v) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.4 Conditional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.5 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.6 Widening (∇) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.7 Join (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 generalizing online decomposition 55

xi
xii Contents

3.1 Generic Model for Numerical Abstract Domains . . . . . . . . . . . . 56


3.2 Decomposing Abstract Elements . . . . . . . . . . . . . . . . . . . . . 60
3.3 Recipe for Decomposing Transformers . . . . . . . . . . . . . . . . . . 61
3.4 Decomposing Domain Transformers . . . . . . . . . . . . . . . . . . . 63
3.4.1 Conditional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.2 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.3 Meet (u) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.4 Join (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4.5 Widening (5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.1 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5.2 Octagon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5.3 Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4 reinforcement learning for numerical domains 89


4.1 Reinforcement Learning for Static Analysis . . . . . . . . . . . . . . . 92
4.1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 92
4.1.2 Instantiation of RL to Static Analysis . . . . . . . . . . . . . . . 94
4.2 Polyhedra Analysis and Approximate Transformers . . . . . . . . . . 95
4.2.1 Block Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2.2 Merging of Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.3 Approximation for Polyhedra Analysis . . . . . . . . . . . . . 98
4.3 Reinforcement Learning for Polyhedra Analysis . . . . . . . . . . . . 100
4.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

II Fast and Precise Neural Network Certication 111


5 deeppoly domain for certifying neural networks 113
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.1.1 Running example on a fully-connected network with ReLU
activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2 Background: Neural Networks and Adversarial Regions . . . . . . . 123
5.3 Abstract Domain and Transformers . . . . . . . . . . . . . . . . . . . 125
5.3.1 ReLU Abstract Transformer . . . . . . . . . . . . . . . . . . . . 125
5.3.2 Sigmoid and Tanh Abstract Transformers . . . . . . . . . . . . 126
5.3.3 Maxpool Abstract Transformer . . . . . . . . . . . . . . . . . . 126
5.3.4 Affine Abstract Transformer . . . . . . . . . . . . . . . . . . . . 127
Contents xiii

5.3.5 Neural Network Robustness Analysis . . . . . . . . . . . . . . 127


5.3.6 Correctness of Abstract Transformers . . . . . . . . . . . . . . 128
5.3.7 Soundness under Floating-Point Arithmetic . . . . . . . . . . 133
5.4 Refinement of Analysis Results . . . . . . . . . . . . . . . . . . . . . . 135
5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.5.2 L∞ -Norm Perturbation . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.3 Rotation perturbation . . . . . . . . . . . . . . . . . . . . . . . 146
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6 combining abstractions with solvers 149


6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2 Refinement with solvers . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.3 k-ReLU relaxation framework . . . . . . . . . . . . . . . . . . . . . . . 159
6.3.1 Best convex relaxation . . . . . . . . . . . . . . . . . . . . . . . 160
6.3.2 1-ReLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.3.3 k-ReLU relaxations . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4 Instantiating the k-ReLU framework . . . . . . . . . . . . . . . . . . . 162
6.4.1 Computing key parameters . . . . . . . . . . . . . . . . . . . . 163
6.4.2 Certification and refinement with k-ReLU framework . . . . . 164
6.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.5.1 Complete certification . . . . . . . . . . . . . . . . . . . . . . . 166
6.5.2 Incomplete certification . . . . . . . . . . . . . . . . . . . . . . 167
6.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.6.1 Neural Network Certification . . . . . . . . . . . . . . . . . . . 168
6.6.2 Constructing adversarial examples . . . . . . . . . . . . . . . . 170
6.6.3 Adversarial training . . . . . . . . . . . . . . . . . . . . . . . . 170
6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7 conclusion and future work 173


7.1 Numerical Program analysis . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2 Neural network certification . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3 formal reasoning about cyber-physical systems . . . . . . . . . . . . . 176
LIST OF FIGURES

Figure 1.1 The high-level idea behind abstract interpretation. . . . . . . 2


Figure 1.2 Number of problem instances certified by different certifiers
at VNN-COMP’20. . . . . . . . . . . . . . . . . . . . . . . . . . 4
Figure 1.3 An example of online decomposition. . . . . . . . . . . . . . 8
Figure 1.4 Precision and cost of the Zone, Octagon and Polyhedra do-
main with and without ELINA. . . . . . . . . . . . . . . . . . 9
Figure 1.5 Three Polyhedra analysis traces, the left-most and middle
trace obtain precise results (the polyhedron at the bottom),
however the analysis cost of the middle trace is lower. The
right-most trace obtains imprecise result. . . . . . . . . . . . 10
Figure 1.6 Neural network certification problem. . . . . . . . . . . . . . 12
Figure 1.7 Different dimensions of the neural network certification
problem. The text in green, blue, and black respectively rep-
resent cases included in this thesis, those that we consider
in our work but not covered in this thesis, those that are not
considered in our work. . . . . . . . . . . . . . . . . . . . . . 14
Figure 1.8 ERAN certification framework. . . . . . . . . . . . . . . . . . 16
Figure 2.1 Two representations of polyhedron defined over variables x1
and x2 . (a) Bounded polyhedron; (b) unbounded polyhedron. 24
Figure 2.2 Code with assertion for static analysis. . . . . . . . . . . . . . 25
Figure 2.3 Polyhedra domain analysis (first iteration) on the example
program on the left. The polyhedra are shown in constraint
representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.4 Two examples of P t Q with πP = πQ = {{x1 }, {x2 }}. (a)P1 6=
Q1 , P2 6= Q2 ; (b) P1 = Q1 , P2 6= Q2 . . . . . . . . . . . . . . . . . 33
Figure 2.5 Example of complexity reduction through decomposition for
Polyhedra analysis on an example program. . . . . . . . . . . 37
Figure 2.6 Precision loss for static partitioning. . . . . . . . . . . . . . . 49
Figure 2.7 The join transformer during the analysis of the
usb_core_main0 benchmark. The x-axis shows the join
number and the y-axis shows the number of variables in
N = A∈π\U A (subset of variables affected by the join)
S

and in X. The first figure shows these values for all joins
whereas the second figure shows it for one of the expensive
regions of the analysis. . . . . . . . . . . . . . . . . . . . . . . 53
Figure 4.1 Policies for balancing precision and speed in static analysis. 89
Figure 4.2 Reinforcement learning for static analysis. . . . . . . . . . . . 90

xv
xvi List of Figures

Figure 4.3 Graph G for P(Xt ) in Example 4.2.1 . . . . . . . . . . . . . . . 97


Figure 5.1 Two different attacks applied to MNIST images. . . . . . . . 115
Figure 5.2 Example fully-connected neural network with ReLU activa-
tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Figure 5.3 The neural network from Fig. 5.2 transformed for analysis
with the DeepPoly abstract domain. . . . . . . . . . . . . . . 118
Figure 5.4 Convex approximations for the ReLU function: (a) shows
the convex approximation [69] with the minimum area in the
input-output plane, (b) and (c) show the two convex approx-
imations used in DeepPoly. In the figure, λ = ui /(ui − li )
and µ = −li · ui /(ui − li ). . . . . . . . . . . . . . . . . . . . . . 120
Figure 5.5 Certified robustness and average runtime for L∞ -norm per-
turbations by DeepPoly against AI2 , Fast-Lin, and DeepZ on
the MNIST FFNNSmall. DeepZ and Fast-Lin are equivalent in
robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Figure 5.6 Certified robustness and average runtime for L∞ -norm per-
turbations by DeepPoly and DeepZ on the MNIST FFNNMed
and FFNNBig networks. . . . . . . . . . . . . . . . . . . . . . . . 141
Figure 5.7 Average percentage of ReLU inputs that can take both pos-
itive and negative values for DeepPoly and DeepZ on the
MNIST FFNNSmall and FFNNMed networks. . . . . . . . . . . . . 142
Figure 5.8 Certified robustness and average runtime for L∞ -norm
perturbations by DeepPoly and DeepZ on the MNIST
FFNNSigmoid and FFNNTanh networks. . . . . . . . . . . . . . . . 142
Figure 5.9 Certified robustness and average runtime for L∞ -norm per-
turbations by DeepPoly and DeepZ on the MNIST ConvSmall
networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Figure 5.10 Certified robustness and average runtime for L∞ -norm per-
turbations by DeepPoly and DeepZ on the CIFAR10 fully-
connected networks. . . . . . . . . . . . . . . . . . . . . . . . . 144
Figure 5.11 Certified robustness and average runtime for L∞ -norm
perturbations by DeepPoly and DeepZ on the CIFAR10
ConvSmall networks. . . . . . . . . . . . . . . . . . . . . . . . . 145
Figure 5.12 Results for robustness against rotations with the MNIST
FFNNSmall network. Each row shows a different attempt to
prove that the given image of the digit 3 can be perturbed
within an L∞ ball of radius  = 0.001 and rotated by an ar-
bitrary angle θ between −45 to 65 degrees without changing
its classification. For the last two attempts, we show 4 repre-
sentative combined regions (out of 220, one per batch). The
running time is split into two components: (i) the time used
for interval analysis on the rotation algorithm and (ii), the
time used to prove the neural network robust with all of the
computed bounding boxes using DeepPoly. . . . . . . . . . . 146
Figure 6.1 The input space for the ReLU assignments y1 := ReLU(x1 ),
y2 := ReLU(x2 ) is shown on the left in blue. Shapes of the
relaxations projected to 3D are shown on the right in red. . . 150
Figure 6.2 Certification of property x9 6 2. Refining DeepPoly with 1-
ReLU fails to prove the property whereas 2-ReLU adds extra
constraints (in green) that help in verifying the property. . . 153
Figure 6.3 DeepPoly relaxations for xi :=ReLU(xj ) using the original
bounds lj , uj (in blue) and the refined bounds lj0 , uj0 (in green)
for xj . The refined relaxations have smaller area in the xi xj -
plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Figure 6.4 Steps to instantiating the k-ReLU framework. . . . . . . . . . 163

L I S T O F TA B L E S

Table 1.1 Polyhedra analysis of Linux device drivers with ELINA vs.
PPL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table 2.1 Asymptotic complexity of Polyhedra operators with differ-
ent representations. . . . . . . . . . . . . . . . . . . . . . . . . 28
Table 2.2 Asymptotic time complexity of Polyhedra operators with de-
composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 2.3 Speedup of Polyhedra domain analysis for ELINA over
NewPolka and PPL. . . . . . . . . . . . . . . . . . . . . . . . 51
Table 2.4 Partition statistics for Polyhedra analysis with ELINA. . . . 52
Table 3.1 Instantiation of constraints expressible in various numerical
domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 3.2 Speedup for the Polyhedra analysis with our decomposition
vs. PPL and ELINA. . . . . . . . . . . . . . . . . . . . . . . . 80
Table 3.3 Partition statistics for the Polyhedra domain analysis. . . . 81
Table 3.4 Asymptotic time complexity of the Octagon transformers. . 82

xvii
xviii List of Tables

Table 3.5 Speedup for the Octagon domain analysis with our decom-
position over the non-decomposed and the decomposed ver-
sions of ELINA. . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 3.6 Partition statistics for the Octagon domain analysis. . . . . . 83
Table 3.7 Speedup for the Zone domain analysis with our decomposi-
tion over the non-decomposed implementation. . . . . . . . 84
Table 3.8 Partition statistics for the Zone domain analysis. . . . . . . . 84
Table 4.1 Mapping of RL concepts to Static analysis concepts. . . . . . 94
Table 4.2 Features for describing RL state s (m ∈ {1, 2}, 0 6 j 6 8, 0 6
h 6 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table 4.3 Instantiation of Q-learning to Polyhedra domain analysis. . 103
Table 4.4 Timings (seconds) and precision of approximations (%) w.r.t.
ELINA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Table 5.1 Neural network architectures used in our experiments. . . . 139
Table 5.2 Certified robustness by DeepZ and DeepPoly on the large
convolutional networks trained with DiffAI. . . . . . . . . . . 143
Table 6.1 Volume of the output bounding box from kPoly on the
MNIST FFNNMed network. . . . . . . . . . . . . . . . . . . . . . 151
Table 6.2 Neural network architectures and parameters used in our
experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Table 6.3 Number of certified adversarial regions and runtime of
kPoly vs. DeepPoly and RefineZono. . . . . . . . . . . . . . . 166
1
INTRODUCTION

Providing formal guarantees on the safety and reliability of modern computing


systems is of fundamental importance in today’s digital society. However, ensuring
this objective has turned out to be a very challenging research problem as modern
systems continue to grow in scale, complexity, and diversity, which dictates the
need for ever more efficient, advanced, and rigorous methods able to provide the
required guarantees. In this thesis, we address this challenge with novel automated
reasoning methods for two critical problem domains: programs and deep learning
models. Our methods come with clean theoretical guarantees and scale the precise
and fast analysis of realistic systems beyond the reach of existing methods, thereby
advancing the state-of-the-art in automated formal reasoning.
Our methodology is based on the theory of abstract interpretation [55], an el-
egant mathematical framework for statically overapproximating (potentially infi-
nite) concrete behaviors with a finite representation. Abstract interpretation has
been successfully applied in many domains including the safety and correctness
verification of critical avionics software [27], windows libraries [134], untrusted
Linux kernels [80], hybrid systems [97], distributed networks [23], embedded sys-
tems [111], and neural networks [78]. Fig. 1.1 depicts the high-level idea behind
abstract interpretation. Assume we want to compute a function f for a given set
of inputs φ and check if a safety property ψ holds, i.e., the set of outputs satisfies
f(φ) ⊆ ψ. In most cases, and for the problem instances considered in this work,
computing the entire set f(φ) exactly is impossible. Abstract interpretation instead
provides a framework to compute an overapproximation g(φ) ⊇ f(φ). g(φ) can
then be used as a proxy for proving the safety property on f(φ) since g(φ) ⊆ ψ
implies f(φ) ⊆ ψ.
There is an inherent precision/cost tradeoff in computing g(φ). An imprecise
g(φ), as in Fig. 1.1 (a), is fast to compute, but may be insufficient for proving ψ. On
the other hand, computing a precise g(φ), as in Fig. 1.1 (b), can be too expensive
for large benchmarks. Thus the main challenge in the adoption of abstract interpre-
tation for analyzing real-world systems lies in designing methods for computing
g(φ) that are both fast and precise. Further, such methods should be as generic as
possible so that they can be applied to a variety of computing systems. This leads
us to the main research question that we address in this thesis:

1
2 introduction

Figure 1.1: The high-level idea behind abstract interpretation.


Are there generic methods for designing fast and precise automated analyzers for mod-
ern systems?

main contributions Our core contributions are at the intersection of pro-


gramming language and machine learning research. We target numerical functions
f and properties ψ with abstract interpretation in two critical and challenging do-
mains: programs and deep learning models. Numerical abstract interpretation is an
essential component of modern program analyzers [27, 80, 134, 156, 200, 201, 204,
209]. It is crucial, for example, for proving in programs the absence of critical bugs
such as buffer overflow, division by zero, or non-termination. Our contributions
are divided into two lines of work:

1. Fast and precise numerical program analysis: We develop general methods to


make numerical program analysis significantly faster, often by orders of mag-
nitude, without losing precision. This enables precise analysis of real-world
programs beyond the reach of prior work.

2. Fast and precise neural network certification: We designed new automated meth-
ods for certifying large real-world neural networks. Our methods enable state-
of-the-art certification beyond the reach of other existing methods.

We briefly expand on these contributions next.

fast and precise numerical program analysis We developed a theo-


retical framework for speeding up existing numerical program analyzers without
any loss of precision. Our key idea is to dynamically decompose the computation
of g(φ) to reduce the asymptotic cost and enable orders of magnitude speedup
in many practical cases. We implemented our theory in the form of a library for
numerical program analysis, called ELINA (ETH Library for Numerical Analysis).
ELINA was developed from scratch, has ≈ 70K lines of code (LOC) in C, and is
currently the state-of-the-art for numerical analysis.
Table 1.1 compares the runtime and memory consumption for the Polyhedra
analysis of large real-world Linux device drivers with ELINA and state-of-the-art
Parma Polyhedra Library (PPL) [12]. The second column shows the maximum
introduction 3

number of variables occurring in a polyhedron during the analysis. In the table, the
entry TO means that the analysis timed out after 4 hours while entry MO represents
memory consumption > 16 GB. The last column of Table 1.1 shows the speedup
of ELINA over PPL. We provide a lower bound on the speedup in the case of a
timeout with PPL. The speedup when the Polyhedra analysis with PPL runs out
of memory is ∞ as it can never finish on the given machine. It can be seen that
ELINA is significantly more memory and time-efficient than PPL. It finishes the
analysis of 12 out of 13 benchmarks in a few seconds and never consumes more
than 1 GB of memory. In contrast, PPL either times out or runs out of memory
on 8 benchmarks. Currently, ELINA is used in several research projects in both
academia and industry. Its Github repository at the time of this writing has 77
stars and 31 forks.

Table 1.1: Polyhedra analysis of Linux device drivers with ELINA vs. PPL.
Benchmark n PPL ELINA Speedup
time(s) memory(GB) time(s) memory(GB)
firewire_firedtv 159 331 0.9 0.2 0.2 1527
net_fddi_skfp 589 6142 7.2 4.4 0.3 1386
mtd_ubi 528 MO MO 1.9 0.3 ∞
usb_core_main0 365 4003 1.4 29 0.7 136
tty_synclinkmp 332 MO MO 2.5 0.1 ∞
scsi_advansys 282 TO TO 3.4 0.2 >4183
staging_vt6656 675 TO TO 0.5 0.1 >28800
net_ppp 218 10530 0.1 891 0.1 11.8
p10_l00 303 121 0.9 5.4 0.2 22.4
p16_l40 874 MO MO 2.9 0.4 ∞
p12_l57 921 MO MO 6.5 0.3 ∞
p13_l53 1631 MO MO 25 0.9 ∞
p19_l59 1272 MO MO 12 0.6 ∞

In a second step, we leveraged machine learning for further speeding up nu-


merical reasoning in ELINA. Our key insight here is that there are redundant
computations during the intermediate steps for computing g(φ). To remove this
redundancy, we established a new connection between reinforcement learning and
program analysis by instantiating the concepts of reinforcement learning for nu-
merical program analysis and learned adaptive heuristics for further speeding up
ELINA. Our results show further speedups over ELINA of up to 1 or 2 orders of
magnitude with little loss in precision.
4 introduction

ReLU-FCN Tool Comparison COLT-CNN Tool Comparison


1000
Fifteen Minutes Five Minutes

All Instances
All Instances
100
100
Time (sec)

Time (sec)
NNV 10
10 Venus NNV
PeregriNN MIPVerify
MIPVerify Oval
1 VeriNet 1 VeriNet
nnenum nnenum
ERAN ERAN

0 20 40 60 80 100 120 140 0 50 100 150 200 250 300


Number of Instances Certified Number of Instances Certified

(a) (b)

OVAL-CNN Tool Comparison Sigmoid-Tanh Tool Comparison


1000
One Hour Fifteen Minutes

All Instances

All Instances
1000
100
Time (sec)

Time (sec)
100
10
10

1 1 NNV
ERAN Verinet
Oval ERAN

0 50 100 150 200 250 300 0 20 40 60 80 100 120


Number of Instances Certified Number of Instances Certified

(c) (d)

Figure 1.2: Number of problem instances certified by different certifiers at VNN-COMP’20.

fast and precise neural network certification In a second, indepen-


dent line of work, we applied numerical abstract interpretation to certify the safety
and robustness of deep neural networks. Our key idea is to design custom methods
for precisely approximating the non-linearities such as ReLU and sigmoid used in
modern neural networks. We designed numerical reasoning methods for the pre-
cise and scalable certification of large neural networks containing tens of thousands
of neurons within a few minutes. Our methods are implemented in the form of a
certifier called ERAN (ETH Robustness Analyzer for Neural Networks). ERAN is
currently the state-of-the-art for neural network certification and can certify proper-
ties beyond the reach of existing certifiers. It recently verified the highest number of
benchmarks in the neural network verification competition (VNN-COMP’20) held
as part of CAV’20.
Fig. 1.2 shows the number of robustness based specifications certified by
the different competing state-of-the-art certifiers in the 4 (out of 5) most chal-
lenging categories of the VNN-COMP’20 competition within a specified time
limit. The plots are taken from the competition report publicly available at
https://sites.google.com/view/vnn20/vnncomp. Fig. 1.2 (a) compares the certi-
fication results on 3 ReLU-based MNIST fully-connected networks with 7 different
certifiers participating in this category. The number of problem instances in this
category was 150; the time limit was set to 15 minutes per instance. The largest
network had 6 layers containing 256 neurons each. It can be seen that ERAN (in
green) certified the most certifying 112/150 instances.
Fig. 1.2 (b) compares the results on 4 MNIST and CIFAR10 ReLU-based convolu-
tional networks with 6 competing certifiers. These networks were trained by COLT
introduction 5

[148] and currently achieve state-of-the-art accuracy and provable robustness for
their respective datasets. The number of problem instances in this category was
337; the time limit was set to 5 minutes per instance. The largest network in this
category had ≈ 50K neurons. Out of the 6, ERAN (in purple) certified the highest
number of instances certifying 266/337 instances.
Fig. 1.2 (c) compares the results on 3 CIFAR10 ReLU-based convolutional net-
works trained by [135]. The largest network had > 6K neurons. The specifications
for these networks are harder; therefore, the time limit was increased to 1 hour per
instance. Only 2 certifiers competed in this category with ERAN (in red) certifying
286/300 instances within the time limit which is 9 more than the competition.
Fig. 1.2 (d) compares the number of instances certified on the MNIST Sigmoid
and Tanh based fully-connected networks. The number of instances in this category
was 128; the time limit was set to 15 minutes per instance. The largest network had
3 layers and 350 neurons. Only 3 competing certifiers supported networks with
Sigmoid and Tanh activations. It can be seen that ERAN (in sky blue) certified the
highest number of problem instances.
On the remaining category, ERAN verified the second-highest number of bench-
marks. We note that ERAN is the only verifier that competed in all the categories
demonstrating its ability to handle diverse networks and specifications in a precise
and scalable manner. Based on its demonstrated performance and flexibility, ERAN
is already widely used in several research projects in both academia and industry.
Its repository on Github at the time of this writing has 145 stars and 49 forks.

main contributions: impact. In summary, this thesis provides state-of-the-


art methods for numerical program analysis and neural network certification. Our
methods scale to problem sizes not possible with prior work and are also precise.
We note that our approach for numerical program analysis differs significantly
from that for neural network certification. This is because the occurring functions
f are different in these domains. In programs, f is unbounded but often decom-
posable while f for neural networks is bounded but non-decomposable and highly
non-linear. Further, programs typically have loops, and thus their analysis may iter-
ate many times before reaching a fixpoint. In contrast, neural networks do not have
loops but contain orders of magnitude more variables than programs. As a result,
our novel and generic methods for program analysis are not suited for neural net-
works and vice-versa. Both ELINA and ERAN have a significant number of users
and thus our work already has real-world impact. Thus, in summary, we believe
that this thesis expands the limits of formal methods for ensuring the safety and
reliability of real-world systems.
Our work opens up a number of promising research directions such as applying
decomposition beyond numerical reasoning for programs (e.g., heaps), using ma-
chine learning techniques for improving the speed of formal methods in other do-
mains (e.g., operating systems), applying automated formal reasoning of machine
learning models to more general specifications (e.g., probabilistic) and models (e.g.,
6 introduction

generative), and formal reasoning of cyber-physical systems with machine learning


components (e.g., self-driving cars, robots). We provide more details in Chapter 7.
Next, we provide a brief informal overview of the abstract interpretation frame-
work to discuss our contributions in greater detail.

1.1 abstract interpretation

Abstract interpretation is a theory for computing overapproximations (g(φ) in


Fig. 1.1) of (potentially infinite) concrete sets (f(φ) in Fig. 1.1) through static anal-
ysis without actually running the system. The key concept in abstract interpreta-
tion is that of an abstract domain. An abstract domain has two main components:
abstract elements and abstract transformers. Abstract elements represent an over-
approximation of the potentially infinite and uncomputable concrete sets. The ab-
stract transformers operating over abstract elements overapproximate the effect of
applying transformations in the computing system on the corresponding concrete
sets.

Example 1.1.1. Consider a concrete set C containing only the integers that are both
positive and even. An abstraction of C in the Sign domain will represent it finitely
using its sign, i.e., +. A transformation on C which multiplies a negative integer
with C can be overapproximated by an abstract transformer that operates on +
and returns − approximating the concrete output containing even and negative
integers.

In this thesis, we focus on numerical abstract interpretation, i.e., both f(φ) and
its approximations g(φ) in Fig. 1.1 computed via an abstract domain are numerical
functions. There is a tradeoff between the expressivity of a numerical domain and
its cost. The Polyhedra domain [57] is the most expressive linear relational domain
as it can capture all linear constraints between variables, i.e., constraints of the form
P
i ai · xi 6 b where ai , b ∈ Z and xi is a variable. In an ideal setting, one would
simply use the Polyhedra domain for numerical analysis. However, it has a worst-
case exponential cost in time and space. Thus, an analyzer using Polyhedra can
easily fail to analyze large programs by running out of memory or by timing out.
Because of this, the Polyhedra domain has been often thought to be impractical
[51, 122, 176]. On the other hand, the Interval domain [54] has a linear cost but is
very imprecise as it captures only constraints of the form li 6 xi 6 ui where li , ui ∈
Z, thus ignoring relationships between variables. To balance the expressivity-cost
tradeoff, researchers have designed domains that limit a domain’s expressivity in
exchange for better asymptotic complexity. Examples include Octagon [142], Zone
[140], Pentagon [134], SubPolyhedra [122] and Gauge [205].
1.2 fast and precise numerical program analysis 7

1.2 fast and precise numerical program analysis

The goal of numerical program analysis is computing the set f(φ) of numerical
values that the program variables can take during all program executions starting
from an initial program state φ. Here, φ encodes the set of initial variable values.
The problem of exactly computing f(φ) is in general undecidable for programs
due to Rice’s Theorem [168]. Numerical domains allow obtaining a numerical over-
approximation g(φ) of f(φ). The design of these domains for program analysis
remains an art requiring considerable and rare expertise. The various available
domains tend to work well for the particular applications for which they were de-
signed but may not be as effective on others. For example, the Octagon domain
with variable packing is fast and precise for analyzing avionics software [27], but it
is not as effective for windows libraries. Similarly, the Pentagon domain [134] used
for effectively analyzing Windows libraries is imprecise for analyzing avionics soft-
ware. The question then is if there are general methods for improving the speed of
existing domains without sacrificing precision.
The starting point of our work is in identifying the considerable redundancy in
numerical program analysis: unnecessary computations that are performed with-
out affecting the final results. Removing this redundancy reduces the cost of anal-
ysis and improves speed without sacrificing precision. Numerical analysis has two
types of redundancies:
• Single-step redundancy: At each step of the analysis, redundant computa-
tions are performed which do not affect the output of that step.

• Redundancy across analysis steps: The results of expensive computations are


sometimes not needed to compute the final results as they are discarded later
in the analysis.
To remove the single-step redundancy, we next present our theoretical frame-
work of online decomposition.

key idea: online decomposition. Online decomposition is based on the


observation that the set of program variables can be partitioned into subsets with
respect to the abstract elements computed during the analysis such that linear con-
straints exist only between the variables in the same subset. This partitioning thus
decomposes the corresponding abstract elements into smaller elements. The trans-
formers then operate over these smaller elements, which reduces their asymptotic
complexity without losing precision.
Example 1.2.1. Fig. 1.3 shows an example of online decomposition where a poly-
hedron P is defined over 6 variables x1 − x6 . The variable set can be decomposed
into a partition πP with three subsets {{x1 , x2 , x3 }, {x4 , x5 }, {x6 }} . Using πP , P can be
decomposed into three pieces without losing any precision. The assignment trans-
former for x2 := 2x4 then only needs to be applied on the parts that it affects, in this
8 introduction

Figure 1.3: An example of online decomposition.

case, the first two, leaving the third part unchanged. This reduces complexity and
improves performance.

The key challenge is in maintaining the partition dynamically as different trans-


formers modify the partitioning in non-trivial ways during the analysis. Instead
statically fixing the partition ahead of the analysis significantly loses precision as
the modifications are based on the computed abstract elements. For example, the
partition πO of the output O in Fig. 1.3 depends on πP which is dynamically com-
puted from P. Thus, the main challenge is in maintaining the partitions efficiently
and dynamically during analysis without losing precision. Prior work from [93, 94]
applied online decomposition for speeding up the Polyhedra domain. However,
the computed partitions were too coarse limiting the resulting speed up. Moreover,
their approach is not general and does not work for decomposing other existing
domains.
Our main contribution here is the design of mathematically clean, computation-
ally efficient, and general algorithms for computing partitions of the output for
all transformers (usually 40 or more) without precision loss. Our algorithms can
efficiently decompose all existing subpolyhedra numerical domains thus enabling
fast and precise analysis of large real-world programs beyond the reach of existing
approaches.

online decomposition for subpolyhedra domains. Our work [190] is


the first to show the applicability of the exponentially expensive Polyhedra abstrac-
tion for analyzing large real-world benchmarks. Prior work on applying online
decomposition for improving Polyhedra performance [93, 94] was based on syntac-
tic information from the input constraints and produced coarse partitions limiting
its effectiveness for analyzing large programs. We designed algorithms for produc-
ing finer partitions based on semantic information which drastically reduces the
observed complexity thus often obtaining two, three, or more orders of magnitude
speedup over existing state-of-the-art approaches. We describe this work in detail
in Chapter 3.
1.2 fast and precise numerical program analysis 9

Figure 1.4: Precision and cost of the Zone, Octagon and Polyhedra domain with and with-
out ELINA.

In our following work [191], we generalized the applicability of online decom-


position from the Polyhedra to all subpolyhedra domains. We provided theory to
refactor existing implementations of these domains to support decomposition. Our
theory gives a general construction for obtaining decomposed transformers from
original non-decomposed transformers for any subpolyhedra abstraction. We iden-
tified theoretical conditions where the decomposed analysis has the same precision
as the original non-decomposed analysis. We then showed that these conditions are
met by existing abstract domains. This allows for improving the performance of all
existing domains without sacrificing any precision. Our construction is described
in detail in Chapter 3. Follow up work [56] from Cousot et al. showed that online
decomposition can be generalized for decomposing any relational abstraction, not
only numerical.

elina library. We have implemented all of the methods in the publicly avail-
able library ELINA available at https://github.com/eth-sri/ELINA. ELINA is the
current state-of-the-art for numerical domains containing complete end-to-end im-
plementations of the popular and expensive Zone, Octagon, and Polyhedra do-
mains. Besides online decomposition, ELINA also contains domain-specific algo-
rithmic improvements and performance optimizations including for cache locality,
vectorization, and more. Fig. 1.4 shows the cost and precision of the Zone, Octagon,
and Polyhedra domain analysis with and without ELINA. Both Zone and Octagon
domains have asymptotic worst-case cubic cost, while that of Polyhedra is expo-
10 introduction

Figure 1.5: Three Polyhedra analysis traces, the left-most and middle trace obtain precise
results (the polyhedron at the bottom), however the analysis cost of the middle
trace is lower. The right-most trace obtains imprecise result.

nential. It can be seen in Fig. 1.4 that ELINA reduces the cost of domain analysis
significantly without affecting precision.
ELINA enables the analysis of large real-world software, for example, Linux
device drivers, containing thousands of lines of code and >500 variables with the
exponentially expensive Polyhedra in few seconds. Before our work, Polyhedra
analysis was considered practically infeasible for such benchmarks as the resulting
analysis did not finish even after several hours. For the Octagon analysis, ELINA
gave typical speedups up to 32x faster than prior state-of-the-art libraries. ELINA
is currently used for numerical analysis in several projects, including the popular
Seahorn verification framework [91].
Online decomposition is effective at removing redundant computations at each
analysis step without losing precision. Next we discuss our approach for removing
redundancies across different analysis steps.

machine learning for program analysis. Our next key observation for
speeding up numerical program analysis is that the results of all abstract trans-
formers applied for obtaining the final result in a sequence need not be precise,
i.e., it is possible to apply imprecise intermediate transformers and still obtain a
precise result. This points to redundancy in the abstract sequences. This redun-
dancy occurs because some of the precise intermediate results are discarded later
in the analysis. For example, an assignment transformer may remove all previous
constraints involving the assigned variable.
1.3 fast and precise neural network certification 11

Example 1.2.2. Fig. 1.5 shows three Polyhedra analysis traces for an overview of
our approach. The nodes in the traces are the Polyhedra, and the edges represent
Polyhedra transformers. The green and orange nodes respectively denote precise
and imprecise Polyhedra. Similarly, the green and orange edges respectively denote
precise but expensive and imprecise but fast transformers. In the left-most trace,
the precise transformer is applied at each step to obtain a precise final result. In
the middle trace, an approximate transformer is applied at the first node, however,
this does not affect the final result computed faster. The choice of the node for the
approximate transformer is crucial, as the final result in the right-most trace after
applying the approximate transformer at the second node is imprecise.

Our key idea for removing redundancy across sequences is learning policies via
reinforcement learning for selectively losing precision at different analysis steps
such that the performance improves while the precision loss is as little as pos-
sible. We take this approach as in practice, hand-crafted or fixed policies often
yield suboptimal results as the resulting analysis is either too slow or too impre-
cise. This is because policies maximizing precision and minimizing cost need to
make adaptive decisions based on high-dimensional abstract elements computed
dynamically during the analysis. Further, the sequence of transformers is usually
quite long in practice. Using our approach, we showed for the first time that re-
inforcement learning can benefit static analysis in [192]. We created approximate
transformers for the Polyhedra domain that enforce different degrees of finer parti-
tions by explicitly removing constraints yielding several approximate transformers
with different speeds and precision. Reinforcement learning then obtains a pol-
icy that selects among different transformers based on the abstract elements. Our
overall approach is presented in detail in Chapter 4.
This approach is helpful when the analysis is inherently non-decomposable or
the partitions with online decomposition become too coarse causing slowdowns.
Our results show that analysis performance improves significantly with up to 550x
speedup without significant precision loss enabling precise Polyhedra analysis of
large programs not possible otherwise. In our follow-up work [96] (not covered in
this thesis), we improve upon this approach by leveraging structured prediction
and also show that our concept of using machine learning for speeding up static
analysis is more general by applying it also to speed up the Octagon domain by
up to 28x over ELINA without losing significant precision.

1.3 fast and precise neural network certification

Neural networks are increasingly deployed for decision making in many real-world
applications such as self-driving cars [30], medical diagnosis [6], and finance [74].
However, recent research [86] has shown that neural networks are susceptible to
undesired behavior in many real-world scenarios posing a threat to their reliability.
Thus there is a growing interest in ensuring they behave in a provably reliable man-
12 introduction

(a) Verified

(b) Counter example

Figure 1.6: Neural network certification problem.

ner. To address this challenge, we designed scalable and precise methods based on
numerical abstract interpretation for certifying the safety of deep neural networks.

problem statement. Fig. 1.6 shows the problem setting for neural network
certification. We are given a set φ of inputs to an already trained neural network f
and a property ψ over the network outputs. Our goal is to prove whether f(φ) ⊆
ψ holds, as in Fig. 1.6 (a) or produce an input i ∈ φ for which the property is
violated, as in Fig. 1.6 (b). φ and ψ are usually chosen by a domain expert in such
a way that a counter-example represents an undesired behavior. An example of
φ is the popular L∞ -norm based region [40] for images. φ here is constructed by
considering all images that can be obtained by perturbing the intensity of each
pixel in a correctly classified image by an amount of  ∈ R independently. ψ, in
this case, is usually classification robustness: all images in φ should be classified
correctly.

dimensions of the problem. Neural network certification is an emerging


inter-disciplinary research direction that has been growing in the last three years.
The problem space is rich and has many dimensions as shown in Fig. 1.7. We note
that the dimensions in the figure are not exhaustive and represent only a subset.
1.3 fast and precise neural network certification 13

For each dimension, the text in green represents cases that we handle in this thesis,
those in blue represent cases handled by our work but not covered in this thesis.
We do not handle the remaining cases currently.
The first dimension that we consider is that of the application domain. The com-
puter vision domain is currently the most popular for neural network certification,
but there is also growing interest in certifying models for natural language process-
ing (NLP), speech, and aviation. To the best of our knowledge, there is no existing
work on certifying neural networks for the remaining domains in Fig. 1.7.
The second dimension is of the specification being certified and has two com-
ponents: the set of inputs φ and the property over network outputs ψ. Our work
focusses on φ defined by changes to pixel or sound intensity, geometric transforma-
tions including rotation, translation, scaling on images, and changes to the sensor
values. For a given φ, the property ψ that we consider can be classification ro-
bustness, safety, or stability. We defined robustness above. Safety means that the
network outputs do not satisfy a given error condition while stability signifies that
the outputs are bounded within a given threshold. Both the error condition and
the threshold are provided by a domain expert.
The particular architecture of the considered neural network is the third dimen-
sion. Our work considers fully-connected (FCN), convolutional (CNN), residual
and recurrent neural networks (RNN) architectures. For such architectures, the
non-linear functions in the hidden layers that we handle are ReLU, Sigmoid, Tanh,
and Maxpool.
There are a variety of methods used in the literature for neural network certifi-
cation such as SMT solvers [37, 69, 113, 114], mixed-integer linear programming
[8, 32, 36, 49, 66, 135, 197], Lipschitz optimization [170], duality [67, 212], convex
relaxations [7, 31, 68, 78, 127, 163, 175, 186, 188, 199, 211, 221], and combination
of relaxations with solvers [187, 206]. Our work is based on custom convex relax-
ations for neural networks defined under the framework of abstract interpretation
and the combination of such relaxations with MILP solvers.
The final dimension is of the type of formal guarantees. We provide deterministic
guarantees meaning that we prove whether the property ψ holds for the entire set
φ or not. We note that there are models such as those in probabilistic forecasting
[62] for which the outputs are probability distributions and probabilistic guarantees
are a natural fit for these.

abstract interpretation for neural network certification Unfor-


tunately, complete certification as shown in Fig. 1.6 is infeasible for larger networks
because of the large number of neurons and non-linear operations. For example,
both the SMT and MILP solvers need to consider two branches per ReLU, which
creates an exponential blowup for hundreds or thousands of ReLUs as is common.
Thus, we focus on incomplete certification. As described later, we also combine
both incomplete and complete certification for achieving state-of-the-art complete
certification.
14 introduction

Figure 1.7: Different dimensions of the neural network certification problem. The text in
green, blue, and black respectively represent cases included in this thesis, those
that we consider in our work but not covered in this thesis, those that are not
considered in our work.

As in Fig. 1.1, we overapproximate f(φ) with g(φ) using abstract interpreta-


tion. Because of the overapproximation, we cannot provide a counterexample as
in Fig. 1.6 (b), i.e., if we fail to prove that f(φ) ⊆ ψ, the status of the problem is
unknown. It can be that either the property does not hold or g(φ) is too impre-
cise. The first work to use abstract interpretation for neural network certification
was that of Gehr et al. [78]. However, they used standard numerical domains used
in program analysis, which are not well suited for neural networks. For example,
even though the Polyhedra domain becomes practically feasible for program analy-
sis via our work on online decomposition, it remains infeasible for neural network
certification as the transformations in the networks create constraints between all
neurons. As a result, the resulting instantiation of abstract interpretation for neural
network certification is either too imprecise or it does not scale to larger networks.

key idea: specialized domains for neural networks. The main chal-
lenge in precisely certifying large neural networks is the fast and precise handling
of the non-linearities employed in the networks. The approximations of these non-
linearities in the implementations of the numerical domains commonly used in
program analysis are either too imprecise or too expensive. For example, in the
setting considered in this work, the input to a ReLU is always bounded while the
ReLU approximations employed in program analysis assume unbounded input.
Therefore to enable fast and precise certification of large neural networks, we de-
signed relaxations tailored for exploiting the setting of neural network certification.

custom zonotope relaxations. In our first work in this direction [186], not
included in this thesis, we designed new parallelizable Zonotope [81] relaxations
tailored for handling the commonly used ReLU, Sigmoid, and Tanh non-linearities
1.3 fast and precise neural network certification 15

in neural networks and provided theoretical guarantees on their optimality. Impor-


tantly, these approximations were sound with respect to floating-point arithmetic.
This means that they always contain all results possible under different rounding
modes and different orders of computations of floating-point operations. The re-
sulting CPU-based analysis verified the robustness of a large image classification
network with > 88K neurons against challenging L∞ -norm based intensity pertur-
bations in about 2 minutes.

deeppoly numerical domain. Our next work [188] is a main contribution of


this thesis. We designed a specialized numerical domain called DeepPoly with cus-
tom parallelizable transformers for handling the ReLU, Sigmoid, Tanh, and Max-
pool non-linearities. We also provided theoretical guarantees on their optimality
and soundness with respect to floating-point arithmetic. Further, we presented the
first method for certifying the robustness of neural networks against geometric per-
turbations such as rotations. The resulting analysis yields more precise and scalable
results than prior work. The DeepPoly domain is covered in detail in Chapter 5. In
ongoing work with a master student, a custom GPU implementation GPUPoly
with specialized GPU algorithms of DeepPoly has also been developed which can
precisely certify a residual network with 34 layers and about 1M neurons against
challenging L∞ -norm based intensity perturbations in approximately 80 seconds.

combining relaxations with solvers. Building on DeepPoly, we de-


signed a new approach in [187] combining the strengths of both approximation-
based methods and precise solvers. Our key observation here is that as the analy-
sis moves deeper into the network, the approximation error accumulates with each
layer causing too much precision loss. We recover precision by calling the solver to
obtain precise results and use those for refining our approximate analysis [186, 188].
To improve the scalability of the solver, we provide it with the bounds computed
by our relaxations. Overall, this improves the precision of incomplete verification
while maintaining sufficient scalability.
We also supply our approximations for refining the problem encoding of the
solver for complete certification. Our approximations reduce the search area for
the solver, e.g., by determining that certain branches are infeasible, thus improv-
ing its speed. For example, the complete certification of an ACAS Xu benchmarks
[110] finished in 10 seconds with our approach, which is about 8x faster than the
previous best.
The refinement method in [187] relies on the MILP-based exact encoding of the
ReLU and does not scale for refining deeper layers in the networks. Refinement
with the existing best convex relaxation of ReLU [175] scales but does not improve
precision. In our most recent work [185], we designed a generic k-ReLU framework
for obtaining more precise convex relaxations of the ReLU than possible with prior
work. The generated relaxations are more scalable than the MILP-based encoding
of ReLU, enabling more effective refinement.
16 introduction

Figure 1.8: ERAN certification framework.

k-ReLU generates relaxations by considering multiple ReLUs jointly, a technique


overlooked by all prior works, thus resulting in relaxations that are more precise
than the prior single neuron based relaxations [175]. We also provide theoretical
guarantees for the optimality of the k-ReLU relaxations. The cost and precision of
the k-ReLU framework are tunable and the technique can be combined for improv-
ing the precision of all existing methods. Refining the results of DeepPoly with
MILP and k-ReLU relaxation yields the most precise and scalable incomplete cer-
tification results reported to date. We describe this approach in greater detail in
Chapter 6.

eran certification framework. All of the above-mentioned methods for


neural network certification are implemented in the ERAN certification framework,
publicly available at https://github.com/eth-sri/eran. Fig. 1.8 depicts its architec-
ture. ERAN is capable of analyzing a wide range of neural network architectures
(fully-connected, convolutional, residual, RNN), non-linearities (ReLU, Sigmoid,
Tanh, Maxpool), datasets (image classification, audio classification, flight sensor
data, drone speed), and φ, ψ specified as polyhedral constraints over inputs and
outputs respectively. We note that the input φ capturing geometric transformations
on images or after the transformations in the audio pre-processing pipeline are not
polyhedral. A pre-analysis [15, 172, 188] is employed for computing a convex relax-
ation of such non-polyhedral input sets.
ERAN supports both complete and incomplete verification. ERAN provides a
number of certification methods with varying degrees of precision and perfor-
mance for incomplete certification. DeepPoly and DeepZ provide the most scalable
and precise certification on a CPU. Certification with the following GPU-based ex-
tension of DeepPoly called GPUPoly yields the most scalable and precise results
1.3 fast and precise neural network certification 17

overall. DeepPoly, DeepZ, and GPUPoly are sound with respect to floating-point
arithmetic which is essential since otherwise the certification results may be wrong
[107]. Both RefineZono and kPoly can be used to refine the results of DeepPoly,
DeepZ, and GPUPoly by incurring an extra cost.
For each certification instance, analysis with ERAN yields one of three outcomes:
(a) proves that the specification holds, (b) returns a counterexample (cex) when it
can determine that a point in the precondition φ violates the postcondition ψ, (c)
otherwise the certification status is unknown which can happen when running
incomplete certification or running complete certification with a time limit.
ERAN can be easily extended for other certification tasks and is currently the
state-of-the-art tool for both complete and incomplete certification of neural net-
works. As a result, ERAN is widely used in many research projects [90, 133, 180].
Beyond certification, our methods have also been used to train state-of-the-art ro-
bust neural networks [148].
PART I
FAST AND PRECISE NUMERICAL
PROGRAM ANALYSIS

19
2
FA S T P O LY H E D R A A N A LY S I S V I A O N L I N E D E C O M P O S I T I O N

We start the thesis with our contributions for fast and precise numerical program
analysis. In this chapter, we formally describe our theoretical framework and new
algorithms for making the exponentially expensive Polyhedra domain practical for
analyzing large real-world programs. We generalize the applicability of our meth-
ods to all subpolyhedra domains in Chapter 3 and provide theoretical guarantees
on the analysis precision. Finally, in Chapter 4, we leverage reinforcement learning
to further improve the performance of numerical program analysis. Overall, our
methods yield many orders of magnitude speedup over prior approaches.
For almost 40 years, program analysis with the Polyhedra domain has been con-
sidered impractical for large real-world programs due to its exponential complex-
ity [51, 122, 176]. In this chapter, we challenge this assumption and present a new
approach that enables the application of Polyhedra for analyzing large, realistic
programs, with speedups ranging between two to five orders of magnitude com-
pared to the state-of-the-art. This allows us to analyze large real-world programs
such as Linux device drivers beyond the reach of existing approaches within a few
seconds. We note that our approach does not lose precision, i.e., it computes the
same invariants as the original analysis.
The work in this chapter was published in [190].

key idea: online decomposition Our key insight is based on the observa-
tion that the set of program variables can be partitioned with respect to the Poly-
hedra generated during the analysis into subsets, called blocks, such that linear
constraints only exist between variables in the same subset [93, 94, 189]. We lever-
age this observation to decompose a large polyhedron into a set of smaller polyhe-
dra, thus reducing the asymptotic complexity of the Polyhedra domain. However,
maintaining decomposition online is challenging because over 40 Polyhedra trans-
formers change the partitions dynamically and in non-trivial ways: blocks in the
partitions can merge, split, grow, or shrink during analysis. Note that an exact par-
tition cannot be computed a priori as it depends on the exact Polyhedra generated
during the analysis. Therefore, static approaches for computing the partition lose
significant precision [27].
To ensure our method does not lose precision, we develop a theoretical frame-
work that asserts how partitions are modified during analysis. We then use our
21
22 fast polyhedra analysis via online decomposition

theory to design new abstract transformers for Polyhedra. Our framework guaran-
tees that the original polyhedron can be recovered exactly from the decomposed
polyhedra at each step of the analysis. Thus our decomposed analysis produces
the same fixpoint and has the same convergence rate as the original analysis. In-
terestingly as we will show in the next chapter, with a non-trivial extension our
framework can be used for decomposing other numerical domains without losing
precision, not only Polyhedra.

main contributions Our main contributions are:

• A theoretical framework for decomposing Polyhedra analysis. This frame-


work allows for efficient maintenance of decomposition throughout the anal-
ysis without losing precision.

• New algorithms for Polyhedra transformers that leverage decomposition


based on the theory. The algorithms are further optimized using novel op-
timizations exploiting sparsity.

• A complete implementation of our Polyhedra transformers in the form of a


state-of-the-art library ELINA publicly available at https://github.com/eth-
sri/ELINA.

• An evaluation of the effectiveness of our approach showing massive gains


in both space and time over state-of-the-art approaches on a large number
of benchmarks, including Linux device drivers. For instance, we obtain a
170x speedup on the largest benchmark containing > 50K lines of code. In
many other cases, the analysis with our approach terminates whereas other
implementations abort without result.

We note that our decomposed Polyhedra domain analysis can be seen as an


instance of cofibred domains from [203].

2.1 background on polyhedra analysis

In this section, we first introduce the necessary background on Polyhedra domain


analysis. We present two ways to represent polyhedra and define the Polyhedra
domain including its transformers. We conclude the section by discussing their
asymptotic complexity.

notation Lower case letters (a, b, . . .) represent column vectors and integers
(g, k, . . .). Upper case letters A, D represent matrices whereas O, P, Q are polyhe-
dra. Greek (α, β, . . .) and calligraphic letters (P, C, . . .) represent scalars and sets
respectively.
2.1 background on polyhedra analysis 23

2.1.1 Representation of Polyhedra

Let x = (x1 , x2 , . . . xn )T be a column vector of program variables. A convex closed


polyhedron P ⊆ Qn that captures linear constraints among variables in x can be
represented in two equivalent ways: the constraint representation and the generator
representation [151]. Both are introduced next.

constraint representation This representation encodes a polyhedron P


as an intersection of:

• A finite number of closed half spaces of the form aT x 6 β.

• A finite number of subspaces of the form dT x = ξ.

Collecting these yields matrices A, D and vectors of rational numbers b, e such that
the polyhedron P can be written as:

P = {x ∈ Qn | Ax 6 b and Dx = e}. (2.1)

The associated constraint set C of P is defined as C = CP = {Ax 6 b, Dx = e}.

generator representation This representation encodes the polyhedron P


as the convex hull of:

• A finite set V ⊂ Qn of vertices vi .

• A finite set R ⊆ Qn representing rays. ri ∈ R are direction vectors of infinite


edges of the polyhedron with one end bounded. The rays always start from
a vertex in V.

• A finite set Z ⊆ Qn representing lines2 . zi ∈ Z are direction vectors of infinite


edges of the polyhedron with both ends unbounded. Each such line passes
through a vertex in V.

Thus, every x ∈ P can be written as:

|V| |R| |Z|


X X X
x= λi v i + µi ri + νi zi , (2.2)
i=1 i=1 i=1

P|V|
where λi , µi > 0, and i=1 λi = 1. The above vectors are the generators of P and
are collected in the set G = GP = {V, R, Z}.

2 one dimensional affine subspaces.


24 fast polyhedra analysis via online decomposition

x2= 4
x2 (1,4) (4,4) x2 x2 = 2.x1

x1 = 1 x1 = 4

(1,2) x2 = 2 (4,2) (1,2) x2 = 2

x1 x1
(a) (b)

Figure 2.1: Two representations of polyhedron defined over variables x1 and x2 . (a)
Bounded polyhedron; (b) unbounded polyhedron.

Example 2.1.1. Fig. 2.1 shows two examples of both representations for polyhedra.
In Fig. 2.1(a) the polyhedron P is bounded and can be represented as either the
intersection of four closed half spaces or as the convex hull of four vertices:

C = {−x1 6 −1, x1 6 4, −x2 6 −2, x2 6 4}, or


G = {V = {(1, 2), (1, 4), (4, 2), (4, 4)}, R = ∅, Z = ∅}.

Note that the sets of rays R and lines Z are empty in this case.
In Fig. 2.1(b), the polyhedron P is unbounded and can be represented either as
the intersection of two closed half planes or as the convex hull of two rays starting
at vertex (1, 2):
C = {−x2 6 −2, x2 6 2 · x1 }, or
G = {V = {(1, 2)}, R = {(1, 2), (1, 0)}, Z = ∅}.
To reduce clutter, we abuse notation and often write P = (C, G) since our algo-
rithms, introduced later, maintain both representations. Both C and G represent
minimal sets, i.e., they do not contain redundancy.

2.1.2 Polyhedra Domain

The Polyhedra domain is commonly used in static analysis to derive invariants


that hold for all executions of the program starting from a given initial state. These
invariants can be used to prove safety properties in programs like the absence of
buffer overflow, division by zero and others [200, 201, 209]. The Polyhedra domain
is a fully relational numerical domain, i.e., it can encode all possible linear con-
straints between program variables. Thus, it is more expressive than weakly rela-
tional domains such as Octagon [142], Pentagon [134] or Zone [140], which restrict
2.1 background on polyhedra analysis 25

if(*) y:=2 · x-1; else y:=2 · x-2;


assert(y<=2 · x);

Figure 2.2: Code with assertion for static analysis.

the set of linear inequalities. The restrictions limit the set of assertions that can be
proved using these domains. For example, the assertion in the code in Fig. 2.2 can-
not be expressed using weakly relational domains whereas Polyhedra can express
and prove the property. The expressivity of the Polyhedra domain comes at higher
cost: it has asymptotic worst-case exponential complexity in both time and space.
The Polyhedra abstract domain consists of the polyhedra lattice (P, v, t, u, ⊥, >)
and a set of transformers. P is the set of convex closed polyhedra ordered by stan-
dard set inclusion: v = ⊆. The least upper bound (t) of two polyhedra P and Q
is the convex hull of P and Q, which, in general, is larger than the union P ∪ Q.
The greatest lower bound (u) of P and Q is simply the intersection P ∩ Q. The top
element > = Qn in the lattice is encoded by C = ∅ or generated by n linearly in-
dependent lines. The bottom element (⊥) is represented by any unsatisfiable set of
constraints in C or with G = ∅.

transformers The transformers used in the Polyhedra domain for program


analysis model the effect of various program statements such as assignments and
conditionals as well as control flow such as loops and branches on the program
states approximated by polyhedra. There are also transformers for checking and
accelerating analysis convergence towards a fixpoint. Overall, a standard imple-
mentation of the Polyhedra domain contains more than 40 transformers [12, 104].
We introduce the most frequently used transformers in Polyhedra domain:
Inclusion test: this transformer tests if P v Q for the given polyhedra P and Q.
Equality test: this transformer tests if two polyhedra P and Q are equal by double
inclusion.
Join: this transformer computes P t Q, i.e., the convex hull of P and Q.
Meet: this transformer computes P u Q = P ∩ Q.
Widening: as the polyhedra lattice has infinite height, the analysis requires widen-
ing to accelerate convergence. The result of the widening transformer [13] P∇Q
contains constraints from CQ that are either present in CP or that can replace a con-
straint in CP without changing P. Using the constraint representation it is defined
as: 
CQ , if P = ⊥;
CP∇Q = (2.3)
CP0 ∪ CQ0 , otherwise;

where:
CP0 = {c ∈ CP | CQ |= c},
0
CQ = {c ∈ CQ | ∃c 0 ∈ CP , CP |= c and ((CP \ c 0 ) ∪ {c}) |= c 0 }.
26 fast polyhedra analysis via online decomposition

P1=⊤
x:=1

P2={x=1}
P1
x:=1; y:=2.x
P2
P3
y:=2.x;
P3={x=1,y=2.x} ⊔ P3’={-x≤-1,x≤2,y=4.x-2}
while(x≤n){ while(x≤n)
P4
x:=x+1;
P5 P4={x=1,y=2.x,x≤n}
y:=y+2.x;
P6 x:=x+1
}
P7 P5={x=2,y=2.x-2,x≤n+1}
y:=y+2.x

P6={x=2,y=4.x-2,x≤n+1}

Figure 2.3: Polyhedra domain analysis (first iteration) on the example program on the left.
The polyhedra are shown in constraint representation.

where C |= c tests whether the constraint c can be entailed by the constraints in C.


Next we introduce the transformers corresponding to program statements. For
simplicity, we assume that the expression δ on the right hand side of both condi-
tional and assignment statements is affine i.e., δ = aT · x + , where a ∈ Qn ,  ∈ Q
are constants. Non-linear expressions can be approximated by affine expressions
using the techniques described in [141].
Conditional: Let ⊗ ∈ {6, =}, 1 6 i 6 n, and α ∈ Q, the conditional statement
α · xi ⊗ δ adds the constraint (α − ai )xi ⊗ δ − ai · xi to the constraint set C.
Assignment: The transformer for an assignment xi := δ first adds a new variable
xi to the polyhedron P and then augments C with the constraint xi0 − δ = 0. The
0

variable xi is then projected out [102] from the constraint set C ∪ {xi0 − δ = 0}. Finally,
the variable xi0 is renamed back to xi .

2.1.3 Polyhedra Domain Analysis: Example

Fig. 2.3 shows a simple program that computes the sum of the first n even num-
bers where a polyhedron P` is associated with each line ` in the program. At the
fixpoint, the polyhedron P` represents invariants that hold for all executions of the
program before executing the statement at line `. Here, we work only with the
constraint representation of polyhedra. The analysis proceeds iteratively by select-
ing the polyhedron at a given line, say P1 , then applying the transformer for the
statement at that program point (x:=1 in this case) on that polyhedron, and produc-
ing a new polyhedron, in this case P2 . The analysis terminates when a fixpoint is
reached, i.e., when further iterations do not add extra points to any polyhedra.
2.1 background on polyhedra analysis 27

first iteration The initial program state does not restrict possible values of
the program variables x, y, n. Thus initially, polyhedra P1 is set to top (>). Next,
the analysis applies the transformer for the assignment x:=1 to P1 , producing P2 .
The set C1 is empty and the transformer adds constraint x = 1 to obtain P2 . The
next statement assigns to y. Since C2 does not contain any constraint involving y,
the transformer for the assignment y:=2·x adds y = 2 · x to obtain P3 . Next, the con-
ditional statement for the loop is processed: that transformer adds the constraint
x 6 n to obtain polyhedron P4 . The assignment statement x:=x+1 inside the loop
assigns to x which is already present in the set C4 . Thus, a new variable x 0 is intro-
duced and constraint x 0 − x − 1 = 0 is added to C4 producing:

C50 = {x = 1, y = 2 · x, x 6 n, x 0 − x − 1 = 0}

The transformer then projects out x from C50 to produce:

C500 = {x 0 = 2, y = 2 · x 0 − 2, x 0 6 n + 1}

Variable x 0 is then renamed to x to produce the final set for P5 :

C5 = {x = 2, y = 2 · x − 2, x 6 n + 1}

The next assignment y:=y+2·x is handled similarly to produce P6 .

next iterations The analysis then returns to the head of the while loop and
propagates the polyhedron P6 to that point. To compute the new program state
at the loop head, it now needs to compute the union of P6 with the previous
polyhedron P3 at that point. Since the union of convex polyhedra is usually not
convex, it is approximated using the join transformer (t) to yield the polyhedron
P30 .
The analysis then checks if the new polyhedron P30 at the loop head is included
in P3 using inclusion testing (v). If yes, then no new information was added and
the analysis terminates. However, here, P30 6v P3 and so the analysis continues. After
several iterations, the widening transformer (∇) may be applied at the loop head
along with the join to accelerate convergence.

2.1.4 Transformers and Asymptotic Complexity

The asymptotic time complexity of Polyhedra transformers depends on how a poly-


hedron is represented, as shown in Table 2.1. In the table, n is the number of vari-
ables, m is the number of constraints in C, g = |V| + |R| + |Z| is the number of
generators in G and LP(m, n) is the complexity of solving a linear program with m
constraints and n variables. For binary transformers like join, meet and others, m
and g denote respectively the maximum of the number of constraints and genera-
tors in P and Q. The column Constraint shows the cost of computing the constraint
28 fast polyhedra analysis via online decomposition

Table 2.1: Asymptotic complexity of Polyhedra operators with different representations.


Operator Constraint Generator Both

Inclusion (v) O(m· LP(m, n)) O(g· LP(g, n)) O(n · g · m)


n+1
Join (t) O(n · m2 ) O(n · g) O(n · g)
n+1
Meet (u) O(n · m) O(n · g2 ) O(n · m)
Widening (∇) O(m· LP(m, n)) O(g· LP(g, n)) O(n · g · m)
n+1
Conditional O(n) O(n · g2 ) O(n)
Assignment O(n · m2 ) O(n · g) O(n · g)

set for the result starting from the input constraint set(s); the column Generator has
similar meaning for the generators. The column Both shows the asymptotic cost of
computing at least one of the representations for the result when both representa-
tions are available for the input(s).

transformers vs. representations Table 2.1 shows transformers, such as


meet (u), are considerably more efficient to compute using the constraint represen-
tation whereas other transformers, such as join (t), are cheaper using the generator
representation. Transformers such as inclusion testing (v) are most efficient when
one of the two participating polyhedron is represented via constraints and the other
via generators. As a result, popular libraries such as NewPolka [104] and PPL [12]
maintain both representations of polyhedra during analysis. We follow the same
approach here and thus each polyhedron P is represented as P = (C, G).
Maintaining both representations requires conversion. For example, the meet of
two polyhedra can be efficiently computed by taking the union of the respective
constraints. Conversion is then required to compute the corresponding generator
representation of the result. As is common, we use Chernikova’s [50, 123] algorithm
but with our own optimized implementation (Section 2.4.1) for converting from the
constraint to the generator representation and vice-versa. The conversion algorithm
also minimizes both representations. We note that other conversion algorithms [2,
10, 75] can also be used.

conversion between representations When both representations are av-


ailable, all Polyhedra transformers become polynomial (last column of Table 2.1)
and Chernikova’s algorithm becomes the bottleneck for the analysis as it has worst
case exponential complexity for conversion in either direction. We refer the reader
to [50, 123] for details of the algorithm. There are two approaches for reducing the
cost of these conversions: lazy and eager.
The lazy approach computes the conversion only when required to amortize
the cost over many operations. For example, in Fig. 2.3, there are a number of
conditional checks and assignments in succession so one can keep working with
the constraint representation and compute the generator one only when needed
2.2 polyhedra decomposition 29

(e.g., at the loop head when join is needed). The eager approach computes the
conversion after every operation. Chernikova’s algorithm is incremental, which
means that for transformers which add constraints or generators such as meet (u),
join (t), conditional and others, the conversion needs to be computed only for
the added constraints or generators. Because of this, in some cases eager can be
faster than lazy. While our transformers and algorithms are compatible with both
approaches, we use the eager approach in this work.

2.2 polyhedra decomposition

We next present our key insight and show how to leverage it to speed up program
analysis using Polyhedra. Our observation is that the set of program variables can
be partitioned into smaller subsets with respect to the polyhedra arising during
analysis such that no constraints exist between variables in different subsets. This
allows us to decompose a large polyhedron into a set of smaller polyhedra, which
reduces the space complexity of the analysis. For example, the n-dimensional hy-
percube requires 2n generators whereas with decomposition only 2n generators are
required. The original polyhedron can be recovered exactly using the decomposed
Polyhedra; thus, analysis precision is not affected. Further, the decomposition al-
lows the expensive polyhedra transformers to operate on smaller polyhedra, thus
reducing their time complexity without losing precision.
We first introduce our notation for partitions. Then, we introduce the theoretical
underpinning of our work: the interaction between the Polyhedra domain trans-
formers and the partitions.

2.2.1 Partitions

Let X = {x1 , x2 , . . . , xn } be the set of n variables. For a given polyhedron, X can be


partitioned into subsets Xk we call blocks such that constraints only exist between
variables in the same block. Each unconstrained variable xi yields a singleton block
{xi }. We refer to this unique, finest partition as π = πP = {X1 , X2 , . . . , Xr }.
Example 2.2.1. Consider

X = {x1 , x2 , x3 } and
P = {x1 + 2 · x2 6 3}.

Here, X is partitioned into two blocks: X1 = {x1 , x2 } and X2 = {x3 }. Now consider

P = {x1 + 2 · x2 6 3, 3 · x2 + 4 · x3 6 1}.

Here, the partition of X has only one block X1 = {x1 , x2 , x3 }.


The partition πP decomposes the polyhedron P defined over X into a set of
smaller polyhedra Pk which we call factors. Each factor Pk is defined only over
30 fast polyhedra analysis via online decomposition

the variables in Xk . The polyhedron P can be recovered from the factors Pk by com-
puting the union of the constraints CPk and the Cartesian product of the generators
GPk . For this, we introduce the ./ transformer defined as:

P = P1 ./ P2 ./ . . . ./ Pr
(2.4)
= (CP1 ∪ CP2 . . . ∪ CPr , GP1 × GP2 . . . × GPr ).

Example 2.2.2. The polyhedron P in Fig. 2.1 (a) has no constraints between vari-
ables x1 and x2 . Thus, X = {x1 , x2 } can be partitioned into blocks: πP = {{x1 }, {x2 }}
with corresponding factors P1 = (CP1 , GP1 ) and P2 = (CP2 , GP2 ) where:

CP1 = {−x1 6 −1, x1 6 4} CP2 = {−x2 6 −2, x2 6 4}


GP1 = {{(1), (4)}, ∅, ∅} GP2 = {{(2), (4)}, ∅, ∅}

The original polyhedron can be recovered from P1 and P2 as P = P1 ./ P2 =


(CP1 ∪ CP2 , GP1 × GP2 ).

The set L consisting of all partitions of X forms a partition lattice (L, v, t, u, ⊥, >).
The elements π of the lattice are ordered as follows: π v π 0 , if every block of π is
included in some block of π 0 (π "is finer" than π 0 ). This lattice contains the usual
transformers of least upper bound (t) and greatest lower bound (u). In the partition
lattice, > = {X} and ⊥ = {{x1 }, {x2 }, . . . , {xn }}.

Example 2.2.3. For example,

{{x1 , x2 }, {x3 }, {x4 }, {x5 }} v {{x1 , x2 , x3 }, {x4 }, {x5 }}

Now consider,
π = {{x1 , x2 }, {x3 , x4 }, {x5 }} and
π 0 = {{x1 , x2 , x3 }, {x4 }, {x5 }}
Then,
π t π 0 = {{x1 , x2 , x3 , x4 }, {x5 }} and
π u π 0 = {{x1 , x2 }, {x3 }, {x4 }, {x5 }}
Definition 2.2.1. We call a partition π permissible for P if there are no variables xi
and xj in different blocks of π related by a constraint in P, i.e., if π w πP .

Note, that the finest partition π> for the top (>) and the bottom (⊥) polyhedra is
the bottom element in the partition lattice, i.e., π> = π⊥ = ⊥. Thus, every partition
is permissible for these.

2.2.2 Transformers and Partitions

The decomposed transformers require the computation of the output partition be-
fore the output is actually computed. We next show how the output partitions
2.2 polyhedra decomposition 31

are computed. The optimality of the computed partitions depends upon the de-
gree to which the polyhedra are observed. The finest partition for the output of a
Polyhedra transformer can always be computed from scratch by taking the output
polyhedron and connecting the variables that occur in the same constraint in that
polyhedron. However, this nullifies any performance gains as standard transformer
needs to be applied for computing the output polyhedron. For efficiency, we com-
pute the output partitions based on limited observation of the inputs. The partition
for the output of transformers such as meet, conditionals, assignment, and widen-
ing is computed from the corresponding partitions of input polyhedra P and Q.
For the join however, to ensure we do not end up with a trivial and imprecise par-
tition, we need to examine P and Q (discussed later in the section). Our approach
to handling the join partition is key to achieving significant analysis speedups.
We also note that the same polyhedron can have multiple constraint representa-
tions with different finest partitions as shown in the example below.
Example 2.2.4. The polyhedron P = {x1 = 0, x2 = 0} has the partition πP =
{{x1 }, {x2 }}. This polyhedron can also be represented as P 0 = {x1 = 0, x2 = x1 } with
the associated partition πP 0 = {{x1 , x2 }}. Here P = P 0 but πP 6= πP 0 .
We note that the conversion algorithm performs transformations to change the
polyhedron representation. The exact output after such transformations depends
on the polyhedron and cannot be determined statically. In this work, we do not
model the effect of such transformations.
We next provide optimal partitions under our observation model. For polyhe-
dron P, we denote the associated optimal partition in our model as πobsP . We will
present a refinement of our observation model to obtain finer output partitions at
small extra cost in Section 3.4. We will also present conditions when πobs
P = πP .

meet The constraint set for the meet P u Q is the union CP ∪ CQ . Thus, overlap-
ping blocks Xi ∈ πP and Xj ∈ πQ will merge into one block in πPuQ . This yields
Lemma 2.2.1. Let P and Q be two polyhedra with P u Q 6= ⊥. Then πPuQ v πobs
PuQ =
πP t πQ .

conditional and assignment The conditional and assignment statements


(xi := δ and α · xi ⊗ δ) create new constraints between program variables. Thus, to
compute the partitions for the outputs of these transformers, we first compute a
block B which contains all variables affected by the statement. Let E be the set of
all variables xj with aj 6= 0 in δ = aT · x + , then B = E ∪ {xi }. To express the fusion
incurred by B, we introduce the following:
Definition 2.2.2. Let π be a partition of X and B ⊆ X, then π ↑ B is the finest
partition π 0 such that π v π 0 and B is a subset of an element of π 0 .
As discussed, the transformer for the conditional statement α · xi ⊗ δ adds con-
straint (α − ai ) · xi ⊗ δ − ai · xi to CP to produce the set CO for the output O. Thus,
32 fast polyhedra analysis via online decomposition

in πO , all blocks Xi ∈ πP that overlap with B will merge into one, whereas non-
overlapping blocks remain independent. Thus, we get the following lemma for
calculating πO .
Lemma 2.2.2. Let P be the input polyhedra and let B be the block corresponding
O = πP ↑ B.
to the conditional α · xi ⊗ δ. If O 6= ⊥, then πO v πobs
πO for the output O of the transformer for the assignment xi := δ can be com-
puted similarly to that of the conditional transformer.
Lemma 2.2.3. Let P be the input polyhedra and let B be the block corresponding
O = πP ↑ B.
to an assignment xi := δ. Then πO v πobs

widening Like the join, the partition for widening (∇) depends not only on
partitions πP and πQ , but also on the exact form of P and Q. By definition, the
constraint set for P∇Q contains only constraints from Q. Thus, the partition for
P∇Q satisfies
Lemma 2.2.4. For polyhedra P and Q, πP∇Q v πobs
P∇Q = πQ .
Note that the widening transformer can potentially remove all constraints con-
taining a variable, making the variable unconstrained. Thus, in general, πP∇Q 6= πQ .

join Let CP = {A1 · x 6 b1 } and CQ = {A2 · x 6 b2 }3 and Y = {x10 , x20 , . . . , xn0 , λ},
then the constraint set CPtQ for the join of P and Q can be computed by projecting
out variables yi ∈ Y from the following set S of constraints:
S = {A1 · x 0 6 b1 · λ, A2 · (x − x 0 ) 6 b2 · (1 − λ), −λ 6 0, λ 6 1}. (2.5)
The Fourier-Motzkin elimination algorithm [102] is used for this projection. The
algorithm starts with S0 = S and projects out variables iteratively one after another
so that CPtQ = Sn+1 . Let Si−1 be the constraint set obtained after projecting out the
first i − 1 variables in Y. Then yi ∈ Y is projected out to produce Si as follows:
S+
yi = {c | c ∈ Si−1 and ai > 0},
S−
yi = {c | c ∈ Si−1 and ai < 0},
S0yi = {c | c ∈ Si−1 and ai = 0}, (2.6)

yi = {µ · c1 + ν · c2 | (c1 , c2 ) ∈ Syi × Syi and µ · a1i + ν · a2i = 0},
+ −

Si = S0yi ∪ S±
yi .

Each iteration can potentially produce a quadratic number of new constraints,


many of which are redundant. The redundant constraints are removed for effi-
ciency.
The partition of P t Q depends in non-trivial ways on P and Q. In particular,
πPtQ has no general relationship to either πP t πQ or πP u πQ . The following exam-
ple illustrates this:
3 We assume equalities are encoded as symmetric pairs of opposing inequalities for simplicity.
2.2 polyhedra decomposition 33

x2 x2=4 x2 x2=4

Q2 x1=2.x Q x2=-2.x1+10 Q2 Q x1=3


x1=2
1
x1=2 x1=3
P2 x2=2 P2 x2=2
x1=1 P x1=4 x1=1 P x1=4
x2=1 x2=1
P1 x1 P1 x1
Q1 Q1
(a) (b)

Figure 2.4: Two examples of P t Q with πP = πQ = {{x1 }, {x2 }}. (a)P1 6= Q1 , P2 6= Q2 ; (b)
P1 = Q1 , P2 6= Q2 .

Example 2.2.5. Let

P = {{x1 − x2 6 0, x1 6 0}, {x3 = 1}} and


Q = {{x1 6 2}, {x3 = 0}} with
πP = {{x1 , x2 }, {x3 }} and
πQ = {{x1 }, {x2 }, {x3 }}.

In this case we have,

P t Q = {{x1 + 2 · x3 6 2, −x3 6 0, x3 6 1}} and


πPtQ = {{x1 , x3 }, {x2 }}.

However,
πP t πQ = {{x1 , x2 }, {x3 }} and
πP u πQ = {{x1 }, {x2 }, {x3 }}.
Thus, neither πP t πQ nor πP u πQ are permissible partitions for P t Q.

The theorem below identifies a case which enables us to compute a non-trivial


permissible partition for P t Q. The theorem states that we can “transfer” a block
from the input partitions to the output partition under certain conditions. It is a
key enabler for the speedups shown later.

Theorem 2.2.5. Let P and Q be two polyhedra with the same permissible partition π =
{X1 , X2 , . . . , Xr } and let π 0 be a permissible partition for the join, that is, πPtQ v π 0 . If for
any block Xk ∈ π, Pk = Qk , then Xk ∈ π 0 .

Proof. Since both P and Q are partitioned according to π, the constraint set in (2.5)
can be written for each Xk separately:

{A1k · xk0 6 b1k · λ, A2k · (xk − xk0 ) 6 b2k · (1 − λ), −λ 6 0, λ 6 1}. (2.7)
34 fast polyhedra analysis via online decomposition

where xk is column vector for the variables in Xk . λ occurs in the constraint set for
all blocks. For proving the theorem, we need to show that no variable in Xk will
have a constraint with a variable in Xk 0 ∈ π after join. The variables in Xk can have
a constraint with the variables in Xk 0 only by projecting out λ. Since Pk = Qk , CPk
and CQk are equivalent, we can assume A1k = A2k and b1k = b2k .4 Inserting this
into (2.7) we get

{A1k · xk0 6 b1k · λ, A1k · (xk − xk0 ) 6 b1k · (1 − λ), −λ 6 0, λ 6 1}. (2.8)

The result of the projection is independent of the order in which the variables are
projected out. Thus, we can project out λ last. For proving the theorem, we need to
show that it is possible to obtain all constraints for CPk tQk before projecting out λ
in (2.8). We add A1k · xk0 6 b1k · λ and A1k · (xk − xk0 ) 6 b1k · (1 − λ) in (2.8) to project
out all xk0 and obtain:

{A1k · xk 6 b1k , −λ 6 0, λ 6 1}. (2.9)

Note that the constraint set in (2.9) does not contain all constraints generated by
the Fourier-Motzkin elimination. Since Pk = Pk t Pk , we have CPk tQk = CPk and CPk
is included in the constraint set of (2.9); thus, the remaining constraints generated
by the Fourier-Motzkin elimination are redundant. In (2.9), all constraints among
the variables in Xk are free from λ; therefore, projecting out λ does not create new
constraints for the variables in Xk . Thus, there cannot be any constraint from a
variable in Xk to a variable in Xk 0 .
The proof of the theorem also yields the following result.

Corollary 2.2.1. If Theorem 2.2.5 holds, then Pk (and Qk ) is a factor of P t Q.

Example 2.2.6. Fig. 2.4 shows two examples of P t Q where both P and Q have the
same partition πP = πQ = {{x1 }, {x2 }}. In Fig. 2.4(a),

P = {{x1 = 1, x1 = 4}, {x2 = 1, x2 = 2}},


Q = {{x1 = 2, x1 = 3}, {x2 = 2, x2 = 4}}.

In this case, P1 6= Q1 and P2 6= Q2 ; thus, P t Q contains constraints x2 = 2 · x1 and


x2 = −2 · x1 + 10 relating x1 and x2 , i.e., πPtQ = {{x1 , x2 }}.
In Fig. 2.4(b),
P = {{x1 = 1, x1 = 4}, {x2 = 1, x2 = 2}},
Q = {{x1 = 1, x1 = 4}, {x2 = 2, x2 = 4}}.
In this case, P1 = Q1 . Thus, by Theorem 2.2.5, {x1 } ∈ πPtQ , i.e., πPtQ = {{x1 }, {x2 }}.

4 One can always perform a transformation so that A1k = A2k and b1k = b2k holds.
2.3 polyhedra domain analysis with partitions 35

2.3 polyhedra domain analysis with partitions

After presenting the theoretical background, we now discuss how we integrate


partitioning in the entire analysis flow. The basic idea is to perform the analysis
while maintaining the variable set partitioned, and thus the occurring polyhedra
decomposed, as fine-grained as possible. The results from the previous section
show that the main Polyhedra transformers can indeed maintain the partitions,
even though these partitions change during the analysis. Crucially, under certain
assumptions, even the join produces a non-trivial partitioned output. Note that
there are no guarantees that for a given program, the partitions do not become
trivial (i.e., equal to {X}); however, as our results later show, this is typically not the
case and thus significant speedups are obtained. This should not be surprising: in
complex programs, not all variables used are related to each other. For example, the
individual conditional and assignment statements are usually defined over only a
few variables. Similarly, the assumption for Theorem 2.2.5 usually holds as P could
be the polyhedron at the loop head and Q the polyhedron at the loop exit. Since
program loops modify only a few variables, the blocks of X unaffected by the loop
have equal factors in P and Q. However, there will be groups of variables that
indeed develop relationships and these groups may change during execution. Our
approach identifies and maintains such groups.

maintaining precision We emphasize that partitioning the variable set and


thus decomposing polyhedra and transformers working on polyhedra, does not
affect the overall precision of the result. That is, we neither lose nor gain precision
in our analysis compared to prior approaches which do not use online partition-
ing. The granularity of a partition only affects the cost, i.e., runtime and memory space,
required for the analysis, but not the precision of its results.
We now briefly discuss the data structures used for polyhedra and the main-
tenance of permissible partitions throughout the analysis. For the remainder of
the chapter, permissible partitions will be denoted with πP w πP . The following
sections then provide more details on the respective transformers.

2.3.1 Polyhedra Encoding

For a given polyhedron, NewPolka and PPL store both, the constraint set C and
the generator set G, each represented as a matrix. We follow a similar approach
adapted to our partitioned scenario. Specifically, assume a polyhedron P with per-
missible partition πP = {X1 , X2 , . . . , Xr }, i.e., associated factors {P1 , P2 , . . . , Pr }, where
Pk = (CPk , GPk ). The blocks of πP are stored as a linked list of variables and the poly-
hedron as a linked list of factors. Alternatively, trees can also be used. Each factor
is stored as two matrices. We do not explicitly store the factors and the blocks for
the unconstrained variables. For example, > is stored as ∅.
36 fast polyhedra analysis via online decomposition

Table 2.2: Asymptotic time complexity of Polyhedra operators with decomposition.


Operator Decomposed
P
Inclusion (v) O( ri=1 ni · gi · mi )
P
Join (t) O( ri=1 ni · gi · mi + nmax · gmax )
P
Meet (u) O( ri=1 ni · mi )
P
Widening (∇) O( ri=1 ni · gi · mi )
Conditional O(nmax )
Assignment O(nmax · gmax )

2.3.2 Transformers and Permissible Partitions

The results in Section 2.2.2 calculated for each input polyhedra P, Q with partitions
πP , πQ either the best (finest) or a permissible partition of the output polyhedron
O of a transformer. Inspection shows that each result can be adapted to the case
where the input partitions are only permissible. In this case, the output partition is
likewise only permissible.

Lemma 2.3.1. Given permissible input partitions πP and πQ , Lemmas 2.2.1–2.2.4


and Theorem 2.2.5 yield permissible partitions for the outputs of transformers.
Specifically, using prior notation:

i) Meet: πPuQ = πP t πQ is permissible if P u Q 6= ⊥, otherwise ⊥ is permissible.

ii) Conditional: πP ↑ B is permissible if O 6= ⊥, otherwise ⊥ is permissible.

iii) Assignment: πP ↑ B is permissible.

iv) Widening: πP∇Q = πQ is permissible.

v) Join: Let π = πP t πQ and U = {Xk | Pk = Qk , Xk ∈ π}. Then the following is


permissible: [
πPtQ = U ∪ A
A∈π\U

Table 2.2 shows the asymptotic time complexity of the Polyhedra transformers
decomposed with our approach. For simplicity, we assume that for binary trans-
formers both inputs have the same partition. In the table, r is the number of blocks
in the partition, ni is the number of variables in the i-th block, gi and mi are the
number of generators and constraints in the i-th factor respectively. It holds that
P P Q
n = ri=1 ni , m = ri=1 mi and g = ri=1 gi . We denote the number of variables
and generators in the largest block by nmax and gmax , respectively. Since we follow
the eager approach for conversion, both representations are available for inputs,
i.e., the second column of Table 2.2 corresponds to column Both in Table 2.1. We do
not show the cost of conversion.
2.3 polyhedra domain analysis with partitions 37

P1 :>
x:=5;
P2 :{{x = 5}}
u:=3;
P3 :{{x = 5}, {u = 3}}
if(x==y){
P4 :{{x = 5, x = y}, {u = 3}}
x:=2 · y;
P5 :{{y = 5, x = 2 · y}, {u = 3}}
}
P6 :{{−x 6 −5, x 6 10}, {u = 3}}
if(u==v){
P7 :{{−x 6 −5, x 6 10}, {u = 3, u = v}}
u :=3 · v;
P8 :{{−x 6 −5, x 6 10}, {v = 3, u = 3 · v}}
}
P9 :{{−x 6 −5, x 6 10}, {−u 6 −3, u 6 9}}
z:=x + u;
P10 :{{−x 6 −5, x 6 10, −u 6 −3, u 6 9, z = x + u}}

Figure 2.5: Example of complexity reduction through decomposition for Polyhedra analy-
sis on an example program.

Fig. 2.5 shows a representative program annotated with Polyhedra invariants at


each program point. The program contains five variables u, v, x, y, z and has two
conditional if-statements. It can be seen that the Polyhedra at different program
points can be decomposed and thus the Polyhedra transformers benefit from the
complexity reduction. For example, the assignment transformer for x:=2y and the
conditional transformer for x==y need to operate only on the factor corresponding
to the block {x, y}. The assignment transformer for u:=3v and the conditional trans-
former for u==v benefit similarly. Also note that the two blocks for the if statements
modify only the variables {x, y} and {u, v} respectively. Thus the factors correspond-
ing to the remaining variables remain equal in the inputs to the corresponding join.
As a result, both join outputs P6 and P9 are also decomposed. We next discuss the
algorithms for the core transformers using partitions.
38 fast polyhedra analysis via online decomposition

2.4 polyhedra transformers

In this section, we describe our algorithms for the main Polyhedra transformers.
For each transformer, we first describe the base algorithm, followed by our adapta-
tion of that algorithm to use partitions. We also discuss useful code optimizations
for our algorithms. We follow an eager approach for the conversion; thus, the in-
puts and the output have both C and G available. Our choice allows us to always
apply the conversion incrementally for the expensive meet, conditional, and join
transformers while with the lazy approach it is possible that this cannot be done.
Join is the most challenging transformer to adapt with partitions as the partition
for the output depends on the exact form of the inputs. Our algorithms rely on two
auxiliary transformers, conversion and refactoring, which we describe first.

2.4.1 Auxiliary Transformers

We apply code optimizations to leverage sparsity in the conversion algorithm


which makes our conversion faster. Refactoring is frequently required by our al-
gorithms to make the inputs conform to the same partition.

conversion transformer An expensive step in Chernikova’s algorithm is


the computation of a matrix-vector product which is needed at each iteration of
the algorithm. We observed that the vector is usually sparse, i.e., it contains mostly
zeros; thus, we need to consider only those entries in the matrix which can be
multiplied with the non-zero entries in the vector. Therefore at the start of each
iteration, we compute an index for the non-zero entries of the vector. The index is
discarded at the end of the iteration. This code optimization significantly reduces
the cost of conversion.
We also vectorized the matrix-vector product using the single intrustion, multiple
data (SIMD) based AVX intrinsics; however, it does not provide as much speedup
compared to leveraging sparsity with the index.

refactoring Let P and Q be defined over the same set of variables X =


{x1 , x2 , . . . , xn }, and let πP = {XP1 , XP2 , . . . , XPp }, πQ = {XQ1 , XQ2 , . . . , XQq } be per-
missible partitions for P and Q respectively and B ⊆ X. Usually πP 6= πQ ; thus, an
important step for the transformers such as meet, inclusion testing, widening and
join is refactoring the inputs P and Q so that the factors correspond to the same
partition π which is simply the least upper bound πP t πQ .
Similarly, usually B 6∈ πP for the conditional and the assignment transformers.
Thus, P is refactored according to π = πP ↑ B.
P is refactored by merging all factors Pi whose corresponding blocks XPi are
included inside the same block Xj of π. The merging is performed using the ./
2.4 polyhedra transformers 39

Algorithm 2.1 Refactor P with partition πP according to π


1: function refactor(P, πP , π)
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: πP ← {XP1 , XP2 , . . . , XPp }
5: π ← {X1 , X2 , . . . , Xr }
6: for k ∈ {1, 2, . . . , r} do
7: Pk0 := >
8: end for
9: for i ∈ {1, 2, . . . , p} do
10: k := j, s.t., XPi ⊆ Xj , Xj ∈ π
11: Pk0 := Pk0 ./ Pi
12: end for
13: P 0 := {P10 , P20 , . . . , Pr0 }
14: return P 0
15: end function

transformer defined in (2.4). Refactoring is shown in Algorithm 2.1. We will use r


to denote the number of blocks in π.

Example 2.4.1. Consider5 :

X = {x1 , x2 , x3 , x4 , x5 , x6 },
P = {{x1 = x2 , x2 = 2}, {x3 6 2}, {x5 = 1}, {x6 = 2}},
Q = {{x1 = 2, x2 = 2}, {x3 6 2}, {x5 = 2}, {x6 = 3}}, with
πP = {{x1 , x2 }, {x3 , x4 }, {x5 }, {x6 }} and
πQ = {{x1 , x2 , x4 }, {x3 }, {x5 }, {x6 }}.

In this case, π is:


π = πP t πQ = {{x1 , x2 , x3 , x4 }, {x5 }, {x6 }}.
We find that both blocks πP1 = {x1 , x2 } and πP2 = {x3 , x4 } of πP are included in
the first block of πP t πQ ; thus, P1 and P2 are merged using the ./ transformer. We
merge Q1 and Q2 similarly. The resulting P 0 and Q 0 are shown below:

P 0 = {P1 ./ P2 , {x5 = 1}, {x6 = 2}} and


Q 0 = {Q1 ./ Q2 , {x5 = 2}, {x6 = 3}}

where,
P1 ./ P2 = {x1 = x2 , x2 = 2, x3 6 2} and
Q1 ./ Q2 = {x1 = 2, x2 = 2, x3 6 2}

After explaining refactoring, we now present our algorithms for the Polyhedra
transformers with partitions.

5 We show only constraints for simplicity.


40 fast polyhedra analysis via online decomposition

2.4.2 Meet (u)

For the double representation, the constraint set CO of the output O = P u Q is the
union of the constraints of the inputs P and Q, i.e., CPuQ = CP ∪ CQ . GO is obtained
by incrementally adding the constraints in CQ to the polyhedron defined by GP
through the conversion transformer. If the conversion returns GO = ∅, then CO is
unsatisfiable and thus O = ⊥.

meet with partitions Our algorithm first computes the common partition
πP t πQ . P and Q are then refactored according to this partition using Algorithm
2.1 to obtain P 0 and Q 0 . If Pk0 = Qk0 , then CP 0 ∪ CQ 0 = CP 0 and no conversion
k k k
is required and we simply add Pk0 to O. If Pk0 6= Qk0 we add CP 0 ∪ CQ 0 to CO .
k k
Next the constraints in CQ 0 are incrementally added to the polyhedron defined
k
by GP 0 through the conversion transformer obtaining GP 0 uQ 0 . If the conversion
k k k
algorithm returns GP 0 uQ 0 = ∅, then we set O = ⊥. We know from Section 2.3 that
k k
πO = πP t πQ if O 6= ⊥, otherwise πO = ⊥.

code optimization CP 0 and CQ 0 usually contain a number of common con-


k k
straints. The generators in GP 0 already correspond to the constraints that occur in
k
both CP 0 and CQ 0 . Thus, these constraints need not be considered for the conver-
k k
sion which reduces its cost.
The check for common constraints can create an overhead as in the worst case
we have to compare each vector in CQ 0 with all vectors in CP 0 . To reduce this
k k
overhead, for a given vector in CQ 0 , we keep track of the vector index which caused
k
the equality check to fail for the previous vector in CP 0 . For the next vector in CP 0 ,
k k
we first compare the vector values at this index as the next vector, if not equal, is
also likely to fail this check. The pseudo code for our meet transformer is shown in
Algorithm 2.2. We omit tracking of the vector index in Algorithm 2.2 for simplicity.

2.4.3 Inclusion (v)

For the double representation, P v Q holds if all generators in GP satisfy all con-
straints in CQ . A vertex v ∈ VP satisfies the constraint set CQ if A · v 6 b and
D · v = e. A ray r ∈ RP satisfies CQ if A · r 6 0 and D · r = 0. A line z ∈ ZP satisfies
CQ if A · z = 0 and D · z = 0.

inclusion testing with partitions In our algorithm, we refactor P and Q


according to the same partition πP t πQ . We only refactor the generators of P and
the constraints of Q according to πP t πQ , obtaining GP 0 and CQ 0 respectively. We
then check for each block Xk in πP t πQ if all generators in GP 0 satisfy CQ 0 .
k k
2.4 polyhedra transformers 41

Algorithm 2.2 Decomposed Polyhedra meet


1: function Meet(P, Q, πP , πQ )
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: Q ← {Q1 , Q2 , . . . , Qq }
5: πP ← {XP1 , XP2 , . . . , XPp }
6: πQ ← {XQ1 , XQ2 , . . . , XQq }
7: 0
P := refactor(P, πP , πP t πQ )
8: Q 0 := refactor(Q, πQ , πP t πQ )
9: O=∅
10: for k ∈ {1, 2, . . . , r} do
11: if Pk0 = Qk0 then
12: O.add(Pk0 )
13: else
14: C := remove_common_con(CP 0 ∪ CQ 0 )
k k
15: G := incr_chernikova(C, CP 0 , GP 0 )
k k
16: if G = ∅ then
17: O := ⊥
18: πO := ⊥
19: return (O, πO )
20: end if
21: O.add((C, G))
22: end if
23: end for
24: πO := πP t πQ
25: return (O, πO )
26: end function

code optimization The result of the inclusion testing transformer is usually


negative, so we first check the smaller factors for inclusion. Thus, the factors are
sorted in the order given by the product of the number of generators in GP 0 and
k
the number of constraints in CQ 0 . The pseudo code for our inclusion testing trans-
k
former is shown in Algorithm 2.3.

2.4.4 Conditional

For the double representation, the transformer for the conditional statement α · xi ⊗
δ adds the constraint c = (α − ai ) · xi ⊗ δ − ai · xi to the constraint set CP , producing
CO . GO is obtained by incrementally adding the constraint c to the polyhedron
defined by GP through the conversion transformer. The conversion returns GO = ∅,
if CO is unsatisfiable and thus we get O = ⊥.

conditional transformer with partitions Our algorithm refactors P


according to πP ↑ B, producing P 0 . The constraint c is added to the constraint set
CP 0 of the factor corresponding to the block Xk ∈ πP ↑ B containing B, producing
k
COk . GOk is obtained by incrementally adding the constraint c to the polyhedron
42 fast polyhedra analysis via online decomposition

Algorithm 2.3 Decomposed inclusion testing for Polyhedra


1: function Inclusion(P, Q, πP , πQ )
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: Q ← {Q1 , Q2 , . . . , Qq }
5: πP ← {XP1 , XP2 , . . . , XPp }
6: πQ ← {XQ1 , XQ2 , . . . , XQq }
7: GP 0 := refactor_gen(GP , πP , πP t πQ )
8: CQ 0 := refactor_con(CQ , πQ , πP t πQ )
9: sort_by_size(GP 0 , CQ 0 )
10: for k ∈ {1, 2, . . . , r} do
11: if Pk0 6v Qk0 then
12: return false
13: end if
14: end for
15: return true
16: end function

Algorithm 2.4 Decomposed conditional transformer for Polyhedra


1: function Conditional(P, πP , stmt)
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: πP ← {XP1 , XP2 , . . . , XPp }
5: stmt ← α · xi ⊗ δ
6: B := extract_block(stmt)
7: P 0 := refactor(P, πP , πP ↑ B)
8: O := ∅
9: πO := πP ↑ B
10: for k ∈ {1, 2, . . . , r} do
11: if B ⊆ πOk then
12: C := CP 0 ∪ {(α − ai ) · xi ⊗ δ − ai · xi }
k
13: G := incr_chernikova(C, CP 0 , GP 0 )
k k
14: if G = ∅ then
15: O := ⊥
16: πO := ⊥
17: return (O, πO )
18: end if
19: O.add((C, G))
20: else
21: O.add(Pk0 )
22: end if
23: end for
24: return (O, πO )
25: end function

defined by GP 0 . If the conversion algorithm returns GOk = ∅, then we set O = ⊥. As


k
shown in Section 2.3, πO = πP ↑ B if O 6= ⊥, otherwise πO = ⊥. The pseudo code
for our conditional transformer is shown in Algorithm 2.4.
2.4 polyhedra transformers 43

2.4.5 Assignment

In Section 2.1.2, the transformer for the assignment xi := δ, where δ = aT · x + ,


was defined using the constraint set CP of P. For the double representation, the
transformer works on the generator set GP = {VP , RP , ZP }. The generators GO =
{VO , RO , ZO } for the output are given by:

VO = {v 0 | vi0 = aT · v + , v ∈ VP },
RO = {r 0 | ri0 = aT · r, r ∈ RP }, (2.10)
ZO = {z 0 | zi0 = aT · z, z ∈ ZP }.

If the assignment is invertible, i.e., if ai 6= 0 (for example x:=x+1), the constraint


set CO can be calculated by backsubstitution. Let xi0 be the new value of xi after
P
assignment, then xi0 = aT · x + . Thus, putting xi = (xi0 − j6=i aj · xj − )/ai for
xi in all constraints of the set CP = {A · x 6 b, D · x = e} and renaming xi0 to xi ,
we get the constraint set CO . For the non-invertible assignments, the conversion
transformer is applied on all generators in GO .

assignment transformer with partitions In our algorithm, we refac-


tor P according to πP ↑ B, producing P 0 . We compute the new generators using
(2.10) only for the factor Pk0 corresponding to the block Xk ∈ πP ↑ B containing
B. The constraints are computed only for Pk0 for both invertible and non-invertible
assignments. This results in a large reduction of the operation count. As shown
in Section 2.3, πO = πP ↑ B. We will present a refinement of πO in Chapter 3.
The pseudo code for our assignment transformer is shown in Algorithm 2.5. The
handle_assign function applies (2.10) on GP 0 .
k

2.4.6 Widening (∇)

For the double representation, the widening transformer requires the generators
and the constraints of P and the constraints of Q. A given constraint a · x ⊗ b,
where ⊗ ∈ {6, =}, saturates a vertex v ∈ V if a · v = b, a ray r ∈ R if a · r = 0, and a
line z ∈ Z if a · z = 0.
For a given constraint c and G, the set Sc,G is defined as:

Sc,G = {g | g ∈ G and c saturates g}. (2.11)

The standard widening transformer computes for each constraint cp ∈ CP , the set
Scp ,GP and for each constraint cq ∈ CQ , the set Scq ,GP . If Scq ,GP = Scp ,GP for any cp ,
then cq is added to the output constraint set CO . The widening transformer removes
the constraints from CQ , so the conversion is not incremental in the standard im-
plementations. Recent work [184] allows incremental conversion when constraints
or generators are removed.
44 fast polyhedra analysis via online decomposition

Algorithm 2.5 Decomposed assignment transformer for Polyhedra


1: function Assignment(P, πP , stmt)
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: πP ← {XP1 , XP2 , . . . , XPp }
5: stmt ← xi := aT · x + 
6: B := extract_block(stmt)
7: P 0 := refactor(P, πP , πP ↑ B)
8: O := ∅
9: πO := πP ↑ B
10: for k ∈ {1, 2, . . . , r} do
11: if B ⊆ πOk then
12: G := handle_assign(GP 0 , stmt)
k
13: if ai = 0 then
14: C := backsubstitute(G, stmt)
15: else
16: C := chernikova(G)
17: end if
18: O.add((C, G))
19: else
20: O.add(Pk0 )
21: end if
22: end for
23: return (O, πO )
24: end function

widening with partitions In our algorithm, we refactor P according to πP t


πQ , producing P 0 . For a given constraint cq ∈ CQi , we access the block Xk ∈ πP t πQ
containing XQi and compute Scq ,GP 0 . If this set is equal to Scp ,GP 0 for any cp ∈ CP 0 ,
k k k
then cq is added to COi . If COi = CQi , then the conversion transformer is not
required and Qi is added to O, otherwise it is applied on all constraints in COi . As
shown in Section 2.3, πO = πQ . The pseudo code for our widening transformer is
shown in Algorithm 2.6. The saturate function applies (2.11) on given c and G.
To possibly improve the granularity for πO , we check if for any block Xk ∈ πP∇Q ,
COk = ∅; if yes, then Xk is removed from πP∇Q and replaced by a set of singleton
blocks with each block corresponding to a variable in Xk .

2.4.7 Join (t)

For the double representation, the generators GO of the output O = P t Q of the join
are simply the union of the generators of the input polyhedra, i.e., GO = GP ∪ GQ .
CO is obtained by incrementally adding the generators in GQ to the polyhedron
defined by CP .
2.4 polyhedra transformers 45

Algorithm 2.6 Decomposed polyhedra widening


1: function Widening(P, Q, πP , πQ )
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: Q ← {Q1 , Q2 , . . . , Qq }
5: πP ← {XP1 , XP2 , . . . , XPp }
6: πQ ← {XQ1 , XQ2 , . . . , XQq }
7: 0
P := refactor(P, πP , πP t πQ )
8: O := ∅
9: for k ∈ {1, 2, . . . , r} do
10: for cp ∈ CP 0 do
k
11: Scp ,GP 0 := saturate(cp , GP 0 )
k k
12: end for
13: end for
14: for i ∈ {1, 2, . . . , q} do
15: COi := ∅
16: k := j, s.t., XQi ⊆ Xj , Xj ∈ πP t πQ
17: for cq ∈ CQi do
18: Scq ,GP 0 := saturate(cq , GP 0 )
k k
19: if ∃cp ∈ CP 0 , s.t., Scq ,GP 0 = Scp ,GP 0 then
k k k
20: COi := COi ∪ {cq }
21: end if
22: end for
23: if COi = CQi then
24: O.add(Qi )
25: else
26: GOi := chernikova(COi )
27: O.add((COi , GOi ))
28: end if
29: end for
30: return (O, πQ )
31: end function

join with partitions In our join transformer shown in Algorithm 2.8, we


first refactor P and Q according to πP t πQ , obtaining P 0 and Q 0 respectively. The
join transformer can create constraints between the variables in different blocks of
πP t πQ . In the worst case, the join can merge all blocks into one to produce the >
partition, which blows up the number of generators due to the Cartesian product in
(2.4). However, in many cases common in the program analysis setting, the blocks
of πP t πQ need not be combined without sacrificing precision. Identifying such
cases is key in our work for avoiding the exponential blowup observed by prior
libraries [12, 104]. Theorem 2.2.5 identifies such cases.

computing the generators for the join If Pk0 = Qk0 holds, then Pk0 can
be added to O by Corollary 2.2.1. Since no new generators are added, the conver-
46 fast polyhedra analysis via online decomposition

Algorithm 2.7 Compute generators for the join


1: function compute_gen_join(P 0 , Q 0 , πP t πQ )
2: Parameters:
3: P 0 ← {P10 , P20 , . . . , Pr0 }
4: Q 0 ← {Q10 , Q20 , . . . , Qr0 }
5: πP t πQ ← {X1 , X2 , . . . , Xr }
6: U := ∅
7: πO := ∅
8: PN0 := Q 0 := >
N
9: O := ∅
10: for k ∈ {1, 2, . . . , r} do
11: if Pk0 = Qk0 then
12: U.add(Xk )
13: O.add(Pk0 )
14: else
15: πO := πO ∪ Xk
16: 0 := P 0 ./ P 0
PN N k
17: QN0 := Q 0 ./ Q 0
N k
18: end if
19: end for
20: πO := U ∪ πO
21: return πO , O, PN0 , Q0
N
22: end function

sion transformer is not required for these. This results in a large reduction of the
operation count for the conversion transformer.
As in Section 2.3, we compute π = πP t πQ and U = {Xk ∈ π | Pk0 = Qk0 }. The
factors in P 0 and Q 0 corresponding to the blocks A ∈ π \ U are merged using
the ./ transformer to produce PN0 and QN 0 respectively. Next, we compute G =
O
{GP 0 , GP 0 , . . . , GP 0 , GP 0 ∪ GQ 0 } where u = |U|. The pseudo code for this step is
U1 U2 Uu N N
shown in Algorithm 2.7.

computing the constraints for the join We know the constraint set for
all factors corresponding to the blocks in U. CP 0 ∪Q 0 is obtained by incrementally
N N
adding the generators in GQ 0 to the polyhedron defined by CP 0 . Similar to the meet
N N
transformer in Section 2.4.2, we apply our code optimization of not computing the
constraints for the generators common in both GP 0 and GQ 0 .
N N

Example 2.4.2. Consider

X = {x1 , x2 , x3 , x4 , x5 , x6 },
P = {{x1 = x2 , x2 = 2}, {x3 6 2}, {x5 = 1}, {x6 = 2}},
Q = {{x1 = 2, x2 = 2}, {x3 6 2}, {x5 = 2}, {x6 = 3}} with
πP = {{x1 , x2 }, {x3 , x4 }, {x5 }, {x6 }} and
πQ = {{x1 , x2 , x4 }, {x3 }, {x5 }, {x6 }}
2.4 polyhedra transformers 47

Algorithm 2.8 Decomposed polyhedra join


1: function Join(P, Q, πP , πQ )
2: Parameters:
3: P ← {P1 , P2 , . . . , Pp }
4: Q ← {Q1 , Q2 , . . . , Qq }
5: πP ← {XP1 , XP2 , . . . , XPp }
6: πQ ← {XQ1 , XQ2 , . . . , XQq }
7: 0
P := refactor(P, πP , πP t πQ )
8: Q 0 := refactor(Q, πQ , πP t πQ )
9: 0 , Q 0 ) := compute_gen_join(P 0 , Q 0 , π t π )
(πO , O, PN N P Q
10: G := remove_common_gen(GP 0 ∪ GQ 0 )
N N
11: C := incr_chernikova(G, GP 0 , CP 0 )
N N
12: O.add((C, G))
13: return (O, πO )
14: end function

In this case, the refactoring gives us,

πP t πQ = {{x1 , x2 , x3 , x4 }, {x5 }, {x6 }},


P 0 = {{x1 = x2 , x2 = 2, x3 6 2}, {x5 = 1}, {x6 = 2}},
Q 0 = {{x1 = 2, x2 = 2, x3 6 2}, {x5 = 2}, {x6 = 3}}.

We observe that only P10 = Q10 ; thus, we add P10 to the join O and {x1 , x2 , x3 , x4 } to U.
Applying Algorithm 2.7 we get,

N = {{x5 }, {x6 }},


PN0 = {x5 = 1, x6 = 2},
0
QN = {x5 = 2, x6 = 3},
O = {{x1 = x2 , x2 = 2, x3 6 2}},
πO = {{x1 , x2 , x3 , x4 }, {x5 , x6 }}.

GQ 0 contains only one vertex (2, 3). The conversion transformer incrementally adds
N
this vertex to the polyhedron defined by CP 0 . Thus, the factors O1 and O2 of O =
N
{O1 , O2 } are given by,

O1 = {x1 = x2 , x2 = 2, x3 6 2} and
O2 = {−x5 6 −1, x5 6 2, x6 = x5 + 1}.

As shown in Section 2.3, πPtQ = U ∪ A∈π\U A. Note that we can have πO 6= πO


S

even though πP = πP and πQ = πQ . This is because the join transformer will not
have a constraint involving a variable xi if either P or Q does not contain any
constraint involving xi . We illustrate this with an example below:
48 fast polyhedra analysis via online decomposition

Example 2.4.3. Consider

P = {{x1 = 0}, {x2 − x3 = 2, x3 − x4 = 3} and


Q = {{x1 = 0}, {x2 − x4 = 5}} with
πP = πP = {{x1 }, {x2 , x3 , x4 }} and
πQ = πQ = {{x1 }, {x2 , x4 }, {x3 }}.
For this example, Algorithm 2.8 returns

O = {{x1 = 0}, {x2 − x4 = 5}} and


πO = {{x1 }, {x2 , x3 , x4 }}.

whereas πO = {{x1 }, {x2 , x4 }, {x3 }}. Thus πO v πO .

improving the granularity of πO We lose performance since πO is usually


not the finest partition for O. To possibly improve the partition obtained, we per-
form a preprocessing step before applying Algorithm 2.7 in our join transformer.
If all variables of a block Xk ∈ πP t πQ are unconstrained in either P or Q, then the
join does not require any constraints involving these variables. We replace Xk in
πP t πQ with a set of singleton blocks. This set has one block for each variable in
Xk . Pk0 and Qk0 are not considered for the join.
If only a subset of variables of Xk ∈ πP t πQ are unconstrained in either P or Q,
then we cannot remove the unconstrained variables from Xk as the join may require
constraints involving the unconstrained variables. For example x3 is unconstrained
in Q in example 2.4.3. However, the constraints involving x3 are required for the
join or else we lose precision. In Chapter 3 we will present a refinement which will
a produce a finer output partition for the join.
It is important to note that the key to keeping the cost of the join down is to
reduce the application of the ./ transformer as it increases the number of generators
exponentially, which in turn, increases the cost of the expensive conversion. The
./ transformer is applied in Algorithm 2.8 during refactoring and while merging
factors corresponding to A. In practice, πP and πQ are usually similar so the ./
transformer adds a small number of generators while refactoring.

why theorem 2.2.5 works in practice In the program analysis setting, the
join is applied at the loop head or where the branches corresponding to if-else state-
ments merge. In case of a loop head, P represents the polyhedron before executing
the loop body and Q represents the polyhedron after executing the loop. The loop
usually modifies only a small number of variables. The factors corresponding to
the blocks containing only the unmodified variables are equal, thus | A∈π\U A| is
S

small. Hence, the application of the ./ transformer while merging factors corre-
sponding to A does not create an exponential number of new generators. Similarly
for the if-else, the branches modify only a small number of variables and thus
| A∈π\U A| remains small.
S
2.4 polyhedra transformers 49

x:=0;
y:=0;
if (∗){
x ++;
y ++;
}
z := x ;

Figure 2.6: Precision loss for static partitioning.

comparison with static partitioning It is also worth noting that deter-


mining unmodified blocks before running the analysis requires knowledge of the
partition at the start of the loop. Partitions computed based on the dependence
relation between program variables may not be permissible as the abstract seman-
tics of the Polyhedra transformers may relate more variables, resulting in precision
loss. This is illustrated by the code in Fig. 2.6.
Here an analysis based on the dependence relation [27] will yield the partition
{{x, z}, {y}} after the assignment z:=x, since the variables x and y are unrelated. How-
ever, the join due to the conditional if-statement creates a constraint between x and
y; thus, πP = {{x, y, z}} which is computed by our analysis.

complexity The performance of the join transformer is dominated by the cost


of the conversion transformer. The conversion incrementally adds the generators
corresponding to GQ to the polyhedron defined by the constraints in CP . The worst
case complexity of the conversion is exponential in the number of generators. For
the join without partitioning, the number of generators can be O(2n ) in the worst
case. The join transformer in Algorithm 2.8 applies the conversion only on the
generators in GQ 0 . Using the notation from Section 2.3, let N = A∈π\U A be the
S
N
union of all blocks in π for which the corresponding factors are not equal, then the
number of generators in GQ 0 can be O(2|N| ) in the worst case. In practice, usually
N
2|N|  2n resulting in a huge reduction in operation count.
An alternative approach for computing CO could be to use (2.5), however this is
more expensive than applying the conversion. This is because the Fourier-Motzkin
elimination can generate a quadratic number of new constraints for each variable
that it projects out. Many of the generated constraints are redundant and should
be removed to keep the algorithm efficient. Redundancy checking is performed by
calling a linear solver for every constraint which slows down the computation. We
note that recent work [137, 138, 218] makes the redundancy removal more efficient
thereby improving its feasibility.
50 fast polyhedra analysis via online decomposition

2.5 experimental evaluation

In this section, we evaluate the effectiveness of our decomposition approach for


analyzing realistic programs. We implemented all of our algorithms in the form
of a library for numerical domains which we call ELINA [1]. The source code of
ELINA is publicly available at http://elina.ethz.ch.
We compare the performance of ELINA against NewPolka [104] and PPL [12],
both widely used state-of-the-art libraries for Polyhedra domain analysis. PPL uses
the same basic algorithms as NewPolka, but uses a lazy approach for the conver-
sion whereas NewPolka uses an eager approach. PPL is faster than NewPolka
for some transformers and slower for others. Like NewPolka, ELINA uses an ea-
ger approach. The experimental results of our evaluation show that the polyhedra
arising during analysis can indeed be kept partitioned using our approach. We
demonstrate dramatic savings in both time and memory across all benchmarks.

2.5.1 Experimental Setup

In ELINA we use rational numbers encoded using 64-bit integers as in NewPolka


and PPL. In the case of an integer overflow, all libraries set the polyhedron to >.

platform All of our experiments were carried out on a 3.5 GHz Intel Quad
Core i7-4771 Haswell CPU. The sizes of the L1, L2, and L3 caches are 256 KB, 1024
KB, and 8192 KB, respectively, and the main memory has 16 GB. Turbo boost was
disabled for consistency of measurements. All libraries were compiled with gcc
5.2.1 using the flags -O3 -m64 -march=native.

analyzer We use the crab-llvm analyzer which is part of the SeaHorn [91] ver-
ification framework. The analyzer is written in C++ and analyzes LLVM bitcode
for C programs. It generates polyhedra invariants which are then checked for sat-
isfiability with an SMT-solver. The analysis is intra-procedural and the time for
analyzing different functions in the analyzed program varies.

2.5.2 Experimental Results

We measured the time and memory consumed for the Polyhedra analysis by New-
Polka, PPL, and ELINA on more than 1500 benchmarks. We used a time limit of 4
hours and a memory limit of 12 GB for our experiments.

benchmarks We tested the analyzer on the benchmarks of the popular soft-


ware verification competition [24]. The competition provides benchmarks in differ-
ent categories. We chose three categories which are suited for the analysis with a
numerical domain: (a) Linux Device Drivers (LD), (b) Control Flow (CF), and (c)
2.5 experimental evaluation 51

Table 2.3: Speedup of Polyhedra domain analysis for ELINA over NewPolka and PPL.
Benchmark Category LOC NewPolka PPL ELINA Speedup ELINA vs.
time(s) memory(GB) time(s) memory(GB) time(s) memory(GB) NewPolka PPL
firewire_firedtv LD 14506 1367 1.7 331 0.9 0.4 0.2 3343 828
net_fddi_skfp LD 30186 5041 11.2 6142 7.2 9.2 0.9 547 668
mtd_ubi LD 39334 3633 7 MO MO 4 0.9 908 ∞
usb_core_main0 LD 52152 11084 2.7 4003 1.4 65 2 170 62
tty_synclinkmp LD 19288 TO TO MO MO 3.4 0.1 >4235 ∞
scsi_advansys LD 21538 TO TO TO TO 4 0.4 >3600 >3600
staging_vt6656 LD 25340 TO TO TO TO 2 0.4 >7200 >7200
net_ppp LD 15744 TO TO 10530 0.15 924 0.3 >16 11.4
p10_l00 CF 592 841 4.2 121 0.9 11 0.8 76 11
p16_l40 CF 1783 MO MO MO MO 11 3 ∞ ∞
p12_l57 CF 4828 MO MO MO MO 14 0.8 ∞ ∞
p13_l53 CF 5816 MO MO MO MO 54 2.7 ∞ ∞
p19_l59 CF 9794 MO MO MO MO 70 1.7 ∞ ∞
ddv_all HM 6532 710 1.4 85 0.5 0.05 0.1 12772 1700

Heap Manipulation (HM). Each of these categories contains hundreds of bench-


marks and invariants that cannot be expressed using weaker domains such as Oc-
tagon, Zone, or others.
Table 2.3 shows the time (in seconds) and the memory (in GB) consumed for Poly-
hedra analysis with NewPolka, PPL, and ELINA on 14 large benchmarks. In the
table, the entry TO means that the analysis did not finish within 4 hours. Similarly,
the entry MO means that the analysis exceeded the memory limit. The benchmarks
in the table were selected based on the following criteria:
• The analysis ran for > 10 minutes with NewPolka.

• There was no integer overflow during the analysis for the most time consum-
ing function in the analyzed program.
At each step of the analysis, our algorithms obtain mathematically/semantically
the same polyhedra as NewPolka and PPL, just represented differently (decom-
posed). In the actual implementation, since our representation contains different
numbers, ELINA may produce an integer overflow before NewPolka or PPL. How-
ever, on the benchmarks shown in Table 2.3, NewPolka overflowed 296 times
whereas ELINA overflowed 13 times. We also never overflowed on the procedures
in the benchmarks that are most expensive to analyze (neither did NewPolka and
PPL). Thus ELINA does not benefit from faster convergence due to integer over-
flows which sets the corresponding polyhedra to >.
We show the speedups for ELINA over NewPolka and PPL which range from
one to at least four orders of magnitude. In the case of a time out, we provide
a lower bound on the speedup, which is very conservative. Whenever there is
memory overflow, we show the corresponding speedup as ∞, because the analysis
can never finish on the given machine even if given arbitrary time.
Table 2.3 also shows the number of lines of code for each benchmark. The largest
benchmark is usb_core_main0 with 52K lines of code. ELINA analyzes it in 65 sec-
onds whereas NewPolka takes > 3 hours and PPL requires > 1 hour. PPL performs
52 fast polyhedra analysis via online decomposition

Table 2.4: Partition statistics for Polyhedra analysis with ELINA.


Benchmark |X| |N| nb trivial/total
max avg max avg max avg
firewire_firedtv 159 80 24 5 31 6 10/577
net_fddi_skfp 589 111 89 24 89 15 76/5163
_
mtd ubi 528 60 111 10 57 12 27/2518
_ _
usb core main0 365 72 267 29 61 15 80/14594
tty_synclinkmp 332 47 48 8 34 10 23/3862
scsi_advansys 282 67 117 11 82 19 11/2315
_
staging vt6656 675 53 204 10 62 6 35/1330
_
net ppp 218 59 112 33 19 5 1/2350
p10_l00 303 184 234 59 38 29 0/601
_
p16 l40 188 125 86 39 53 38 4/186
_
p12 l57 921 371 461 110 68 28 4/914
p13_l53 1631 458 617 149 78 28 5/1325
p19_l59 1272 476 867 250 65 21 9/1754
_
ddv all 45 22 7 2 14 8 5/124

better than NewPolka on 5 benchmarks whereas NewPolka has better performance


than PPL on 2 benchmarks. Half of the benchmarks in the Linux Device Drivers
category do not finish within the time and memory limit with NewPolka and PPL.
net_ppp takes the longest to finish with ELINA (≈ 15 minutes).
All benchmarks in the Control Flow category run out of memory with both
NewPolka and PPL except for p10_l00 which is also the smallest. This is because
all benchmarks in this category contain a large number of join points which creates
an exponential number of generators for both libraries. With our approach, we
are able to analyze all benchmarks in this category in 6 3 GB. There are > 500
benchmarks in this category not shown in Table 2.3 that run out of memory with
both libraries whereas ELINA is able to analyze them.
There is only one large benchmark in the Heap Manipulation category. For it we
get a 12722x speedup and also save 14x in memory over NewPolka. The gain over
PPL is 1700x in time and 5x in memory.
We gathered statistics on the number of variables (|X|), the size of the largest
block (|N|) in the respective partition, and its number of blocks (nb) after each join
for all benchmarks. Table 2.4 shows max and average of these quantities. It can be
seen that the number of variables in N is significantly smaller than in X resulting in
complexity gains. The last column shows the fraction of the times the partition is
trivial (equal to {X}). It is very low and happens only when the number of variables
is very small.
The bottleneck for the analysis is the conversion applied on GPtQ during the
join transformer. ELINA applies conversion on GP 0 tQ 0 which contains variables
N N
from the set N = A∈π\U A whereas NewPolka and PPL apply conversion for all
S
2.5 experimental evaluation 53

Number of variables at join


400
NewPolka ELINA
350
300
250
200
150
100
50
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000
Join Number
Number of variables at join: zoom-in on 13000 onwards
400
Newpolka ELINA
350
300
250
200
150
100
50
0
13000 13250 13500 13750 14000 14250 14500
Join Number

Figure 2.7: The join transformer during the analysis of the usb_core_main0 benchmark. The
x-axis shows the join number and the y-axis shows the number of variables in
N = A∈π\U A (subset of variables affected by the join) and in X. The first figure
S

shows these values for all joins whereas the second figure shows it for one of
the expensive regions of the analysis.

variables in the set X. The first part of Fig. 2.7 plots the number of variables in N
and in X for all joins during the analysis of the usb_core_main0 benchmark. |X| varies
for all joins at different program points. It can be seen that the number of variables
in N is close to the number of variables in X till join number 5000. Although the
number of variables is large in this region, it is not the bottleneck for NewPolka
and PPL as the number of generators is linear in the number of variables. We get a
speedup of 4x mainly due to our conversion transformer which leverages sparsity.
The most expensive region of the analysis for both NewPolka and PPL is after
join number 5000 where the number of generators grows exponentially. In this
region, N contains 9 variables on average whereas X contains 54. The second part
of Fig. 2.7 zooms in one of these expensive regions. Since the cost of conversion
depends exponentially on the number of generators which in turn depends on the
number of variables, we get a large speedup.
We also measured the effect of optimizations not related to partitioning on the
overall speedup. The maximum difference was on the net_ppp benchmark which
was 2.4x slower without the optimizations.

remaining benchmarks Above we presented the results for 14 large bench-


marks. The remaining benchmarks either finish or run out of memory in < 10
minutes with NewPolka or the analysis produces an integer overflow in the most
time consuming function. The bound on the speedup for these benchmarks ranges
from 2x to 76x.
54 fast polyhedra analysis via online decomposition

2.6 discussion

Program analysis with the exponentially expensive Polyhedra domain was believed
to be intractable for analyzing large real-world programs for around 40 years. In
this chapter, we presented a theoretical framework, and its implementation, for
speeding up the Polyhedra domain analysis by orders of magnitude without los-
ing precision. Our key idea is to decompose the analysis and its transformers to
work on sets of smaller polyhedra, thus reducing its asymptotic time and space
complexity. This is possible because the statements in real-world programs affect
only a few variables in the polyhedra. As a result, the variable set can be parti-
tioned into independent subsets. The challenge in maintaining these partitions is
in handling their continuous change during the analysis. These changes cannot
be predicted statically in advance. Our partition computations leverage dynamic
analysis state and the semantics of the Polyhedra transformers. These computa-
tions are fast and produce sufficiently fine partitions, which enables significant
speedups. Precision-wise, our decomposed analysis computes polyhedra seman-
tically equivalent to those produced by the original non-decomposed analysis at
each step. Overall our analysis computes the same invariants as the original one,
but significantly faster.
We provided a complete end-to-end implementation of the Polyhedra domain
analysis within ELINA [1]. Benchmarking against two state-of-the-art libraries for
the Polyhedra analysis, namely, NewPolka and PPL, on real-world programs in-
cluding Linux device drivers and heap manipulating programs showed orders of
magnitude speedup or successful completion where the others time-out or exceed
memory. We believe that our framework presents a significant step forward in mak-
ing Polyhedra domain analysis practical for real-world use.
In the next chapter, we will show that our theoretical framework of online decom-
position is generic and can be extended to any numerical domain that maintains
linear constraints between program variables.
3
GENERALIZING ONLINE DECOMPOSITION

In Chapter 2, we presented our theoretical framework for decomposing the stan-


dard implementation of the Polyhedra domain that is based on the most precise
Polyhedra transformers. However, the Polyhedra domain can also be implemented
differently, with other, less precise transformers. Further, there are other existing
popular numerical domains such as Octahedron [51], TVPI [183], Octagon [142],
and Zone [140] and these too can be implemented in a variety of ways. So the
basic question is how to apply the idea of online decomposition to all numerical
domains and potentially different implementations.
The online decomposition for the Polyhedra [190] domain (Chapter 2) is special-
ized for the particular implementations of the domain. In [189], we developed an
online decomposition for the Octagon domain. In both cases, the decomposition
was manually designed from scratch for the standard transformers of the particu-
lar domain. The downside of this approach is that the substantial effort invested in
decomposing the transformers of the specific implementation of the domain can-
not be reused and needs to be repeated for every new implementation. This task is
difficult and error-prone as it requires devising new algorithms and data structures
from scratch each time.
To illustrate the issue, consider an element I = {−x1 − x2 6 0, −x3 6 0, −x4 6 0}
in the Octagon domain (which captures constraints of the form ±xi ± xj 6 c, c ∈ R,
between the program variables) and the conditional expression x2 + x3 + x4 6 1.
There are multiple ways to define a sound conditional transformer in the Octagon
domain for the given conditional expression. One may define a sound conditional
transformer T1 that adds the non-redundant constraint −x1 + x4 6 1 to I resulting
in the output I 0 = {−x1 − x2 6 0, −x3 6 0, −x4 6 0, −x1 + x4 6 1} whereas another
transformer T2 may add x2 + x3 6 1 to I resulting in I 00 = {−x1 − x2 6 0, −x3 6
0, −x4 6 0, x2 + x3 6 1}. The specialized decomposition for the Octagon domain
[189] requires access to the exact definition of the transformer, i.e., it will produce
different decompositions for T1 and T2 as the set of variables in the constraints
added by the two transformers are disjoint.

this chapter Our key objective is to bring the power of decomposition to all
sub-polyhedra domains without requiring complex manual effort from the domain
designer. This enables domain designers to achieve speed-ups without requiring
55
56 generalizing online decomposition

them to rewrite all abstract transformers from scratch each time. More formally, our
goal is to provide a systematic correct-by-construction method that, given a sound
abstract transformer T in a sub-polyhedra domain (e.g., Zone), generates a sound
decomposed version of T that is faster than T and does not require any change to the
internals of T. In this chapter, we present a construction that achieves this objective
under certain conditions. We provide theoretical guarantees on the convergence,
monotonicity, and precision of the decomposed analysis with respect to the non-
decomposed analysis. We also show that the obtained decomposed transformers
are faster than the prior, hand-tuned decomposed domains from [189, 190].
The work in this chapter was published in [191].

main contributions We make the following contributions:


• We introduce a general construction for obtaining decomposed transform-
ers from given non-decomposed transformers of existing numerical domains.
Our construction is “black-box:” it does not require changes to the underlying
algorithms implemented in the original non-decomposed transformers.

• We provide conditions on the non-decomposed transformers under which


our decomposition maintains precision and equivalence at fixpoint.

• We apply our method to decompose standard transformers of three popular


and expensive domains: Polyhedra, Octagon, and Zone. For these, we provide
complete end-to-end implementations as part of ELINA [1].

• We evaluate the effectiveness of our decomposed analysis against state-of-the-


art implementations on large real-world benchmarks including Linux device
drivers. Our evaluation shows up to 6x and 2x speedups compared to state-of-
the-art manually decomposed domain implementations and orders of magni-
tude speedups compared to non-decomposed Polyhedra and Octagon imple-
mentations. For Zone, we achieve speedups of up to 6x compared to our own,
non-decomposed implementation.

3.1 generic model for numerical abstract domains

In this section, we introduce a generic model for the abstract domains to which
our theory applies. An abstract domain consists of a set of abstract elements and
a set of transformers that model the effect of program statements (assignment,
conditionals, etc.) and control flow (join etc.) on the abstract elements. Let X =
{x1 , x2 , . . . , xn } be a set of program variables. We consider sub-polyhedra domains,
i.e., numerical abstract domains D that encode linear relationships between the
variables in X of the form:
X
n
ai · xi ⊗ c, where xi ∈ X, ai ∈ Z, ⊗ ∈ {6, =}, c ∈ C. (3.1)
i=1
3.1 generic model for numerical abstract domains 57

Typical choices for C include Q (rationals) and R (reals). As with any abstraction,
the design of a numerical domain is guided by the cost vs. precision tradeoff. For
instance, the Polyhedra domain [57] is the most precise domain yet it is also the
most expensive. On the other hand, the Interval domain is cheap but also very
imprecise as it does not preserve relational information between variables. Between
these two sit a number of domains with varying degrees of precision and cost:
examples include Two Variables Per Inequality (TVPI) [143], Octagon [142], and
Zone [140].

representing domain constraints We introduce notation for describing


the set of constraints a given domain D can express for the variables in X. This
set of constraints is referred to as LX,D and is determined by four components
(n, R, T, C):

• The size n of the variable set X.

• A relation R ⊆ R1 × R2 × · · · × Rn to describe the universe of possible co-


efficients. Each Ri ⊆ Z is a set of integers defining the allowed values for
the coefficient ai . Typical examples for Ri include Z, U = {−1, 0, 1}, and
L = {−2k , 0, 2k | k ∈ Z}.

• The set T ⊆ {6, =} determining equality/inequality constraints.

• The set C containing the allowed values for the constant c in (3.1). Typical
examples include Q and R.

Table 3.1 shows common constraints in the above notation allowed by different
numerical domains. The set of constraints LX,D representable by a domain D con-
P
tains all constraints of the form ni=1 ai · xi ⊗ c where: (i) the coefficient list of each
Pn
expression i=1 ai · xi is a permutation of a tuple in R, (ii) ⊗ ∈ T, and (iii) the
constant c ∈ C. For instance, the possible constraints LX,Octagon for the Octagon
domain over real numbers are described via the tuple (n, U2 × {0}n−2 , {6, =}, R).

Example 3.1.1. Consider a program with four variables and a fictive domain that
can relate at most two:

X = {x1 , x2 , x3 , x4 } and LX,D : (4, U2 × {0}2 , {6, =}, {1, 2}).

Here, the constraint 2x1 + 3x4 6 2 6∈ LX,D as no permutation of tuples in U2 × {0}2


can produce (2, 0, 0, 3). Similarly, x2 − x3 6 3 6∈ LX,D even though there exists a
permutation of tuples in U2 × {0}2 that can produce (0, 1, −1, 0), but 3 6∈ C. However,
the constraints x2 − x3 6 1 and x2 − x3 = 2 are in LX,D .
58 generalizing online decomposition

Table 3.1: Instantiation of constraints expressible in various numerical domains.


Domain R T C Reference

Polyhedra Zn {6, =} Q, R [57]


Linear equality Zn {=} Q, R [112]
Octahedron Un {6, =} Q, R [51]
Stripes {(a, a, −1, 0, . . . , 0) | a ∈ Z}∪ {6, =} Q, R [71]
{(0, a, −1, 0, . . . , 0) | a ∈ Z}
TVPI Z2 × {0}n−2 {6, =} Q, R [183]
Octagon U2 × {0}n−2 {6, =} Q, R [142]
Logahedra L2 × {0}n−2 {6, =} Q, R [101]
Zone {1, 0} × {0, −1} × {0}n−2 {6, =} Q, R [140]
Upper bound {1} × {−1} × {0}n−2 {6} {0} [134]
Interval {1, −1} × {0}n−1 {6, =} Q, R [54]

defining an abstract domain An abstract element I in a domain D is


a conjunction of a finite number of constraints from LX,D . By abuse of notation
we will represent I as a set of constraints (interpreted as a conjunction of the con-
straints in the set). We ignore the equivalent generator representation of an abstract
element where the element is encoded as a collection of vertices, rays, and lines
(Section 2.1.1) as the constraint representation leads to a clearer exposition of the
ideas. However, our technical results are also valid with the generator representa-
tion. The set of all possible abstract elements in D is denoted with PD and typically
forms a lattice (PD , v, t, u, >, ⊥) with respect to the defined domain order v. Given
abstract elements I and I 0 , I t I 0 is the smallest element in the domain covering both
I and I 0 and is computed or approximated by the join transformer. Similarly I u I 0
is the meet, computed, e.g., as I ∪ I 0 . While our theory handles all transformers in
a given domain D, our presentation focuses on the core transformers (as in Chap-
ter 2), namely: conditional containing a linear constraint, assignment with a linear
expression, meet (u), join (t), and widening (5). We chose these because they are
the most expensive domain transformers and thus their design shows the most
variation, i.e., they can be implemented in multiple ways.
As is standard, we use the meet-preserving concretization function γ to denote
with γ(I) the concrete element (polyhedron) represented by the abstract element I.
We note that it is possible for I to include redundant constraints, that is, removing
a constraint from I may not change the represented concrete element γ(I). Further,
the minimal (without any redundancy) representation of a concrete element γ(I)
need not be unique, i.e., there could be two distinct abstract elements I and I 0 with
γ(I) = γ(I 0 ):
Example 3.1.2. I = {x1 = 0, x2 = 0} and I 0 = {x1 = 0, x2 = 0, x1 = x2 } represent
the same concrete element γ(I) in the Polyhedra domain. However, I 0 contains the
redundant constraint x1 = x2 . I is not the only minimal representation as I 00 =
{x1 = 0, x1 = x2 } is also minimal for γ(I).
3.1 generic model for numerical abstract domains 59

We next define what it means for an abstract transformer to be sound.1

Definition 3.1.1. A given abstract transformer T is sound w.r.t to its concrete trans-
former T # iff for any element I ∈ D, T # (γ(I)) ⊆ γ(T (I)).

The soundness criterion above is naturally extended to transformers with multi-


ple arguments.

Definition 3.1.2. We say an abstract domain D is closed (also called forward com-
plete in [83, 164]) for a concrete transformer T # (e.g., conditional, meet) iff it can
be done precisely in the domain, i.e., if there exists an abstract transformer T cor-
responding to that concrete transformer such that for any abstract element I in D,
γ(T (I)) = T # (γ(I)).

The Polyhedra domain is closed for conditional, assignment, and meet, but not
for the join. All other domains in Table 3.1 are only closed for the meet. Indeed,
a crucial aspect of abstract interpretation is to permit sound approximations of
transformers for which the domain is not closed.

Example 3.1.3. The Octagon domain is not closed for the conditional transformer.
For example, if the condition is x1 − 2x2 6 0 and the abstract element is I = {x1 6
1, x2 6 0}, then the concrete element T # (γ(I)) = {x1 6 1, x2 6 0, x1 − 2x2 6 0} is not
representable exactly in the Octagon domain (because the constraint x1 − 2x2 6 0
is not exactly representable).

A useful concept in analysis (and one we refer to throughout the chapter) is that
of a best abstract transformer.

Definition 3.1.3. A (unary) abstract transformer T in D is best iff for any other
sound unary abstract transformer T 0 (corresponding to the same concrete trans-
former T # ) it holds that for any element I in D, T always produces a more precise
result (in the concrete), that is, γ(T (I)) ⊆ γ(T 0 (I)). The definition is naturally lifted
to multiple arguments.

In Example 3.1.3, a possible sound approximation for the output in the Octagon
domain is I 00 = I while a best transformer would produce {x1 6 0, x2 6 0, x1 − x2 6
0}. Since there can be multiple abstract elements with the same concretization, there
can be multiple best abstract transformers in D. We require the sub-polyhedra
abstract domains to be equipped with a best transformer and also be closed under
meet. Due to these restrictions, our theory does not apply to the Zonotope [81, 82,
191] and DeepPoly [188] domains.

1 Throughout the chapter we will simply use the term transformer to mean a sound abstract trans-
former.
60 generalizing online decomposition

3.2 decomposing abstract elements

In this section, we introduce the needed notation and concepts for decomposing
abstract elements and transformers. We extend the terminology of partitions, blocks,
and factors as introduced in Section 2.2.1 for handling the decomposition of the
elements of the abstract domain D. We write πI for referring to the unique finest
partition for an element I in D. Each factor Ik ⊆ I is defined by the constraints that
exist between the variables in the corresponding block Xk ∈ πI . I can be recovered
from the set of factors by taking the union of the constraint sets Ik .
Example 3.2.1. Consider the element I = {x1 − x2 6 1, x3 6 0, x4 6 0} in the TVPI
domain
X = {x1 , x2 , x3 , x4 } and LX,TVPI : (4, Z2 × {0}2 , {6, =}, Q).
Here X can be partitioned into three blocks with respect to I resulting in three
factors:

πI = {{x1 , x2 }, {x3 }, {x4 }}, I1 = {x1 − x2 6 1}, I2 = {x3 6 0}, and I3 = {x4 6 0}.

For a given D, π⊥ = π> = ⊥ = {{x1 }, {x2 }, . . . , {xn }}. More generally, note that
I v I 0 does not imply that πI 0 is finer, coarser, or comparable to πI .

different partitions for equivalent elements To gain a deeper under-


standing of partitions for abstract elements, there are two interesting points worth
noting. First, it is possible that two semantically equivalent abstract elements I, I 0
in the domain have different partitions. That is, even if γ(I) = γ(I 0 ), it may be the
case that πI 6= πI 0 or πI @ πI 0 :
Example 3.2.2. Consider I = {x1 6 x2 , x2 = 0, x3 = 0} with the finest partition
πI = {{x1 , x2 }, {x3 }}, I 0 = {x1 6 0, x2 = 0, x3 = 0} with πI 0 = {{x1 }, {x2 }, {x3 }} and
I 00 = {x1 6 x3 , x2 = 0, x3 = 0} with πI 00 = {{x1 , x3 }, {x2 }} in the Polyhedra domain.
Here γ(I) = γ(I 0 ) = γ(I 00 ), but the partitions are pairwise different.
Second, it is possible that for a given abstract element I, there exists an equivalent
element I 0 with finer partition but I 0 is not representable in the domain. This shows
a potential limitation of syntactic partitions used in our framework.
Example 3.2.3. Consider the Stripes domain

X = {x1 , x2 , x3 }, LX,Stripes : {3, {(a, a, −1) | a ∈ Z} ∪ {(0, a, −1) | a ∈ Z}, {6, =}, Q},

and the abstract element

I = {x1 + x2 − x3 = 0, −x2 + x3 = 0} with πI = {x1 , x2 , x3 }.

This domain cannot represent the equivalent element I 0 = {x1 = 0, x2 − x3 = 0} with


partition πI 0 = {{x1 }, {x2 , x3 }}, which is finer than πI . This is because the constraint
x1 = 0 is not representable in the Stripes domain.
3.3 recipe for decomposing transformers 61

It is important we guarantee that regardless of how approximate a given trans-


former T is, the partition we end up computing for T is always sound (permissible)
for the output abstract element I produced by T . Next, we extend Definition 2.2.1:

Definition 3.2.1. A partition π is permissible for an abstract element I in D if it is


coarser than πI , that is, if π w πI .

The variables related in πI are also related in any permissible partition of I,


but not vice-versa. In Example 3.2.1, {{x1 , x2 }, {x3 , x4 }} is permissible for I while
{{x1 }, {x2 , x3 , x4 }} is not. We will use πI to denote a permissible partition for I.

3.3 recipe for decomposing transformers

A primary objective of this work is to define a mechanical recipe which takes as


input a sound abstract transformer and produces as output a sound and decom-
posed variant of that transformer, thus resulting in better analysis performance. In
this section we describe the general recipe and illustrate its actual use.
At first glance the above challenge appears fundamentally difficult because there
are many ways to define a sound transformer in a domain D. Standard implemen-
tations of popular numerical domains like Octagon, Zone, TVPI, and others, do
not necessarily implement the best transformers as they can be expensive; instead
the domains often approximate them. Interestingly, as pointed out earlier, such an
approximation can make the associated partition both coarser or finer. That is, the
partitioning function is not monotone. Here is an example illustrating this point:

Example 3.3.1. Consider the elements I = {x1 6 0, x2 6 0, x1 − x2 6 0} with πI =


{{x1 , x2 }} and I 0 = {x1 6 0, x2 6 0} with πI 0 = {{x1 }, {x2 }} in the Octagon domain.
Here, γ(I) ⊂ γ(I 0 ) and πI w πI 0 . Now consider the element I 00 = {x1 + x2 6 0} with
πI 00 = {{x1 , x2 }}. Here, γ(I 0 ) ⊂ γ(I 00 ) and πI 0 v πI 00 .

Definition 3.3.1. A transformer T in D is decomposable for input I iff the output


IO = T (I) results in a partition πIO 6= >. For binary transformers, the definition is
analogous.

There are many ways to define sound approximations of the best transformers in
D. As a consequence, it is possible to have two transformers T1 , T2 in D on the same
input I such that one produces the > partition for the output while the other does
not. There are two principal ways to obtain a decomposable transformer: (a) white
box: here, one designs the transformer from scratch, maintaining the (changing)
partitions during the analysis, and (b) black box: here, one provides a construction
for decomposing existing transformers without knowing their internals. In the next
section, we pursue the second approach and show that it is possible and, under
certain conditions there is no loss of precision. As a preview, we describe the high-
level steps that one needs to perform dynamically in our black-box decomposition.
62 generalizing online decomposition

a construction for online transformer decomposition There are


three main steps for decomposing a given transformer:

1. compute (if needed) partitions for the input(s) of the transformer,

2. compute a partition for the output based on the program statement/expres-


sion and input partition(s) from step 1,

3. refactor the inputs according to the partition from step 2,

4. apply the transformer on one or more factors of the inputs from step 3.

We next describe these steps in greater detail.


In an ideal setting, one would always work with the finest partition for the inputs
and the output to (optimally) reduce the cost of the transformer. The finest parti-
tion for the inputs of a given transformer can always be computed from scratch
by taking the abstract element and connecting the variables that occur in the same
constraint in that element. The downside is that this computation may incur signif-
icant overhead. For example, computing the finest partition for an element in the
Octagon domain from scratch has the same quadratic complexity (in the number
of variables) as the standard conditional, meet, and assignment transformers. Thus,
it may nullify potential performance gains from decomposing these transformers.
In our approach we iteratively maintain permissible partitions.
To compute the output partition, a naive way is to first run the transformer,
obtain an abstract element as a result, and then compute the partition for that el-
ement. Of course, this approach is useless since running the standard transformer
prevents performance gains. Thus, the challenge is to determine first a permissible
output partition that is not too coarse at little cost so that then the transformers can
be applied only to relevant factors. Indeed, in our construction we compute a per-
missible partition for the output based on permissible partitions of the input, the
program statement, and possibly additional information that is cheaply available.
In practice, our computed partitions are reasonably fine so that we get significant
performance gains. We provide conditions under which computed partitions are
optimal in Section 3.4. One can also refine the partitions by computing them from
scratch occasionally in case they become too coarse.
Next we refactor the input(s) so that the partition corresponds to the output
partition according to Algorithm 2.1. In the last step, once the output partition
is obtained, the associated abstract element is computed directly in decomposed
form by applying the original transformer to one or more factor(s) of the input(s).
Applying this transformer on smaller factors reduces its complexity and results in
increased performance. In certain cases, the permissible partition for the output can
be further refined after applying the transformer and without adding significant
overhead. We identify such cases in Section 3.4.
Our approach is generic in nature and can decompose the standard transform-
ers of the existing sub-polyhedra numerical abstract domains. We implemented our
3.4 decomposing domain transformers 63

recipe and applied it to Polyhedra, Octagon, and Zone. Using a set of large Linux
device drivers, we show later in Section 3.5 the performance of our generated de-
composed transformers vs. transformers obtained via state-of-the-art hand-tuned
decomposition [189, 190]. Our approach yields up to 6x speed-ups for Polyhedra
and up to 2x speed-ups for Octagon. This speed-up is due to our decomposition
theorems (discussed next) that enable, in certain cases, finer decomposition of ab-
stract elements than previously possible. Speedups compared to the original trans-
formers without decomposition are orders of magnitude larger. Further, we also
decompose the Zone domain using our approach (for which no previous decom-
position exists) without changing the existing domain transformers. We obtain a
speedup of up to 6x over the non-decomposed implementation of the Zone domain.
In summary, our recipe is generic in nature yet leads to state-of-the-art performance
for classic abstract transformers.

3.4 decomposing domain transformers

In this section we show a construction that takes as input a sound and monotone
transformer in a given domain D and produces a decomposed variant of the same
transformer that operates on part(s) of the input(s). The resulting decomposed
transformer is always sound. We define classes of transformers for which the out-
put produced by the decomposed transformer has the same concretization as the
original non-decomposed transformer, i.e., there is no loss of precision. Although
our results apply to all transformers, we focus on the conditional, assignment, meet,
join, and widening transformers. We also show how to obtain finer partitions than
the manually decomposed transformers for Polyhedra (Chapter 2) and Octagon
considered in prior work [189, 190].

partitioned abstract elements and factors Given an abstract element


I with permissible partition πI = {X1 , . . . , Xr }, we denote the associated factors
with I(Xk ), 1 6 k 6 r. We will write a decomposed abstract element I as a set of
constraints (and not as set of factors) just as we did before, but always explicitly
mention the associated partition.

common partition and refactoring We assume that inputs I, I 0 for bi-


nary transformers are partitioned according to a common permissible partition
πcommon . This partition can always be computed as πcommon = πI t πI 0 , where
πI , πI 0 are permissible partitions for I, I 0 respectively. For all transformers, we as-
sume that the inputs get refactored as per the output partition whenever needed
by the corresponding transformer based on Algorithm 2.1.

abstract ordering Ordering in the decomposed abstract domain is defined


as I v I 0 ≡ γ(I) ⊆ γ(I 0 ).
64 generalizing online decomposition

precision of decomposed transformers We define classes of conditional,


assignment, meet, join and widening transformers for which the outputs of both
the given non-decomposed T and our associated decomposed T d have the same
concretization, i.e., γ(T (I)) = γ(T d (I)) for all I in D, which implies that T d inherits
monotonicity from T . If the transformer is not in these classes, then both γ(T (I)) ⊂
γ(T d (I)) or γ(T (I)) ⊃ γ(T d (I)) are possible and monotonicity may not hold.
If in addition to γ(T (I)) = γ(T d (I)), the given transformers satisfy
γ(I) = γ(I 0 ) ⇒ γ(T (I)) = γ(T (I 0 )),
then the analysis with our associated decomposed transformers produces the same
semantic invariants at fixpoint (fixpoint equivalence).

3.4.1 Conditional
P
We consider conditional statements of the form e ⊗ c where e = n i=1 ai · xi with
ai ∈ Z, ⊗ ∈ {6, =}, and c ∈ Q, R, on an abstract element I with an associated
permissible partition πI in domain D. The conditional transformer computes the
effect of adding the constraint e ⊗ c to I. As discussed in Section 3.1, most exist-
ing domains are not closed for the conditional transformer. Moreover, computing
the best transformers is expensive in these domains and thus the transformer is
usually approximated to strike a balance between precision and cost. The example
below illustrates two sound conditional transformers on the same input: the first
transformer produces a decomposable output whereas the output of the second
results in the > partition.
Example 3.4.1. Consider

X = {x1 , x2 , x3 , x4 , x5 , x6 }, LX,Polyhedra : (6, Z6 , {6, =}, Q),


I = {x1 + x2 6 0, x3 + x4 6 5} with πI = πI = {{x1 , x2 }, {x3 , x4 }, {x5 }, {x6 }}.
For the conditional statement x5 + x6 6 0, a best transformer T1 may return:
IO = {x1 + x2 6 0, x3 + x4 6 5, x5 + x6 6 0} with πIO = πIO = {{x1 , x2 }, {x3 , x4 }, {x5 , x6 }},

which is decomposable. However, another sound transformer T2 may return the


non-decomposable output:
0
IO = {x1 + x2 + x3 + x4 + x5 + x6 6 5} with πI 0 = πI 0 = >.
O O

Let Bcond = {xi | ai 6= 0} be the set of variables with non-zero coefficients in


P
the constraint n i=1 ai · xi ⊗ c. The block Bcond =

Xk ∩Bcond 6=∅ Xk fuses all blocks
S

Xk ∈ πI that have non-empty intersection with Bcond .


Example 3.4.2. Consider X = {x1 , x2 , x3 , x4 , x5 , x6 } and an element I in the Polyhedra
domain with πI = {{x1 , x2 , x3 }, {x4 , x5 }, {x6 }}. For the conditional x3 + x6 6 0, Bcond =
{x3 , x6 } and B∗cond = {x1 , x2 , x3 , x6 }.
3.4 decomposing domain transformers 65

d
Algorithm 3.1 Decomposed conditional transformer Tcond
1: function Conditional((I, πI ), stmt, Tcond )
2: B∗cond := compute_block(stmt, πI )
3: Icond := I(B∗cond )
4: πIO := {A ∈ πI | A ∩ B∗cond = ∅} ∪ {B∗cond }
5: Irest := I(X \ B∗cond )
6: IO := Tcond (Icond ) ∪ Irest
7: return (IO , πIO )
8: end function

construction for conditional Algorithm 3.1 shows our construction for


decomposing a given conditional transformer Tcond . Given an input element I with
a permissible partition πI in domain D, the algorithm first extracts the block B∗cond
based on the conditional statement and the permissible partition πI as described
above. The block B∗cond coarsens the input partition to yield the output partition.
Finally, the original transformer is applied to the abstract element Icond associated
with the block B∗cond ; the remaining constraints are kept as is in the result.
In Algorithm 3.1 the output of the decomposed transformer Tcond d on input I is
computed as Tcond (I) = Tcond (Icond ) ∪ Irest . One can show that Tcond is sound but we
d d

focus on also maintaining precision and thus monotonicity. Thus, we define a class
Cond(D) of conditional transformers Tcond where γ(Tcond (I)) = γ(Tcond d (I)) (this is

one of the two conditions discussed earlier that ensure fixpoint equivalence).
Definition 3.4.1. A (sound and monotone) transformer Tcond for the conditional
expression e ⊗ c is in Cond(D) iff for any element I and any associated permissible
partition πI , the output Tcond (I) satisfies:
• Tcond (I) = I ∪ I 0 ∪ I 00 where I 0 contains non-redundant constraints between the
variables from B∗cond only and I 00 is a set of redundant constraints between
the variables in X.

• γ(Tcond (Icond )) = γ(I 0 ∪ Icond ).


d (I)) for all inputs I in D.
Theorem 3.4.1. If Tcond ∈ Cond(D), then γ(Tcond (I)) = γ(Tcond
d
In particular, Tcond is sound and monotone.
Proof.

γ(Tcond (I)) = γ(I ∪ I 0 ∪ I 00 ) (by Definition 3.4.1)


0
= γ(I ∪ I ) (I 00 is redundant)
= γ(Irest ∪ (Icond ∪ I 0 )) (as I = Irest ∪ Icond )
= γ(Irest ) ∩ γ(Icond ∪ I 0 ) (γ is meet-preserving)
= γ(Irest ) ∩ γ(Tcond (Icond )) (by Definition 3.4.1)
= γ(Irest ∪ Tcond (Icond )) (γ is meet-preserving)
d
= γ(Tcond (I)).
66 generalizing online decomposition

Note that we can strengthen the condition in Definition 3.4.1 by replacing B∗cond
with Bcond . This makes it independent of permissible partitions but would reduce
the size of the class Cond(D).
In Example 3.4.1, T1 ∈ Cond(D) whereas T2 6∈ Cond(D) since T2 does not keep
the original constraints. Most standard transformers used in practice satisfy the
two conditions and can thus be decomposed with our construction without losing
any precision. The following example illustrates our construction for decomposing
the standard conditional transformer Tcond in the Octagon domain.

Example 3.4.3. Consider

X = {x1 , x2 , x3 }, LX,Octagon : (3, U2 × {0}, {6, =}, Q),


I = {x1 6 0, x2 + x3 6 0} with πI = πI = {{x1 }, {x2 , x3 }}.

Consider the conditional statement x3 6 0 with Bcond = {x3 }. Tcond adds the
constraint x3 6 0 to I and then applies Octagon closure on the resulting element to
produce the output:

Tcond (I) = {x1 6 0, x2 + x3 6 0, x3 6 0, x1 + x3 6 0},

which matches Definition 3.4.1 with I 0 = {x3 6 0} and I 00 = {x1 + x3 6 0}.


Algorithm 3.1 computes B∗cond = {x2 , x3 }, πO = {{x1 }, {x2 , x3 }}, Icond = {x2 + x3 6 0}
and Irest = {x1 6 0}. The algorithm applies Tcond on Icond and keeps Irest untouched
to produce:
d
Tcond (I) = {x1 6 0, x2 + x3 6 0, x3 6 0} with πO .
Since I 0 ∪ Icond = Tcond (Icond ), Tcond satisfies the conditions for Cond(D) in this case
and there is no change in precision.

Note that best transformers are not necessarily in Cond(D). This is due to con-
straints on the coefficient set R or the constant set C in D. We provide an example
of a domain D which does not have any best transformer in Cond(D).

Example 3.4.4. We consider a fictive domain

X = {x1 , x2 } and LX,D : (2, Z2 , {6, =}, {0, 1, 1.5}).

We assume I = {x1 6 1, x2 6 1} with permissible partition {{x1 }, {x2 }} and the


conditional x2 6 0.5. In this case B∗cond = {x2 }. Using only the constraints with
variables in B∗cond yields Tcond (Icond ) = {x2 6 1} as 0.5 6∈ C. This means that the
most precise result we can express which fits our conditions in the definition will
be semantically equivalent to I. However, a best transformer would produce an
abstract element semantically equivalent to {x1 6 1, x2 6 1, x1 + x2 6 1.5}, which is
more precise than I. Thus, no best transformer is in Cond(D).
3.4 decomposing domain transformers 67

We next identify domains for which a best conditional transformer is in


Cond(D).
best of domain D is in Cond(D) if for
Theorem 3.4.2. A best conditional transformer Tcond
D = (n, R, T, C) we have that R = R1 × Rr × . . . × Rn where each Ri ⊆ Z and 0 ∈ Ri ,
T = {6, =} and C = R or Q.

Proof. We show the proof for C = R. The extension to C = Q is straightforward.


We first present an inefficient but correct construction of Tcond best (I) for an element

I ∈ D with the associated partition πI . We note that the result of the concrete
conditional transformer is I ∪ {e ⊗ c}. We compute its abstraction in D by initializing
best (I) = ∅ and computing an upper bound c
Tcond
Pn max for all representable expressions
ai · xi in D constrained under the set I ∪ {e ⊗ c} using LP. If cmax 6= ∞, then
0
Pi=1
n 0 best
i=1 ai · xi 6 cmax is added to Tcond (I).
We now show that Tcond best (I) satisfies Definition 3.4.1. We note that our construc-

tion will not remove any existing constraint from I by adding a bound of ∞; thus,
I ⊆ Tcond
best (I). Since the constraint e ⊗ c only affects the variables in B∗
cond in the con-
crete, any constraint in Tcond (I) involving only the variables of X \ B∗cond is either
best

already in I or is redundant in Tcond best (I).


P 1 Pn2
Let us now consider a constraint ι := n i=1 αi · bi +
best
i=1 µi · ui 6 c in Tcond (I)
where bi ∈ B∗cond , ui ∈ X \ B∗cond , n1 , n2 > 0, n1 + n2 6 n involving the variables
of B∗cond and X \ B∗cond . Since B∗cond and X \ B∗cond are separate in πI∪{e⊗c} , ι is
P 1
redundant in I ∪ {e ⊗ c}. There must exist constraints ι1 := n i=1 αi · bi 6 c1 and
Pn2
ι2 := i=1 µi · ui 6 c2 in I ∪ {e ⊗ c} with c1 + c2 6 c. Since ι is representable in D, the
tuple (α1 , . . . , αn1 , µ1 , . . . , µn2 ) is in R. Since 0 ∈ Ri , the tuples (α1 , . . . , αn1 , 0, . . . , 0)
and (µ1 , . . . , µn2 , 0, . . . , 0) are also in R. Since c1 , c2 ∈ C = R, the constraints ι1
and ι2 are representable in D and are therefore part of Tcond best (I) which makes the

constraint ι redundant. Therefore, all non-redundant constraints added by Tcond best

are only between the variables in block B∗cond . Further, our construction ensures
that these can be recovered by applying Tcond best on I
cond .
best in Theorem 3.4.2 will not terminate if R is
We note that the algorithm for Tcond i
infinite. Existing domains including Polyhedra, Octagon and Zone satisfy the con-
ditions of Theorem 3.4.2 and thus a best conditional transformer in these domains
is in Cond(D). The following obvious corollary provides a condition under which
the output partition πIO computed by Algorithm 3.1 is finest, i.e., πIO = πIO .

Corollary 3.4.1. For the conditional e ⊗ c, πIO = πIO , if πI = πI and IO = I ∪ {e ⊗ c}.

3.4.2 Assignment

As in Section 2.1.2, we consider linear assignments of the form xj := δ on an abstract


P
element I with an associated permissible partition πI in D where δ := n i=1 ai ·
68 generalizing online decomposition

xi + c with ai ∈ Z and c ∈ Q, R. An assignment is invertible if aj 6= 0 (for example


x1 := x1 + x2 ). We write Ixj ⊆ I for the subset of constraints containing xj .
As discussed in Section 3.1, a number of existing domains are not closed under
assignment. As for the conditional, the best assignment transformers are usually
expensive and may need to be approximated. We provide an example, very similar
to Example 3.4.1, that shows how approximation can affect decomposition.

Example 3.4.5. Consider

X = {x1 , x2 , x3 , x4 , x5 , x6 }, LX,Polyhedra : (6, Z6 , {6, =}, Q),


I = {x1 + x2 = 0, x3 + x4 = 5, x5 − x3 = 0} with πI = πI = {{x1 , x2 }, {x3 , x4 , x5 }, {x6 }}.

For the assignment x5 := −x6 , a best sound assignment transformer T1 may return
the decomposable output:

IO = {x1 + x2 = 0, x3 + x4 = 5, x5 + x6 = 0} with πIO = πIO = {{x1 , x2 }, {x3 , x4 }, {x5 , x6 }}.

However, another sound transformer T2 may return the non-decomposable output:


0
IO = {x1 + x2 + x3 + x4 + x5 + x6 = 5} with πI 0 = πI 0 = >.
O O
P
Let Bassign = {xi | ai 6= 0} ∪ {xj } be the set of variables affected by δ := ni=1 ai xi +
c (we also include xj ). The block Bassign = Xk ∩Bassign 6=∅ Xk fuses all blocks Xk ∈ πI

S

having non-empty intersection with Bassign .

Example 3.4.6. Consider X = {x1 , x2 , x3 , x4 , x5 , x6 } and an element I in the Polyhe-


dra domain with πI = {{x1 , x2 }, {x3 , x4 }, {x5 , x6 }}. For the assignment x3 := x1 + x2 ,
Bassign = {x1 , x2 , x3 } and B∗assign = {x1 , x2 , x3 , x4 }.

We briefly explain the standard assignment transformer as background to moti-


vate the later definition of Assign(D) (the class of assignment transformers that do
not lose precision with our decomposition).

standard transformer: invertible assignment The standard assign-


ment transformer first removes all constraints in Ixj from I. It then computes a set
P
of constraints Iinv by substituting, in all constraints in Ixj , xj with (xj − i6=j ai xi −
c)/aj . Finally, Iinv may be approximated by a set Iinv 0 of representable constraints
(in D) over the same variable set. The result is IO = (I \ Ixj ) ∪ Iinv
0 .

standard transformer: non-invertible assignment For a non-


invertible assignment, the transformer also first removes Ixj from I. Next, it com-
putes a set of constraints Inon-inv by projecting out xj from all constraints in Ixj using
variable elimination [102]. Then it adds {xj − e = 0} to Inon-inv . Finally, Inon-inv may
be approximated by Inon-inv
0 to make it representable in D over the same variable
set. The result is IO = (I \ Ixj ) ∪ Inon-inv
0 .
3.4 decomposing domain transformers 69

d
Algorithm 3.2 Decomposed assignment transformer Tassign
1: function Assignment((I, πI ), stmt, Tassign )
2: B∗assign := compute_block(stmt, πI )
3: Iassign := I(B∗assign )
4: πIO := {A ∈ πI | A ∩ B∗assign = ∅} ∪ {B∗assign }
5: Irest := I(X \ B∗assign )
6: IO := Tassign (Iassign ) ∪ Irest
7: πIO := refine(πIO )
8: return (IO , πIO )
9: end function

construction for assignment Algorithm 3.2 shows our construction for


decomposing a given assignment transformer Tassign . It operates analogous to the
decomposed conditional transformer, except for the partition refinement in line
7, which we explain at the end of this section. The output of the decomposed
d
transformer Tassign on I is Tassign
d (I) = Tassign (Iassign ) ∪ Irest . Next we define a class
d
Assign(D) of assignment transformers Tassign where γ(Tassign (I)) = γ(Tassign (I)) for
all I. Again, this will ensure soundness and monotonicity.

Definition 3.4.2. A (sound and monotone) assignment transformer T in D for the


statement xj := e is in Assign(D) iff for any element I and any associated permissi-
ble partition πI in D, the output Tassign (I) satisfies the following conditions:

• Tassign (I) = (I \ Ixj ) ∪ I 0 ∪ I 00 where I 0 contains non-redundant constraints be-


tween the variables from B∗assign only, and I 00 is a set of redundant constraints
between the variables in X.

• γ(Tassign (Iassign )) = γ((Iassign \ Ixj ) ∪ I 0 ).


d
Theorem 3.4.3. If Tassign ∈ Assign(D), then γ(Tassign (I)) = γ(Tassign (I)) for all inputs I
in D. In particular, Tassign is sound and monotone.
d

Proof. B∗assign contains xj by definition, thus I \ Ixj = Irest ∪ (Iassign \ Ixj ). We have,

γ(Tassign (I)) = γ((I \ Ixj ) ∪ I 0 ∪ I 00 ) (by Definition 3.4.2)


= γ((I \ Ixj ) ∪ I 0 ) (I 00 is redundant)
= γ(Irest ∪ (Iassign \ Ixj ) ∪ I 0 ) (from above)
= γ(Irest ) ∩ γ((Iassign \ Ixj ) ∪ I 0 ) (γ is meet-preserving)
= γ(Irest ) ∩ γ(Tassign (Iassign )) (by Definition 3.4.2)
= γ(Irest ∪ Tassign (Iassign )) (γ is meet-preserving)
d
= γ(Tassign (I))
70 generalizing online decomposition

As for Cond(D), Definition 3.4.2 can be tightened to make it independent of


permissible partitions at the price of a smaller Assign(D).
In Example 3.4.5, T1 ∈ Assign(D) whereas T2 6∈ Assign(D) as T2 does not keep
the constraints in I \ Ix5 . Most standard assignment transformers used in prac-
tice satisfy the two conditions and can thus be decomposed with our construction
without losing any precision. The following example illustrates our construction
for decomposing the standard assignment transformer Tassign in the TVPI domain.

Example 3.4.7. Consider

X = {x1 , x2 , x3 }, LX,TVPI : (3, Z2 × {0}, {6, =}, Q),


I = {x1 6 0, x2 + x3 6 0, x3 6 3} with πI = πI = {{x1 }, {x2 , x3 }}.

Consider the non-invertible assignment x2 := 2 · x3 with Bassign = {x2 , x3 }. Tassign


determines that Ix2 = {x2 + x3 6 0}, projects out x2 , which yields the empty set,
and then adds x2 − 2 · x3 = 0 to obtain Inon-inv , which is representable. Overall this
results in {x1 6 0, x2 − 2 · x3 = 0, x3 6 3}.
Next, the transformer applies TVPI completion to produce the final output:

Tassign (I) = {x1 6 0, x2 − 2 · x3 = 0, x3 6 3, x2 6 6, x1 + x2 6 6, x1 + x3 6 3, x2 + x3 6 9},

which has the form of Definition 3.4.2 with

I 0 = {x2 − 2 · x3 = 0} and I 00 = {x2 6 6, x1 + x2 6 6, x1 + x3 6 3, x2 + x3 6 9}.

Algorithm 3.2 computes B∗assign = {x2 , x3 }, Iassign = {x2 + x3 6 0, x3 6 3}, πIO =


{{x1 }, {x2 , x3 }}, and Irest = {x1 6 0}. The algorithm applies Tassign on Iassign and keeps
Irest untouched to produce:

IO = Tassign
d
(I) = {x1 6 0, x2 − 2 · x3 = 0, x3 6 3, x2 6 6, x2 + x3 6 9} with πIO .

Here I 0 contains non-redundant constraints between variables from B∗assign only


and we have γ((Iassign \ Ixj ) ∪ I 0 ) = γ({x2 − 2 · x3 = 0, x2 6 3}) = γ(Tassign (Iassign )).
Thus Tassign satisfies the conditions for Assign(D) in this case and there is no loss
of precision.

We next identify domains for which a best assignment transformer is in


Assign(D).
best of domain D is in Assign(D) if for
Theorem 3.4.4. A best assignment transformer Tcond
D = (n, R, T, C) we have that R = R1 × Rr × . . . × Rn where each Ri ⊆ Z and 0 ∈ Ri ,
T = {6, =} and C = R or = Q.

Proof. The result of the concrete assignment transformer is (I \ Ixj ) ∪ IB∗assign where
IB∗assign contains non-redundant constraints only between the variables in B∗assign
added by the concrete transformer. Since I \ Ixj is representable in D, Tassign
best can
3.4 decomposing domain transformers 71

be implemented by adding constraints from IB∗assign to I \ Ixj via Tcond


best . From The-

best for D is in Cond(D). Thus T best will add non-


orem 3.4.2, we know that Tcond cond
redundant constraints only between the variables in the block B∗assign to I \ Ixj .
best on I
assign \ Ixj . Thus Tassign is
These constraints can be obtained by applying Tcond best

in Assign(D).
We note that most existing domains such as Polyhedra, Octagon, Zone, Octa-
hedron satisfy the conditions of Theorem 3.4.4 and thus a best assignment trans-
former in these domains is in Assign(D).

refinement So far we have assumed that line 7 of Algorithm 3.2 is the identity
(that is, refinement does not affect πIO ). We now discuss refinement in more detail.
Definition 3.4.3 (Refinement condition). The output partition πIO of a non-
invertible assignment transformer Tassign satisfying Definition 3.4.2 is a candidate
for refinement if Xt ∩ Bassign = {xj } where Xt is the block of πI containing xj .
Here, I is the abstract element upon which the transformer is applied and πI is a
permissible partition for I.
If the above condition holds during analysis (it can be checked efficiently), then
refinement can split B∗assign from πIO into two blocks Xt \ {xj } and B∗assign \ (Xt \
{xj }), provided no redundant constraint (I 00 in Definition 3.4.2) fuses these two
blocks. The result is a finer partition πIO . Note that our observation model for
computing the output partition is more refined compared to Section 2.2.2 where
we do not distinguish between invertible and non-invertible assignments and also
do not check whether the condition Xt ∩ Bassign = {xj } holds.
Example 3.4.8. Consider
X = {x1 , x2 , x3 , x4 , x5 }, LX,Zone : (5, {1, 0} × {0, −1} × {0}3 , {6, =}, Q),
I = {x1 6 x2 , x2 6 x3 , x4 6 x5 } with πI = πI = {{x1 , x2 , x3 }, {x4 , x5 }}.
Consider the non-invertible assignment x2 := x4 with Bassign = {x2 , x4 } and the
standard Zone assignment transformer Tassign . Without refinement, we obtain the
partition πIO = {X} = >. However, our refinement condition enables us to obtain
a finer output partition. We have that Xt = {x1 , x2 , x3 } and Xt ∩ Bassign = {x2 } and
thus the dynamic refinement condition applies, splitting the block B∗assign = X into
two blocks: Xt \ {x2 } = {x1 , x3 } and B∗assign \ (Xt \ {x2 }) = {x2 , x4 , x5 }. This produces a
finer partition for the output:
IO = {x1 6 x3 , x2 − x4 = 0, x4 6 x5 , x2 6 x5 } with πIO = πIO = {{x1 , x3 }, {x2 , x4 , x5 }}.
As with the conditional, πIO 6= πIO in general even if πI = πI . The following
corollary provides conditions under which πIO = πIO after applying Algorithm 3.2.
Corollary 3.4.2. For the assignment xj := δ, πIO = πIO holds if πI = πI and, in
the invertible case IO = (I \ Ixj ) ∪ Iinv or, in the non-invertible case IO = (I \ Ixj ) ∪
Inon-inv .
72 generalizing online decomposition

3.4.3 Meet (u)

As discussed in Section 3.1, all domains we consider are closed under the meet (u)
transformer and thus it is common to implement a best transformer. In fact, any
meet transformer Tu that obeys Tu (I, I 0 ) v I, I 0 is precise. Thus, we assume a given
best meet transformer, i.e., γ(Tu (I, I 0 )) = γ(I) ∩ γ(I 0 ) for all I, I 0 . As a consequence,
our decomposed construction will always yield an equivalent transformer, without
any conditions.

construction for meet (u) Algorithm 3.3 shows our construction of a de-
composed transformer for a given meet transformer Tu on input elements I, I 0 with
the respective permissible partitions πI , πI 0 in domain D. The algorithm computes a
common permissible partition πI t πI 0 for the inputs and then applies Tu separately
on the individual factors of I, I 0 corresponding to the blocks in πI t πI 0 .

Theorem 3.4.5. γ(Tu (I, I 0 )) = γ(Tud (I, I 0 )) for all inputs I, I 0 in D. In particular, Tud is
sound and monotone.

Proof.

γ(Tu (I, I 0 )) = γ(I) ∩ γ(I 0 )


\ [ [
= γ(I(A)) ∩ γ(I 0 (A)) (I = I(A), I 0 = I 0 (A))
A
\
= γ(Tu (I(A), I 0 (A))) (Tu is precise)
A
[
= γ( Tu (I(A), I 0 (A))) (γ is meet-preserving)
A
= γ(Tud (I, I 0 ))

The following example illustrates the decomposition of a best meet transformer


Tu in the Octahedron domain using Algorithm 3.3.

Example 3.4.9. Consider

X = {x1 , x2 , x3 , x4 }, LX,Octahedron = (4, U4 , {6, =}, Q),


I = {x1 6 1, x2 6 0, x3 + x4 6 1} with πI = πI = {{x1 }, {x2 }, {x3 , x4 }} and
I 0 = {x1 − x3 − x4 6 2, x2 6 1} with πI 0 = πI 0 = {{x1 , x3 , x4 }, {x2 }}.

Tu computes the union I ∪ I 0 and then removes redundant constraints to produce


the output:

Tu (I, I 0 ) = {x1 6 1, x2 6 0, x3 + x4 6 1, x1 − x3 − x4 6 2}.


3.4 decomposing domain transformers 73

Algorithm 3.3 Decomposed meet transformer Tud


1: function Meet((I, πI ), (I 0 , πI 0 ), Tu )
2: πIO := πI t πI 0
[
3: IO := Tu (I(A), I 0 (A))
A∈πIO
4: return (IO , πIO )
5: end function

Algorithm 3.3 computes the common partition πI t πI 0 = {{x1 , x3 , x4 }, {x2 }} and


applies Tu separately on the factors of I, I 0 corresponding to the common partition
and produces:

Tud (I, I 0 ) = {x1 6 1, x2 6 0, x3 + x4 6 1, x1 − x3 − x4 6 2}


with πIO = {{x1 , x3 , x4 }, {x2 }}.

The following corollary provides conditions under which the output partition is
finest.
Corollary 3.4.3. πIO = πIO if πI = πI , πI 0 = πI 0 , and IO = I ∪ I 0 .

3.4.4 Join (t)

As discussed in Section 3.1, none of the sub-polyhedra domains we consider are


closed for the join (t). Thus, the join transformer approximates the union of I
and I 0 in D and is usually the most expensive transformer in D. As with other
transformers, an arbitrary approximation can result in the > partition. The example
below shows this situation with two sound join transformers in the Zone domain.
Example 3.4.10. Consider

X = {x1 , x2 , x3 , x4 , x5 , x6 }, LX,Zone : (6, {1, 0} × {0, −1} × {0}4 , {6, =}, R),
I = {x1 = 1, x2 = 2, x3 6 3, x4 = 4, x5 = 0, x6 = 0} and
I 0 = {x1 = 1, x2 = 2, x3 6 3, x4 = 4, x5 = 1, x6 = 1} with
πI = πI 0 = πI = ⊥.

A sound transformer T1 may return the decomposed output:

IO = {x1 = 1, x2 = 2, x3 6 3, x4 = 4, −x5 6 0, x5 6 1, −x6 6 0, x6 6 1} with


πIO = πIO = ⊥.

Another sound transformer T2 for the join could produce the output IO
0 with the >

partition:
0
IO = {x2 − x1 6 1, x1 − x5 6 1, x3 − x2 6 1, x3 − x4 6 −1, x5 − x6 = 0} with
πI 0 = πI 0 = >.
O O
74 generalizing online decomposition

Algorithm 3.4 Decomposed join transformer Ttd


1: function Join((I, πI ), (I 0 , πI 0 ), Tt )
2: πcommon[ := πI t πI 0
3: N = {A ∈ πcommon | I(A) 6= I 0 (A)}
4: It := I(N)
5: 0 := I 0 (N)
It
6: Irest := I(X \ N)
7: πIO := {A ∈ πcommon | A ∩ N = ∅} ∪ {N}
8: 0 )∪I
IO := Tt (It , It rest
9: πIO := refine(πIO )
10: return (IO , πIO )
11: end function

[
Let πcommon := πI t πI 0 and N = {A ∈ πcommon | I(A) 6= I 0 (A)} be the union of
all blocks for which the corresponding factors I(A) and I 0 (A) are not semantically
equal. In Example 3.4.10, we have N = {x5 , x6 }.

construction for t Algorithm 3.4 shows our construction of a decomposed


join transformer for a given Tt on input elements I, I 0 with the respective per-
missible partitions πI , πI 0 in domain D. The algorithm first computes a common
permissible partition πcommon = πI t πI 0 . For each block A ∈ πcommon , it checks if
the corresponding factors I(A), I 0 (A) are (semantically) equal. If equality holds, the
algorithm adds I(A) to the output IO and adds the corresponding block A to the
partition πIO . Those not equal are collected (using Algorithm 2.1) in the bigger fac-
tors It , It0 on which Tt is applied, which reduces complexity. The associated block
in πIO is N. The partition refinement in line 9 of Algorithm 3.4 is explained at the
end of this section.
In Algorithm 3.4 the output of the decomposed transformer Ttd on inputs I, I 0
is computed as Ttd (I, I 0 ) = Tt (It , It0 ) ∪ Irest . Next we define a class Join(D) of join
transformers Tt for which γ(Tt (I, I 0 )) = γ(Ttd (I, I 0 )) for all inputs I, I 0 in D. This
ensures soundness and monotonicity.

Definition 3.4.4. A join transformer Tt is in Join(D) iff for all pairs of input ele-
ments I, I 0 and all associated common permissible partitions πcommon , the output
Tt (I, I 0 ) satisfies the following conditions:

• Tt (I, I 0 ) = Irest ∪ J 0 ∪ J 00 where Irest = I(X \ N), J 0 contains non-redundant


constraints between only the variables from N and J 00 contains redundant
constraints between the variables in X.

• γ(Tt (It , It0 )) = γ(J 0 ).

Theorem 3.4.6. If Tt ∈ Join(D), then γ(Tt (I, I 0 )) = γ(Ttd (I, I 0 )) for all inputs I, I 0 in D.
In particular, Ttd is sound and monotone.
3.4 decomposing domain transformers 75

Proof.
γ(Tt (I, I 0 )) = γ(Irest ∪ J 0 ∪ J 00 ) (by Definition 3.4.4)
0
= γ(Irest ∪ J ) (J 00 is redundant)
= γ(Irest ) ∩ γ(J 0 ) (γ is meet-preserving)
0
= γ(Irest ) ∩ γ(Tt (It , It )) (by Definition 3.4.4)
= γ(Irest ∪ Tt (It , It0 )) (γ is meet-preserving)
= γ(Ttd (I, I 0 ))

The join transformer T1 in Example 3.4.10 is in Join(D) whereas T2 is not in


Join(D) as it does not keep the constraints in Irest . Most standard and best join
transformers in practice satisfy the conditions for Join(D) and thus they are de-
composable with our construction without any change in precision.
The following example illustrates the decomposition of a best join transformer
Tt in the Octagon domain using Algorithm 3.4.
Example 3.4.11. Consider
X = {x1 , x2 , x3 }, LX,Octagon : (3, U2 × {0}, {6, =}, R),
I = {x1 6 2, x2 6 1, x3 6 3}, I 0 = {x1 6 1, x2 6 3, x3 6 3} with
πI = πI 0 = πI = ⊥.
Tt performs strong closure on both I, I 0 to produce:
I∗ = {x1 6 2, x2 6 1, x3 6 3, x1 + x2 6 3, x1 + x3 6 5, x2 + x3 6 4}

I 0 = {x1 6 1, x2 6 3, x3 6 3, x1 + x2 6 4, x1 + x3 6 4, x2 + x3 6 6}.
It then takes the pairwise maximum of bounds for each constraint to produce the
output:
Tt (I, I 0 ) = {x1 6 2, x2 6 3, x3 6 3, x1 + x2 6 4, x1 + x3 6 5, x2 + x3 6 6}.
which matches Definition 3.4.4 with
Irest = {x3 6 3}, J 0 = {x1 6 2, x2 6 3, x1 + x2 6 4}, and J 00 = {x1 + x3 6 5, x2 + x3 6 6}.
Since πI = πI 0 , we have πcommon = πI . Here I1 6= I10 , I2 6= I20 and I3 = I30 . Algo-
rithm 3.4 computes N = {x1 , x2 } and combines I1 , I2 into a single factor It using
Algorithm 2.1. Similarly, it combines I10 , I20 into It0 .
It = {x1 6 2, x2 6 1}, It0 = {x1 6 1, x2 6 3}.
The algorithm applies Tt only on It , It0 whereas I3 is added to the output directly:
IO = Ttd (I, I 0 ) = {x1 6 2, x2 6 3, x1 + x2 6 4, x3 6 3} with πIO = πIO = {{x1 , x2 }, {x3 }}.
In the example, Irest contains non-redundant constraints only between the variables
from X \ N, J 0 contains non-redundant constraints between only the variables from
N and γ(Tt (It , It0 )) = {x1 6 2, x2 6 3, x1 + x2 6 4} = γ(J 0 ), and thus Tt satisfies the
conditions for Join(D) in this case.
76 generalizing online decomposition

refinement We can sometimes refine the output partition πIO after comput-
ing the output IO without inspecting or modifying IO . Namely, if a variable xi is
unconstrained in either I or I 0 , then it is also unconstrained in IO . πIO can thus be
refined by removing xi from the block containing it and adding the singleton set
{xi } to πIO . This refinement can be performed after applying Ttd . Note that our ober-
vation model in Section 2.2.2 did not check if the variables were unconstrained for
polyhedra inputs and thus it produces coarser partitions. The following theorem
formalizes this refinement.

Theorem 3.4.7. Let I, I 0 be abstract elements in D with the associated permissible parti-
tions πI , πI 0 respectively. Let U = {x ∈ X | x is unconstrained in I or I 0 } = {u1 , . . . , ur }
and let πIO as computed in line 7 of Algorithm 3.4. Then the following partition is permis-
sible for the output IO :
πIO u {X \ U, {u1 }, . . . , {ur }}.

The proof of Theorem 3.4.7 is immediate from the discussion above. Unlike other
transformers, we do not know of conditions for checking whether πIO = πIO .

3.4.5 Widening (5)

The widening transformer T5 is applied during analysis to accelerate convergence


towards a fixpoint. It is a binary transformer that guarantees: (i) the output satisfies
T5 (I, I 0 ) w I, I 0 , and (ii) the analysis terminates after a finite number of steps. In
general, widening transformers are not monotonic or commutative. Further, best
widening transformers do not exist for any numerical domain. In theory, it may
be possible to design arbitrary widening transformers that always result in the >
partition. In practice, the standard widening transformers are of two types:

syntactic For syntactic widening [140, 142], the output satisfies IO ⊆ I. A


Pn
constraint ι :=
Pn i=1 ai · xi 6 c ∈ I is in the output IO iff there is a constraint
ι := i=1 ai · xi 6 c 0 ∈ I 0 with the same linear expression and c 0 6 c.
0

semantic The semantic widening [13] requires the set of constraints in the in-
put I to be non-redundant. The output satisfies IO ⊆ I ∪ I 0 . Specifically, IO contains
the constraints from I that are satisfied by I 0 and the constraints ι 0 from I 0 that are
mutually redundant with a constraint ι in I.
Both these transformers are decomposable in practice. The following example
illustrates the semantic and the syntactic widening on the Octagon domain.

Example 3.4.12. Consider

X = {x1 , x2 , x3 , x4 }, LX,Octagon : (4, U2 × {0}2 , {6, =}, Q),


I = {x1 − x2 = 0, x2 = 0, x3 6 0, x4 6 1} with πI = πI = {{x1 , x2 }, {x3 }, {x4 }} and
I 0 = {x1 = 0, x3 + x4 6 2} with πI 0 = πI 0 = {{x1 }, {x2 }, {x3 , x4 }}.
3.4 decomposing domain transformers 77

d
Algorithm 3.5 Decomposed widening transformer T5
1: function Widening((I, πI ), (I 0 , πI 0 ), T5 )
2: πIO := πI t πI 0
[
3: IO := T5 (I(A), I 0 (A))
A∈πIO
4: πIO := refine(πIO )
5: return (IO , πIO )
6: end function

The semantic widening transformer T1 yields:

IO = {x1 = 0} with πIO = πIO = ⊥.

On the other hand, the syntactic widening transformer T2 yields:


0
IO = ∅ with πI 0 = πI 0 = ⊥.
O O

Thus, both are decomposable in this case.

construction for widening Algorithm 3.5 shows our construction of a de-


composed widening transformer on input elements I, I 0 with respective permissi-
ble partitions πI , πI 0 in D. The algorithm computes a common permissible partition
πI t πI 0 and then applies the widening transformer T5 separately on the individual
factors of I, I 0 corresponding to the blocks of πI t πI 0 . The refinement of the output
partition in line 4 of Algorithm 3.5 is explained at the end of this section.
Next, we define a class Widen(D) of widening transformers T5 for which it hold
that γ(T5 (I, I 0 )) = γ(T5d (I, I 0 )) for all inputs I, I 0 in D.

Definition 3.4.5. A widening transformer T5 is in Widen(D) iff for all pairs of


input elements I, I 0 and all associated common permissible partitions πcommon , the
output T5 (I, I 0 ) satisfies:
\
γ(T5 (I, I 0 )) = γ(T5 (I(A), I 0 (A))).
A

Theorem 3.4.8. If T5 ∈ Widen(D), then γ(T5 (I, I 0 )) = γ(T5


d (I, I 0 )) for all inputs I, I 0

in D. Thus, T5
d is sound.

Proof. \
γ(T5 (I, I 0 )) = γ(T5 (I(A), I 0 (A))) (by Definition 3.4.5)
A
[
= γ( T5 (I(A), I 0 (A)) (γ is meet-preserving)
A
= γ(T5d
(I, I 0 ))
78 generalizing online decomposition

Both syntactic and semantic Octagon widening transformers from Example


3.4.12 are in Widen(D). It can be shown that the standard transformers in exist-
ing domains are in Widen(D). For syntactic widening, γ(I) = γ(I 0 ) does not imply
γ(T5 (I 00 , I)) = γ(T5 (I 00 , I 0 )) in general, and thus fixpoint equivalence is not guar-
anteed with the corresponding decomposed transformer. The following example
illustrates the decomposition of the standard semantic TVPI widening transformer
T5 using Algorithm 3.5.
Example 3.4.13. Consider
X = {x1 , x2 , x3 , x4 }, LX,TVPI = (Z2 × {0}2 , {6, =}, Q),
I = {x1 = 1, x2 = 0, x3 + x4 6 1}, with πI = {{x1 }, {x2 }, {x3 , x4 }} and
I 0 = {2 · x1 − 3 · x2 6 2, x1 + x2 = 1, x3 6 0, x4 6 0} with πI 0 = {{x1 , x2 }, {x3 }, {x4 }}.
T5 keeps the constraint x3 + x4 6 1 from I as it is satisfied by I 0 (using x3 6 0, x4 6
0). It also adds the constraint x1 + x2 = 1 from I 0 to the output as it is mutually
redundant with the constraint x1 = 1 in I. The output of T5 is:
T5 (I, I 0 ) = {x1 + x2 = 1, x3 + x4 6 1}.
Algorithm 3.5 computes the common permissible partition IO = πI t πI 0 =
{{x1 , x2 }, {x3 , x4 }} and then computes the output IO by applying T5 separately on
the individual factors of I, I 0 corresponding to the blocks of πcommon :
I O = T5
d
(I, I 0 ) = {x1 + x2 = 1, x3 + x4 6 1} with πIO = {{x1 , x2 }, {x3 , x4 }}.
\
Here γ(T5 (I, I )) =
0 γ(T5 (I(A), I 0 (A))) and thus T5 is in Widen(D).
A

refinement T5 in Algorithm 3.5 does not create constraints between variables


in different blocks of the common partition in the output IO . By construction πIO =
πcommon = πI t πI 0 . For syntactic widening, IO ⊆ I and thus the output partition
πIO can be refined to πI after computing the output IO .
The following corollaries provide conditions when πIO = πIO for the semantic
and the syntactic widening respectively.
Corollary 3.4.4. For semantic widening, πIO = πIO if πI = πI , πI 0 = πI 0 and IO =
I ∪ I 0.
Corollary 3.4.5. For syntactic widening, πIO = πIO if πI = πI and IO = I.

3.5 experimental evaluation

In this section we evaluate the performance of our generic decomposition approach


on three popular domains: Polyhedra, Octagon, and Zone. Using standard imple-
mentations of these domains, we show that our decomposition of their transform-
ers leads to substantial performance improvements, often surpassing existing trans-
formers designed for specific domains.
3.5 experimental evaluation 79

Our decomposed implementation for these domains is available as part (i.e., an


update) of the ELINA library [1]. Below, we compare to the prior ELINA as de-
scribed in [189, 190].

experimental setup We used the same setup as in Section 2.5.

benchmarks The benchmarks were taken from the popular software verifica-
tion competition [24]. The benchmark suite is divided into categories suited for
different kinds of analyses, e.g., pointer, array, numerical, and others. We chose
two categories suited for numerical analysis: (i) Linux Device Drivers (LD), and (ii)
Control Flow (CF). Each of these categories contains hundreds of benchmarks and
we evaluated the performance of our analysis on each of these. We use the crab-
llvm analyzer, part of the SeaHorn verification framework [91], for performing the
analysis as in Section 2.5 but a different version. Therefore our reported numbers
for the baselines are different than in Table 2.3.

3.5.1 Polyhedra

The standard implementation of the Polyhedra domain contains the best condi-
tional, assignment, meet, and join transformers together with a semantic widening
transformer (as described in Chapter 2). All these transformers are in the classes of
decomposable transformers defined in Section 3.4.
We refer the reader to Table 2.2 and Table 2.1 for the asymptotic complexity
of the Polyhedra transformers in the standard implementation with and without
decomposition [190] respectively. We compare the runtime and memory consump-
tion for end-to-end Polyhedra analysis with our generic decomposed transformers
versus the original non-decomposed transformers from the Parma Polyhedra Li-
brary (PPL) [12] and the decomposed transformers from ELINA [190] presented in
Chapter 2. Note that we do not compare against NewPolka [104] as it performed
worse than PPL in our previous evaluation in Section 2.5. PPL, ELINA, and our
implementation store the constraints and the generators using matrices with 64-bit
integers. PPL stores a single matrix for either representation whereas both ELINA
and our implementation use a set of matrices corresponding to the factors, which
requires exponential space in the worst case.
Table 3.2 shows the results on 13 large, representative benchmarks. These bench-
marks were chosen based on the following criteria which is similar to the one in
Section 2.5:

• The most time consuming function in the benchmark did not produce any
integer overflow with ELINA or our approach.

• The benchmark ran for at least 2 minutes with PPL.


80 generalizing online decomposition

Our decomposition maintains semantic equivalence with both ELINA and PPL as
long as there is no integer overflow and thus gets the same semantic invariants.
All three implementations set the abstract element to > when an integer overflow
occurs. The total number of integer overflows on the chosen benchmarks were 58,
23 and 21 for PPL, ELINA, and our decomposition, respectively. We also had fewer
integer overflows than both ELINA and PPL on the remaining benchmarks. Thus,
our decomposition improves in some cases also the precision of the analysis with
respect to both ELINA and PPL.

Table 3.2: Speedup for the Polyhedra analysis with our decomposition vs. PPL and ELINA.
Benchmark PPL ELINA Our Decomposition Speedup vs.
time(s) memory(GB) time(s) memory(GB) time(s) memory(GB) PPL ELINA
firewire_firedtv 331 0.9 0.4 0.2 0.2 0.2 1527 2
net_fddi_skfp 6142 7.2 9.2 0.9 4.4 0.3 1386 2
mtd_ubi MO MO 4 0.9 1.9 0.3 ∞ 2.1
usb_core_main0 4003 1.4 65 2 29 0.7 136 2.2
tty_synclinkmp MO MO 3.4 0.1 2.5 0.1 ∞ 1.4
scsi_advansys TO TO 4 0.4 3.4 0.2 >4183 1.2
staging_vt6656 TO TO 2 0.4 0.5 0.1 >28800 4
net_ppp 10530 0.1 924 0.3 891 0.1 11.8 1
p10_l00 121 0.9 11 0.8 5.4 0.2 22.4 2
p16_l40 MO MO 11 3 2.9 0.4 ∞ 3.8
p12_l57 MO MO 14 0.8 6.5 0.3 ∞ 2.1
p13_l53 MO MO 54 2.7 25 0.9 ∞ 2.2
p19_l59 MO MO 70 1.7 12 0.6 ∞ 5.9

Table 3.2 shows our experimental findings. The entry MO and T O in the table
have the same meaning as in Table 2.3. We follow the same convention of reporting
speedups in the case of a memory overflow or a time out as in Table 2.3.
In the table, PPL either ran out of memory or did not finish within four hours
on 8 out of the 13 benchmarks. Both ELINA and our decomposition are able to
analyze all benchmarks. We are faster than ELINA on all benchmarks with a maxi-
mum speedup of 5.9x on the P19_l59 benchmark. We also save significant memory
over ELINA. The speedups on the remaining (not shown) benchmarks over the
decomposed version of ELINA varies from 1.1x to 4x with an average of ≈ 1.4x.

better partitioning leads to performance improvements Table 3.3


shows further statistics about the category (LD or CF) and the number of lines
of code in each benchmark. As can be seen, the benchmarks are quite large and
contain up to 50K lines of code. Further, after each join, we measured the total
number of variables n and report the maximum and the average. For the decom-
posed analyses (ELINA and ours) we measured the size of the largest block and
report again maximum and average under nelina our
max , nmax . To assess the quality of the
3.5 experimental evaluation 81

Table 3.3: Partition statistics for the Polyhedra domain analysis.


Benchmark Category LOC n nelina
max nour
max nfinest
max

max avg max avg max avg max avg


firewire_firedtv LD 14506 159 25 81 7 40 4 39 3
net_fddi_skfp LD 30186 589 88 111 25 45 9 13 4
mtd_ubi LD 39334 528 59 111 14 28 5 23 4
usb_core_main0 LD 52152 365 72 267 30 60 11 40 7
tty_synclinkmp LD 19288 332 49 48 10 40 6 26 4
scsi_advansys LD 21538 282 63 117 18 49 12 41 9
staging_vt6656 LD 25340 675 53 204 17 25 4 12 3
net_ppp LD 15744 218 58 112 40 51 28 43 20
p10_l00 CF 592 303 174 234 54 79 16 14 6
p16_l40 CF 1783 874 266 86 31 39 14 5 3
p12_l57 CF 4828 921 261 461 78 21 7 4 3
p13_l53 CF 5816 1631 342 617 111 26 10 9 3
p19_l59 CF 9794 1272 358 867 187 31 8 12 3

partitions, we also computed (with the needed overhead) the finest partition after
each join and show the largest blocks under nfinest
max (maximum and average). As can
be observed, our partitions are strictly finer than the ones produced by our polyhe-
dra decomposition in Chapter 2 on all benchmarks due to the refinements for the
assignment and join transformers. Moreover, it can be seen that the average size of
our partitions is sometimes close to that of the finest partition but in many cases
there is room for further improvement. We consider this as an interesting item for
future work.

3.5.2 Octagon

The standard implementation of the Octagon domain works only with the con-
straint representation and approximates the best conditional and best assignment
transformers but implements best join and meet transformers. The widening is
defined syntactically. All of these transformers are in the classes of (decompos-
able) transformers from Section 3.4. Since the syntactic widening does not pro-
duce semantically equivalent outputs for semantically equivalent but syntactically
different inputs, our fixpoint can be different than the one computed by non-
decomposed analysis. However, we still get the same semantic invariants at fix-
point on most of our benchmarks. The standard implementation requires a strong
closure operation for the efficiency and precision of transformers such as join, con-
ditional, assignment, and others.
Table 3.4 shows the asymptotic complexity of standard Octagon transformers as
well as the strong closure operation with and without decomposition [189]. In the
table, n, ni , nmax have the same meaning as in Table 2.2. In can be seen that strong
82 generalizing online decomposition

Table 3.4: Asymptotic time complexity of the Octagon transformers.


Transformer Non-Decomposed Decomposed

Conditional O(n2 ) O(n2max )


Assignment O(n2 ) O(n2max )
P
Meet (u) O(n2 ) O( ri=1 n2i )
P
Join (t) O(n2 ) O( ri=1 n2i )
P
Widening (∇) O(n2 ) O( ri=1 n2i )
P
Strong Closure O(n3 ) O( ri=1 n3i )

Table 3.5: Speedup for the Octagon domain analysis with our decomposition over the non-
decomposed and the decomposed versions of ELINA.
Benchmark ELINA-ND ELINA-D Our Decomposition Speedup vs.
time(s) time(s) time(s) ELINA-ND ELINA-D
firewire_firedtv 0.4 0.07 0.07 5.7 1
net_fddi_skfp 28 2.6 1.9 15 1.4
mtd_ubi 3411 979 532 6.4 1.8
usb_core_main0 107 6.1 4.9 22 1.2
tty_synclinkmp 8.2 1 0.8 10 1.2
scsi_advansys 9.3 1.5 0.8 12 1.9
staging_vt6656 4.8 0.3 0.2 24 1.5
net_ppp 11 1.1 1.2 9.2 0.9
p10_l00 20 0.5 0.5 40 1
p16_l40 8.8 0.6 0.5 18 1.2
p12_l57 19 1.2 0.7 27 1.7
p13_l53 43 1.7 1.3 33 1.3
p19_l59 41 2.8 1.2 31 2.2

closure is the most expensive operation with cubic complexity. It is applied incre-
mentally with quadratic cost for the conditional and the assignment transformers.
We compare the performance of our approach for the standard Octagon analysis,
using the non-decomposed ELINA (ELINA-ND) and the decomposed (ELINA-D)
transformers from ELINA. All of these implementations store the constraint repre-
sentation using a single matrix with 64-bit doubles. The matrix requires quadratic
space in n. Thus, overall memory consumption is the same for all implementations.

We compare the runtime and report speedups for the end-to-end Octagon anal-
ysis in Table 3.5. We achieve up to 40x speedup for the end-to-end analysis over
the non-decomposed implementation. More importantly, we are either faster or
have the same runtime as the decomposed version of ELINA on all benchmarks
but one. The maximum speedup over the decomposed version of ELINA is 2.2x.
The speedups on the remaining (not shown) benchmarks vary between 1x and 1.6x
3.5 experimental evaluation 83

Table 3.6: Partition statistics for the Octagon domain analysis.


Benchmark Category LOC n nelina
max nour
max nfinest
max

max avg max avg max avg max avg


firewire_firedtv LD 14506 159 25 31 6 40 4 27 3
net_fddi_skfp LD 30186 573 86 49 18 30 10 14 7
mtd_ubi LD 39334 553 46 111 65 22 9 16 9
usb_core_main0 LD 52152 364 72 59 22 39 9 35 7
tty_synclinkmp LD 19288 324 49 84 15 26 6 25 4
scsi_advansys LD 21538 293 64 94 19 41 6 20 5
staging_vt6656 LD 25340 651 52 63 7 25 4 14 3
net_ppp LD 15744 218 54 40 23 55 29 39 19
p10_l00 CF 592 305 173 19 10 77 16 17 9
p16_l40 CF 1783 874 266 32 12 13 7 10 5
p12_l57 CF 4828 954 265 55 15 13 4 11 4
p13_l53 CF 5816 1635 337 41 12 22 7 10 5
p19_l59 CF 9794 1291 363 79 14 22 4 18 3

with an average of about 1.2x. Notice that on the mtd_ubi benchmark, the Octagon
analysis takes longer than the Polyhedra analysis. This is because the Octagon
widening takes longer to converge compared to the Polyhedra widening.
Table 3.6 shows the partition statistics for the Octagon analysis (as we did for
the Polyhedra analysis). It can be seen that while our refinements often produce
finer partitions than the decomposed version of ELINA, they are coarser on 3 of
the 13 benchmarks. This is because the decomposed transformers in ELINA are
specialized for the standard approximations of the conditional and assignment
transformers. We still achieve comparable performance on these benchmarks. Note
that the average size of our partitions is close to that of the finest in most cases.

3.5.3 Zone

The standard Zone domain uses only the constraint representation. The conditional
and assignment transformers are approximate whereas the meet and join are best
transformers [140]. The widening is defined syntactically. All of these transformers
are in the class of (decomposable) transformers from Section 3.4. As for Octagon,
fixpoint equivalence is not guaranteed due to syntactic widening. However, we still
get the same semantic invariants at fixpoint on most of our benchmarks. As for the
Octagon domain, a cubic closure operation is required. The domain transformers
have the same asymptotic complexity as in the Octagon domain.
We implemented both, a non-decomposed version as well as a version with our
decomposition method of the standard transformers. Both implementations store
the constraints using a single matrix with 64-bit doubles that requires quadratic
space in n. We compare the runtime and report speedups for the Zone analy-
84 generalizing online decomposition

Table 3.7: Speedup for the Zone domain analysis with our decomposition over the non-
decomposed implementation.
Benchmark Non-Decomposed Our Decomposition Speedup vs.
time(s) time(s) Non-Decomposed
firewire_firedtv 0.05 0.05 1
net_fddi_skfp 3 1.5 2
mtd_ubi 1.4 0.7 2
usb_core_main0 10.3 4.6 2.2
tty_synclinkmp 1.1 0.7 1.6
scsi_advansys 0.9 0.7 1.3
staging_vt6656 0.5 0.2 2.5
net_ppp 1.1 0.7 1.5
p10_l00 1.9 0.4 4.6
p16_l40 1.7 0.7 2.5
p12_l57 3.5 0.9 3.9
p13_l53 8.7 2.1 4.2
p19_l59 9.8 1.6 6.1

Table 3.8: Partition statistics for the Zone domain analysis.


Benchmark Category LOC n nour
max nfinest
max

max avg max avg max avg


firewire_firedtv LD 14506 159 25 40 4 17 3
net_fddi_skfp LD 30186 578 88 30 9 13 5
mtd_ubi LD 39334 553 59 23 5 14 3
usb_core_main0 LD 52152 362 71 37 8 33 7
tty_synclinkmp LD 19288 328 49 26 6 25 5
scsi_advansys LD 21538 293 65 41 8 21 7
staging_vt6656 LD 25340 675 53 25 3 13 2
net_ppp LD 15744 219 58 54 29 47 24
p10_l00 CF 592 303 174 77 16 17 8
p16_l40 CF 1783 856 261 13 7 10 6
p12_l57 CF 4828 882 249 12 4 10 3
p13_l53 CF 5816 1557 317 22 7 20 5
p19_l59 CF 9794 1243 331 14 4 13 3

sis in Table 3.7. Our decomposition achieves speedups of up to 6x over the non-
decomposed implementation. The speedups over the remaining benchmarks not
shown in the table vary between 1.1x and 5x with an average of ≈ 1.6x.
Table 3.8 shows the partition statistics for the Zone analysis. It can be seen that
partitioning is the core reason for the speed-ups obtained and that the average size
of our partitions is close to that of the finest in most cases.
3.6 related work 85

3.5.4 Summary

Overall, our results show that the generic decomposition method proposed in
this chapter works well. It speeds up analysis compared to non-decomposed do-
mains significantly, and, importantly, the more expensive the domain, the higher
the speed-ups. Our generic method also compares favorably with the prior man-
ually decomposed domains provided by ELINA due to refined partitioning for
the outputs of the assignment and join transformers presented in Section 3.4. The
refinement is possible because we refine our model for observing the abstract el-
ements. We also show that the partitions computed during analysis are close to
optimal for Octagon and Zone but have further room for improvement for Polyhe-
dra. The challenge is how to obtain those with reasonable cost. Further speed-ups
can also be obtained by different implementations of the transformers that are, for
example, selectively approximate to achieve finer partitions.

3.6 related work

We now discuss the work most closely related to Chapters 2 and 3 for improving
the performance of numerical program analysis.

polyhedra The concept of polyhedra partitioning has been explored before in


[93, 94]. Here, the partitions are based upon the decomposition of the matrix en-
coding the constraint representation of polyhedra. Their output partitions for the
join are coarser than ours. This is because the authors rely on a syntactic criteria for
computing the output partition which does not detect equal factors. This degrades
performance; for example, using the output join partitions computed by this ap-
proach in ELINA, it takes > 1 hour to analyze the usb_core_main0 benchmark in
Table 2.3. Further, their partition for the join requires the constraint representation
to be available which is not compatible with the eager approach for conversion
(Section 2.1.4).
The authors of [182] observe that the polyhedra arising during analysis of their
benchmarks show sparsity, i.e., a given variable occurs in only a few constraints.
The authors work only with the constraint representation and exploit sparsity in
the constraints to speed up the expensive join transformer. In case the output be-
comes too large, the join is approximated. We implemented this approach in ELINA
but without the approximation step so that we do not lose precision. For our bench-
marks, we found that the performance of this approach degrades quickly due to
frequent calls to the linear solver for redundancy removal.
Another work [143] decomposes polyhedra P and Q before applying the join
into two factors P = {P1 , P2 } and Q = {Q1 , Q2 }, such that P1 = Q1 and P2 6= Q2 .
Thus, the conversion is only required for GP2 tQ2 . This is similar to Theorem 2.2.5.
However, the authors rely on syntactic equality between the constraints for iden-
86 generalizing online decomposition

tifying the factors; in contrast, Theorem 2.2.5 relies on semantic equality. Further,
their partition is coarser as it has only two blocks which increases the number of
generators.
The work of [137, 138, 218] focuses on improving the performance of standard
Polyhedra transformers based on constraint representation using parametric linear
programming. We believe that our approach is complementary and their transform-
ers could benefit from our decomposition.
[10, 75] provide conversion algorithms that can be more efficient than the
Chernikova algorithm used in ELINA currently for certain polyhedra. In the fu-
ture, a straightforward way to speedup ELINA would be to run the Chernikova
algorithm and the ones from [10, 75] in parallel and pick the fastest one.
[184] proposed an incremental conversion algorithm when the constraints are re-
moved. This is useful for speeding up conversions for the assignment and widening
transformers. Integrating their algorithms in ELINA would yield further speedups.
[20] introduces a new double representation and efficient conversion algorithm
for the NNC (not necessarily closed) Polyhedra domain which is more expressive
than the closed Polyhedra domain considered in our work. The follow-up work
[21] provides a domain implementation based on the new representation and con-
version algorithm. [219] identifies a number of optimization opportunities to make
NNC Polyhedra domain even faster. We believe that online decomposition can fur-
ther speedup the NNC Polyhedra domain without precision loss.
[33] performs bottom-up interprocedural analysis and computes procedure sum-
maries as a disjunction of convex polyhedra. We note that our framework of online
decomposition presented here can also be extended to disjunctions of polyhedra.
[29] targets the analysis of particular kinds of programs which produce polyhe-
dra that are not decomposable using online decomposition. The authors develop
new data structures for the efficient analysis of the intended programs with the
Polyhedra domain. Similarly, [22] identifies a list of optimization opportunities
when analyzing hybrid systems. We believe that while our approach is agnostic to
the program being analyzed, tailoring the Polyehdra domain implementation for
the class of programs being analyzed can further improve our performance and it
is an interesting direction of future work.

octagon Variable packing [27, 99] has been used for decomposing the Octagon
transformers. It partitions X statically before running the analysis based on certain
criteria. For example, two variables are in the same block of the partition if they
occur together in the same program statement. Although variable packing could
also be generalized to decompose transformers of other domains, it is fundamen-
tally different from our dynamic decomposition; it is not guaranteed to preserve
precision. This is because the permissible partition depends on the Octagon pro-
duced during the analysis. Therefore the enforced static partition would not be
permissible throughout the analysis and thus the analysis loses precision. Further,
the dynamic decomposition often yields even finer partitions than can be detected
3.6 related work 87

statically. So dynamic decomposition (of transformers within the classes defined)


provides both higher precision and faster execution.
Our prior work [189] maintains partitions dynamically for the Octagon domain
and does not lose precision. The partitions are hand-crafted for the Octagon do-
main. The partitions computed by our generic approach in Section 3.4 are finer
than those produced by [189]. As a result, our approach yields better results than
[189] in Section 3.5.
The work of [45, 46] presents a new algorithm for reducing the operation count
of incremental closure for the Octagon domain. In cases where the structure of the
program allows the analyzer to use incremental closure more frequently than the
full closure, incremental closure becomes a bottleneck for the overall analysis. The
work of [18] parallelizes the standard octagon operators on GPUs. [44] introduces
a new data structure for representing octagons by noticing that the entries in the
corresponding matrices are usually identical for their analyzed programs. This re-
duces the memory required for storing octagons. This approach is similar in spirit
to the works mentioned above that adapt the Polyhedra domain to the particu-
lar programs being analyzed. Follow-up work [44] proposes optimizations based
on the proposed data structure in [44] that reduce the performance gap between
octagons implemented with arbitrary-precision arithmetic and those implemented
with machine doubles. We believe that our framework is complementary to the
above approaches and a combination can yield more performance gains.
The work of [109] designs sparse algorithms for the Octagon domain. While the
proposed algorithms cannot be extended to more expressive domains, they could
be combined with our decomposition to potentially achieve better performance.

zone The work of [204] dynamically maintains partitions based on a syntactic


criteria for the Zone domain. They also explicitly relate all variables that are mod-
ified within a loop. This can lead to a coarser partition for the corresponding join.
Further, the resulting analysis is not guaranteed to be as precise as the original
non-decomposed analysis.
The authors of [76] observe that the analysis with the Zone domain on their
benchmarks is usually sparse. They exploit sparsity by using graph-based algo-
rithms for the Zone domain transformers. Due to the Zone widening being defined
syntactically, their sparse analysis is also (like our approach) not guaranteed to ob-
tain the same fixpoint as the original analysis. While these algorithms cannot be
extended to more expressive domains such as TVPI or Polyhedra, they could be
combined with our decomposition to potentially achieve better performance.

beyond numerical domains In a recent work [56], the authors generalize


our online decomposition framework beyond numerical domains and show that it
can be applied for decomposing any abstract domain. However, the authors do not
provide conditions on the structure of domains and the corresponding transform-
88 generalizing online decomposition

ers when the resulting decomposed analysis does not lose precision. We believe
that providing such conditions is interesting future work.

3.7 discussion

Online decomposition is a promising avenue to make numerical domain analysis


faster, possibly by orders of magnitude, and thus practical for analyzing many real-
world programs. This is made possible thanks to the inherent “locality” in the way
program statements, and sequences of such, access variables. In this chapter, we
advanced partitioning presented in Chapter 2 by showing that it is generally ap-
plicable to all sub-polyhedra domains and constructed decomposed transformers
from existing non-decomposed transformers. This way, existing implementations
can be re-factored to incorporate decomposition. We also showed that our decom-
position does not lose precision on most practical transformers already in use. Re-
cent research has shown that online decomposition can be extended to any abstract
domain, not only numerical.
We introduced techniques for refining the output partitions of domain trans-
formers at small extra cost, an improvement over the computed output partitions
in Chapter 2. We evaluated our generic approach on three expensive abstract do-
mains: Zone, Octagon, and Polyhedra. We obtain significant speedups over prior
work, including on the domain implementations that were previously manually
decomposed. Most importantly, based on our results, we observe that the more
expensive a domain is, the higher the speedups from online decomposition. Our
highest speedups are orders of magnitude on the exponentially expensive Poly-
hedra. We believe that decomposition can level the playing field among domains,
requiring a rethinking of the fundamental question when designing program analy-
sis for a particular application: selecting the domain with the best tradeoff between
analysis precision and performance.
In both Chapters 2 and 3, our focus has been on ensuring that our faster, de-
composed analysis produces the same result as the original one at every step. It is,
however, also possible to selectively lose precision for abstract transformers such
that even though the intermediate results are imprecise, the final computed invari-
ants are still the same as the ones from the original analysis. This concept is based
on our observation that many of the intermediate analysis results are not needed
for computing the final invariants at the fixpoint. Devising such a strategy requires
deciding where and how much precision to lose. We take a data-driven approach
to learning such strategies in Chapter 4. Our results show that our learned strate-
gies make context-sensitive decisions based on the abstract state and outperform
fixed manually designed heuristics.
4
REINFORCEMENT LEARNING FOR NUMERICAL DOMAINS

In Chapters 2 and 3, we presented a theoretical framework for dynamically de-


composing the abstract elements and transformers of numerical domains without
losing precision. The framework is effective in reducing redundancy at each step
of the analysis. However, redundancy remains over sequences of abstract trans-
formers. A precise but expensive transformer applied at each step may compute
intermediate results discarded downstream in the analysis. This means that it may
be possible to achieve the same fixpoint faster by selectively losing precision. A key
challenge then is coming up with effective, general dynamic approaches that can
decide where and how to lose precision for the best tradeoff between performance
and precision. Statically fixed policies [42, 99, 131, 156, 157] for losing precision
often produce imprecise results.
Our Work. We address the above challenge by offering a new approach to dy-
namically losing precision based on reinforcement learning (RL) [195]. The key idea
is to use RL and learn a policy that determines when and how the analyzer should
lose the least precision at an abstract state, to achieve the best performance gains.
The key insight of the work is to establish a correspondence between classic con-
cepts in static analysis with those in reinforcement learning, demonstrating that RL
is a viable approach for handling choices in the inner workings of an analyzer.

Figure 4.1: Policies for balancing precision and speed in static analysis.

89
90 reinforcement learning for numerical domains

Figure 4.2: Reinforcement learning for static analysis.

We illustrate this connection on the example shown in Fig. 4.1. Here, a static
analyzer starts from an initial abstract state s0 depicted by the root node of the
tree. It transitions to a new abstract state, i.e., one of its children, by applying a
transformer. At each step, the analyzer can select either a precise but expensive
transformer Tprecise or a fast but imprecise one Tfast . If the analyzer follows a fixed
policy that guarantees maximum precision (orange path in Fig. 4.1), it will always
apply Tprecise and obtain a precise fixpoint at the rightmost leaf. However, the com-
putation is slow. Analogously, by following a fixed policy maximizing performance
(yellow path in Fig. 4.1), the analyzer always chooses Tfast and obtains an imprecise
fixpoint at the leftmost leaf. A policy maximizing both speed and precision (green
path in Fig. 4.1) yields a precise fixpoint but is computed faster as the policy ap-
plies both Tfast and Tprecise selectively. A globally optimal sequence of transformer
choices that optimizes both objectives is generally very difficult to achieve. How-
ever, as we show in this chapter, effective policies that work well in practice can be
obtained using principled RL based methods.
Fig. 4.2 explains the connection with RL intuitively. The left-hand side of the
figure shows an RL agent in a state st at timestep t. The state st represents the
agents’ knowledge about its environment. It takes an action at and moves to a new
state st+1 . The agent obtains a numerical reward rt+1 for the action at in the state st .
The agents’ goal is maximizing long-term rewards. Notice that the obtained reward
depends on both the action and the state. A policy maximizing short-term rewards
at each step does not necessarily yield better long term gains as the agent may reach
an intermediate state from which all further rewards are negative. RL algorithms
typically learn the so-called Q-function, which quantifies the expected long term
gains by taking action at in the state st . This setting also mimics the situation that
arises in an iterative static analysis shown on the right-hand side of Fig. 4.2. Here
the analyzer obtains a representation of the abstract state (its environment) via a
set of features φ. The analyzer selects among a set of transformers T with different
precision and speed. The transformer choice represents the action. The analyzer
obtains a reward in terms of speed and precision. In Fig. 4.1, a learned policy
would determine at each step whether to choose Tprecise or Tfast . To do that, for a
given state and action, the analyzer would compute the value of the Q-function
reinforcement learning for numerical domains 91

using the features φ. Querying the Q-function would then return the suggested
action from that state.
While the overall connection between the two areas is conceptually clean, the
details of making it work in practice pose significant challenges. The first is the
design of suitable approximations to be able to gain performance when precision is
lost. The second is the design of the features φ, which should be cheap to compute
yet be expressive enough to capture key properties of abstract states so that the
learned policy generalizes to unseen states. And finally, a suitable reward function
is needed that combines both, precision and performance.
The work in this chapter was published in [192].

main contributions Our main contributions are:

• A new, general approach for speeding up static analysis with reinforcement


learning based on establishing a correspondence between concepts in both
areas (Section 4.1).

• An instantiation of the approach for speeding up Polyhedra domain analysis,


which is known to be expensive with worst-case exponential complexity. The
instantiation consists of several contributions:
– We build on recent work on online decomposition to systematically cre-
ate a space of approximate Polyhedra transformers (i.e., actions) with
different trade-offs between precision and cost (Section 4.2).
– We design a set of feature functions which capture key properties of ab-
stract states and transformers, yet are efficient to extract during analysis
(Section 4.3).
– We develop a complete instantiation of reinforcement learning for Poly-
hedra analysis based on Q-learning with linear function approximation
(i.e., actions, Q-function, reward function, and policy).

• We provide an end-to-end implementation and evaluation of our approach.


Given a training dataset of programs, we first learn a policy (based on the
Q-function) over analysis runs of these programs. We then use the resulting
policy during the analysis of new, unseen programs. The experimental results
on a set of realistic programs (e.g., Linux device drivers) show that our rein-
forcement learning based Polyhedra analysis achieves substantial speed-ups,
in many cases, two to four orders of magnitude over a heavily optimized
state-of-the-art implementation of Polyhedra.

Overall, we believe the recipe outlined in this chapter opens up the possibility
for speeding-up other analyzers with reinforcement learning based concepts.
92 reinforcement learning for numerical domains

4.1 reinforcement learning for static analysis

In this section we first introduce the general framework of reinforcement learning


and then discuss its instantiation for static analysis.

4.1.1 Reinforcement Learning

Reinforcement learning (RL) [195] involves an agent learning to achieve a long-term


goal by interacting with its environment. The agent starts from an initial represen-
tation of its environment in the form of an initial state s0 ∈ S where S is the set of
possible states. Then, at each time step t = 0, 1, 2, . . . , the agent performs an action
at ∈ A in state st (A is the set of possible actions) and moves to the next state st+1 .
The agent receives a numerical reward r(st , at , st+1 ) ∈ R for moving from the state
st to st+1 by taking the action at . The agent repeats this process until it reaches a
final state. Each sequence of states and actions from an initial state to the final state
is called an episode.
In RL, state transitions typically satisfy the Markov property: the next state st+1
depends only on the current state st and the action at taken from st . A policy
p : S → A is a mapping from states to actions: it specifies the action at = p(st )
that the agent will take when in state st . The agent’s goal is to learn a policy that
maximizes not an immediate but a cumulative reward for its actions in the long
term. The agent does this by selecting the action with the highest expected long-
term reward in a given state. The quality function (Q-function) Q : S × A → R
specifies the long term cumulative reward associated with choosing an action at
in state st . Learning this function, which is not available a priori, is essential for
determining the best policy and is explained next.

q-learning and approximating the q-function. Q-learning [208] can


be used to learn the Q-function over state-action pairs. Typically the size of the
state space is so large that it is not feasible to explicitly compute the Q-function
for each state-action pair and thus the function is approximated. In this chapter,
we consider a linear function approximation of the Q-function for three reasons: (i)
effectiveness: the approach is efficient, can handle large state spaces, and works well
in practice [79]; (ii) it leverages our application domain: in our setting, it is possible
to choose meaningful features (e.g., approximation of polyhedra volume and cost
of transformer) that relate to precision and performance of the static analysis and
thus it is not necessary to uncover them automatically (as done, e.g., by training
a neural net); and (iii) interpretability of policy: once the Q-function and associated
policy are learned they can be inspected and interpreted.
The Q-function is described as a linear combination of ` basis functions φi : S ×
A → R, i = 1, . . . , `. Each φi is a feature that assigns a value to a (state, action) pair
and ` is the total number of chosen features. The choice of features is important
4.1 reinforcement learning for static analysis 93

Algorithm 4.1 Q-learning algorithm


1: function Q-learn(S, A, r, γ, α, φ)
2: Input:
3: S ← set of states, A ← set of actions, r ← reward function
4: γ ← discount factor, α ← learning rate
5: φ ← set of feature functions over S and A
6: Output: parameters θ
7: θ = Initialize arbitrarily (which also initializes Q)
8: for each episode do
9: Start with an initial state s0 ∈ S
10: for t = 0, 1, 2, . . . , length(episode) do
11: Take action at , observe next state st+1 and r(st , at , st+1 )
12: θ := θ + α · (r(st , at , st+1 ) + γ · maxat+1 Q(st+1 , at+1 ) − Q(st , at )) · φ(st , at )
13: end for
14: end for
15: return θ
16: end function

and depends on the application domain. We collect the feature functions into a
vector φ(s, a) = (φ1 (s, a), φ2 (s, a), . . . , φ` (s, a)); doing so, the Q-function has the
form:

Q(s, a) = θj · φj (s, a) = φ(s, a) · θT , (4.1)
j=1

where θ = (θ1 , θ2 , . . . , θ` ) is the parameter vector. The goal of Q-learning with linear
function approximation is thus to estimate (learn) θ.
Algorithm 4.1 shows the Q-learning procedure with linear function approxima-
tion. In the algorithm, 0 6 γ < 1 is the discount factor which represents the dif-
ference in importance between immediate and future rewards. γ = 0 makes the
agent only consider immediate rewards while γ ≈ 1 gives more importance to
future rewards. The parameter 0 < α 6 1 is the learning rate that determines the
extent to which the newly acquired information overrides the old information. The
algorithm first initializes θ randomly. Then at each step t in an episode, the agent
takes an action at , moves to the next state st+1 and receives a reward r(st , at , st+1 ).
Line 12 in the algorithm shows the equation for updating the parameters θ. Notice
that Q-learning is an off-policy learning algorithm as the update in the equation
assumes that the agent follows a greedy policy (from state st+1 ) while the action
(at ) taken by the agent (in st ) need not be greedy.
Once the Q-function is learned, a policy p∗ for maximizing the agent’s cumula-
tive reward is obtained as:

p∗ (s) = argmaxa∈A Q(s, a). (4.2)

In the application, p∗ is computed on the fly at each stage s by computing Q for


each action a and choosing the one with maximal Q(s, a). Since the number of
actions is typically small, this incurs little overhead.
94 reinforcement learning for numerical domains

Table 4.1: Mapping of RL concepts to Static analysis concepts.


RL concept Static analysis concept
Agent Static analyzer
State s ∈ S Features of abstract state
Action a ∈ A Abstract transformer
Reward function r Transformer precision and runtime
Feature Value associated with abstract state features
and transformer

4.1.2 Instantiation of RL to Static Analysis

We now discuss a general recipe for instantiating the RL framework described


above to the domain of static analysis. The precise formal instantiation to the spe-
cific numerical (Polyhedra) analysis is provided later.
In Table 4.1, we show a mapping between RL and program analysis concepts.
Here, the analyzer is the agent that observes its environment, which is the ab-
stract program state (e.g., polyhedron) arising at every iteration of the analysis. In
general, the number of possible abstract states can be very large (or infinite) and
thus, to enable RL in this setting, we abstract the state through a set of features
(Table 4.2). An example of a feature could be the number of bounded program
variables in a polyhedron or its volume. The challenge is to define the features
to be fast to evaluate, yet sufficiently representative so the policy derived through
learning generalizes well to unseen abstract program states.
Further, at every abstract state, the analyzer should have the choice between
different actions corresponding to different abstract transformers. The transformers
should range from expensive and precise to cheap and approximate. The reward
function r is composed of a measure of precision and speed and should encourage
approximations that are both precise and fast.
The goal of our agent then is to learn an approximation policy that at each
analysis step selects an action that tries to minimize the loss of analysis precision
at fixpoint while improving overall performance. Learning this policy is typically
done offline using a given dataset D of programs (discussed in Section 4.4). We note
that this is computationally challenging because the dataset D can contain many
programs and each will need to be analyzed many times during training: even a
single run of the analysis can contain many (e.g., thousands) calls to transformers.
Generating all combinations of different approximate transformers applied at each
program point is infeasible. Hence, to improve the efficiency of learning in practice,
one would typically exercise the choice for multiple transformers/actions only at
selected program points. In our work, we approximate at the join points where the
most expensive transformer in the numerical domains is usually applied.
4.2 polyhedra analysis and approximate transformers 95

Another key challenge lies in defining a suitable space of transformers. As we


will see later, we accomplish this by leveraging recent advances in online decom-
position for numerical domains [189, 190, 191] presented in Chapters 2 and 3. We
show how to do that for the notoriously expensive Polyhedra analysis; however,
the approach is easily extendable to other popular numerical domains, which all
benefit from decomposition.

4.2 polyhedra analysis and approximate transformers

In this section we leverage online decomposition to define a flexible approxima-


tion framework for the Polyhedra domain analysis (see Chapters 2 and 3) that loses
precision in a way that directly translates into performance gains. The basic idea is
simple: we approximate by dropping constraints to reduce connectivity among con-
straints and thus to yield finer decompositions of abstract elements. These directly
translate into speedup. We consider various options for these approximations; re-
inforcement learning (in Section 3.3) will then learn a proper, context-sensitive
strategy that stipulates when and which approximation option to apply.
Next we describe our strategies for approximating a given transformer to yield
finer online decompositions. The strategies are generic and can be instantiated to
approximate any transformer in a given domain.

4.2.1 Block Splitting

The cost of a decomposed abstract transformer applied on the abstract element(s)


P depends on the sizes of the blocks in the permissible partition πP relevant to
the transformer and, more specifically, on the size of the largest such block. Thus,
it is desirable to bound this size by a threshold ∈ N. The common goal of all our
splitting strategies is to satisfy this bound (after splitting). This is done by first
identifying all blocks Xt ∈ πP with |Xt | > threshold that the transformer requires
and then removing constraints from P(Xt ) until it decomposes into blocks of sizes
< threshold. Since we only remove constraints from the abstract element, the result-
ing transformer remains sound. Obviously, there are many choices for removing
constraints as shown in the next example.

Example 4.2.1. Consider the following polyhedron and threshold = 4

Xt = {x1 , x2 , x3 , x4 , x5 , x6 },
P(Xt ) = {x1 − x2 + x3 6 0, x2 + x3 + x4 6 0, x2 + x3 6 0,
x3 + x4 6 0, x4 − x5 6 0, x4 − x6 6 0}.

We can remove M = {x4 − x5 6 0, x4 − x6 6 0} from P(Xt ) to obtain the constraint


set {x1 − x2 + x3 6 0, x2 + x3 + x4 6 0, x2 + x3 6 0, x3 + x4 6 0} with partition
{{x1 , x2 , x3 , x4 }, {x5 }, {x6 }}, which obeys the threshold.
96 reinforcement learning for numerical domains

Algorithm 4.2 Block Splitting algorithm


1: function block_split(Xt , P(Xt ), threshold)
2: Input:
3: Xt ← input block, P(Xt ) ← input factor, threshold ← threshold for the size of Xt
4: if |Xt | < threshold then . Nothing to decompose
5: πO := πO ∪ Xt
6: O := O ∪ P(Xt )
7: return (O, πO )
8: end if
9: r_algo := choose_removal_algorithm(Xt , P(Xt )) . Choose constraint removal algorithm
10: M := remove_cons(Xt , P(Xt ), r_algo) . Compute constraints M ⊆ P(Xt ) to be removed
11: O := P(Xt ) \ M
12: πO := finest_partition(O, Xt ) . Partition Xt w.r.t. O
13: return (O, πO )
14: end function

We could also remove M 0 = {x2 + x3 + x4 6 0, x3 + x4 6 0} from P(Xt ) to get the


constraint set {x1 − x2 + x3 6 0, x2 + x3 6 0, x4 − x5 6 0, x4 − x6 6 0} with partition
{{x1 , x2 , x3 }, {x4 , x5 , x6 }}, which also obeys the threshold.

Algorithm 4.2 shows our generic function block_split for splitting a given block
Xt and the associated factor P(Xt ). If the size of the block Xt is below the thresh-
old, no decomposition is performed. Otherwise, one out of several possible con-
straint removal algorithms is chosen (these are explained below) using the func-
tion choose_removal_algorithm learned by RL. Using the removal algorithm, a set
of constraints is removed obtaining O such that Xt decomposes into blocks of size
6 threshold. We note that O approximates P(Xt ) by construction. The associated
partition of Xt is computed from scratch by connecting the variables that occur in
the same constraint using the function finest_partition.
We discuss next choices for the removal algorithm that we consider. Depending
on the inputs, each may yield different decompositions and different precisions.

stoer-wagner min-cut. The first basic idea is to remove a minimal number


of constraints in P(Xt ) that decomposes the block Xt into two blocks. To do so,
we associate with P(Xt ) a weighted undirected graph G = (V, E), where V = Xt .
Further, there is an edge between xi and xj , if there is a constraint containing
both; its weight mij is the number of such constraints. We then apply the standard
Stoer-Wagner min-cut algorithm [194] to obtain a partition of Xt into Xt0 and Xt00 .
M collects all constraints that need to be removed, i.e., those that contain at least
one variable from both Xt0 and Xt00 . If Xt0 , Xt00 have sizes > threshold, then the above
procedure is repeated until we get a partition of Xt where each block has size
6 threshold.

Example 4.2.2. Fig. 4.3 shows the graph G for P(Xt ) in Example 4.2.1. Applying the
Stoer-Wagner min-cut on G once will cut off x5 or x6 by removing the constraint
4.2 polyhedra analysis and approximate transformers 97

x2 x5
1
1 1

x1 3 x4
1 2 1

x3 x6

Figure 4.3: Graph G for P(Xt ) in Example 4.2.1

x4 − x5 or x4 − x6 , respectively. In either case a block of size 5 remains, exceeding the


threshold of 4. After two applications, both constraints have been removed and the
resulting block structure is given by {{x1 , x2 , x3 , x4 }, {x5 }, {x6 }}. The associated factors
are {x1 − x2 + x3 6 0, x2 + x3 + x4 6 0, x2 + x3 6 0, x3 + x4 6 0} and x5 , x6 become
unconstrained.

weighted constraint removal. Our second approach for constraints re-


moval does not associate weights with edges but with constraints. It then re-
moves greedily edges with high weights. Specifically, we consider the following
two choices of constraint weights, yielding two different constraint removal poli-
cies:

• For each variable xi ∈ Xt , we first compute the number ni of constraints


containing xi . The weight of a constraint is then the sum of the ni over all
variables occurring in the constraint.

• For each pair of variables xi , xj ∈ Xt , we first compute the number nij of


constraints containing both xi and xj . The weight of a constraint is then the
sum of the nij over all pairs xi , xj occurring in the constraint.

Once the weights are computed, we remove the constraint with the maximum
weight. The intuition is that variables in this constraint most likely occur in other
constraints in P(Xt ) and thus they do not become unconstrained upon constraint
removal. This reduces the loss of information. The procedure is repeated until we
get the desired partition of Xt .

Example 4.2.3. Applying the first definition of weights in Example 4.2.1, we get
n1 = 1, n2 = 3, n3 = 4, n4 = 4, n5 = 1, n6 = 1. The constraint x2 + x3 + x4 6 0
has the maximum weight of n2 + n3 + n4 = 11 and thus is chosen for removal.
Removing this constraint from P(Xt ) does not yet yield a decomposition; thus we
have to repeat. Doing so {x3 + x4 6 0} is chosen. Now, P(Xt ) \ M = {x1 − x2 + x3 6
0, x2 + x3 6 0, x4 − x5 6 0, x4 − x6 6 0} which can be decomposed into two factors
{x1 − x2 + x3 6 0, x2 + x3 6 0} and {x4 − x5 6 0, x4 − x6 6 0} corresponding to blocks
{x1 , x2 , x3 } and {x4 , x5 , x6 }, respectively, each of size 6 threshold.
98 reinforcement learning for numerical domains

4.2.2 Merging of Blocks

Our basic objective when approximating is to ensure that the maximal block size
remains below a chosen threshold. Besides splitting to ensure this, there can also be
a benefit of merging small blocks, again provided the resulting block size remains
below the threshold. The merging itself does not change precision, but the resulting
transformer may be more precise when working on larger blocks. In particular this
can happen with the inputs of the join transformer as we will explain later.
We consider the following three merging strategies. To simplify the explanation,
we assume that the blocks in πP are ordered by ascending size:

1. No merge: None of the blocks are merged.

2. Merge smallest first: We start merging the smallest blocks as long as the size
stays below the threshold. These blocks are then removed and the procedure
is repeated on the remaining set.

3. Merge large with small: We start to merge the largest block with the smallest
blocks as long as the size stays below the threshold. These blocks are then
removed and the procedure is repeated on the remaining set.

Example 4.2.4. Consider threshold = 5 and πP with block sizes {1, 1, 2, 2, 2, 2, 3, 5, 7,


10}. Merging smallest first yields blocks 1 + 1 + 2, 2 + 2, 2 + 3 leaving the rest un-
changed. The resulting sizes are {4, 4, 5, 5, 7, 10}. Merging large with small leaves
10, 7, 5 unchanged and merges 3 + 1 + 1, 2 + 2, and 2 + 2. The resulting sizes are
also {4, 4, 5, 5, 7, 10} but the associated factors are different (since different blocks
are merged), which will yield different results in following transformations.

4.2.3 Approximation for Polyhedra Analysis

Using the notation and terminology from Chapter 2, we now show how our ap-
proximation methods discussed above can be instantiated for the Polyhedra analy-
sis. So far the discussion has been rather generic, i.e., approximation could be done
at any time during analysis. We choose to perform approximation only with the
join transformer. This is because as explained in Section 2.4 and Section 2.5, the join
usually coarsens the partitions substantially and is the most expensive transformer
of the Polyhedra analysis.
Let πcommon = πP1 t πP2 be a common permissible partition for the inputs P1 , P2
of the join transformer. Then, from Chapter 2, a permissible partition for the (not
approximated) output is obtained by keeping all blocks Xt ∈ πcommon for which
P1 (Xt ) = P2 (Xt ) in the output partition πO , and fusing all remaining blocks into
one. Formally, πO = {N} ∪ U, where
[
N = {Xk ∈ πcommon : P1 (Xk ) 6= P2 (Xk )}, U = {Xk ∈ πcommon : P1 (Xk ) = P2 (Xk )}.
4.2 polyhedra analysis and approximate transformers 99

Algorithm 4.3 Approximation algorithm for Polyhedra join


1: function approximate_join((P1 , πP1 ), (P2 , πP2 ), threshold)
2: Input:
3: (P1 , πP1 ), (P2 , πP2 ) ← decomposed inputs to the join
4: threshold
[ ← Upper bound on size of N
5: O := {P1 (Xk ) : Xk ∈ U}
6: πO := U . initialize output partition
7: B := {Xk ∈ πP1 t πP2 : Xk ⊆ N}
8: Bt := {Xt ∈ B : |Xt | > threshold}
9:
. join factors for blocks in Bt and split the outputs
10: for Xt ∈ Bt do
11: P 0 := P1 (Xt ) t P2 (Xt )
12: (C, π) := block_split(Xt , CP 0 , threshold)
13: for Xt 0 ∈ π do
14: G(Xt 0 ) := conversion(C(Xt 0 ))
15: O := O ∪ (C(Xt 0 ), G(Xt 0 ))
16: end for
17: πO := πO ∪ π
18: end for
19:
. merge blocks ∈ B \ Bt via a merge algorithm and apply join
20: m_algo := choose_merge_algorithm(B \ Bt )
21: Bm := merge(B \ Bt , m_algo)
22: for Xm ∈ Bm do
23: O := O ∪ (P1 (Xm ) t P2 (Xm ))
24: πO := πO ∪ {Xm }
25: end for
26: return (O, πO )
27: end function

The join transformer computes the generators GO for the output O as GO =


GP1 (X\N) × (GP1 (N) ∪ GP2 (N) ) where × is the Cartesian product. The constraint repre-
sentation CO is computed as CO = CP1 (X\N) ∪ conversion(GP1 (N) ∪ GP2 (N) ). The con-
version algorithm has worst-case exponential complexity and is the most expensive
step of the join. Note that the decomposed join applies it only on the generators
GP1 (N) ∪ GP2 (N) corresponding to the block N.
Let B ∈ πcommon be the set of blocks that merge into N in the output O = P1 t P2 :

B = {Xk ∈ πcommon : Xk ∩ N 6= ∅}.

A straightforward way of approximating the join is to compute it separately on


each block Xk in B as P1 (Xk ) t P2 (Xk ), which yields the output partition πcommon .
However, it may contain block sizes above threshold. Further, precision can be
gained for subsequent transformers by merging Xk with sizes below threshold. Thus
we merge small input blocks using a merge before the join, and split large output
blocks in Bt = {Xk ∈ B : |Xk | > threshold} after the join.
100 reinforcement learning for numerical domains

Algorithm 4.3 shows the overall algorithm for approximating the Polyhedra join
transformer and is explained in greater detail next.

splitting of large blocks For each block in Xt in Bt we apply the join


on the associated factors: O(Xt ) = P1 (Xt ) t P2 (Xt ). Then we call the function
block_split (Algorithm 4.2) to split Xt into smaller blocks, each of size 6 threshold.
The conversion is now applied on the set of constraint sets returned by block_split.
We perform a number of optimizations in our implementation that are not shown
in Algorithm 4.3 for simplicity. For example, {Xt } is permissible for CP 0 , but may
not be finest. Thus we first compute the finest partition from scratch and then
call block_split only if it contains a block of size > threshold. The cost of this
preprocessing is cheap compared to the cost of the overall join.

merging blocks All blocks in B \ Bt obey the threshold size and we can ap-
ply merging to obtain larger blocks Xm of size 6 threshold to increase precision of
the subsequent transformers. The merging function choose_merge_algorithm in Al-
gorithm 4.3 is learned by RL. The join is then applied on the factors P1 (Xm ), P2 (Xm )
and the result is added to the output O.

need for rl. Different choices of the threshold, splitting, and merge strategies
in Algorithm 4.3 yield a range of transformers with different performance and
precision depending on the inputs. Determining the suitability of a given choice
on inputs is highly non-trivial and thus we use RL to learn a policy that makes
decisions adapted to the join inputs. We note that all of our approximate trans-
formers are non-monotonic; however, the analysis always converges to a fixpoint
when combined with widening [13].

4.3 reinforcement learning for polyhedra analysis

We now describe how to instantiate reinforcement learning for approximating Poly-


hedra domain analysis. The instantiation consists of the following steps:

• Extracting the RL state s from the abstract program state numerically using a
set of features.

• Defining actions a as the choices among the threshold, merge and split meth-
ods defined in the previous section.

• Defining a reward function r favoring both high precision and fast execution.

• Defining the feature functions φ(s, a) to enable Q-learning.

states. We consider nine features for defining a state s for RL. The features ψi ,
their extraction complexity, and their typical range on our benchmarks are shown
4.3 reinforcement learning for polyhedra analysis 101

Table 4.2: Features for describing RL state s (m ∈ {1, 2}, 0 6 j 6 8, 0 6 h 6 3).


Feature ψi Extraction Typical ni Buckets for feature ψi
complexity range
|B| O(1) 1–10 10 {[j + 1, j + 1]} ∪ {[10, ∞)}
min(|Xk | : Xk ∈ B) O(|B|) 1–100 10 {[10 · j + 1, 10 · (j + 1)]} ∪ {[91, ∞)}
max(|Xk | : Xk ∈ B) O(|B|) 1–100 10 {[10 · j + 1, 10 · (j + 1)]} ∪ {[91, ∞)}
avg(|Xk | : Xk ∈ B) O(|B|) 1–100 10 {[10 · j + 1, 10 · (j + 1)]} ∪ {[91, ∞)}
[
min(| GPm (Xk ) | : Xk ∈ B) O(|B|) 1–1000 10 {[100 · j + 1, 100 · (j + 1)]} ∪ {[901, ∞)}
[
max(| GPm (Xk ) | : Xk ∈ B) O(|B|) 1–1000 10 {[100 · j + 1, 100 · (j + 1)]} ∪ {[901, ∞)}
[
avg(| GPm (Xk ) | : Xk ∈ B) O(|B|) 1–1000 10 {[100 · j + 1, 100 · (j + 1)]} ∪ {[901, ∞)}
|{xi ∈ X : xi ∈ [lm , um ] in Pm }| O(ng) 1–25 5 {[5 · h + 1, 5 · (h + 1)]} ∪ {[21, ∞)}
|{xi ∈ X : xi ∈ [lm , ∞) in Pm }|+ O(ng) 1–25 5 {[5 · h + 1, 5 · (h + 1)]} ∪ {[21, ∞)}
|{xi ∈ X : xi ∈ (−∞, um ] in Pm }|

in Table 4.2. The first seven features capture the asymptotic complexity of the join,
as in Table 2.2, on the input polyhedra P1 , P2 . These are the number of blocks,
the distribution (using maximum, minimum and average) of their sizes, and the
distribution (using maximum, minimum and average) of the number of generators
in the different factors. The precision of the inputs is captured by considering the
number of variables xi ∈ X with finite upper and lower bound, and the number of
those with only a finite upper or lower bound in both P1 and P2 .
As shown in Table 4.2, each state feature ψi returns a natural number; however,
its range can be rather large, resulting in a massive state space. To ensure scalability
and generalization of learning, we use bucketing to reduce the state space size by
clustering states with similar precision and expected join cost. The number ni of
buckets for each ψi and their definition are shown in the last two columns of
Table 4.2. Using bucketing, the RL state s is then a 9-tuple consisting of the indices
of buckets where each index indicates the bucket that ψi ’s return value falls into.

actions. An action a is a 3-tuple (th, r_algo, m_algo) consisting of:

• th ∈ {1, 2, 3, 4} depending on threshold ∈ [5, 9], [10, 14], [15, 19], or [20, ∞).

• r_algo ∈ {1, 2, 3}: the choice of a constraint removal, i.e., splitting method.

• m_algo ∈ {1, 2, 3}: the choice of merge algorithm.

All three of these have been discussed in detail in Section 4.2. The threshold values
were chosen based on performance characterization on our benchmarks. With the
above, we have 36 possible actions per state.

reward. After applying the (approximated) join transformer according to ac-


tion at in state st , we compute the precision of the output polyhedron P1 t P2 by
102 reinforcement learning for numerical domains

first computing the smallest (often unbounded) box1 covering P1 t P2 which has
complexity O(ng). We then compute the following quantities from this box:

• ns : number of variables xi with finite singleton interval, i.e., xi ∈ [l, u], l = u.

• nb : number of variables xi with finite upper and lower bounds, i.e., xi ∈


[l, u], l 6= u.

• nhb : number of variables xi with either finite upper or finite lower bounds,
i.e., xi ∈ (−∞, u] or xi ∈ [l, ∞).

Further, we measure the runtime in CPU cycles cyc for the approximate join
transformer. The reward is then defined by

r(st , at , st+1 ) = 3 · ns + 2nb + nhb − log10 (cyc). (4.3)

As the order of precision for different types of intervals is: singleton > bounded
> half bounded interval, the reward function in (4.3) weighs their numbers by
3, 2, 1. The reward function in (4.3) favors both high performance and precision. It
also ensures that the precision part (3 · ns + 2nb + nhb ) has a similar magnitude
range as the performance part (log10 (cyc))2 .

q-function. As mentioned before, we approximate the Q-function by a linear


function (4.1). We define binary feature functions φijk for each (state, action) pair.
φijk (s, a) = 1 if the tuple s(i) lies in j-th bucket and action a = ak

φijk (s, a) = 1 ⇐⇒ s(i) = j and a = ak (4.4)

The Q-function is a linear combination of state action features φijk

X
9 X
ni X
36
Q(s, a) = θijk · φijk (s, a). (4.5)
i=1 j=1 k=1

q-learning. During the training phase, we are given a dataset of programs


D and we use Q-LEARN from Algorithm 4.1 on each program in D to perform Q-
learning. Q-learning is performed with input parameters instantiated as explained
above and summarized in Table 4.3. Each episode consists of a run of Polyhedra
analysis on a benchmark in D. We run the analysis multiple times on each program
in D and update the Q-function after each join by calling Q-LEARN.
The Q-function is typically learned using an -greedy policy [195] where the
agent takes greedy actions by exploiting the current Q-estimates with 1 −  proba-
bility while also exploring randomly with  probability. The policy requires initial
1 A natural measure of precision is the volume of P1 t P2 . However, calculating it is very expensive
and P1 t P2 is often unbounded.
2 The log is used since the runtime of join in cycles is exponential.
4.4 experimental evaluation 103

Table 4.3: Instantiation of Q-learning to Polyhedra domain analysis.


RL concept Polyhedra Analysis Instantiation
Agent Polyhedra analysis
State s ∈ S As described in Table 4.2
Action a ∈ A Tuple (th, r_algo, m_algo)
Reward function r Shown in (4.3)
Feature φ Defined in (4.4)
Q-function Q-function from (4.5)

random exploration to learn good Q-estimates that can be exploited later. The num-
ber of episodes required for obtaining such estimates is infeasible for the Polyhedra
analysis as an episode typically contains thousands of join calls. Therefore, we gen-
erate actions for Q-learning by exploiting the optimal policy for precision (which
always selects the precise join) and explore for performance by choosing a ran-
dom approximate join: both with a probability of 0.5. We note that we also tried
the exploitation probabilities of 0.7 and 0.9. However, the resulting policies had a
suboptimal performance during testing due to limited exploration.
Formally, the action at := p(st ) selected in state st during learning is given by
at = (th, r_algo, m_algo) where

rand() % 4+1 with probability 0.5
th = P|B| ,
min(4, ( i=1 |Xk |)/5 + 1) with probability 0.5 (4.6)
r_algo = rand() % 3 + 1, m_algo = rand() % 3 + 1.

obtaining the learned policy. After learning over the dataset D, the
learned approximating join transformer in state st chooses an action according
to (4.2) by selecting the maximal value over all actions. The value of th = 1, 2, 3, 4 is
decoded as threshold = 5, 10, 15, 20 respectively.

4.4 experimental evaluation

We implemented our approach in the form of a C-library for Polyhedra analysis,


called Poly-RL. We compare the performance and precision of Poly-RL against the
state-of-the-art ELINA [1] (Chapters 2 and 3), which uses online decomposition
for Polyhedra analysis without losing precision. In addition, we implemented two
Polyhedra analysis approximations (baselines) based on the following heuristics:

• Poly-Fixed: uses a fixed strategy based on the results of Q-learning. Namely,


we selected the threshold, split and merge algorithm most frequently chosen
by our (adaptive) learned policy during testing.
104 reinforcement learning for numerical domains

• Poly-Init: uses a random approximate join with probability 0.5 based on (4.6).

All Polyhedra implementations use 64-bit integers to encode rational numbers.


In the case of overflow, the corresponding polyhedron is set to top as in Section 2.5
and Section 3.5.

experimental setup All our experiments including learning the parameters θ


for the Q-function and the evaluation of the learned policy on unseen benchmarks
were carried out on a 2.13 GHz Intel Xeon E7- 4830 Haswell CPU with 24 MB L3
cache and 256 GB memory. All Polyhedra implementations were compiled with
gcc 5.4.0 using the flags -O3 -m64 -march=native.

analyzer For both learning and testing, we used a newer version of the crab-
llvm analyzer that is different from the versions used in Section 2.5 and Section 3.5.

benchmarks We chose benchmarks from the Linux Device Drivers (LD) cate-
gory of the popular software verification competition [24]. Some of the benchmarks
we used for both learning and testing were also used in Section 2.5 and Section 3.5
while others are different.

training dataset We chose 70 large benchmarks for Q-learning. We ran each


benchmark a thousand times over a period of three days to generate sample traces
of Polyhedra analysis containing thousands of calls to the join transformer. Since
the crab-llvm analyzer is intra-procedural, we get an analysis trace for each function
leading to several traces per analyzed benchmark. We set a timeout of 5 minutes
per run and discarded incomplete traces in case of a timeout. In total, we performed
Q-learning over 110811 traces.

evaluation method For evaluating the effectiveness of our learned policy,


we then chose benchmarks based on the following criteria:

• No overfitting: the benchmark was not used for learning the policy.

• Challenging: ELINA takes > 5 seconds on the benchmark.

• Fair: there is no integer overflow in the expensive functions in the benchmark.


Because in the case of an overflow, the polyhedron is set to top resulting in a
trivial fixpoint at no cost and thus in a speedup that is due to overflow. This
fairness criterion is the same as the one used in Section 2.5 and Section 3.5.

Based on these criteria, we found 11 benchmarks on which we present our results.


We used a timeout of 1 hour and a memory limit of 100 GB for our experiments.
4.4 experimental evaluation 105

Table 4.4: Timings (seconds) and precision of approximations (%) w.r.t. ELINA.
Benchmark #Program ELINA Poly-RL Poly-Fixed Poly-Init
Points time time precision time precision time precision
wireless_airo 2372 877 6.6 100 6.7 100 5.2 74
net_ppp 680 2220 9.1 87 TO 34 7.7 55
mfd_sm501 369 1596 3.1 97 1421 97 2 64
ideapad_laptop 461 172 2.9 100 157 100 MO 41
pata_legacy 262 41 2.8 41 2.5 41 MO 27
usb_ohci 1520 22 2.9 100 34 100 MO 50
usb_gadget 1843 66 37 60 35 60 TO 40
wireless_b43 3226 19 13 66 TO 28 83 34
lustre_llite 211 5.7 4.9 98 5.4 98 6.1 54
usb_cx231xx 4752 7.3 3.9 ≈100 3.7 ≈100 3.9 94
netfilter_ipvs 5238 20 17 ≈100 9.8 ≈100 11 94

inspecting the learned policy Our learned policy chooses in the majority
of cases threshold=20, the binary weighted constraint removal algorithm for split-
ting, and the merge smallest first algorithm for merging. Poly-Fixed always uses
these values for defining an approximate transformer, i.e., it follows a fixed strategy.
Our experimental results show that following this fixed strategy results in subop-
timal performance compared to our learned policy that makes adaptive, context-
sensitive decisions to improve performance.

results We measure the precision as a fraction of program points at which


the Polyhedra invariants generated by approximate analysis are semantically the
same or stronger than the ones generated by ELINA. This is a less biased and more
challenging measure than the number of discharged assertions [42, 156, 157] where
one can write weak assertions that even a weaker domain can prove.
Table 4.4 shows the number of program points,3 timings (in seconds), and the
precision (in %) of Poly-RL, Poly-Fixed, and Poly-Init w.r.t. ELINA on all 11 bench-
marks. In the table, the entry TO (MO) means that the analysis did not finish within 1
hour (exceeded the memory limit). For an analysis that did not finish, we compute
the precision by comparing program points for which the incomplete analysis can
produce invariants.

poly-rl vs elina In Table 4.4, Poly-RL obtains > 7x speed-up over ELINA
on 6 of the 11 benchmarks with a maximum of 515x speedup for the mfd_sm501
benchmark. It also obtains the same or stronger invariants on > 87% of program
points on 8 benchmarks. Note that Poly-RL obtains both large speedups and the
same invariants at all program points on 3 benchmarks.

3 The benchmarks contain up to 50K LOC but SeaHorn encodes each basic block as one program
point; thus, the number of points in Table 4.4 is significantly reduced.
106 reinforcement learning for numerical domains

Many of the constraints produced by the precise join transformer from ELINA
are removed by the subsequent transformers in the analysis which allows Poly-
RL to obtain the same invariants as ELINA despite the loss of precision dur-
ing join in most cases. Due to non-monotonic join transformers, Poly-RL can
produce fixpoints non-comparable to those produced by ELINA. Because of the
non-comparability, the quality of the obtained invariants cannot be established
using our precision metric. We take a conservative approach and mark all non-
comparable invariants as being imprecise. We note that this is the case for the 3
benchmarks in Table 4.4 where Poly-RL obtains low precision.
We also tested Poly-RL on 17 benchmarks from the product lines category.
ELINA did not finish within an hour on any of these benchmarks whereas Poly-RL
finished within 1 second. Poly-RL had 100% precision on the subset of program
points at which ELINA produces invariants. With Poly-RL, SeaHorn successfully
discharged the assertions. We did not include these results in Table 4.4 as the pre-
cision w.r.t. ELINA cannot be completely compared.

poly-rl vs poly-fixed Poly-Fixed is never significantly more precise than


Poly-RL in Table 4.4. Poly-Fixed is faster than Poly-RL on 4 benchmarks; how-
ever, the speedups are small. These results validate our hypothesis that a fixed
policy yields suboptimal performance and precision. We note that Poly-Fixed is
slower than ELINA on 3 benchmarks and times out on 2 of these. This is due to the
overhead of the binary weight constraints removal algorithm and the approximate
analysis converging slower than ELINA.

poly-rl vs poly-init From (4.6), Poly-Init takes random actions and thus the
quality of its result varies depending on the run. Table 4.4 shows the results on a
sample run. Poly-RL is more precise than Poly-Init on all benchmarks in Table 4.4.
Poly-Init also does not finish on 4 benchmarks.

4.5 related work

As our work touches on several topics, we next survey some of the work that is
most closely related to ours.

learning in program analysis Our work in this chapter can be seen as


part of the general research direction of parametric program analysis [7, 42, 96, 99,
100, 105, 131, 156, 157], a high level approach for building program analyzers that
tune the precision and cost of the analysis by adapting to the analyzed program.
In this setting, the analysis has parameters that control its precision and cost.
The authors in [157] propose parameter learning as a blackbox optimization
problem, and use Bayesian optimization for finding the tuned parameters. The
work of [42, 99, 156] computes a fixed static partition of the set of variables X for
4.5 related work 107

the Octagon domain analysis on a given program. The abstract transformers are
then applied individually on the relevant factors. The work of [131] provides algo-
rithms to learn minimal values of the tuning parameters for points-to analysis. The
work of [105] proposes a data-driven approach that automatically learns a good
heuristic rule for choosing important context elements for points-to analysis. The
difference between all of these works and ours is that the above approaches fix
the learning parameters for a given program. We believe that better tuning of cost
and precision can be achieved by changing the learning parameters dynamically
based on the abstract states encountered by the analyzer during the analysis. Fur-
thermore, these approaches measure precision by the number of queries proved
whereas we target the stronger notion of fixed point equivalence.
The work of [100] uses reinforcement learning to select a subset of variables to
be tracked in a flow sensitive manner with the weaker Interval domain so that the
memory footprint of the resulting analysis fits within a pre-determined memory
budget and the loss of precision is minimized. The work of [47] uses reinforcement
learning to guide relational proof search for verifying relational properties defined
over multiple programs. The authors in [7] use bayesian optimization to learn a
verification policy that guides numerical domain analysis during proof search.
In our recent work [96], we presented a new structured learning method based
on graph neural networks for speeding up numerical domains. The method is
more generic than the one presented in this chapter as the approximate transform-
ers there are not derived via custom splitting and merging algorithms but simply
obtained via constraint removal. Further, the features are more precise than ours
and capture structural dependencies between constraints. The results show that the
new method outperforms Poly-RL and also outperforms [191] when instantiated
for the Octagon domain.
In a recent work by [119], the authors observe that many programs share com-
mon pieces of code. Thus analysis results can be reused across programs. To
achieve this, the authors use cross program training. This is a complementary ap-
proach to the one taken in this work and we believe that in the future the two
methods can be combined to further improve the performance of static analysis.
The work of [25] automatically learns abstract transformers from examples. This
is a rather different approach, instead, we build approximations on top of standard
transformers based on online decomposition. We believe that the approach of [25]
can be combined with ours in the future to automate the process of generating
approximate abstract transformers.

online decomposition The work of [189, 190, 191] improve the performance
of the numerical domain analysis based on online decomposition without losing
precision. We compare against [191] in this chapter. As our experimental results
suggest, the performance of Polyhedra analysis can be significantly improved with
our approach. Further, some benchmarks are inherently dense, i.e., fully precise
108 reinforcement learning for numerical domains

online decomposition cannot decompose the set of variables |X| efficiently, and in
such cases our approach can be used to generate precise invariants efficiently.

abstraction refinement In general and as demonstrated by our experi-


ments, our method can sometimes be imprecise when our approximate join trans-
formers lose precision due to a low threshold value. We believe that in the fu-
ture, techniques similar to classic counter example guided abstraction refinement
(CEGAR) [52] and lazy abstraction [98] can be combined with our approach, i.e.,
concretely, to increase the precision of our approximate abstract transformers by
finding the ones that lead to the precision loss and (optionally) trying out larger
values of the threshold (i.e, a more precise transformer).

numerical solvers Machine learning methods have been extensively applied


for optimizing different solvers. Reinforcement learning based on linear function
approximation of the Q-function has been applied to learn branching rules for SAT
solvers in the work of [121]. The learned policies achieve performance levels equal
to those of the best branching rules. The work of [14, 115] learns branching rules via
empirical risk minimization for solving mixed integer linear programming prob-
lems. The work of [116] learns to solve combinatorial optimization problems over
graphs via reinforcement learning. FastSMT [16] learns a policy to apply appropri-
ate tactics to speed up numerical SMT solving. The work of [126] uses reinforce-
ment learning combined with graph neural networks for improving the efficiency
of solvers for quantified boolean formulas. The work of [135] learns branching rules
for mixed integer linear programming using graph neural networks to improve the
scalability of complete verifiers of neural networks.

inferring invariants Recent research has investigated the problem of in-


ferring numerical program invariants with machine learning. The works of
[77, 181, 222] use machine learning for inferring inductive loop invariants for pro-
gram verification. The learning algorithms in these works require specifications in
the form of pre/post conditions. Our work speeds up numerical abstract interpre-
tation which can be used to infer program invariants without pre/post conditions.

testing In recent years, there has been emerging interest in learning to produce
test inputs for finding program bugs or vulnerabilities. AFLFast [28] models pro-
gram branching behavior with Markov Chain, which guides the input generation.
Several works train neural networks to generate new test inputs, where the training
set can be obtained from an existing test corpus [61, 84], inputs generated earlier
in the testing process [178], or inputs generated by symbolic execution [95].
4.6 discussion 109

4.6 discussion

In this chapter, we further improved the performance of numerical program anal-


ysis based on online decomposition presented in Chapters 2 and 3. Our main con-
tribution here is to offer and demonstrate a new approach based on reinforcement
learning (RL) towards achieving scalable and precise analysis. The basic idea is
simple: using RL to compute an adaptive, context-sensitive policy on how to lose
as little precision as possible during analysis and to speed up the analyzer as much
as possible. To do so, we first showed that program analysis naturally maps to the
basic scenario and concepts needed to perform RL with Q-learning. To make the
approach work, three key ingredients are needed: a suitably designed set of fea-
tures that capture state/agent (abstract element/transformer) pairs and that are
efficient to compute, a set of transformers with different precision/runtime trade-
offs, and a reward function that assesses the quality of choosing a transformer at
some state during analysis.
To evaluate these general ideas, we instantiated the approach for the notoriously
expensive Polyhedra abstract domain, which has worst-case exponential complex-
ity. To define transformers with different precision/performance trade-offs, we
built on recent successes with online decomposition. Namely, we defined a novel
set of flexible approximate transformers that enforce a finer decomposition, which
directly translates to reduced asymptotic complexity and thus faster execution. As
a feature set, we considered various readily available statistics on the block sizes
in the decomposition and the bounds within the polyhedra. Our reward function
composed both the precision of a polyhedron and the runtime of the transformer.
Based on this concrete instantiation, we then learned the appropriate parame-
ters of the Q-function from a training dataset of programs and obtained a learned
policy function, which, during analysis, makes context-sensitive choices on which
approximate transformer to employ.
We provided a complete implementation of our approach and evaluated it on
a range of realistic programs, including Linux device drivers, that are expensive
to analyze. The results demonstrate that RL-based analysis can provide massive
speed-ups of sometimes orders of magnitude over a highly optimized Polyhedra
analysis library, while often maintaining precision at most program points.
Overall, we believe the correspondence between reinforcement learning and
static analysis (as well as its concrete instantiation) established in this work, is
instructive and can serve as a starting point for exploring and exploiting this con-
nection for other types of analyzers.
PART II
FAST AND PRECISE NEURAL
NETWORK CERTIFICATION

111
5
D E E P P O LY D O M A I N F O R C E R T I F Y I N G N E U R A L N E T W O R K S

In Chapters 2-4, we focussed on designing precise and scalable reasoning methods


for programs. Next we shift our focus to the problem domain of deep learning
models mentioned in Chapter 1. In the next two chapters, we describe our contri-
butions for designing state-of-the-art neural network certification methods.
In this chapter, we target incomplete certification of neural networks using ab-
stract interpretation. Incompleteness implies that there can be cases where we in-
correctly fail to certify the network because the approximation is too coarse (as
in Fig. 1.1 (a)). We describe our new custom polyhedral domain DeepPoly for
neural network certification containing specialized parallelizable transformers for
handling non-linearities in neural networks. DeepPoly currently produces the most
scalable and precise neural network certification results. We also show how to com-
bine DeepPoly with a form of abstraction refinement based on trace partitioning.
This enables us to prove, for the first time, the robustness of the network when
the input image is subjected to complex perturbations such as rotations that em-
ploy linear interpolation. In Chapter 6, we will focus on combining our domain
with precise solvers. We will show that this increases the precision of incomplete
certification while also improving the scalability of complete certification. Before
proceeding, we note that parts of this and the next chapter include contributions
from Timon Gehr and Rupanshu Ganvir.
Over the last few years, deep neural networks have become increasingly popu-
lar and have now started penetrating safety-critical domains such as autonomous
driving [30] and medical diagnosis [6] where they are often relied upon for making
important decisions. As a result of this widespread adoption, it has become even
more important to ensure that neural networks behave reliably and as expected.
Unfortunately, reasoning about these systems is challenging due to their “black
box” nature: it is difficult to understand what the network does since it is typically
parameterized with thousands or millions of real-valued weights that are hard to
interpret. Further, it has been discovered that neural nets can sometimes be sur-
prisingly brittle and exhibit non-robust behaviors, for instance, by classifying two
very similar inputs (e.g., images that differ only in brightness or in one pixel) to
different labels [86].
To address the challenge of reasoning about neural networks, recent research
has started exploring new methods and systems which can automatically prove that
113
114 deeppoly domain for certifying neural networks

a given network satisfies a specific property of interest (e.g., robustness to certain


perturbations, pre/post conditions). State-of-the-art works include methods based
on SMT solving [37, 69, 113, 114], mixed integer linear programming [8, 32, 36,
49, 66, 135, 197], Lipschitz optimization [170], duality [67, 212], convex relaxations
[68, 78, 163, 175, 186, 199, 211], and combination of relaxations with solvers [206].
Despite the progress made by these works, more research is needed to reach
the point where we can solve the overall neural network reasoning challenge suc-
cessfully. In particular, we still lack an analyzer that can scale to large networks,
can handle popular neural architectures (e.g., fully-connected, convolutional), and
yet is sufficiently precise to prove relevant properties required by applications. For
example, the works based on SMT solving and mixed-integer linear programming
are precise yet can only handle very small networks. To mitigate the scalability is-
sues, the work of [78] introduced the first neural network certifier based on abstract
interpretation enabling the analysis of larger networks than solver-based methods.
However, it relies on existing generic abstract domains that either do not scale to
larger neural networks (such as Convex Polyhedra [57]) or are too imprecise (e.g.,
Zonotope [81]). We note that online decomposition described in Chapters 2 and 3
for improving the scalability of the Polyhedra domain does not work in the neu-
ral network setting as the transformations in neural networks create constraints
between all neurons. Recent work by [211] scales better than [78] but only han-
dles fully-connected networks and cannot handle the widely used convolutional
networks. Both [113] and [211] are, in fact, unsound for floating-point arithmetic,
which is heavily used in neural nets, and thus they can suffer from false negatives.
Recent work by [186] handles fully-connected and convolutional networks and is
sound for floating-point arithmetic; however, as we demonstrate experimentally, it
can lose significant precision when dealing with larger perturbations.

this work In this work, we propose a new polyhedral domain, called Deep-
Poly, that makes a step forward in addressing the challenge of certifying neural
networks with respect to both scalability and precision. The key technical idea be-
hind our work is a novel abstract interpreter specifically tailored to the setting of
neural networks. Concretely, our abstract domain is a combination of floating-point
polyhedra with intervals, coupled with custom abstract transformers for common
neural network functions such as affine transforms, the rectified linear unit (ReLU),
sigmoid and tanh activations, and the maxpool operator. These abstract transform-
ers are carefully designed to exploit key properties of these functions and balance
analysis scalability and precision. As a result, DeepPoly is more precise than [211],
[78] and [186], yet can handle large convolutional networks and is also sound for
floating-point arithmetic.

proving robustness: illustrative examples To provide an intuition for


the kinds of problems that DeepPoly can solve, consider the images shown in
deeppoly domain for certifying neural networks 115

Attack Original Lower Upper


L∞

Rotation

Figure 5.1: Two different attacks applied to MNIST images.

Fig. 5.1. Here, we will illustrate two kinds of robustness properties: L∞ -norm based
perturbations (first row) and image rotations (second row).
In the first row, we are given an image of the digit 7 (under “Original”). Then,
we consider an attack where we allow a small perturbation to every pixel in the
original image (visually this may correspond to darkening or lightening the im-
age). That is, instead of a number, each pixel now contains an interval. If each of
these intervals has the same size, we say that we have formed an L∞ ball around
the image (typically with a given epsilon  ∈ R). This ball is captured visually by
the Lower image (in which each pixel contains the smallest value allowed by its
interval) and the Upper image (in which each pixel contains the largest value al-
lowed by its interval). We call the modification of the original image to a perturbed
version inside this ball an attack, reflecting an adversary who aims to trick the net-
work. There have been various works which aim to find such an attack, otherwise
called an adversarial example (e.g., [40]), typically using gradient-based methods.
For our setting however, the question is: are all possible images “sitting between”
the Lower and the Upper image classified to the same label as the original? Or, in
other words, is the neural net robust to this kind of attack?
The set of possible images induced by the attack is also called an adversarial region.
Note that enumerating all possible images in this region and simply running the
network on each to check if it is classified correctly, is practically infeasible. For
example, an image from the standard MNIST [124] dataset contains 784 pixels
and a perturbation that allows for even two values for every pixel will lead to 2784
images that one would need to consider. In contrast, our domain DeepPoly can
automatically prove that all images in the adversarial region classify correctly (that
is, no attack is possible) by soundly propagating the entire input adversarial region
through the abstract transformers of the network.
We also consider a more complex type of perturbation in the second row. Here,
we rotate the image by an angle and our goal is to show that any rotation up to
this angle classifies to the same label. In fact, we consider an even more challenging
problem where we not only rotate an image but first form an adversarial region
around the image and then reason about all possible rotations of any image in that
region. This is challenging, as again, the enumeration of images is infeasible when
using geometric transformations that perform linear interpolation (which is needed
to improve output image quality). Further, unlike the L∞ ball above, the entire set
of possible images represented by a rotation up to a given angle does not have a
116 deeppoly domain for certifying neural networks

closed-form and needs to somehow be captured. Directly approximating this set is


too imprecise and the analysis fails to prove the wanted property. Thus, we intro-
duce a method where we refine the initial approximation into smaller regions that
correspond to smaller angles (a form of trace partitioning [169]), use DeepPoly to
prove the property on each smaller region, and then deduce the property holds
for the initial, larger approximation. To our best knowledge, this is the first work
that shows how to prove the robustness of a neural network under complex input
perturbations such as rotations.
The work in this chapter was published in [188].

main contributions Our main contributions are:

• A new abstract domain for the certification of neural nets. The domain com-
bines floating-point polyhedra and intervals with custom abstract transform-
ers for affine transforms, ReLU, sigmoid, tanh, and maxpool functions. These
abstract transformers carefully balance the scalability and precision of the
analysis (Section 5.3).

• An approach for proving more complex perturbation specifications than con-


sidered so far, including rotations using linear interpolation, based on the
refinement of the abstract input. To our best knowledge, this is the first time
such perturbations have been certified (Section 5.4).

• A complete, parallelized implementation of our approach in a system called


ERAN fully available at https://github.com/eth-sri/eran. ERAN can handle
both fully-connected and convolutional neural networks (Section 5.5).

• An extensive evaluation on a range of datasets and networks including de-


fended ones, showing DeepPoly is more precise than prior work yet scales to
large networks (Section 5.5).

We note that in [152], we introduced new abstract transformers for handling


residual layers with DeepPoly integrated into ERAN. Overall, we believe that Deep-
Poly is a promising step towards addressing the challenge of reasoning about neu-
ral networks and a useful building block for proving complex specifications (e.g.,
rotations) and other applications of analysis. As an example, because our abstract
transformers for the output of a neuron are “point-wise” (i.e., can be computed in
parallel), they can be implemented on GPUs. In [152], we designed efficient GPU
algorithms for DeepPoly, and the resulting implementation which we call GPUPoly
enables the analysis of a 34-layer deep neural network containing up to 1M neu-
rons within a minute. In the future, we believe our transformers can also be directly
plugged into the latest systems training robust neural networks using abstract in-
terpretation on GPUs [144, 148]. As our transformers are substantially more precise
in practice than those from [144, 148], we expect they can help improve the overall
robustness of the trained network.
5.1 overview 117

Input Layer Hidden Layers Output Layer

i1 ∈ [−1, 1] 0 0 1
1 1 1

1 1 0

1 1 1

i2 ∈ [−1, 1] -1
0
-1
0
1
0

Figure 5.2: Example fully-connected neural network with ReLU activations.

5.1 overview

In this section, we provide an overview of our abstract domain on a small illustra-


tive example. Full formal details are provided in later sections.

5.1.1 Running example on a fully-connected network with ReLU activation

We consider the simple fully-connected neural network with ReLU activations


shown in Fig. 5.2. This network has already been trained and we have the learned
weights shown in the figure. The network consists of four layers: an input layer,
two hidden layers, and an output layer with two neurons each. The weights on
the edges represent the learned coefficients of the weight matrix used by the affine
transformations done at each layer. Note that these values are usually detailed float-
ing point numbers (e.g., 0.03), however, here we use whole numbers to simplify the
presentation. The learned bias for each neuron is shown above or below it. All of
the biases in one layer constitute the translation vector of the affine transformation.
To compute its output, each neuron in the hidden layer applies an affine trans-
formation based on the weight matrix and bias to its inputs (these inputs are the
outputs of the neurons in the previous layer), producing a value v. Then, the neu-
ron applies an activation function to v, in our example ReLU, which outputs v, if
v > 0, and 0 otherwise. Thus, the input to every neuron goes through two stages:
first, an affine transformation, followed by an activation function application. In
the last layer, a final affine transform is applied to yield the output of the entire
network, typically a class label that describes how the input is classified.

specification Suppose we work with a hypothetical image that contains only


two pixels and the perturbation is such that it places both pixels in the range [−1, 1]
(pixels are usually in the range [0, 1], however, we use [−1, 1] to better illustrate our
analysis). Our goal will be to prove that the output of the network at one of the
output neurons is always greater than the output at the other one, for any possible
input of two pixels in the range [−1, 1]. If the proof is successful, it implies that the
network produces the same classification label for all of these images.
118 deeppoly domain for certifying neural networks

hx1 > −1, hx3 > x1 + x2 , hx5 > 0, hx7 > x5 + x6 , hx9 > x7 , hx11 > x9 + x10 + 1,
x1 6 1, x3 6 x1 + x2 , x5 6 0.5 · x3 + 1, x7 6 x5 + x6 , x9 6 x7 , x11 6 x9 + x10 + 1,
l1 = −1, l3 = −2, l5 = 0, l7 = 0, l9 = 0, l11 = 1,
u1 = 1i u3 = 2i u5 = 2i u7 = 3i u9 = 3i u11 = 5.5i
0 0 1
[-1,1] 1 max(0, x3 ) 1 max(0, x7 ) 1
x1 x3 x5 x7 x9 x11
1 1 0

1 1 1

x2 x4 x6 x8 x10 x12
[-1,1] -1 max(0, x4 ) -1 max(0, x8 ) 1

0 0 0

hx2 > −1, hx4 > x1 − x2 , hx6 > 0, hx8 > x5 − x6 , hx10 > 0, hx12 > x10 ,
x2 6 1, x4 6 x1 − x2 , x6 6 0.5 · x4 + 1, x8 6 x5 − x6 , x10 6 0.5 · x8 + 1, x12 6 x10 ,
l2 = −1, l4 = −2, l6 = 0, l8 = −2, l10 = 0, l12 = 0,
u2 = 1i u4 = 2i u6 = 2i u8 = 2i u10 = 2i u12 = 2i

Figure 5.3: The neural network from Fig. 5.2 transformed for analysis with the DeepPoly
abstract domain.

abstract domain To perform the analysis, we introduce an abstract domain


with the appropriate abstract transformers that propagate the (abstract) input of
the network through the layers, computing an over-approximation of the possible
values at each neuron. Concretely, for our example, we need to propagate both
intervals [−1, 1] (one for each pixel) simultaneously. We now briefly discuss our
abstract domain, which aims to balance analysis scalability and precision. Then
we illustrate its effect on our example network and discuss why we made certain
choices in the approximation over others.
To perform the analysis, we first rewrite the network by expanding each neu-
ron into two nodes: one for the associated affine transform and one for the ReLU
activation. Transforming the network of Fig. 5.2 in this manner produces the net-
work shown in Fig. 5.3. Because we assign a variable to each node, the network
of Fig. 5.3 consists of n = 12 variables. Our abstract domain, formally described
in Section 5.3, associates two constraints with each variable xi : an upper polyhe-
dral constraint and a lower polyhedral constraint. Additionally, the domain tracks
auxiliary (concrete) bounds, one upper bound and one lower bound for each vari-
able, describing a bounding box of the concretization of the abstract element. Our
domain is less expressive than the Polyhedra domain [57] because it bounds the
number of conjuncts that can appear in the overall formula to 2 · n where n is
the number of variables of the network. Such careful restrictions are necessary
because supporting the full expressive power of convex polyhedra leads to an ex-
ponential number of constraints that make the analysis for thousands of neurons
practically infeasible. We now discuss the two types of constraints and the two
types of bounds, and show how they are computed on our example.
5.1 overview 119

First, the lower (a6


i ) and upper (a>
i ) relational polyhedral constraints associated
P
with xi have the form v + j wj · xj where v ∈ R ∪ {−∞, +∞}, w ∈ Rn , ∀j > i. wj = 0.
That is, a polyhedral constraint for xi can consider and refer to variables “before”
xi in the network, but cannot refer to variables “after” xi (because their coeffi-
cient is set to 0). Second, for the concrete lower and upper bounds of xi , we use
li , ui ∈ R ∪ {−∞, +∞}, respectively. All abstract elements a in our domain satisfy
the additional invariant that the interval [li , ui ] overapproximates the set of values
that the variable xi can take (we formalize this requirement in Section 5.3).

abstract interpretation of the network We now illustrate the opera-


tion of our abstract interpreter (using the DeepPoly domain above) on our example
network, abstract input ([−1, 1] for both pixels), and specification (which is to prove
that any image in the concretization of [−1, 1] × [−1, 1] classifies to the same label).
The analysis starts at the input layer, i.e., in our example from x1 and x2 , and
simply propagates the inputs, resulting in a6 6 > >
1 = a2 = −1, a1 = a2 = 1, l1 =
l2 = −1, and u1 = u2 = 1. Next, the affine transform at the first layer updates the
constraints for x3 and x4 . The abstract transformer first adds the constraints:

x1 + x2 6 x3 6 x1 + x2
(5.1)
x1 − x2 6 x4 6 x1 − x2

The transformer uses these constraints and the constraints for x1 , x2 to compute
l3 = l4 = −2 and u3 = u4 = 2.
Next, the transformer for the ReLU activation is applied. In general, the out-
put xj of the ReLU activation on variable xi is equivalent to the assignment
xj := max(0, xi ). If ui 6 0, then our abstract transformer sets the state of the vari-
able xj to 0 6 xj 6 0, lj = uj = 0. In this case, our abstract transformer is exact. If
li > 0, then our abstract transformer adds xi 6 xj 6 xi , lj = li , uj = ui . Again, our
abstract transformer is exact in this case.
However, when li < 0 and ui > 0, the result cannot be captured exactly by
our abstraction and we need to decide how to lose information. Fig. 5.4 shows
several candidate convex approximations of the ReLU assignment in this case. The
approximation [69] of Fig. 5.4 (a) minimizes the area in the xi , xj plane, and would
add the following relational constraints and concrete bounds for xj :

xi 6 xj , 0 6 xj ,
xj 6 ui · (xi − li )/(ui − li ). (5.2)
lj = 0, uj = ui .

However, the approximation in (5.2) contains two lower polyhedra constraints for
xj , which we disallow in our abstract domain. The reason for this is the potential
blowup of the analysis cost as it proceeds. We will explain this effect in more detail
later in this section.
120 deeppoly domain for certifying neural networks

xj xj xj

+ µ + µ + µ
λ · xi λ · xi λ · xi
xj = =
xi xj = xj =
xi xi xi
xj

li ui li ui li ui

xi
=
xj
(a) (b) (c)

Figure 5.4: Convex approximations for the ReLU function: (a) shows the convex approxi-
mation [69] with the minimum area in the input-output plane, (b) and (c) show
the two convex approximations used in DeepPoly. In the figure, λ = ui /(ui − li )
and µ = −li · ui /(ui − li ).

To avoid this explosion we further approximate (5.2) by allowing only one lower
bound. There are two ways of accomplish this, shown in Fig. 5.4 (b) and (c), both of
which can be expressed in our domain. During analysis we always consider both
and choose the one with the least area dynamically.
The approximation from Fig. 5.4 (b) adds the following constraints and bounds
for xj :
0 6 xj 6 ui · (xi − li )/(ui − li ),
(5.3)
lj = 0, uj = ui .
The approximation from Fig. 5.4 (c) adds the following constraints and bounds:

xi 6 xj 6 ui · (xi − li )/(ui − li ),
(5.4)
lj = li , uj = ui .

Note that it would be incorrect to set lj = 0 in (5.4) above (instead of lj = li ). The


reason is that this would break a key domain invariant which we aim to maintain,
namely that the concretization of the two symbolic bounds for xj is contained
inside the concretization of the concrete bounds lj and uj (we discuss this domain
invariant later in Section 5.3). In particular, if we only consider the two symbolic
bounds for xj , then xj would be allowed to take on negative values and these
negative values would not be included in the region [0, ui ]. This domain invariant
is important to ensure efficiency of our transformers and as we prove later, all of
our abstract transformers maintain it.
Returning to our example, the area of the approximation in Fig. 5.4 (b) is 0.5 ·
ui · (ui − li ) whereas the area in Fig. 5.4 (c) is 0.5 · −li · (ui − li ). We choose the
tighter approximation; i.e., when ui 6 −li , we add the constraints and the bounds
from (5.3); otherwise, we add the constraints and the bounds from (5.4). We note
5.1 overview 121

that the approximations in Fig. 5.4 (b) and (c) cannot be captured by the Zonotope
abstraction as used in [78, 191].
In our example, for both x3 and x4 , we have l3 = l4 = −2 and u3 = u4 = 2. The
areas are equal in this case; thus we choose (5.3) and get the following constraints
and bounds for x5 and x6 :
0 6 x5 6 0.5 · x3 + 1, l5 = 0, u5 = 2,
(5.5)
0 6 x6 6 0.5 · x4 + 1, l6 = 0, u6 = 2.
Next, we apply the abstract affine transformer, which first adds the following con-
straints for x7 and x8 :
x5 + x6 6 x7 6 x5 + x6 ,
(5.6)
x5 − x6 6 x8 6 x5 − x6 .
It is possible to compute bounds for x7 and x8 from the above equations by sub-
stituting the concrete bounds for x5 and x6 . However, the resulting bounds are in
general too imprecise. Instead, we can obtain better bounds by recursively substitut-
ing the polyhedral constraints until the bounds only depend on the input variables
for which we then use their concrete bounds. In our example we substitute the
relational constraints for x5 , x6 from equation (5.5) to obtain:
0 6 x7 6 0.5 · x3 + 0.5 · x4 + 2,
(5.7)
−0.5 · x4 − 1 6 x8 6 0.5 · x3 + 1.
Replacing x3 and x4 with the constraints in (5.1), we get:
0 6 x7 6 x1 + 2,
(5.8)
−0.5 · x1 + 0.5 · x2 − 1 6 x8 6 0.5 · x1 + 0.5 · x2 + 1.
Now we use the concrete bounds of ±1 for x1 , x2 to obtain l7 = 0, u7 = 3 and
l8 = −2, u8 = 2. Indeed, this is more precise than if we had directly substituted the
concrete bounds for x5 and x6 in (5.6) because that would have produced concrete
bounds l7 = 0, u7 = 4 (which are not as tight as the ones above).

avoiding exponential blowup of the analysis cost As seen above, to


avoid exponential cost, our analysis introduces exactly one polyhedral constraint
for the lower bound of a variable. It is instructive to understand the effect of intro-
ducing more than one constraint via the ReLU approximation of Fig. 5.4 (a). This
ReLU approximation introduces two lower relational constraints for both x5 and
x6 . Substituting them in (5.6) would have created four lower relational constraints
for x7 . More generally, if the affine expression for a variable xi contains p variables
with positive coefficients and n variables with negative coefficients, then the num-
ber of possible lower and upper relational constraints after substitution is 2p and
2n , respectively, leading to an exponential blowup. This is the reason why we keep
only one lower relational constraint for each variable in the network, which creates
exactly one lower and upper relational constraint after substitution, and use either
the ReLU transformer illustrated in Fig. 5.4 (b) or the one in Fig. 5.4 (c).
122 deeppoly domain for certifying neural networks

asymptotic runtime The computation of concrete bounds by the abstract


affine transformer in the hidden layers is the most expensive step of our analysis.
If there are L network layers and the maximum number of variables in a layer is
nmax , then this step for one variable is in O(n2max · L). Storing the concrete bounds
ensures that the subsequent ReLU transformer has constant cost.
All our transformers work point-wise for the variables in a layer; i.e., they are
independent for different variables since they only read constraints and bounds
from the previous layers. This makes it possible to parallelize our analysis on both
CPUs and GPUs. The work of [144] defines pointwise Zonotope transformers for
training neural networks on GPUs to be more robust against adversarial attacks.
The more precise Zonotope transformers of [186] were used for training more ro-
bust networks than [144] in [148]. Our pointwise transformers are more precise in
practice than those from [144, 186]. We believe that our transformers can be used
to train even more robust neural networks in the future.

precision vs. performance trade-off We also note that our approach


allows one to easily vary the precision-performance knob of the affine transformer
in the hidden layers: (i) we can select a subset of variables for which to perform
complete substitution all the way back to the first layer (the example above showed
this for all variables), and (ii) we can decide at which layer we would like to stop
the substitution and select the concrete bounds at that layer.
Returning to our example, next, the ReLU transformers are applied again. Since
l7 = 0, the ReLU transformer is exact for the assignment to x9 and adds the re-
lational constraints x7 6 x9 6 x7 and the bounds l9 = 0, u9 = 3 for x9 . However,
the transformer is not exact for the assignment to x10 and the following constraints
and bounds for x10 are added:
0 6 x10 6 0.5 · x8 + 1,
(5.9)
l10 = 0, u10 = 2.
Finally, the analysis reaches the output layer and the abstract affine transformer
adds the following constraints for x11 and x12 :
x9 + x10 + 1 6 x11 6 x9 + x10 + 1
(5.10)
x10 6 x12 6 x10
Again, backsubstitution up to the input layer yields l11 = 1, u11 = 5.5 and l12 =
0, u12 = 2. This completes our analysis of the neural network.

checking the specification Next, we check our specification, namely


whether all concrete output values of one neuron are always greater than all con-
crete output values of the other neuron, i.e., if
∀i1 , i2 ∈ [−1, 1] × [−1, 1], x11 > x12 or
∀i1 , i2 ∈ [−1, 1] × [−1, 1], x12 > x11 ,
5.2 background: neural networks and adversarial regions 123

where x11 , x12 = Nfc (i1 , i2 ) are the concrete values for variables x11 and x12 pro-
duced by our small fully-connected (fc) neural network Nfc for inputs i1 , i2 .
In our simple example, this amounts to proving whether x11 − x12 > 0 or x12 −
x11 > 0 holds given the abstract results computed by our analysis. Note that using
the concrete bounds for x11 and x12 , that is, l11 , l12 , u11 , and u12 leads to the bound
[−1, 5.5] for x11 − x12 and [−5.5, 1] for x12 − x11 and hence we cannot prove that
either constraint holds. To address this imprecision, we first create a new temporary
variable x13 and apply our abstract transformer for the assignment x13 := x11 − x12 .
Our transformer adds the following constraint:

x11 − x12 6 x13 6 x11 − x12 (5.11)

The transformer then computes bounds for x13 by backsubstitution (to the first
layer), as described so far, which produces l13 = 1 and u13 = 4. As the (concrete)
lower bound of x13 is greater than 0, our analysis concludes that x11 − x12 > 0 holds.
Hence, we have proved our (robustness) specification. Of course, if we had failed
to prove the property, we would have tried the same analysis using the second
constraint (i.e., x12 > x11 ). And if that would fail, then we would declare that we
are unable to prove the property. For our example, this was not needed since we
were able to prove the first constraint.

5.2 background: neural networks and adversarial regions

In this section, we provide the minimal necessary background on neural networks


and adversarial regions. Further, we show how we represent neural networks for
our analysis.

neural networks Neural networks are functions N : Rm → Rn that can


be implemented using straight-line programs (i.e., without loops) of a certain
form. In this work, we focus on neural networks that follow a layered architec-
ture, but all our methods can be used unchanged for more general neural net-
work shapes. A layered neural network is given by a composition of l layers
f1 : Rm → Rn1 , . . . , fl : Rnl−1 → Rn . Each layer fi is one of the following: (i) an
affine transformation fi (x) = Ax + b for some A ∈ Rni ×ni−1 and b ∈ Rni (in par-
ticular, convolution with one or more filters is an affine transformation), (ii) the
ReLU activation function f(x) = max(0, x), where the maximum is applied compo-
x x −x
nentwise, (iii) the sigmoid (σ(x) = exe+1 ) or the tanh (tanh(x) = eex −e
+e−x ) activation
function (again applied componentwise), or (iv) a maxpool operator, which sub-
divides the input x into multiple parts, and returns the maximal value in each
part.

neurons and activations Each component of one of the vectors passed


along through the layers is called a neuron, and its value is called an activation.
124 deeppoly domain for certifying neural networks

There are three types of neurons: m input neurons whose activations form the
input to the network, n output neurons whose activations form the output of the
network, and all other neurons, called hidden, as they are not directly observed.

classification For a neural network that classifies its inputs to multiple pos-
sible labels, n is the number of distinct classes, and the neural network classifies a
given input x to a given class k if N(x)k > N(x)j for all j with 1 6 j 6 n and j 6= k.

adversarial region In our evaluation, we consider the following (standard,


e.g., see [40]) threat model: an input is drawn from the input distribution, perturbed
by an adversary and then classified by the neural network. The perturbations that
the adversary can perform are restricted and the set X ⊆ Rn of possible perturba-
tions for a given input is called an adversarial region. The maximal possible error
(i.e., the fraction of misclassified inputs) that the adversary can obtain by picking a
worst-case input from each adversarial region is called the adversarial error. A neu-
ral network is robust for a given adversarial region if it classifies all inputs in that
region the same way. This means that it is impossible for an adversary to influence
the classification by picking an input from the adversarial region.
In our evaluation, we focus on certifying robustness for adversarial regions that
×
m
can be represented using a set of interval constraints, i.e., X = i=1 [li , ui ] for li , ui ∈
R ∪ {−∞, +∞}. We also show how to use our analyzer to certify robustness against
rotations which employ linear interpolation.

network representation For our analysis, we represent neural networks


as a sequence of assignments, one per hidden and per output neuron. We need
four kinds of assignments: ReLU assignments xi ← max(0, xj ), sigmoid/tanh as-
signments xi ← g(xj ) for g = σ or g = tanh, maxpool assignments xi ← maxj∈J xj
P
and affine assignments xi ← v + j wj · xj . Convolutional layers can be described
with affine assignments [78].
For example, we represent the neural network from Fig. 5.2 as the following
program:
x3 ← x1 + x2 , x4 ← x1 − x2 , x5 ← max(0, x3 ), x6 ← max(0, x4 ),
x7 ← x5 + x6 , x8 ← x5 − x6 , x9 ← max(0, x7 ), x10 ← max(0, x8 ),
x11 ← x9 + x10 + 1, x12 ← x10 .
In Fig. 5.2, the adversarial region is given by X = [−1, 1] × [−1, 1]. The variables x1
and x2 are the input to the neural network, and the variables x11 and x12 are the
outputs of the network. Therefore, the final class of an input (x1 , x2 ) is 1 if x11 > x12 ,
and 2 if x11 < x12 . To prove the considered specification, we either need to prove
that ∀(x1 , x2 ) ∈ X. x11 > x12 or that ∀(x1 , x2 ) ∈ X. x12 > x11 .
We note that even though our experimental evaluation focuses on different kinds
of robustness, our method and abstract domain are general and can also be used
to prove other properties, such as those in Fig. 1.7.
5.3 abstract domain and transformers 125

5.3 abstract domain and transformers

In this section, we introduce our abstract domain as well as the abstract trans-
formers needed to analyze the four kinds of assignment statements mentioned in
Section 5.2.
Elements in our abstract domain An consist of a set of polyhedral constraints
of a specific form, over n variables. Each constraint relates one variable to a linear
combination of the variables of a smaller index. Each variable has two associated
polyhedral constraints: one lower bound and one upper bound. In addition, the
abstract element records derived interval bounds for each variable. Formally, an
abstract element a ∈ An over n variables can be written as a tuple a = ha6 , a> , l, ui
where
P
a6 >
i , ai ∈ {x 7→ v + j∈[i−1] wj · xj | v ∈ R ∪ {−∞, +∞}, w ∈ Ri−1 } for i ∈ [n]

and l, u ∈ (R ∪ {−∞, +∞})n . Here, we use the notation [n] := {1, 2, . . . , n}. The
concretization function γn : An → P(Rn ) is then given by

γn (a) = {x ∈ Rn | ∀i ∈ [n]. a6 >


i (x) 6 xi ∧ ai (x) > xi }.

domain invariant All abstract elements in our domain additionally satisfy


×
the following invariant: γn (a) ⊆ i∈[n] [li , ui ]. In other words, every abstract el-
ement in our domain maintains concrete lower and upper bounds which over-
approximate the two symbolic bounds. This property is essential for creating effi-
cient abstract transformers.
To simplify our exposition of abstract transformers, we will only consider the
case where all variables are bounded, which is always the case when our analy-
sis is applied to neural networks. Further, we require that variables are assigned
exactly once, in increasing order of their indices. Our abstract transformers Tf# for
a deterministic function f : Am → An satisfy the following soundness property
based on Definition 3.1.1: Tf (γm (a)) ⊆ γn (Tf# (a)) for all a ∈ Am , where Tf is the
corresponding concrete transformer of f, given by Tf (X) = {f(x) | x ∈ X}.

5.3.1 ReLU Abstract Transformer

Let f : Ri−1 → Ri be a function that executes the assignment xi ← max(0, xj ) for


some j < i. The corresponding abstract ReLU transformer is Tf# (ha6 , a> , l, ui) =
ha 06 , a 0> , l 0 , u 0 i where ak06 =a6 0> > 0 0
k , ak = ak , lk = lk and uk = uk for k < i. For the
new component i, there are three cases. If uj 6 0, then ai06 (x) = ai0> (x) = 0 and
li0 = ui0 = 0. If 0 6 lj , then ai06 (x) = ai0> (x) = xj , li0 = lj and ui0 = uj .
Otherwise, the abstract ReLU transformer approximates the assignment by a
set of linear constraints forming the convex hull of the polyhedra obtained by
intersecting the interval constraints on the input lj 6 xj 6 uj with the constraints
126 deeppoly domain for certifying neural networks

from the two ReLU branches, i.e., the convex hull of {lj 6 xj 6 uj , xj 6 0, xi = 0}
and {lj 6 xj 6 uj , xj > 0, xi = xj }:

0 6 xi , xj 6 xi ,
xi 6 uj · (xj − lj )/(uj − lj ).

As there is only one upper bound for xi , we obtain the following rule:

ai0> (x) = uj · (xj − lj )/(uj − lj ).

On the other hand, we have two lower bounds for xi : xj and 0. Any convex com-
bination of those two constraints is still a valid lower bound. Therefore, we can
set
ai06 (x) = λ · xj ,
for any λ ∈ [0, 1]. We select the λ ∈ {0, 1} that minimizes the area of the resulting
shape in the (xi , xj )-plane. Finally, we set li0 = λ · lj and ui0 = uj .

5.3.2 Sigmoid and Tanh Abstract Transformers

Let g : R → R be a continuous, twice-differentiable function with g 0 (x) > 0 and


0 6 g 00 (x) ⇔ x 6 0 for all x ∈ R where g 0 and g 00 are the first and second derivatives
x x −x
of g. The sigmoid function σ(x) = exe+1 and the tanh function tanh(x) = eex −e +e−x both
satisfy these conditions. For such a function g, let f : Ri−1 → Ri be the function that
executes the assignment xi ← g(xj ) for j < i.
The corresponding abstract transformer is Tf# (ha6 , a> , l, ui) = ha 06 , a 0> , l 0 , u 0 i
where ak06 =a6 0> > 0 0
k , ak = ak , lk = lk and uk = uk for k < i. For the new component i,
we set li0 = g(lj ) and ui0 = g(uj ). If lj = uj , then ai06 (x) = ai0> (x) = g(lj ). Otherwise,
we consider ai06 (x) and ai0> (x) separately. Let λ = (g(uj ) − g(lj ))/(uj − lj ) and λ 0 =
min(g 0 (lj ), g 0 (uj )). If 0 < lj , then ai06 (x) = g(lj ) + λ · (xj − lj ), otherwise ai06 (x) =
g(lj ) + λ 0 · (xj − lj ). Similarly, if uj 6 0, then ai0> (x) = g(uj ) + λ · (xj − uj ) and
ai0> (x) = g(uj ) + λ 0 · (xj − uj ) otherwise.

5.3.3 Maxpool Abstract Transformer

Let f : Ri−1 → Ri be a function that executes xi ← maxj∈J xj for some J ⊆ [i − 1]. The
corresponding abstract maxpool transformer is Tf# (ha6 , a> , l, ui) = ha 06 , a 0> , l 0 , u 0 i
where ak06 =a6 0> > 0 0
k , ak = ak , lk = lk and uk = uk for k < i. For the new component
i, there are two cases. If there is some k ∈ J with uj < lk for all j ∈ J \ {k}, then
ai06 (x) = ai0> (x) = xk , li0 = lk and ui0 = uk . Otherwise, we choose k ∈ J such that lk
is maximized and set ai06 (x) = xk , li0 = lk and ai0> (x) = ui0 = maxj∈J uj .
5.3 abstract domain and transformers 127

5.3.4 Affine Abstract Transformer


P
Let f : Ri−1 → Ri be a function that executes xi ← v + j∈[i−1] wj · xj for some
w ∈ Ri−1 . The corresponding abstract affine transformer is Tf# (ha6 , a> , l, ui) =
ha 06 , a 0> , l 0 , u 0 i where ak06 =a6 0> > 0 0
k , ak = ak , lk = lk and uk = uk for k < i. Further,
P
ai06 (x) = ai0> (x) = v + j∈[i−1] wj · xj .
To compute li and ui , we repeatedly substitute bounds for xj into the constraint,
until no further substitution is possible. Formally, if we want to obtain li0 , we start
P
with b1 (x) = ai06 (x). If we have bs (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1],
v 0 ∈ R, w 0 ∈ Rk , then
X 
0 0 06 0 0>
bs+1 (x) = v + max(0, wj ) · aj (x) + min(wj , 0) · aj (x) .
j∈[k]

We iterate until we reach bs 0 with bs 0 (x) = v 00 (i.e., s 0 is the smallest number with
this property). We then set li0 = v 00 .
We compute ui0 in an analogous fashion: to obtain ui0 , we start with c1 (x) =
P
ai0> (x). If we have ct (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈ Rk ,
then X 
ct+1 (x) = v 0 + max(0, wj0 ) · aj0> (x) + min(wj0 , 0) · aj06 (x) .
j∈[k]

We iterate until we reach ct 0 with ct 0 (x) = v 00 . We then set ui0 = v 00 .

5.3.5 Neural Network Robustness Analysis

We now show how to use our analysis to prove robustness of a neural network
with p inputs, q hidden activations and r output classes, resulting in a total of
p + q + r activations. More explicitly, our goal is to prove that the neural network
classifies all inputs satisfying the given interval constraints (the adversarial region)
to a particular class k.
We first create an abstract element a = ha6 , a> , l, ri over p variables, where
a6 >
i (x) = li and ai (x) = ui for all i. The bounds li and ui are initialized such
that they describe the adversarial region. For example, for the adversarial region in
Fig. 5.2, we get
a = h(x 7→ l1 , x 7→ l2 ), (x 7→ u1 , x 7→ u2 ), (−1, −1), (1, 1)i.
Then, the analysis proceeds by processing assignments for all q hidden activations
and the r output activations of the neural network, layer by layer, processing nodes
in ascending order of variable indices, using their respective abstract transformers.
Finally, the analysis executes the following r − 1 (affine) assignments in the abstract:
xp+q+r+1 ← xp+q+k − xp+q+1 , . . . , xp+q+r+(k−1) ← xp+q+k − xp+q+(k−1) ,
xp+q+r+k ← xp+q+k − xp+q+(k+1) , . . . , xp+q+r+(r−1) ← xp+q+k − xp+q+r .
128 deeppoly domain for certifying neural networks

As output class k has the highest activation if and only if those differences are all
positive, the neural network is proved robust if for all i ∈ {p + q + r + 1, . . . , p + q +
r + (r − 1)} we have 0 < li . Otherwise, our robustness analysis fails to certify.
For the neural network in Fig. 5.2, if we want to prove that class 1 is most likely,
this means we execute one additional assignment x13 ← x11 − x12 . Abstract inter-
pretation derives the bounds l13 = 1, u13 = 4. The neural network is proved robust,
because l13 is positive.
The above discussion showed how to use our abstract transformers to prove
robustness. However, a similar procedure could be used to prove standard pre/post
conditions (by performing the analysis starting with the pre-condition).

5.3.6 Correctness of Abstract Transformers

In this section, we prove that our abstract transformers are sound, and that they
preserve the invariant. Formally, for Tf# (a) = a 0 we have Tf (γi−1 (a)) ⊆ γi (a 0 ) and
×
γi (a 0 ) ⊆ k∈[i] [lk0 , uk0 ].

soundness We first prove a lemma that is needed to prove soundness of our


ReLU transformer.
Lemma 5.3.1. For l < 0, 0 < u, l 6 x 6 u, and λ ∈ [0, 1] we have λ · x 6 max(0, x) 6
x−l
u · u−l .
Proof. If x < 0, then λ · x 6 0 = max(0, x). If x > 0, then λ · x 6 x = max(0, x). If
x−l x−l
x < 0, then max(0, x) = 0 6 u · u−l . If x > 0 then max(0, x) = x 6 u · u−l because
x · (−l) 6 u · (−l) ⇔ x · u − x · l 6 x · u − u · l ⇔ x · (u − l) 6 u · (x − l).
Theorem 5.3.2. The ReLU abstract transformer is sound.
Proof. Let f : Ri−1 → Ri execute the assignment xi ← max(0, xj ) for some j < i, and
×
let a ∈ Ai−1 be arbitrary. We have γi−1 (a) ⊆ k∈[i−1] [lk , uk ] and

Tf (γi−1 (a)) = {f(x) | x ∈ γi−1 (a)}


= {(x1 , . . . , xi−1 , max(0, xj )) | (x1 , . . . , xi−1 ) ∈ γi−1 (a)}
= {x ∈ Ri | (x1 , . . . , xi−1 ) ∈ γi−1 (a) ∧ xi = max(0, xj )}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max(0, xj )}.

If uj 6 0, we have that (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) implies xj 6 0, and

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max(0, xj )
∧ xj 6 0}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = 0}
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).
5.3 abstract domain and transformers 129

Otherwise, if 0 6 lj , we have that (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) implies
0 6 xj , and

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max(0, xj )
∧ 0 6 xj }
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = xj }
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).

Otherwise, we have lj < 0 and 0 < uj and that (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) >
xk ) implies lj 6 xj 6 uj and therefore

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max(0, xj )}
xj − lj
⊆ {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi 6 uj · u − l
j j
∧ xi > λ · xj }
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).

Therefore, in all cases, Tf (γi−1 (a)) ⊆ γi (Tf# (a)). Note that we lose precision only in
the last case.
Theorem 5.3.3. The sigmoid and tanh abstract transformers are sound.
Proof. A function g : R → R with g 0 (x) > 0 and 0 6 g 00 (x) ⇔ 0 6 x is monotoni-
cally increasing, and furthermore, g|(−∞,0] (the restriction to (−∞, 0]) is convex and
g|(0,∞) is concave.
Let f : Ri−1 → Ri execute the assignment xi ← g(xj ) for some j < i, and let
a ∈ Ai−1 be arbitrary. We have

Tf (γi−1 (a)) = {f(x) | x ∈ γi−1 (a)}


= {(x1 , . . . , xi−1 , g(xj )) | (x1 , . . . , xi−1 ) ∈ γi−1 (a)}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = g(xj )}.

If lj = uj , then (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) implies xj = lj and therefore

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = g(xj )}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = g(lj )}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ g(lj ) 6 xi
∧ xj 6 g(lj )}
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).
130 deeppoly domain for certifying neural networks

Therefore, the transformer is exact in this case.


Otherwise, we need to show that (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧
xi = g(xj ) implies ai06 (x) 6 xi and ai0> (x) > xi . We let x ∈ Ri be arbitrary with
06 0>
(∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = g(xj ) and consider ai (x) and ai (x)
separately. Recall that λ = (g(uj ) − g(lj ))/(uj − lj ) and λ 0 = min(g 0 (lj ), g 0 (uj )). If
0 6 lj , then, because g is concave on positive inputs,
 
06 xj − lj xj − lj
ai (x) = g(lj ) + λ · (xj − lj ) = 1 − · g(lj ) + · g(uj )
uj − lj uj − lj
  
xj − lj xj − lj
6g 1− · lj + · uj = g(xj ) = xi .
uj − lj uj − lj

Otherwise, because g 0 is non-decreasing on (−∞, 0] and decreasing on (0, ∞), we


have that λ 0 = min(g 0 (lj ), g 0 (uj )) 6 g 0 (ξ) for all ξ ∈ [lj , uj ]. Therefore,
Z xj Z xj
06 0 0
ai (x) = g(lj ) + λ · (xj − lj ) = g(lj ) + λ dξ 6 g(lj ) + g 0 (ξ)dξ = g(xj ).
lj lj

The proof of ai0> (x) > xi is analogous.


We conclude

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = g(xj )}
06
⊆ {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ ak (x) 6 xi
∧ ai0> (x) > xi }
= {x ∈ Ri | (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk )}
= γi (Tf# (a)).

where the inclusion is strict because we have dropped the constraint xi = g(xj ).
Therefore, the abstract transformer is sound.

Theorem 5.3.4. The maxpool abstract transformer is sound.

Proof. Let f : Ri−1 → Ri execute the assignment xi ← maxj∈J (0, xj ) for some J ⊆
[i − 1], and let a ∈ Ai−1 be arbitrary. We have

Tf (γi−1 (a)) = {f(x) | x ∈ γi−1 (a)}


= {(x1 , . . . , xi−1 , max xj ) | (x1 , . . . , xi−1 ) ∈ γi−1 (a)}
j∈J
= {x ∈ R | (∀k ∈ [i − 1]. a6
i >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max xj }.
j∈J
5.3 abstract domain and transformers 131

There are two cases. If there is some k ∈ J with uj < lk for all j ∈ J \ {k}, then
(∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) implies that maxj∈J xj = xk and therefore

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max xj } j∈J
= {x ∈ R | i
(∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = xk }
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).

Otherwise, the transformer chooses a k with maximal lk . We also know that (∀k ∈
[i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) implies xj 6 uj for all j ∈ J, and therefore

Tf (γi−1 (a)) = {x ∈ Ri | (∀k ∈ [i − 1]. a6 >


k (x) 6 xk ∧ ak (x) > xk ) ∧ xi = max xj )} j∈J
⊆ {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk ) ∧ xk 6 xi
∧ max uj > xi )}
j∈J
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).

In summary, in both cases, Tf (γi−1 (a)) ⊆ γi (Tf# (a)).

Theorem 5.3.5. The affine abstract transformer is sound and exact.


P
Proof. Let f : Ri−1 → Ri execute the assignment xi ← v + j∈[i−1] wj · xj for some
v ∈ R,w ∈ Ri−1 , and let a ∈ Ai−1 be arbitrary. We have

Tf (γi−1 (a)) = {f(x) | x ∈ γi−1 (a)}


P
= {(x1 , . . . , xi−1 , v + j∈[i−1] wj · xj ) | (x1 , . . . , xi−1 ) ∈ γi−1 (a)}
P
= {x ∈ Ri | (x1 , . . . , xi−1 ) ∈ γi−1 (a) ∧ xi = v + j∈[i−1] wj · xj )}
= {x ∈ Ri | (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk )
P
∧ xi = v + j∈[i−1] wj · xj )}
= {x ∈ Ri | ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk }
= γi (Tf# (a)).

Thus, Tf (γi−1 (a)) = γi (Tf# (a)).

invariant We now prove that our abstract transformers preserve the


invariant. For each of our abstract transformers Tf# , we have to show that
×
for Tf# (a) = a 0 , we have γi (a 0 ) ⊆ j∈[i] [lj0 , uj0 ]. Note that the constraints
(∀k ∈ [i]. ak06 (x)
6 xk ∧ ak0> (x)
> xk ) include all constraints of a. We first as-
sume that the invariant holds for a; thus, (∀k ∈ [i − 1]. a6 >
k (x) 6 xk ∧ ak (x) > xk )
132 deeppoly domain for certifying neural networks

implies the bounds (∀k ∈ [i − 1]. lk 6 xk 6 uk ), which are equivalent to


(∀k ∈ [i − 1]. lk0 6 xk 6 uk0 ), because our abstract transformers preserve
the bounds of existing variables. It therefore suffices to show that
(∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies li0 6 xi 6 ui0 .

Theorem 5.3.6. The ReLU abstract transformer preserves the invariant.

Proof. If uj 6 0, we have ai06 (x) = ai0> (x) = 0 and therefore (∀k ∈ [i]. ak06 (x) 6
xk ∧ ak0> (x) > xk ) implies 0 = li0 = ai06 (x) 6 xi 6 ai0> (x) = ui0 = 0. If 0 6 lj , we have
ai06 (x) = ai0> (x) = xj and therefore (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies
li0 = lj 6 xj = xi 6 uj = ui0 . Otherwise, we have lj < 0 and 0 < uj , as well as
x −l
a 06 (x)i = λ · xj , a 0> (x)i = uj · ujj −ljj , and so (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk )
implies li0 = λ · lj 6 xi 6 uj = uj0 .

Theorem 5.3.7. The sigmoid and tanh abstract transformers preserve the invariant.

Proof. The constraints (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) imply lj 6 xj 6 uj and
by monotonicity of g, we obtain li0 = g(lj ) 6 xi 6 g(uj ) = ui0 using xi = g(xj ).

Theorem 5.3.8. The maxpool abstract transformer preserves the invariant.

Proof. The maxpool transformer either sets ai06 (x) = ai0> (x) = xk and li0 = lk and
ui0 = uk , in which case (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies li0 = lk 6
xk = xi 6 uk = ui0 , or it sets ai06 (x) = xk , li0 = lk and ui0 = ai0> (x), such that
(∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ), which implies li0 6 xi 6 ui0 .

Theorem 5.3.9. The affine abstract transformer preserves the invariant.

Proof. Note that s 0 and t 0 are finite, because in each step, the maximal index of a
variable whose coefficient in, respectively, bs and ct is nonzero decreases by at least
one. Assume ∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk . We have to show that bs 0 (x) 6 xi
and ct 0 (x) > xi . It suffices to show that ∀s ∈ [s 0 ]. bs (x) 6 xi and ∀t ∈ [t 0 ]. ct (x) > xi .
To show ∀s ∈ [s 0 ]. bs (x) 6 xi , we use induction on s. We have b1 (x) = ai06 (x) 6 xi .
P
Assuming bs (x) 6 xi and bs (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈
Rk , we have
X
xi > bs (x) = v 0 + wj0 · xj
j∈[k]
X
= v0 + (max(0, wj0 ) ·xj + min(wj0 , 0) ·xj )
j∈[k]
| {z } | {z }
>0 60
X
> v0 + (max(0, wj0 ) · aj06 (x) + min(wj0 , 0) · aj0> (x))
j∈[k]
= bs+1 (x).
5.3 abstract domain and transformers 133

To show ∀t ∈ [t 0 ]. ct (x) > xi , we use induction on t. We have c1 (x) = ai0> (x) > xi .
P
Assuming ct (x) > xi and ct (x) = v 0 + j∈[k] wj0 · xj for some k ∈ [i − 1], v 0 ∈ R, w 0 ∈
Rk , we have
X
xi 6 ct (x) = v 0 + wj0 · xj
j∈[k]
X
= v0 + (max(0, wj0 ) ·xj + min(wj0 , 0) ·xj )
j∈[k]
| {z } | {z }
>0 60
X
6 v0 + (max(0, wj0 ) · aj0> (x) + min(wj0 , 0) · aj06 (x))
j∈[k]
= ct+1 (x).

Therefore, (∀k ∈ [i]. ak06 (x) 6 xk ∧ ak0> (x) > xk ) implies li0 6 xi 6 ui0 .

5.3.7 Soundness under Floating-Point Arithmetic

Our abstract domain and its transformers above are sound under real arithmetic
but unsound under floating-point arithmetic if one does not take care of the
rounding errors. To obtain soundness, let F be the set of floating-point values
and ⊕f , f , ⊗f , f be the floating-point interval addition, subtraction, multiplica-
tion, and division, respectively, as defined in [141] with lower bounds rounded
towards −∞ and upper bounds rounded towards +∞. For a real constant c, we
use c− , c+ ∈ F to denote the floating-point representation of c with rounding to-
wards −∞ and +∞ respectively. We use the standard interval linear form, where
the coefficients in the constraints are intervals instead of scalars, to define an ab-
stract element a ∈ An over n variables in our domain as a tuple a = ha6 , a> , l, ui
where for i ∈ [n]:
P
a6 > − +
i , ai ∈{x 7→ [v , v ] ⊕f
− +
j∈[i−1] [wj , wj ] ⊗f xj | v− , v+ ∈ F ∪ {−∞, +∞},
w− , w+ ∈ Fi−1 }

and l, u ∈ (F ∪ {−∞, +∞})n . For a floating-point interval [li , ui ], let inf and sup
be functions that return its lower and upper bound. The concretization function
γn : An → P(Fn ) is given by

γn (a) = {x ∈ Fn | ∀i ∈ [n].inf(a6 >


i (x)) 6 xi ∧ xi 6 sup(ai (x))}.

We next modify our abstract transformers for soundness under floating-point


arithmetic. It is straightforward to modify the maxpool transformer so we only
show our modifications for the ReLU, sigmoid, tanh, and affine abstract transform-
ers assigning to the variable xi .
134 deeppoly domain for certifying neural networks

relu abstract transformer It is straightforward to handle the cases lj > 0


or uj 6 0. For the remaining case, we add the following constraints:

[λ, λ] ⊗f xj 6 [1, 1] ⊗f xi ,
[1, 1] ⊗f xi 6 [ψ− , ψ+ ] ⊗f xj ⊕f [µ− , µ+ ].

where λ ∈ {0, 1} and [ψ− , ψ+ ] = [u− + − + − + − +


j , uj ] f ([uj , uj ] f [lj , lj ]), [µ , µ ] =
([−l+ − − + − + − +
j , −lj ] ⊗f [uj , uj ]) f ([uj , uj ] f [lj , lj ]). Finally, we set li = λ · lj and

ui = u+ j .

sigmoid and tanh abstract transformers We consider the case when


lj < 0. We soundly compute an interval for the possible values of λ under any
rounding mode as [λ− , λ+ ] = ([g(uj )− , g(uj )+ ] f [g(lj )− , g(lj )+ ]) f ([u− +
j , u j ] f
[l− + 0 0
j , lj ]). Similarly, both g (lj ) and g (uj ) are soundly abstracted by the intervals
[g 0 (lj )− , g 0 (lj )+ ] and [g 0 (uj )− , g 0 (uj )+ ], respectively. Because of the limitations of
the floating-point format, it can happen that the upper polyhedral constraint with
slope λ passing through lj intersects the curve at a point < uj . This happens fre-
quently for smaller perturbations. To ensure soundness, we detect such cases and
return the box [g(lj )− , g(uj )+ ]. Other computations for the transformers can be han-
dled similarly.

affine abstract transformer The affine abstract transformer xi ← v +


P i−1 first adds the interval linear constraints a 06 (x) =
j∈[i−1] wj · xj for some w ∈ F
P i
ai0> (x) = [v− , v+ ] ⊕f j∈[i−1] [w− j , w +
j ] ⊗ x
f j .
We modify the backsubstitution for the computation of li and ui . For-
mally, if we want to obtain li0 , we start with b1 (x) = ai06 (x). If we have
P
bs (x) = [v 0− , v 0+ ] ⊕f j∈[k] [wj0− , wj0+ ] ⊗f xj for some k ∈ [i − 1], v 0− , v 0+ , wj0− , wj0+ ∈
F, then


[w 0− , wj0+ ] ⊗f aj06 (x), if wj0− > 0,
X j
bs+1 (x) = [v 0− , v 0+ ] ⊕f [wj0− , wj0+ ] ⊗f aj0> (x), if wj0+ 6 0,

j∈[k] 
[θ− , θ+ ], otherwise.
l l

Note that ⊕f and ⊗f as defined in [141] add extra error terms that are not shown
above for simplicity so that our results contain all values that can arise by executing
the different additions and multiplications in different orders. Here, [θ− l , θl ] ∈ F
+

are the floating-point values of the lower bound of the interval [wj0− , wj0+ ] ⊗f [lj , uj ]
rounded towards −∞ and +∞ respectively. We iterate until we reach bs 0 with
bs 0 (x) = [v 00− , v 00+ ], i.e., s 0 is the smallest number with this property. We then set
li0 = v 00− . We compute ui0 analogously.
5.4 refinement of analysis results 135

Algorithm 5.1 Rotate image I by θ degrees.


procedure Rotate(I, θ)
Input: I ∈ [0, 1]m×n , θ ∈ [−π, π], Output: R ∈ [0, 1]m×n
for i ∈ {1, . . . , m}, j ∈ {1, . . . , n} do
(x, y) ← (j − (n + 1)/2, (m + 1)/2 − i)
(x 0 , y 0 ) ← (cos(−θ) · x − sin(−θ) · y, sin(−θ) · x + cos(−θ) · y)
0 , i0 0 0
(ilow high ) ← (max(1, b(m + 1)/2 − y c), min(m, d(m + 1)/2 − y e))
0 , j0 0 0
(jlow high ) ← (max(1, bx + (n + 1)/2c), min(n, dx + (n + 1)/2e))
Pihigh Pjhigh
0 0 p
t ← i 0 =i 0 j 0 =j 0
max(0, 1 − (j 0 − x 0 )2 + (i 0 − y 0 )2 )
low low
if t 6= 0 then
Pihigh
0
Pjhigh
0 p
Ri,j ← (1/t) · 0
i 0 =ilow 0
j 0 =jlow
max(0, 1 − (j 0 − x 0 )2 + (i 0 − y 0 )2 ) · Ii 0 ,j 0
else
Ri,j ← 0
end if
end for
return R
end procedure

5.4 refinement of analysis results

In this section, we show how to apply a form of abstraction refinement based on


trace partitioning [169] to certify robustness for more complex adversarial regions,
which cannot be exactly represented using a set of interval constraints. In particular,
we will show how to handle adversarial regions that, in addition to permitting
small perturbations to each pixel, allow the adversary to rotate the input image by
an angle θ ∈ [α, β] within an interval.

certifying robustness against image rotations Consider Algo-


rithm 5.1, which rotates an m × n-pixel (grayscale) image by an angle θ. To compute
the intensity Ri,j of a given output pixel, it first computes the (real-valued) position
(x 0 , y 0 ) that would be mapped to the position of the center of the pixel. Then, it
performs linear interpolation: it forms a convex combination of pixels in the neigh-
borhood of (x 0 , y 0 ), such that the contribution of each pixel is proportional to the
distance to (x 0 , y 0 ), cutting off contributions at distance 1.
Our goal is to certify that a neural network N : Rm×n → Rr classifies all images
obtained by rotating an input image using Algorithm 5.1 with an angle θ ∈ [α, β] ⊆
[−π, π] in the same way. More generally, if we have an adversarial region X ⊆
Rm×n (represented using componentwise interval constraints), we would like to
certify that for any image I ∈ X and any angle θ ∈ [α, β], the neural network N
classifies Rotate(I, θ) to a given class k. This induces a new adversarial region
X 0 = {Rotate(I, θ) | I ∈ X, θ ∈ [α, β]}. Note that because we deal with regions (and
not only concrete images) as well as rotations that employ linear interpolation,
136 deeppoly domain for certifying neural networks

we cannot simply enumerate all possible rotations as done for simpler rotation
algorithms and concrete images [159].

interval specification of X 0 We certify robustness against rotations by de-


riving lower and upper bounds on the intensities of all pixels of the rotated image.
We then certify that the neural network classifies all images satisfying those bounds
to class k. To obtain bounds, we apply abstract interpretation to Algorithm 5.1,
using the interval domain (more powerful numerical domains could be applied).
We use standard interval domain transformers, except to derive bounds on t and
Ri,j , which we compute (at the same time), by enumerating all possible integer
values of ilow0 , i0 0 0 0 0
high , jlow and jhigh (respecting the constraints ilow + 1 > ihigh and
0
jlow 0
+ 1 > jhigh , and refining the intervals for x 0 and y 0 based on the known values
0
ilow 0 ) and joining the intervals resulting from each case. For each case, we
and jlow
compute intervals for Ri,j in two ways: once using interval arithmetic, restricting
partial sums to the interval [0, 1], and once by observing that a convex combination
of pixel values will be contained in the union of intervals for the individual values.
We intersect the intervals resulting from both approaches.

refinement of abstract inputs by trace partitioning For large


enough intervals [α, β], the derived bounds often become too imprecise. Thus,
when our analyzer is invoked with these bounds, it may fail to certify the prop-
erty, even though it actually holds. We can make the following simple observation:
if we have n sets X10 , . . . , Xn0 that cover the adversarial region X 0 , i.e. X 0 ⊆ n 0
S
i=1 Xi ,
then it suffices to certify that the neural network N classifies all input images to
class k for each individual input region Xi0 for i ∈ {1, . . . , n}. We obtain X10 , . . . , Xn0
by subdividing the interval [α, β] into n equal parts: {Rotate(I, θ) | I ∈ X, θ ∈
[(i − 1)/n · (β − α) + α, i/n · (β − α) + α]} ⊆ Xi0 . Note that each Xi0 is obtained by
running the interval analysis on the rotation code with the given angle interval
and the adversarial region X. After obtaining all Xi0 ’s, we run our neural network
analyzer separately with each Xi0 as input.

batching As interval analysis tends to be imprecise for large input intervals,


we usually need to subdivide the interval [α, β] into many parts to obtain precise
enough output intervals from the interval analysis (a form of trace partitioning
[169]). Running our neural network analysis for each of these can be too expensive.
Instead, we use a separate refinement step to obtain more precise interval bounds
for larger input intervals. We further subdivide each of the n intervals into m parts
each, for a total of n · m intervals in n batches. For each of the n batches, we then
run interval analysis m times, once for each part, and combine the results using
a join, i.e., we compute the smallest common bounding box of all output regions
in a batch. The additional refinement within each batch preserves dependencies
5.5 experimental evaluation 137

between variables that a plain interval analysis would ignore, and thus yields more
precise boxes X10 , . . . , Xn0 , on which we run the neural network analysis.
Using the approach outlined above, we were able to certify, for the first time,
that the neural network is robust to non-trivial rotations of all images inside an
adversarial region. Interval-based regions for geometric transformations were also
derived in the work of [149]. We note that in our follow-up work [15], we obtain
tighter polyhedral regions based on a combination of sampling and Lipschitz op-
timization. We also handle other geometric transformations such as translation,
scaling, shearing as well as any arbitrary composition of these. The robustness of
the network is then analyzed using DeepPoly for the obtained polyhedral region.
Our results using the polyhedral region are more precise than with the interval
region presented here. Further, the approach of [15] when combined the GPU im-
plementation GPUPoly [152] of DeepPoly enables the analysis of a 18-layer neural
network containing more than 0.5M neurons within 30 minutes.

5.5 experimental evaluation

In this section, we evaluate the effectiveness of our approach for certifying the
robustness of a large, challenging, and diverse set of neural networks for ad-
versarial regions generated by both changes in pixel intensity as well as im-
age rotations. We implemented our method in the ERAN analyzer [3]. ERAN
is written in Python and the abstract transformers of DeepPoly domain are im-
plemented on top of the ELINA library [1, 190] for numerical abstractions. We
have implemented both a sequential and a parallel version of our transformers.
All code, networks, datasets, and results used in our evaluation are available at
https://github.com/eth-sri/eran. We compared the precision and performance of
DeepPoly against the three state-of-the-art systems that can scale to larger net-
works:

• AI2 by [78] uses the Zonotope abstract domain [81] implemented in ELINA
for performing abstract interpretation of fully-connected and convolutional
ReLU networks. Their transformers are generic and based on standard nu-
merical domains used for program analysis. Therefore they do not exploit
the structure of ReLU. As a result, AI2 is often slow and imprecise.

• Fast-Lin by [211] performs layerwise linear approximations tailored to exploit


the structure of fully-connected ReLU networks. We note that Fast-Lin is not
sound under floating-point arithmetic and does not support convolutional
networks. Nonetheless, we still compare to it despite the fact it may contain
false negatives [107] (adapting their method to be sound in floating-point
arithmetic is non-trivial).

• DeepZ by [191] provides specialized Zonotope transformers for handling


ReLU, sigmoid, and tanh activations, and supports both fully-connected and
138 deeppoly domain for certifying neural networks

convolutional networks. It is worth mentioning that although Fast-Lin and


DeepZ employ very different techniques for robustness analysis, both can be
shown to have the same precision on fully-connected neural networks with
ReLU activations. On our benchmarks, DeepZ was often faster than Fast-Lin.
Our experimental results indicate that DeepPoly is always more precise and
faster than all three competing tools on our benchmarks. This demonstrates the
suitability of DeepPoly for the task of robustness certification of larger neural net-
works.

5.5.1 Experimental setup

All of our experiments for the feedforward networks were run on a 3.3 GHz 10 core
Intel i9-7900X Skylake CPU with a main memory of 64 GB; our experiments for the
convolutional networks were run on a 2.6 GHz 14 core Intel Xeon CPU E5-2690
with 512 GB of main memory. We next describe our experimental setup including
the datasets, neural networks, and adversarial regions.

evaluation datasets. We used the popular MNIST [124] and CIFAR10 [118]
image datasets for our experiments. MNIST contains grayscale images of size
28 × 28 pixels and CIFAR10 consists of RGB images of size 32 × 32 pixels. For
our evaluation, we chose the first 100 images from the test set of each dataset. For
the task of robustness certification, out of these 100 images, we considered only
those that were correctly classified by the neural network.

neural networks. Table 5.1 shows the MNIST and the CIFAR10 neural net-
work architectures used in our experiments. The architectures considered in our
evaluation contain up to 88K hidden units. We use networks trained with adversar-
ial training, i.e., defended against adversarial attacks, as well as undefended net-
works. We used DiffAI by [144] and projected gradient descent (PGD) from [64] for
adversarial training. In our evaluation, when we consider the certified robustness
of the defended and undefended networks with the same architecture together, we
append the suffix Point to the name of a neural network trained without adversarial
training and the name of the training procedure (either DiffAI or PGD) to the name
of a defended network. In the table, the FFNNSigmoid and FFNNTanh networks use sig-
moid and tanh activations, respectively. All other networks use ReLU activations.
The FFNNSmall and FFNNMed network architectures for both MNIST and CIFAR10
datasets were taken from [78] whereas the FFNNBig architectures were taken from
[211]. The ConvSmall, ConvBig, and ConvSuper architectures were taken from [144].

adversarial regions We consider the following adversarial regions:


1. L∞ -norm [40]: This region is parameterized by a constant  and contains all
perturbed images x 0 where each pixel xi0 has a distance of at most  from the
5.5 experimental evaluation 139

Table 5.1: Neural network architectures used in our experiments.


Dataset Model Type #Hidden units #Hidden layers

MNIST FFNNSmall fully-connected 510 6


FFNNMed fully-connected 1 610 9
FFNNBig fully-connected 3 072 4
FFNNSigmoid fully-connected 3 000 6
FFNNTanh fully-connected 3 000 6
ConvSmall convolutional 3 604 3
ConvBig convolutional 48 064 6
ConvSuper convolutional 88 544 6

CIFAR10 FFNNSmall fully-connected 6 10 6


FFNNMed fully-connected 1 810 9
FFNNBig fully-connected 6 144 7
ConvSmall convolutional 4 852 3
ConvBig convolutional 62 464 6

corresponding pixel xi in the original input x. We use different values of  in


our experiments. In general, we use smaller  values for the CIFAR10 dataset
compared to the MNIST dataset since the CIFAR10 networks are known to be
less robust against L∞ -norm based adversarial regions with larger  values
[211].
2. Rotation: The input image is first perturbed using a perturbation bounded
by  in the L∞ -norm. All resulting images are then rotated by Algorithm 5.1
using an arbitrary θ ∈ [α, β]. The region Rx,,[α,β] contains all images that can
be obtained in this way.

5.5.2 L∞ -Norm Perturbation

We first compare the precision and performance of DeepPoly vs AI2 , Fast-Lin, and
DeepZ for robustness certification against L∞ -norm based adversarial attacks on
the MNIST FFNNSmall network. We note that it is straightforward to parallelize
Fast-Lin, DeepZ, and DeepPoly. However, the abstract transformers in AI2 cannot
be efficiently parallelized. To ensure fairness, we ran all four analyzers in single
threaded mode. Fig. 5.5 compares the percentage of certified adversarial regions
and the average runtime in seconds per -value of all four analyzers. We used
six different values for  shown on the x-axis. For all analyzers, the number of
certified regions decreases with increasing values of . As can be seen, DeepPoly
is the fastest and the most precise analyzer on the FFNNSmall network. DeepZ has
140 deeppoly domain for certifying neural networks

Certified robustness Time (s)


80
AI2 AI2
100% Fast-Lin Fast − Lin
DeepZ 60 DeepZ
80% DeepPoly DeepPoly

60% 40

40%
20
20%

0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030

(a) MNIST FFNNSmall (b) MNIST FFNNSmall

Figure 5.5: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly against AI2 , Fast-Lin, and DeepZ on the MNIST FFNNSmall. DeepZ and
Fast-Lin are equivalent in robustness.

the exact same precision as Fast-Lin but is up to 2.5x faster. AI2 has significantly
worse precision and higher runtime than all other analyzers.
Based on our results in Fig. 5.5, we compare the precision and performance of the
parallelized versions of DeepPoly and DeepZ for all of our remaining experiments.

mnist fully-connected networks Fig. 5.6 considers the MNIST FFNNMed


and FFNNBig networks and compares the percentage of adversarial regions on which
the neural networks are certified to be robust and the average runtime, per  value,
for both DeepPoly and DeepZ. Both networks were trained without adversarial
training. DeepPoly certifies more than DeepZ on both networks. As an example,
considering  = 0.01, we notice that DeepPoly certifies 69% of the regions on the
FFNNMed network, whereas DeepZ certifies only 46%. The corresponding numbers
on the FFNNBig network are 79% and 58% respectively. DeepPoly is also significantly
faster than DeepZ and achieves a speedup of up to 4x and 2.5x on the FFNNMed and
FFNNBig networks, respectively.
We compare the average percentage of ReLU inputs that can take both positive
and negative values per -value for the MNIST FFNNSmall and FFNNMed neural net-
works in Fig. 5.7. Since the ReLU transformer in both DeepPoly and DeepZ is
inexact for such inputs, it is important to reduce their percentage. For both net-
works, DeepPoly produces strictly less inputs for which the ReLU transformer is
inexact than DeepZ.
In Fig. 5.8, we compare the precision of DeepPoly and DeepZ on the MNIST
FFNNSigmoid and FFNNTanh networks. Both networks were trained using PGD-based
adversarial training. On both networks, DeepPoly is strictly more precise than
DeepZ. For the FFNNSigmoid network, there is a sharp decline in the number of
5.5 experimental evaluation 141

Certified robustness Time (s)


8
DeepZ DeepZ
100% DeepPoly DeepPoly
6
80%

60% 4

40%
2
20%

0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030

(a) MNIST FFNNMed (b) MNIST FFNNMed

Certified robustness Time (s)


50
DeepZ DeepZ
100% DeepPoly DeepPoly
40
80%
30
60%
20
40%

20% 10

0% 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030

(c) MNIST FFNNBig (d) MNIST FFNNBig

Figure 5.6: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST FFNNMed and FFNNBig networks.

regions certified by DeepZ starting at  = 0.02. DeepZ certifies only 23% of the re-
gions when  = 0.03; in contrast, DeepPoly certifies 80%. Similarly, for the FFNNTanh
network, DeepZ only certifies 1% of the regions when  = 0.015, whereas Deep-
Poly certifies 94%. We also note that DeepPoly is more than 2x faster than DeepZ
on both these networks (we omit the relevant plots here as timings do not change
with increasing values of ): DeepZ has an average runtime of 6 35 seconds on
both networks whereas DeepPoly has an average runtime of 6 15 seconds on both.

mnist convolutional networks Fig. 5.9 compares the precision and aver-
age runtime of DeepPoly vs DeepZ on the MNIST ConvSmall networks. We consider
three types of ConvSmall networks based on their training method: (a) undefended
(Point), (b) defended with PGD (PGD), and (c) defended with DiffAI (DiffAI). Note
that our convolutional networks are more robust than the fully-connected networks
142 deeppoly domain for certifying neural networks

% ReLU inputs % ReLU inputs


DeepZ DeepZ
100 100
DeepPoly DeepPoly

80 80

60 60

40 40

20 20

0 0
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030

(a) MNIST FFNNSmall (b) MNIST FFNNMed

Figure 5.7: Average percentage of ReLU inputs that can take both positive and negative val-
ues for DeepPoly and DeepZ on the MNIST FFNNSmall and FFNNMed networks.

Certified robustness Certified robustness


DeepZ
DeepZ
100% DeepPoly 100% DeepPoly

80% 80%

60% 60%

40% 40%

20% 20%

0% 0%
0.005 0.010 0.015 0.020 0.025 0.030 0.005 0.010 0.015 0.020 0.025 0.030

(a) MNIST FFNNSigmoid (b) MNIST FFNNTanh

Figure 5.8: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST FFNNSigmoid and FFNNTanh networks.

and thus the values of  considered in our experiments are higher than those for
fully-connected networks.
As expected, both DeepPoly and DeepZ certify more regions on the defended
neural networks than on the undefended one. This is because the adversarially
trained networks produce fewer inputs, where the ReLU transformer loses signif-
icant precision. We notice that ConvSmall trained with DiffAI is the most provably
robust network. Overall, DeepPoly certifies more regions than DeepZ on all neu-
ral networks for all  values. The precision gap between DeepPoly and DeepZ
increases with increasing . For the largest  = 0.12, the percentage of regions cer-
tified by DeepZ on the Point, PGD, and DiffAI networks are 7%, 38%, and 53%
5.5 experimental evaluation 143

Certified robustness Time (s)


3
DiffAI_DeepZ DiffAI_DeepPoly DiffAI_DeepZ DiffAI_DeepPoly
PGD_DeepZ PGD_DeepPoly PGD_DeepZ PGD_DeepPoly
Point_DeepZ Point_DeepPoly Point_DeepZ Point_DeepPoly
100%
2

50% 1

0% 0
0.020 0.040 0.060 0.080 0.100 0.120 0.020 0.040 0.060 0.080 0.100 0.120

(a) MNIST ConvSmall (b) MNIST ConvSmall

Figure 5.9: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the MNIST ConvSmall networks.

Table 5.2: Certified robustness by DeepZ and DeepPoly on the large convolutional net-
works trained with DiffAI.
Dataset Model  % Certified robustness Average runtime

DeepZ DeepPoly DeepZ DeepPoly

MNIST ConvBig 0.1 97 99 5 8


ConvBig 0.2 79 88 7 8
ConvBig 0.3 37 77 17 8
ConvSuper 0.1 97 98 133 39

CIFAR10 ConvBig 0.006 50 52 39 23


ConvBig 0.008 33 40 46 23

respectively whereas DeepPoly certifies 17%, 67%, and 81% regions respectively.
The runtime of DeepZ increases with  while that of DeepPoly is not affected sig-
nificantly. DeepPoly runs the fastest on the DiffAI network and is faster than DeepZ
for all  values. DeepPoly is slower than DeepZ on the PGD and Point networks
for smaller  values but faster on the largest  = 0.12.
Table 5.2 shows our experimental results on the larger MNIST convolutional
networks trained using DiffAI. For the ConvBig network, DeepPoly certifies signif-
icantly more regions than DeepZ for  = 0.2 and 0.3. In particular, the percentage
certified for  = 0.3 with DeepPoly and DeepZ is 77% and 37%, respectively. For
the ConvSuper network, DeepPoly certifies one more region than DeepZ for  = 0.1.
In terms of the runtime, DeepZ runs slower with increasing value of  while Deep-
Poly is unaffected. DeepZ is slightly faster than DeepPoly on the ConvBig network
144 deeppoly domain for certifying neural networks

Certified robustness Time (s)


6
DeepZ DeepZ
100% DeepPoly DeepPoly

80% 4

60%

40% 2

20%

0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012

(a) CIFAR10 FFNNSmall (b) CIFAR10 FFNNSmall

Certified robustness Time (s)


20
DeepZ DeepZ
100% DeepPoly DeepPoly
15
80%

60% 10

40%
5
20%

0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012

(c) CIFAR10 FFNNMed (d) CIFAR10 FFNNMed

Certified robustness Time (s)


300
DeepZ DeepZ
100% DeepPoly DeepPoly

80% 200

60%

40% 100

20%

0% 0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012

(e) CIFAR10 FFNNBig (f) CIFAR10 FFNNBig

Figure 5.10: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the CIFAR10 fully-connected networks.
5.5 experimental evaluation 145

for  = 0.1 and 0.2 but is 2x slower for  = 0.3. On the ConvSuper network, DeepPoly
is 3.4x faster than DeepZ.

cifar10 fully-connected networks Fig. 5.10 compares DeepPoly against


DeepZ on the CIFAR10 fully-connected networks. As with the MNIST fully-
connected networks, DeepPoly certifies more regions than DeepZ and is faster on
all the considered networks. Considering  = 0.001, DeepPoly certifies 65%, 53%,
and 84% of the regions on the FFNNSmall, FFNNMed, and FFNNBig networks respec-
tively whereas DeepZ certifies 42%, 33%, and 64% of the regions. Notice that the
average runtime of both DeepPoly and DeepZ on the CIFAR10 FFNNMed is higher
than on the MNIST FFNNMed network even though the number of hidden units is
the same. The slowdown on the CIFAR10 networks is due to the higher number of
input pixels. DeepPoly is up to 7x, 5x, and 4.5x faster than DeepZ on the FFNNSmall,
FFNNMed, and FFNNBig networks, respectively.

Certified robustness Time (s)


6
DiffAI_DeepZ DiffAI_DeepPoly DiffAI_DeepZ DiffAI_DeepPoly
PGD_DeepZ PGD_DeepPoly PGD_DeepZ PGD_DeepPoly
Point_DeepZ Point_DeepPoly Point_DeepZ Point_DeepPoly
100%
4

50% 2

0% 0
0.002 0.004 0.006 0.008 0.010 0.012 0.002 0.004 0.006 0.008 0.010 0.012

(a) CIFAR10 ConvSmall (b) CIFAR10 ConvSmall

Figure 5.11: Certified robustness and average runtime for L∞ -norm perturbations by Deep-
Poly and DeepZ on the CIFAR10 ConvSmall networks.

cifar10 convolutional networks Fig. 5.11 evaluates DeepPoly and


DeepZ on the CIFAR10 ConvSmall networks. We again consider undefended (Point)
networks and networks defended with PGD and DiffAI as was the case for the cor-
responding MNIST network. We again notice that the ConvSmall network trained
with DiffAI is the most provably robust network. DeepPoly certifies more regions
than DeepZ for all values of  on the PGD and Point networks. DeepPoly is less
precise than DeepZ on the DiffAI network for  = 0.008 but certifies more for the
largest  = 0.012. In terms of runtime, DeepPoly is faster than DeepZ on all the con-
146 deeppoly domain for certifying neural networks

#Batches Batch Size Region(s) (l, 21 (l + u), u) Analysis time Certified?

1 1 0.5s + 1.9s No

1 10000 22.2s + 1.8s No

220 1 1.2s + 5m51s No

220 300 2m29s + 5m30s Yes

Figure 5.12: Results for robustness against rotations with the MNIST FFNNSmall network.
Each row shows a different attempt to prove that the given image of the digit
3 can be perturbed within an L∞ ball of radius  = 0.001 and rotated by an
arbitrary angle θ between −45 to 65 degrees without changing its classification.
For the last two attempts, we show 4 representative combined regions (out of
220, one per batch). The running time is split into two components: (i) the time
used for interval analysis on the rotation algorithm and (ii), the time used to
prove the neural network robust with all of the computed bounding boxes
using DeepPoly.

sidered ConvSmall networks for all  values. As was the case on the corresponding
MNIST networks, DeepPoly runs fastest on the DiffAI network.
The last two rows in Table 5.2 compare the precision and performance of Deep-
Poly and DeepZ on the CIFAR10 ConvBig convolutional network trained with Dif-
fAI. It can be seen that DeepPoly certifies more regions than DeepZ for both
 = 0.006 and  = 0.008 and is also up to 2x faster.

5.5.3 Rotation perturbation

As described in Section 5.4, we can apply refinement to the input so to prove a


neural network robust against rotations of a certain input image. Specifically, our
analysis can prove that the MNIST FFNNSmall network classifies a given image of
the digit 3 correctly, even if each pixel is first L∞ -perturbed with  6 0.001 and
then rotated using an arbitrary angle θ between −45 and 65 degrees. Fig. 5.12
5.6 discussion 147

shows example regions and analysis times for several choices of parameters to the
refinement approach. For example, #Batches = 220, Batch Size = 300 means that
we split the interval [α, β] into n = 220 batches. To analyze a batch, we split the cor-
responding interval into m = 300 input intervals for interval analysis, resulting in
300 regions for each batch. We then run DeepPoly on the smallest common bound-
ing boxes of all regions in each batch, 220 times in total. Fig. 5.12 shows a few such
bounding boxes in the Regions column. Note that it is not sufficient for certifica-
tion to compute a single region that captures all rotated images. Fig. 5.12 shows two
such attempts: one where we did not use batching (therefore, our interval analysis
approach was applied to the rotation algorithm using an abstract θ covering the
entire range), and one where we used a batch size of 10, 000 to compute the bound-
ing box of the perturbations rather precisely. However, those perturbations cannot
be captured well using interval constraints, therefore the bounding box contains
many spurious inputs and the certification fails.
We then considered two certification attempts with 220 batches, with each batch
covering a range of θ of length 0.5 degrees. It was not sufficient to use a batch size
of 1, as some input intervals become large. Using a batch size of 300, the neural
network can be proved robust for this perturbation.

5.6 discussion

We introduced a new method for certifying deep neural networks which balances
analysis precision and scalability. The core idea is an abstract domain based on
combining floating-point polyhedra and intervals equipped with abstract trans-
formers specifically designed for common neural network functions such as affine
transforms, ReLU, sigmoid, tanh, and maxpool. These abstract transformers enable
us to soundly handle both, fully-connected and convolutional networks.
We implemented our method in the ERAN analyzer, and evaluated it extensively
on a wide range of networks of different sizes including defended and undefended
networks. Our experimental results demonstrate that DeepPoly is more precise and
faster than prior work. We also showed how to use DeepPoly to prove, for the first
time, the robustness of a neural network when the input image is perturbed by
complex transformations such as rotations employing linear interpolation.
In our follow-up work [152], we have extended the DeepPoly domain for han-
dling residual networks as well as designed efficient algorithms for adapting Deep-
Poly on GPUs. The resulting implementation GPUPoly enabled precise and fast
certification of large networks containing up to 1M neurons within a minute. In
[15], we extended our robustness certification against rotation to cover more geo-
metric transformations such as translation, scaling, and shearing as well as their
arbitrary composition. Combined with GPUPoly, we can verify neural networks
with up to 0.5M neurons against rotations within 30 minutes. All above results are
148 deeppoly domain for certifying neural networks

beyond the reach of any other existing certification method [8, 32, 36, 37, 67, 68, 69,
78, 113, 114, 135, 149, 163, 170, 175, 186, 197, 199, 206, 211, 212].
More recently, we designed new DeepPoly transformers for handling the specific
non-linearities in the RNN architectures and audio preprocessing pipeline in [172].
This enables the certification of audio classifiers against intensity perturbations in
the audio signal for the first time. We believe that DeepPoly domain can similarly
be extended for handling other neural network architectures such as transformers
[179], domains such as natural language processing [108], and specifications such
as robustness against patches [217].
Overall, we believe this work is a promising step towards more effective reason-
ing about deep neural networks and a useful building block for proving interesting
specifications as well as other applications of analysis (for example, training more
robust networks).
6
C O M B I N I N G A B S T R A C T I O N S W I T H S O LV E R S

The DeepPoly domain presented in Chapter 5 presents scalable approximations of


the non-linearities employed in neural networks. This enables the analysis of larger
networks than possible with exact methods based on SMT solving [37, 69, 113, 114],
mixed-integer linear programming (MILP) [8, 32, 36, 49, 66, 135, 197], and Lips-
chitz optimization [170]. For deeper networks, the error from each approximation
accumulates, exponentially with each layer in many practical cases, causing impre-
cision. In this chapter, we present a new approach for recovering the lost precision
by combining abstract interpretation with precise solvers for ReLU based neural
networks. Besides improving the precision of the approximations, the combination
also improves the scalability of the solvers.

this work: boosting complete and incomplete certifiers Our first


key idea is combining state-of-the-art overapproximation techniques used by in-
complete methods together with MILP solvers. We refine the intermediate results
computed via incomplete methods by calling the MILP solver. We provide results
from the overapproximations to the MILP solver which improves its speed. This is
because the MILP solvers must consider two paths per every ReLU input that can
take both positive and negative values. The results from overapproximation can
eliminate some of these branches while also reducing the search space for others.
The above combination works well for refining the results in the first few layers,
however, the MILP solver still does not scale for deeper layers as the number of
branches becomes infeasible due to a combinatorial explosion.
The scalability issue can be addressed by employing convex relaxations of ReLU
for refinement that are tighter than those employed by incomplete methods, for
example, the DeepPoly domain, but more scalable than MILP solvers. A natural
candidate is the most precise convex relaxation of ReLU output based on the con-
vex hull of Polyhedra [57]. However, its computation is practically infeasible as it
requires an exponential number of convex hull computations, each with a worst-
case exponential complexity in the number of neurons. The most common convex
relaxation of y1 :=ReLU(x1 ) used in practice [175, 197] is the triangle relaxation from
[69] shown in Fig. 5.4 (a). We note that other works such as [31, 186, 211, 212, 221]
and the DeepPoly ReLU transformers shown in Fig. 5.4 (b) and (c) approximate
this relaxation. The triangle relaxation creates constraints only between y1 and x1
149
150 combining abstractions with solvers

x2 (0, 2, −2)(1, 2, −2) y2


y2
(0, 2)
(2, 1, −2) (0, 2, 2) (0, 0, −2)
(−2, 0) (2, 0)
x1 (2, 0, −2)
y1
(0, 0, 2)
(0, −2) y1
z = x1 + x2 (2, 0, 2)
z = x1 + x2

Figure 6.1: The input space for the ReLU assignments y1 := ReLU(x1 ), y2 := ReLU(x2 ) is
shown on the left in blue. Shapes of the relaxations projected to 3D are shown
on the right in red.

and is optimal in the x1 y1 -plane. Because of this optimality, recent work [175] refers
to the triangle relaxation as the convex barrier, meaning the best convex approxi-
mation one can obtain when processing each ReLU separately. In our experiments,
using this relaxation does not yield significant precision gains. Our main insight is
that the triangle relaxation is not optimal when one considers multiple neurons at
a time as it ignores all dependencies between x1 and any other neuron x2 in the
same layer, and thus loses precision.
Our second key idea is proposing more precise but scalable convex relaxations
than possible with prior work. We introduce a novel parameterized framework,
called k-ReLU, for generating convex approximations that consider multiple ReLUs
jointly. Here, the parameter k determines how many ReLUs are considered jointly
with large k resulting in more precise output. For example, unlike prior work, our
framework can generate a convex relaxation for y1 :=ReLU(x1 ) and y2 :=ReLU(x2 ) that
is optimal in the x1 x2 y1 y2 -space. Next, we illustrate this point with an example.

precision gain with k-relu on an example Consider the input space of


x1 x2 as defined by the blue area in Fig. 6.1 and the ReLU operations y1 :=ReLU(x1 )
and y2 :=ReLU(x2 ). The input space is bounded by the relational constraints x2 − x1 6
2, x1 − x2 6 2, x1 + x2 6 2 and −x1 − x2 6 2. The relaxations produced are in a four
dimensional space of x1 x2 y1 y2 . For simplicity of presentation, we show the feasible
shape of y1 y2 as a function of z = x1 + x2 .
The triangle relaxation from [69] is in fact a special case of our framework with
k = 1, that is, 1-ReLU. 1-ReLU independently computes two relaxations - one in
the x1 y1 space and the other in the x2 y2 space. The final relaxation is the carte-
sian product of the feasible sets of the two individually computed relaxations
and is oblivious to any correlations between x1 and x2 . The relaxation adds tri-
angle constraints {y1 > 0, y1 > x1 , y1 6 0.5 · x1 + 1} between x1 and y1 as well as
{y2 > 0, y2 > x2 , y2 6 0.5 · x2 + 1} between x2 and y2 .
combining abstractions with solvers 151

Table 6.1: Volume of the output bounding box from kPoly on the MNIST FFNNMed network.

k 1-ReLU 2-ReLU 3-ReLU

Volume 4.5272 · 1014 5.1252 · 107 2.9679 · 105

In contrast, 2-ReLU considers the two ReLU’s jointly and captures the relational
constraints between x1 and x2 . 2-ReLU computes the following relaxation:

{y1 > 0, y1 > x1 , y2 > 0, y2 > x2 , 2 · y1 + 2 · y2 − x1 − x2 6 2}

The result is shown in Fig. 6.1 (c). In this case the shape of y1 y2 is not independent
of x1 + x2 as opposed to the triangle relaxation. At the same time, it is more precise
than Fig. 6.1 (b) for all values of z. We note that the work of [163] computes semi
definite relaxations that consider multiple ReLUs jointly, however these are not
optimal and do not scale to the large networks used in our experiments.
The work in this chapter was published in [185, 187].

main contributions Our main contributions are:

• A refinement-based approach for certifying neural network robustness that


combines the strengths of fast overapproximation methods with MILP solvers
and convex relaxations.

• A novel framework, called k-ReLU, that computes optimal convex relaxations


for the output of k ReLU operations jointly. k-ReLU is generic and can be com-
bined with existing certifiers for improved precision while maintaining scala-
bility. Further, k-ReLU is also adaptive and can be tuned to balance precision
and scalability by varying k.

• A method for computing approximations of the optimal relaxations for larger


k, which is more precise than simply using l < k.

• An instantiation of k-ReLU with the DeepPoly domain [188] resulting in a


certifier called kPoly.

• An evaluation, showing that kPoly is more precise than existing state-of-


the-art incomplete certifiers [187, 188] on larger networks with up to 100K
neurons against challenging adversarial perturbations (e.g., L∞ balls with
 = 0.3) and faster (while being complete) than state-of-the-art complete cer-
tifiers [197, 206] on smaller networks.
152 combining abstractions with solvers

precision gain in practice Table 6.1 quantitatively compares the precision


of kPoly instantiated with three relaxations: k = 1, k = 2, and k = 3. We measure
the volume of the output bounding box computed after propagating an L∞ -norm
based region with  = 0.015 through the 9 layer deep MNIST FFNNMed network of Ta-
ble 5.1. We observe that the output volume from 3-ReLU and 2-ReLU is respectively
9 and 7 orders of magnitude smaller than from 1-ReLU. We note that the networks
we consider, as for example the FFNNMed network above, are especially challeng-
ing for state-of-the-art certifiers as these methods either lose unnecessary precision
[31, 175, 186, 187, 188, 199, 211, 221] or simply do not scale [37, 67, 68, 163, 197, 206].

6.1 overview

We now show, on a simple example, the working of our certifier kPoly combining
the k-ReLU concept with refinement improves the results of state-of-the-art certi-
fiers. In particular, we illustrate how the output kPoly instantiated with 1-ReLU is
refined by instantiating it with 2-ReLU. This is possible as the 2-ReLU relaxation
can capture extra relationships between neurons that 1-ReLU inherently cannot.
Consider the simple fully-connected neural network with ReLU activations
shown in Fig. 6.2. The network has two inputs each taking values independently
in the range [−1, 1], one hidden layer and one output layer each containing two
neurons. For simplicity, we split each layer into two parts: one for the affine trans-
formation and the other for the ReLU (as in Fig. 5.3). The weights of the affine
transformation are shown on the arrows and the biases are above or below the
respective neuron. The goal is to certify that x9 6 4 holds for the output x9 with
respect to all inputs.
We first show that 1-ReLU instantiated with the state-of-the-art DeepPoly [188]
abstract domain fails to certify the property. We refer the reader to Chapter 5 for
more details on the DeepPoly abstract domain. The bounds computed by our certi-
fier using this instantiation are shown as annotations in Fig. 6.2, in the same format
as in Fig. 5.3. We next show how our analysis proceeds layer-by-layer.

first layer The certifier starts by computing the bounds for x1 and x2 which
are simply taken from the input specification resulting in:

x1 > −1, x1 6 1, l1 = −1, u1 = 1,


x2 > −1, x2 6 1, l2 = −1, u2 = 1.

second layer Next, the affine assignments x3 := x1 + x2 and x4 := x1 − x2 are


handled. DeepPoly handles affine transformations exactly and thus no precision is
lost. The affine transformation results in the following bounds for x3 and x4 :

x3 > x1 + x2 , x3 6 x1 + x2 , l3 = −2, u3 = 2,
x4 > x1 − x2 , x4 6 x1 − x2 , l4 = −2, u4 = 2.
6.1 overview 153

x1 > −1 x3 > x1 + x2 x5 > 0 x7 > x5 + 2 · x6 x9 > x7


x1 6 1 x3 6 x1 + x2 x5 6 1 + 0.5 · x3 x7 6 x5 + 2 · x6 x9 6 x7
l1 = −1 l3 = −2 l5 = 0 l7 = 0 l9 = 0
u1 = 1 u3 = 2 u5 = 2 u7 = 5 u9 = 5
0 0
[-1,1] 1 max(0, x3 ) 1 max(0, x7 )
x1 x3 x5 x7 x9
1 0

1 2

x2 x4 x6 x8 x10
[-1,1] -1 max(0, x4 ) 1 max(0, x8 )
0 1.5

x2 > −1 x4 > x1 − x2 x6 > 0 x8 > x6 + 1.5 x10 > x8


x2 6 1 x4 6 x1 − x2 x6 6 1 + 0.5 · x4 x8 6 x6 + 1.5 x10 6 x8
1-ReLU
l2 = −1 l4 = −2 l6 = 0 l8 = 1.5 l10 = 1.5
u2 = 1 u4 = 2 u6 = 2 u8 = 3.5 u10 = 3.5
x3 + x4 6 2, 2 · x5 + 2 · x6 x7 6 4 x9 6 4
2-ReLU x3 − x4 6 2, −x3 − x4 6 2
x4 − x3 6 2,
−x3 − x4 6 2

Figure 6.2: Certification of property x9 6 2. Refining DeepPoly with 1-ReLU fails to prove
the property whereas 2-ReLU adds extra constraints (in green) that help in
verifying the property.

DeepPoly can precisely handle ReLU assignments when the input neuron takes
only positive or negative values; otherwise, it loses precision. Since x3 and x4 can
take both positive and negative values, the approximation from Fig. 5.4 (b) is ap-
plied which for x5 yields:
x5 > 0, x5 6 1 + 0.5 · x3 , l5 = 0, u5 = 2. (6.1)

The lower and upper bounds are set to l5 = 0 and u5 = 2 respectively. Analo-
gously, for x6 we obtain:
x6 > 0, x6 6 1 + 0.5 · x4 , l6 = 0, u6 = 2. (6.2)

third layer Next, the affine assignments x7 := x5 + 2x6 and x8 := x6 + 1.5 are
handled. DeepPoly adds the constraints:
x7 > x5 + 2 · x6 , x7 6 x5 + 2 · x6 ,
(6.3)
x8 > x6 + 1.5, x8 6 x6 + 1.5.
To compute the upper and lower bounds for x7 and x8 , DeepPoly uses back-
substitution as described in Section 5.1. Doing so yields l7 = 0, u7 = 5 and
l8 = 1.5, u8 = 3.5.
154 combining abstractions with solvers

refinement with 1-relu fails Because DeepPoly discards one of the lower
bounds from the triangle relaxations for the ReLU assignments in the previous
layer, it is possible to refine lower and upper bounds for x7 and x8 by encoding
the network up to the final affine transformation using the relatively tighter ReLU
relaxations based on the triangle formulation and then computing bounds for x7
and x8 with respect to this formulation via an LP solver. However, this does not
improve bounds and still yields l7 = 0, u7 = 5, l8 = 1.5, u8 = 3.5.
As the lower bounds for both x7 and x8 are non-negative, the DeepPoly ReLU
approximation simply propagates x7 and x8 to the output layer. Therefore the final
output is:
x9 > x7 , x9 6 x7 , l9 = 0, u9 = 5,
x10 > x8 , x10 6 x8 , l10 = 1.5, u10 = 3.5.
Because the upper bound is u9 = 5, the certifier fails to prove the property that
x9 6 4 holds.

refinement with 2-relu certifies the property Now we consider re-


finement with our 2-ReLU relaxation which considers the two ReLU assignments
x5 := ReLU(x3 ) and x6 := ReLU(x4 ) jointly. Besides the box constraints for x3 and x4 ,
it also considers the constraints x3 + x4 6 2, x3 − x4 6 −2, −x3 − x4 6 2, x4 − x3 6 2
for computing the output of ReLU. The ReLU output contains the extra constraint
2 · x5 + 2 · x6 − x3 − x4 6 2 that 1-ReLU cannot capture. We again encode the network
up to the final affine transformation with the tighter ReLU relaxations obtained us-
ing 2-ReLU and refine the bounds for x7 , x8 via an LP solver. Now, we obtain better
upper bounds as u7 = 4. The better bound for u7 is then propagated to u9 and is
sufficient for proving the desired property.
We remark that while in this work we instantiate the k-ReLU concept with the
DeepPoly relaxation, the idea can be applied to other relaxations [67, 78, 163, 175,
188, 191, 206, 211, 212, 221].
Alternatively, one can also refine the bounds for the neurons x7 and x8 by replac-
ing 2-ReLU with the MILP encoding of ReLU from [197] and also adding the extra
constraints from the DeepPoly analysis to speed it up. Doing so also certifies the
property. However, the MILP encoding is less scalable than our k-ReLU framework
and is feasible only for the first few layers. In our experiments in Section 6.5, we
use the MILP encoding for refining up to the second layer and k-ReLU for the
remaining layers of our deep networks.

6.2 refinement with solvers

We now describe our refinement approach in more formal terms. As in Section 6.1,
we will consider affine transformations and ReLU activations as separate layers.
The key idea will be to combine abstract interpretation [55] with exact MILP or pre-
cise convex relaxation based formulations of the network, which are then solved, in
6.2 refinement with solvers 155

order to compute more precise results for neuron bounds. We begin by describing
the core components of abstract interpretation that our approach requires.
Our approach requires an abstract domain An over n variables (i.e., some set
whose elements can be encoded symbolically) such as Interval, Zonotope, Deep-
Poly, or Polyhedra. The abstract domain has a bottom element ⊥ ∈ An as well as
the following components:
• A (potentially non-computable) concretization function γn : An → P(Rn ) that
associates with each abstract element a ∈ An the set of concrete points from
Rn that it abstracts. We have γn (⊥) = ∅.

• An abstraction function αn : Bn → An , where X ⊆ γn (αn (X)) for all X ∈ Bn .


Q
We assume that αn ( i [li , ui ]) is a computable function of l, u ∈ Rn . Here,
Q Q
Bn = l,u∈Rn i [li , ui ] and i [li , ui ] = {x ∈ Rn | li 6 xi 6 ui }. (For many
S

abstract domains, αn can be defined on a larger domain Bn , but in this work,


we only consider Interval input regions.)
Q
• A bounding box function ιn : An → Rn × Rn , where γn (a) ⊆ i [li , ui ] for
(l, u) = ιn (a) for all a ∈ An .

• A meet operation a u L for each a ∈ An and linear constraints L over n real


variables, where {x ∈ γn (a) | L(x)} ⊆ γn (a u L).

• An affine abstract transformer Tx7#→Ax+b : Am → An for each transformation


of the form (x 7→ Ax + b) : Rm → Rn , where

{Ax + b | x ∈ γn (a)} ⊆ γn (Tx7#→Ax+b (a))

for all a ∈ Am .
#
• A ReLU abstract transformer TReLU|Q : An → An , where
i [li ,ui ]
Y
#
{ReLU(x) | x ∈ γn (a) ∩ [li , ui ]} ⊆ TReLU|Q (a)
i [li ,ui ]
i

for all abstract elements a ∈ An and for all lower and upper bounds l, u ∈ Rn
on input activations of the ReLU operation.

certification via abstract interpretation As first shown by [78], any


such abstract domain induces a method for robustness certification of neural net-
works with ReLU activations.
For example, assume that we want to certify that a given neural network
f : Rm → Rn considers class i more likely than class j for all inputs x̄ with
||x̄ − x||∞ 6  for a given x and . We can first use the abstraction function αm
to compute a symbolic overapproximation of the set of possible inputs x̄, namely

ain = αm ({x̄ ∈ Rm | ||x̄ − x||∞ 6 }).


156 combining abstractions with solvers

Given that the neural network can be written as a composition of affine func-
tions and ReLU layers, we can then propagate the abstract element ain through the
corresponding abstract transformers to obtain a symbolic overapproximation aout
of the concrete outputs of the neural network.
For example, if the neural network f(x) = A 0 · ReLU(Ax + b) + b 0 has a single
hidden layer with h hidden neurons, we first compute a 0 = Tx7#→Ax+b (ain ), which
is a symbolic overapproximation of the inputs to the ReLU activation function.
We then compute (l, u) = ιh (a 0 ) to obtain opposite corners of a bounding box
of all possible ReLU input activations, such that we can apply the ReLU abstract
transformer:
a 00 = TReLU|
#
Q (a 0 ).
i [li ,ui ]

Finally, we apply the affine abstract transformer again to obtain aout =


Tx7#→A 0 x+b 0 (a 00 ). Using our assumptions, we can conclude that the set γn (aout ) con-
tains all output activations that f can possibly produce when given any of the
inputs x̄. Therefore, if aout u (xi 6 xj ) = ⊥, we have proved the property: for all x̄,
the neural network considers class i more likely than class j.

incompleteness While this approach is sound (i.e., whenever we prove the


property, it actually holds), it is incomplete (i.e., we might not prove the property,
even if it holds), because the abstract transformers produce a superset of the set of
concrete outputs that the corresponding concrete executions produce. This can be
quite imprecise for deep neural networks, because the overapproximations intro-
duced in each layer accumulate.

refining the bounds To combat spurious overapproximation, we use mixed


integer linear programming (MILP) to compute refined lower and upper bounds
l 0 , u 0 after applying each affine abstract transformer (except for the first layer). We
then refine the abstract element using the meet operator of the underlying abstract
domain and the linear constraints lj0 6 xj 6 uj0 for all input activations i, i.e., we
replace the current abstract element a by a 0 = a u ( j lj0 6 xj 6 uj0 ), and continue
V

analysis with the refined abstract element.


Importantly, we obtain a more refined abstract transformer for ReLU than the
one used in DeepPoly by leveraging the new lower and upper bounds. That is,
using the tighter bounds lj0 , uj0 for xj , we define the ReLU transformer, using the
notation from Chapter 5, for xi := max(0, xj ) as follows:


0 0 if lj0 > 0,

hxj , xj , lj , uj i,
ha6 >
i (x), ai (x), li , ui i = h0, 0, 0, 0i, if uj0 6 0,

hλ · xj , uj0 · (xj − lj0 )/(uj0 − lj0 ), λ · lj0 , uj0 i, otherwise.
6.2 refinement with solvers 157

xi xi

lj lj0 uj0 uj xj lj lj0 uj0 uj xj

(a) (b)

Figure 6.3: DeepPoly relaxations for xi :=ReLU(xj ) using the original bounds lj , uj (in blue)
and the refined bounds lj0 , uj0 (in green) for xj . The refined relaxations have
smaller area in the xi xj -plane.

where λ = {0, 1}. The refined ReLU transformer benefits from the improved bounds.
For example, when lj < 0 and uj > 0 holds for the original bounds then after
refinement:

• If lj0 > 0, then the relational constraints are the same, however the interval
bounds are more precise.

• Else if uj0 6 0, then the output is exact.

• Otherwise, as shown in Fig. 6.3, the approximation with the tighter lj0 and
uj0 has smaller area (in green) in the input-output plane than the original
transformer that uses the imprecise lj and uj (in blue).

obtaining constraints for refinement To enable refinement with MILP,


we need to obtain constraints which fully capture the behavior of the neural net-
work up to the last layer whose abstract transformer has been executed. In our
(k)
encoding, we have one variable for each neuron and we write xi to denote the
variable corresponding to the activation of the i-th neuron in the k-th layer, where
(k) (k)
the input layer has k = 0. Similarly, we write li and ui to denote the best derived
lower and upper bounds for this neuron.
(0)
From the input layer, we obtain constraints of the form l0i 6 xi 6 u0i , from affine
(k) P (k−1) (k−1) (k−1)
layers, we obtain constraints of the form xi = j aij · xj + bi and from
(k) (k−1)
ReLU layers we obtain constraints of the form xi = max(0, xi ).
158 combining abstractions with solvers

milp Let ϕ(k) denote the conjunction of all constraints up to and including those
from layer k. To obtain the best possible lower and upper bounds for layer k with
p neurons, we need to solve the following 2 · p optimization problems:
0(k) (k)
li = min xi , for i = 1, . . . , p,
(0) (k)
x1 ,...,xp
(0) (k)
s.t. ϕ(k) (x1 ,...,xp )
0(k) (k)
ui = max xi , for i = 1, . . . , p.
(0) (k)
x1 ,...,xp
(0) (k)
s.t. ϕ(k) (x1 ,...,xp )

As was shown by [8, 197], such optimization problems can be encoded exactly
as MILP instances using the bounds computed by abstract interpretation and the
0(k)
instances can then be solved using off-the-shelf MILP solvers [92] to compute li
0(k)
and ui .

convex relaxations While not introducing any approximation, unfortu-


nately, current MILP solvers do not scale to larger neural networks. It becomes
increasingly more expensive to refine bounds with the MILP-based formulation as
the analysis proceeds deeper into the network. However, for soundness it is not
crucial that the produced bounds are the best possible: for example, plain abstract
interpretation uses sound bounds produced by the bounding box function ι instead.
Therefore, for deeper layers in the network, we explore the trade-off between pre-
cision and scalability by also considering an intermediate method, which is faster
than exact MILP, but also more precise than abstract interpretation. We relax the
constraints in ϕ(k) using the convex set computed by our k-ReLU framework, for-
(k)
mally introduced in Section 6.3, to obtain a set of weaker linear constraints ϕLP .
We then use the solver to solve the relaxed optimization problems that are con-
(k)
strained by ϕLP instead of ϕ(k) , producing possibly looser bounds l 0(k) and u 0(k) .
Note that the encoding of subsequent layers depends on the bounds computed
in previous layers, where tighter bounds reduce the amount of newly introduced
approximation.

anytime milp relaxation MILP solvers usually provide the option to pro-
vide an explicit timeout T after which the solver must terminate. In return, the
solver may not be able to solve the instance exactly, but it will instead provide
lower and upper bounds on the objective function in a best-effort fashion. This
provides another way to compute sound but inexact bounds l 0(k) and u 0(k) .

neuron selection heuristic for refinement We select all neurons for


refinement in an affine layer that can be proven to be only taking positive values
using abstract interpretation.
6.3 k-relu relaxation framework 159

kpoly: end-to-end approach To certify deep neural networks, we combine


MILP, LP relaxation, and abstract interpretation. We first pick numbers of layers
kMILP , kLP , kAI that sum to the total number of layers of the neural network. For the
analysis of the first kMILP layers, we refine bounds using anytime MILP relaxation
with the neuron selection heuristic. As an optimization, we do not perform refine-
ment after the abstract transformer for the first layer in case it is an affine trans-
formation, as the abstract domain computes the tightest possible bounding box for
an affine transformation of a box (this is always the case in our experiments). For
the next kLP layers, we refine bounds using LP relaxation (i.e., the network up to
the layer to be refined is encoded using linear constraints computed via k-ReLU
framework) combined with the neuron selection heuristic. For the remaining kAI
layers, we use abstract interpretation without additional refinement (however, this
also benefits from refinement that was performed in previous layers), and compute
the bounds using ι.

final property certification Let k be the index of the last layer and p be
the number of output classes. We can encode the final certification problem using
the output abstract element aout obtained after applying the abstract transformer
for the last layer in the network. If we want to prove that the output satisfies the
property ψ, where ψ is given by a CNF formula ∧i ∨j li,j with all literals li,j being
linear constraints, it suffices to show that aout u (∧j ¬li,j ) = ⊥ for all i. If this fails,
one can resort to complete verification using MILP: the property is satisfied if and
(0) (k)
only if the set of constraints ϕ(k) (x1 , . . . , xp ) ∧ (∧j ¬li,j ) is unsatisfiable for all i.

6.3 k-relu relaxation framework

In this section we formally describe our k-ReLU framework for generating optimal
convex relaxations in the input-output space for k ReLU operations jointly. In the
next section, we discuss the instantiation of our framework with existing certifiers
which enables more precise results.
We consider a ReLU based fully-connected, convolutional, or residual neural
network with h neurons from a set H (that is h = |H|) and a bounded input region
I ⊆ Rm where m < h is the number of neural network inputs. As before, we treat
the affine transformation and the ReLUs as separate layers. We consider a convex
approximation method M that processes network layers in a topologically sorted
sequence from the input to the output layer passing the output of predecessor
layers as input to the successor layers. Let S ⊆ Rh be a convex set computed via M
approximating the set of values that neurons up to layer l-1 can take with respect
to I and B ⊇ S be the smallest bounding box around S. We use Conv(S1 , S2 ) and
S1 ∩ S2 to denote the convex hull and the intersection of convex sets S1 and S2 ,
respectively.
160 combining abstractions with solvers

Let X, Y ⊆ H be respectively the set of input and output neurons in the l-th
layer consisting of n ReLU assignments of the form yi := ReLU(xi ) where xi ∈ X
and yi ∈ Y. We assume that each input neuron xi takes on both positive and
negative values in S. We define the polyhedra induced by the two branches of each
ReLU assignment yi := ReLU(xi ) as C+ i = {xi > 0, yi = xi } ⊆ R and Ci = {xi 6
h −
s(i)
0, yi = 0} ⊆ Rh . Let QJ = { i∈J Ci | s ∈ J → {−, +}} (where J ⊆ [n]}) be the set
T

of polyhedra Q ⊆ Rh constructed by the intersection of polyhedra Ci ⊆ Rh for


neurons xi , yi indexed by the set J such that each Ci ∈ {C+
i ,Ci }.

Example 6.3.1. For the ReLU assignments yi := ReLU(xi ) with 1 6 i 6 2, and


J = [2] = {1, 2}, we have that C+ 1 = {x1 > 0, y1 = x1 }, C1 = {x1 6 0, y1 = 0},

C+
2 = {x2 > 0, y2 = x2 }, and C2 = {x1 6 0, y1 = 0}. QJ contains 4 polyhedra

{C+
1 ∩ C2 , C1 ∩ C2 , C1 ∩ C2 , C1 ∩ C2 } where the individual polyhedron are:
+ + − − + − −

C+
1 ∩ C2
+
= {x1 > 0, y1 = x1 , x2 > 0, y2 = x2 },
1 ∩ C2
C+ −
= {x1 > 0, y1 = x1 , x2 6 0, y2 = 0},
C−
1 ∩ C2
+
= {x1 6 0, y1 = 0, x2 > 0, y2 = x2 },
C−
1 ∩ C2

= {x1 6 0, y1 = 0, x2 6 0, y2 = 0}.

We note that QJ contains 2|J| polyhedra. We next formulate the best convex relax-
ation of the output after all n ReLU assignments in a layer.

6.3.1 Best convex relaxation

The best convex relaxation after the n ReLU assignments is given by

Sbest = ConvQ∈Q[n] (Q ∩ S). (6.4)

Sbest considers all n assignments jointly. Computing it is practically infeasible as


it involves computing 2n convex hulls each of which has exponential cost in the
number of neurons h [190].

6.3.2 1-ReLU

We now describe the prior convex relaxation [69] through triangles (here called 1-
ReLU) that handles the n ReLU assignments separately. Here, the input to the i-th
assignment yi := ReLU(xi ) is the polyhedron P1-ReLU ⊇ S where for each xi ∈ X,
P1-ReLU,i contains only an interval constraint [li , ui ] that bounds xi , that is, li 6 xi 6
ui . Here, the interval bounds are simply obtained from the bounding box B of S.
The output of this method after n assignments is
n
\
S1-ReLU = S ∩ Conv(P1-ReLU,i ∩ C+
i , P1-ReLU,i ∩ Ci ).

(6.5)
i=1
6.3 k-relu relaxation framework 161

The projection of Conv(P1-ReLU,i ∩ C+


i , P1-ReLU,i ∩ Ci ) onto the xi yi -plane is a trian-

gle minimizing the area as shown in Fig. 5.4 (a) and is the optimal convex relaxation
in this plane. However, because the input polyhedron P1-ReLU is a hyperrectangle
(when projected to X), it does not capture relational constraints between different
xi ’s in X (meaning it typically has to substantially over-approximate the set S).
Thus, as expected, the computed result S1-ReLU of the 1-ReLU method will incur
significant imprecision when compared with the Sbest result.

6.3.3 k-ReLU relaxations

We now describe our k-ReLU framework for computing a convex relaxation of the
output of n ReLUs in one layer by considering groups of k ReLUs jointly with
k > 1. For simplicity, we assume that n > k and k divides n. Let J be a partition
of the set of indices [n] such that each block Ji ∈ J contains exactly k indices. Let
Pk-ReLU,i ⊆ Rh be a polyhedron containing interval and relational constraints over
the neurons from X indexed by Ji . In our framework, Pk-ReLU,i is derived via B and
S and satisfies S ⊆ Pk-ReLU,i .
Our k-ReLU framework produces the following convex relaxation of the output:

n/k
\
Sk-ReLU = S ∩ ConvQ∈QJ (Pk-ReLU,i ∩ Q). (6.6)
i
i=1
The result of (6.6) is the optimal convex relaxation for the output of n ReLUs for
the given choice of S, k, J, and Pk-ReLU,i .
Theorem 6.3.1. For k > 1 and a partition J of indices, if there exists a Ji for which
Pk-ReLU,i $ u∈Ji P1-ReLU,u holds, then Sk-ReLU $ S1-ReLU .
T

Proof. Since Pk-ReLU,i $ u∈Ji P1-ReLU,u for Ji , by monotonicity of intersection and


T

convex hull,
\
ConvQ∈QJ (Pk-ReLU,i ∩ Q) $ ConvQ∈QJ (( P1-ReLU,u ) ∩ Q) (6.7)
i i
u∈Ji

For any Q ∈ QJi , we have that either Q ⊆ C+ u or Q ⊆ Cu for u ∈ Ji . Thus, we


can replace all Q on the right hand side of (6.7) with either C+ u or Cu such that

for all u ∈ Ji both C+u and Cu are used at least in one substitution and obtain by

monotonicity,
\ \
⊆ Convu∈Ji (( P1-ReLU,u ) ∩ C+
u , ( P1-ReLU,u ) ∩ C−
u)
u∈Ji u∈Ji
\
⊆ Convu∈Ji (P1-ReLU,u ∩ C+
u , P1-ReLU,u ∩ Cu )

( P1-ReLU,u ⊆ P1-ReLU,u ).
u∈Ji

For remaining i, similarly ConvQ∈Qi (Pk-ReLU,i ∩ Q) ⊆ Convu∈Ji (P1-ReLU,u ∩


C+
u , P1-ReLU,u ∩ Cu ) holds. Since $ relation holds for at least one i and ⊆ holds

for others, Sk-ReLU $ S1-ReLU holds by the monotonicity of the intersection.


162 combining abstractions with solvers

Note that P1-ReLU only contains interval constraints whereas Pk-ReLU contains
both, the same interval constraints and extra relational constraints. Thus, any con-
vex relaxation obtained using k-ReLU is typically strictly more precise than a 1-
ReLU one.

precise and scalable relaxations for large k For each Ji , the optimal
convex relaxation Ki = ConvQ∈QJ (Pk-ReLU,i ∩ Q) from (6.6) requires computing the
i
convex hull of 2k convex sets each of which has a worst-case exponential cost in
terms of k. Thus, computing Ki via (6.6) can become computationally expensive
for large values of k. We propose an efficient relaxation Ki0 for each block Ji ∈ J
(where |Ji | = k as described earlier) based on computing relaxations for all subsets
of Ji that are of size 2 6 l < k. Let Ri = {{j1 , . . . , jl } | j1 , . . . , jl ∈ Ji } be the set
containing all subsets of Ji containing l indices. For each R ∈ Ri , let Pl-ReLU,R 0 ⊆ Rh
be a polyhedron containing interval and relational constraints between the neurons
from X indexed by R with S ⊆ Pl-ReLU,R
0 .
The relaxation Ki is computed by applying l-ReLU kl times as:
0


\
Ki0 = 0
ConvQ∈QR (Pl-ReLU,R ∩ Q). (6.8)
R∈Ri

Example 6.3.2. Consider k = 4 and l = 3 with Ji = {1, 2, 3, 4}, then Ri =


{{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}} contains all subsets of Ji of size 3. We first com-
0
pute P3-ReLU,R for each R ∈ Ri , for example, P3-ReLU,{1,2,3}
0 contains relational and
interval constraints for the variables x1 , x2 , x3 . Our approximation Ki0 of the opti-
mal 4-ReLU output is computed using (6.8) by intersecting the result of 3-ReLU for
each R ∈ Ri .
Tn/k
The layerwise convex relaxation Sk-ReLU 0 = S ∩ i=1 Ki0 via (6.8) is tighter than
computing relaxation Sl-ReLU via (6.6) with a partition J 0 where for each block Ji0 ∈
J 0 there exists Rj corresponding to a block of J such that Ji0 ∈ Rj and Pl-ReLU,J 0
0 ⊆
i
Pl-ReLU,J 0 where Pl-ReLU,J 0 is the polyhedron in (6.6) for computing Sl-ReLU . In our
i i
instantiations, we ensure that this condition always holds for gaining precision.

6.4 instantiating the k-relu framework

Our k-ReLU framework from Section 6.3 can be instantiated to produce different
relaxations depending on the parameters S, k, J, and Pk-ReLU,i . Fig. 6.4 shows the
steps to instantiating our framework. The inputs to the framework are the convex
set S computed via a convex relaxation method M and the partition J based on k.
These inputs are first used to produce a set containing n/k polyhedra {Pk-ReLU,i }.
Each polyhedron Pk-ReLU,i is then intersected with polyhedra from the set QJi pro-
ducing 2k polyhedra which are then combined via the convex hull (each called Ki ).
The Ki ’s are then combined with S to produce the final relaxation that captures
6.4 instantiating the k-relu framework 163

Tn/k
(S ∩ i=1 Ki ) as per (6.6)

Pk-ReLU,i ∩ Q
Convex set
S via M=
SDP [68, 163] for each Convex
Q ∈ Q Ji hull
Abstract
Interpretation
[78, 188, 191] for each Convex hull for Ji
Denoted by Ki
Linear Pk-ReLU,i 2x1 + x2 + x3 − y1 6 0
relaxations y2 + x2 − x3 6 −1
Pk-ReLU,i y3 − x1 + x3 6 1
[175, 206, ..
.
..
.
211, 221]
Duality
[67, 212]
{Pk-ReLU,i }

Partition
J of [n]

Figure 6.4: Steps to instantiating the k-ReLU framework.

the values which neurons can take after the ReLU assignments. This relaxation
is tighter than that produced by applying M directly on the ReLU layer enabling
precision gains.

6.4.1 Computing key parameters

We next describe the choice of the parameters S, k, J, Pk-ReLU,i in our framework.


Input convex set Examples of a convex approximation method M for computing
S include [67, 78, 163, 175, 188, 191, 206, 211, 212, 221]. In this work, we use the
DeepPoly [188] relaxation for computing S which is a state-of-the-art precise and
scalable certifier for neural networks.
k and partition J We use (6.6) to compute the output relaxation when k ∈ {2, 3}.
For larger k, we compute the output based on (6.8). To maximize precision gain,
we group those indices i together into a block where the triangle relaxation for
yi :=ReLU(xi ) has the larger area in the xi yi -plane.
Computing Pkrelu,i We note that for a fixed block Ji , several polyhedra Pk-ReLU,i
are possible that produce convex relaxations with varying degree of precision. Ide-
ally, one would like Pk-ReLU,i to be the projection of S onto the variables in the set
X indexed by the block Ji . However, computing this projection exactly is expensive
and therefore we compute an overapproximation of it.
We use the method M to compute Pk-ReLU,i by computing the upper bounds
P
for linear relational expressions of the form ku=1 au · xu with respect to S. In our
experiments, we found that setting au ∈ {−1, 0, 1} yields maximum precision (ex-
164 combining abstractions with solvers

cept in the case where all possible au are zero). Thus Pk-ReLU,i ⊇ S contains 3k − 1
constraints which include the interval constraints for all xu .

6.4.2 Certification and refinement with k-ReLU framework

We can use the constraints generated by our framework encoding the ReLU layers
(k)
in the formula ϕLP defined in Section 6.2, which can be used for either refining the
neuron bounds or for proving the property ψ.

6.5 evaluation

We implemented our refinement approach combining abstract interpretation with


solvers in the form of a certifier called kPoly. kPoly runs DeepPoly analysis which
is refined using MILP and k-ReLU based formulation of the ReLU layers of the
network (Section 6.2). We note that in our instantiation of the k-ReLU framework,
the DeepPoly domain serves as the convex relaxation method M (Fig. 6.4). Both
the neuron bounds and the final certification results can be refined.
kPoly is written in Python and uses cdd [2, 4, 75] for computing convex hulls,
and Gurobi [92] as the solver for refining the abstract interpretation results. We
made kPoly publicly available as part of our ERAN [3] framework available at
https://github.com/eth-sri/eran.
We evaluated kPoly for the task of robustness certification of challenging deep
neural networks. We compare the speed and precision of kPoly for both complete
and incomplete certification against two state-of-the-art certifiers: DeepPoly [188]
and RefineZono [187]. DeepPoly has the same precision as [31, 221] whereas Re-
fineZono refines the results of DeepZ [191] and is more precise than [191, 211, 212].
Both, DeepPoly and RefineZono are more scalable than [37, 67, 68, 163, 197, 206].
We show that kPoly is more precise than DeepPoly and RefineZono while also
scaling to large networks. Our results show that kPoly achieves faster complete
certification and more precise incomplete certification than prior work.
We next describe the neural networks, benchmarks and parameters used in our
experiments.

neural networks We used 7 MNIST [124], 3 CIFAR10 [118], and 1 ACAS


Xu [110] based neural networks shown in Table 6.2. Our networks have fully-
connected (FNNs), convolutional (CNNs), and residual architectures. All net-
works except ResNet are taken from the ERAN website; ResNet is taken from
https://github.com/locuslab/convex_adversarial. Seven of the networks do not
use adversarial training while the rest use different variants of it. The MNIST
ConvBig network is trained with DiffAI [144], the two CIFAR10 convolutional net-
works are trained with PGD [136] and the residual network is trained via [212].
In the table, the MNIST FNNs are named in the format m × n. These networks
6.5 evaluation 165

Table 6.2: Neural network architectures and parameters used in our experiments.
Dataset Model Type #Neurons #Layers Defense Refine k
ReLU

MNIST 2 × 50 fully-connected 110 3 None 7 N/A


5 × 100 fully-connected 510 6 None 3 3
8 × 100 fully-connected 810 9 None 3 2
5 × 200 fully-connected 1 010 6 None 3 2
8 × 200 fully-connected 1 610 9 None 3 2
ConvSmall convolutional 3 604 3 None 7 Adapt
ConvBig convolutional 48 064 6 DiffAI [144] 7 5

CIFAR10 ConvSmall convolutional 4 852 3 PGD [136] 7 Adapt


ConvBig convolutional 62,464 6 PGD [136] 7 5
ResNet Residual 107,496 13 Wong [212] 7 Adapt

ACAS Xu 6 × 50 fully-connected 300 6 None 7 N/A

have m + 1 layers where the first m layers have n neurons each and the last layer
has 10 neurons. We note that that the MNIST 5 × 100 and 8 × 200 are named as
FFNNSmall and FFNNMed networks in Section 5.5 respectively. The largest network in
our experiments contains > 100K neurons and has 13 layers.

robustness property We consider the L∞ -norm [40] based adversarial region


around a correctly classified image from the test set parameterized by the radius
 ∈ R. Our goal is to certify that the network classifies all images in the adversarial
region correctly.

machines The runtimes of all experiments for the MNIST and ACAS Xu FNNs
were measured on a 3.3 GHz 10 Core Intel i9-7900X Skylake CPU with a main
memory of 64 GB whereas the experiments for the rest were run on a 2.6 GHz 14
core Intel Xeon CPU E5-2690 with 512 GB of main memory.

benchmarks For each MNIST and CIFAR10 network, we selected the first 1000
images from the respective test set and filtered out incorrectly classified images.
The number of correctly classified images by each network are shown in Table 6.3.
We chose challenging  values for defining the adversarial region for each network.
For the ACAS Xu network, we consider the property φ9 as defined in [113]. We
note that our benchmarks (e.g., the 8 × 200 network with  = 0.015) are quite
challenging to handle for state-of-the-art certifiers (as we will see below).
166 combining abstractions with solvers

Table 6.3: Number of certified adversarial regions and runtime of kPoly vs. DeepPoly and
RefineZono.
Dataset Model #correct  DeepPoly [188] RefineZono [187] kPoly

certified(#) time(s) certified(#) time(s) certified(#) time(s)

MNIST 2 × 50 959 0.03 411 0.1 782 3.5 782 2.9


5 × 100 960 0.026 160 0.3 312 310 441 307
8 × 100 947 0.026 182 0.4 304 411 369 171
5 × 200 972 0.015 292 0.5 341 570 574 187
8 × 200 950 0.015 259 0.9 316 860 506 464
ConvSmall 980 0.12 158 3 179 707 347 477
ConvBig 929 0.3 711 21 648 285 736 40

CIFAR10 ConvSmall 630 2/255 359 4 347 716 399 86


ConvBig 631 2/255 421 43 305 592 459 346
ResNet 290 8/255 243 12 243 27 245 91

6.5.1 Complete certification

We next describe our results for the complete certification of the ACAS Xu 6 × 50
and the MNIST 2 × 50 network.

acas xu 6 × 50 network As this network has only 5 inputs, we split the pre-
condition defined by φ9 into smaller input regions by splitting each input dimen-
sion independently. Our splitting heuristic is similar to the one used in Neurify
[206] which is state-of-the-art for certifying ACAS Xu networks. We certify that the
post-condition defined by φ9 holds for each region with DeepPoly domain analysis.
kPoly certifies that φ9 holds for the network in 14 seconds. RefineZono uses the
same splits with the DeepZ domain and verifies in 10 seconds. We note that both
these timings are faster than Neurify which takes > 100 seconds.

mnist 2 × 50 network For complete certification of this network, kPoly first


runs DeepPoly analysis on the whole network collecting the bounds for all neurons
in the network. If DeepPoly fails to certify the network, then the collected bounds
are used to encode the robustness certification as a MILP instance (discussed in
Section 6.2). RefineZono is based on the same approach but uses the DeepZ domain
[186]. We use  = 0.03 for the L∞ -norm attack. We note that complete certification
for this benchmark with RefineZono was previously reported in [187] to be slightly
faster than MIPVerify [197] which is a state-of-the-art complete certifier for MNIST
and CIFAR10 networks.
The first row of Table 6.3 shows our results. Both kPoly and RefineZono certify
the neural network to be robust on 782 regions. The average runtime of kPoly and
6.5 evaluation 167

RefineZono is 2.9 and 3.3 seconds respectively. DeepPoly is faster than both kPoly
and RefineZono but is also quite imprecise and certifies only 411 regions.

6.5.2 Incomplete certification

Both our works RefineZono and kPoly refine abstract interpretation results with
precise solvers but with different domains. Further, RefineZono uses only 1-ReLU
(k)
approximation in ϕLP while kPoly uses k > 1. Next, we list the parameter values
for kPoly used in our experiments.

kpoly parameters We refine both the DeepPoly ReLU relaxation and the cer-
tification results for the MNIST FNNs. All neurons that are input to a ReLU oper-
ation and can take positive values based on the abstract interpretation results are
selected for refinement. As an optimization, we use the MILP ReLU encoding from
[197] when refining the ReLU relaxation for the second ReLU layer. Thus kMILP = 2
for these networks and kLP = m − kMILP , kAI = 0 where m is the number of layers.
Only the certification results are refined for the rest, thus kLP = kMILP = 0, kAI = m.
The last column of Table 6.2 shows the value of k for all networks. We use the
entry N/A for the ACAS Xu 6 × 50 and the MNIST 2 × 50 network as the k-ReLU
framework was not used for refinement on these. The entry Adapt means that k was
not fixed for all layers but computed dynamically. For the MNIST 5 × 100 network,
we use k = 3 for encoding all ReLU layers and use k = 2 for refining the remaining
FNNs. For the MNIST and CIFAR10 ConvBig networks, we encode the first 3 ReLU
layers with 1-ReLU while the remaining are encoded with 5-ReLU. We use l = 3
in (6.8) for encoding 5-ReLU. For the remaining 3 CNNs, we encode the first ReLU
layer with 1-ReLU while the remaining layers are encoded adaptively. Here, we
choose a value of k for which the total number of calls to 3-ReLU is 6 500. Next,
we discuss our experimental results shown in Table 6.3.

kpoly vs deeppoly and refinezono Table 6.3 compares the precision in the
number of adversarial regions certified and the average runtime per region in sec-
onds for kPoly, DeepPoly, and RefineZono. We refine the certification results with
kPoly and RefineZono only when DeepPoly and DeepZ fail to certify respectively.
It can be seen in the table that kPoly is more precise than both DeepPoly and Refine-
Zono on all networks. RefineZono is more precise than DeepPoly on the networks
trained without adversarial training. On the 8 × 200 and MNIST ConvSmall networks,
kPoly certifies 506 and 347 regions respectively whereas RefineZono certifies 316
and 179 regions respectively. The precision gain with kPoly over RefineZono and
DeepPoly is less on networks trained with adversarial training. kPoly certifies 25,
40, 38, and 2 regions more than DeepPoly on the last 4 CNNs in Table 6.3. kPoly is
faster than RefineZono on all networks and has an average runtime of < 8 minutes.
We note that the runtime of kPoly is not necessarily determined by the number
168 combining abstractions with solvers

of neurons but rather by the complexity of the refinement instances. In the table,
the larger runtimes of kPoly are on the MNIST 8 × 200 and ConvSmall networks.
These are quite small compared to the CIFAR10 ResNet network where kPoly has
an average runtime of only 91 seconds.

1-relu vs k-relu We consider the first 100 regions for the MNIST ConvSmall
network and compare the number of regions certified by kPoly when run with
k-ReLU and 1-ReLU. We note that kPoly run with 1-ReLU is equivalent to [175].
kPoly with 1-ReLU certifies 20 regions whereas with k-ReLU it certifies 35. kPoly
with 1-ReLU has an average runtime of 9 seconds.

effect of heuristic for J We ran kPoly based on k-ReLU with random par-
titioning Jr using the same setup as for 1-ReLU. We observed that kPoly produced
worse bounds and certified 34 regions.

6.6 related work

We next discuss works related to ours in Chapter 5 and 6.

6.6.1 Neural Network Certification

There is plethora of work on neural network certification mostly where the input
regions can be encoded as a boxes such as L∞ -norm based. The approaches can be
broadly classfied into two types: complete and incomplete.

complete certifiers Complete certifiers are based on MILP solvers [8, 32, 36,
49, 66, 135, 197], SMT solving [37, 69, 113, 114], Lipschitz optimization [170], and
input and neuron refinement [206, 207]. In our experience, MILP solvers [8, 197]
scale the most for complete certification with high dimensional inputs such as
MNIST or CIFAR10 networks, while input refinement [206, 207] works the best for
lower dimensional inputs such as those for ACAS Xu. In our approach for complete
certification in ERAN, we use both. Our results in Section 6.5 indicate that ERAN
gets state-of-the-art complete certification results. We believe that our performance
can be further improved by designing new algorithms for the MILP solvers that
take advantage of the particular structure of the problem instances [32, 36, 135].

incomplete certifiers The incomplete certifers sacrifice the exactness of the


complete certifiers to gain extra scalability. The challenge is then to design a certifier
that is as precise as possible but also scales. The approaches here are based on du-
ality [67, 212], convex relaxations [7, 31, 68, 78, 127, 163, 175, 186, 188, 199, 211, 221],
and combination of relaxations with solvers [187, 206]. DeepPoly and DeepZ anal-
ysis are among the most precise and scalable certifiers. We note that the work of
6.6 related work 169

[211, 212], although based on different principles, obtains the same precision and
similar speed as DeepZ. Similarly, the work of [31, 221] obtains the same precision
and similar speed as DeepPoly. We note that our refinement approach presented
in this chapter allow us to be more precise than all competing incomplete certi-
fiers and our GPU implementation of DeepPoly in [152] allows scaling to larger
benchmarks with the precision of DeepPoly.
A complementary approach to the above is modifying the neural network to
make them easier to certify [85, 90, 154, 180, 215]. We believe that a combination of
this approach with ERAN can further improve the certification results.

other specifications The work of [160] considers the certification of various


non-linear specifications. There has been increasing interest in certifying against ad-
versarial regions generated by geometric transformations. The work of [158] was
the first to tackle these regions; however, they did not consider linear interpola-
tion which is often applied together with geometric transformations. Our work
[188] was the first to also consider linear interpolation and produced an interval
approximation of the resulting non-convex adversarial region. The work of [149]
also produces interval regions. In our recent work [15], we obtain more precise
polyhedral regions, making our approach more scalable and precise, and currently
the state-of-the-art for geometric certification.
The authors in [5] were among the first to consider certification of ReLU based
recurrent neural networks (RNNs). The work of [117] was the first to consider
certification of RNNs employed in natural language processing (NLP) tasks having
sigmoid and tanh activations. Their approach is based on an extension of their
previous work [221]. [108] uses the simpler Interval domain for the certification
of RNNs for NLP tasks. Our recent work [172] shows how the DeepPoly domain
can be adpated for the robustness certification of RNNs. Our results show that our
approach gets better precision and speed than [117] and is also the first one to
consider the certification of the audio classifiers. We note that the recent work of
[103] also considers the robustness of audio classifiers.
[217] is the first work to consider certifying robustness against adversarial
patches. The authors use the Interval domain in their work. [165] and [166] con-
sider the robustness certification of support vector machines and decision tree en-
sembles respectively via abstract interpretation. The work of [65] trains models
robust to data poisoning attacks based on abstract interpretation.
Further works on robustness certification include those on binarized neural net-
works [106, 153], transformers [179], video classifiers [214], fairness of models [171],
generative properties [145, 193, 216], and runtime monitoring [133].

probabilistic guarantees via smoothing Recently, there has been a


growing interest in an orthogonal line of work to ours based on randomized
smoothing. The approach is inspired by the concept of differential privacy and
considers a randomized classifier. Because of the randomness, the classifier can fail
170 combining abstractions with solvers

on a previously correctly classified input with a small probability. Therefore the


guarantees on robustness are probabilistic here whereas our approach provides
deterministic guarantees. The first work on randomized smoothing for neural net-
works was presented by [125]. The considered adversarial regions were intensity-
based, like in our setting. Follow-up work of [53] improves the probabilistic bounds
of [125]. The work of [174] used adversarial training to further improve the perfor-
mance of smoothed classifiers. Randomized smoothing for geometric robustness
has been recently considered in the work of [72, 129].
We note that recently, the authors in [17] consider quantitative certification of
neural networks which is orthogonal to our qualitative approach.

6.6.2 Constructing adversarial examples

Another alternative line of work is that of empirical certification of neural networks.


Here, neural network robustness is demonstrated by the lack of an empirical ad-
versarial example using the strongest attacks within an adversarial region. As an
example, [19] under-approximates the behavior of the network under L∞ -norm
based perturbation and formally defines metrics of adversarial frequency and ad-
versarial severity to evaluate the robustness of a neural network against adversarial
attack. However, no formal robustness guarantees are provided, Thus the empiri-
cal guarantees can be broken by a better attack. For example, recent work by [198]
broke robustness guarantees of several existing works.
There is considerable interest in constructing adversarial examples for image
classifiers. The adversarial regions here are usually based on intensity changes [34,
40, 64, 86, 120, 136, 155, 158, 173, 196], geometric [70], and 3D transformations [9].
Beyond image classifiers, there are works on crafting adversarial attacks on videos
[210], speech [41, 128, 161], NLP [150], malware classification [88] and probabilistic
forecasting models [62]. We refer the reader to [38, 43] for a more detailed survey
on adversarial attacks.

6.6.3 Adversarial training

There is growing interest in adversarial training where neural networks are trained
against a model of adversarial attacks. Here, a robustness loss is added to the nor-
mal training loss. The robustness loss is calculated as the worst-case loss in an
adversarial region. It is not possible to exactly compute this loss; thus, it is ei-
ther estimated with a lower bound or an upper bound. Using the lower bound
leads to better empirical robustness [39, 64, 86, 89, 136] whereas the upper bound
[59, 60, 87, 130, 144, 147, 148, 162, 212, 213, 220] leads to models that are relatively
easier to certify. In both cases, there is a loss in the standard accuracy of the trained
model. The main challenge is then to produce models that are both robust and
accurate.
6.7 discussion 171

Interestingly, the work of [144, 147, 148] trains neural networks against adver-
sarial attacks using abstract interpretation. The work of [148] currently produces
state-of-the-art models for CIFAR10 and MNIST using our DeepZ abstraction and
using a certification method similar to RefineZono. We believe that the results of
[148] can be further improved by using the DeepPoly domain and the more precise
kPoly certification method. We note that recent work by [11] provides theoretical
results on the existence of a neural network that can be certified with abstract inter-
pretation to the same degree as a "normally" trained network with exact methods.
Beyond robustness, the work of [73, 132] trains the network so that it satisfies a
logical property.

6.7 discussion

We presented a refinement approach combining solver-based precise methods with


abstract interpretation for neural network certification. We designed a novel para-
metric framework k-ReLU for obtaining scalable convex relaxations that are more
precise than those produced by the single neuron triangle convex relaxation. k-
ReLU is generic and can be instantiated with existing convex relaxation methods.
The key idea of k-ReLU is to consider multiple ReLUs jointly. We presented our
state-of-the-art certifier kPoly, integrated into ERAN, which combines the Deep-
Poly domain, MILP, the k-ReLU framework, and input refinement for achieving
state-of-the-art complete and incomplete certification. Our results are beyond the
reach of existing certifiers.
We note that our k-ReLU framework is more general and can be extended for
computing more precise convex relaxations of other non-linearities commonly ap-
plied in neural networks such as sigmoid, tanh, and maxpool. This is because
the existing approximations of these non-linearities [188, 221] are also single neu-
ron based and would benefit from considering multiple neurons jointly. For ex-
ample, more precise sigmoid approximations than DeepPoly (Section 5.3) for the
sigmoid assignments y1 := sigmoid(x1 ) and y2 := sigmoid(x2 ) with lx1 , lx2 < 0 and
ux1 , ux2 > 0 can be produced: compute the DeepPoly approximations for both as-
signments by considering the input neuron values to be in the intervals [lxi , 0] and
[0, uxi ] with i ∈ {1, 2}. The resulting approximations can then be combined using
the convex hull operation.
Similarly, our overall refinement approach can also be extended beyond the neu-
ral network architectures presented in this chapter. We believe that our concepts
can be leveraged in the future to design state-of-the-art certifiers for handling the
certification of networks from other applications and against richer specifications.
7
CONCLUSION AND FUTURE WORK

In this dissertation, we presented new methods for enabling automated reasoning


in two practically critical problem domains: programs and deep learning models.
Our approach is based on the framework of numerical abstract interpretation and
involves designing specialized algorithms that exploit the structure of problem in-
stances arising during the analysis. Our key contribution for numerical program
analysis is the development of a new theory of online decomposition, which is
based on the common observation that program transformations only affect a sub-
set of the program variables. Our theory is quite general and can be used for speed-
ing up all existing subpolyhedra domains, often by orders of magnitude, without
any precision loss. In a second step, we leveraged data-driven machine learning for
obtaining heuristics that improve the speed of numerical program analysis with-
out sacrificing too much precision. This required establishing a new connection
between the concepts in static analysis and reinforcement learning.
We adopted an inherently different approach for neural network certification,
where online decomposition does not work. We designed a new abstract domain
for precise and scalable analysis of neural networks containing custom convex re-
laxations of common non-linearities used in neural networks such as ReLU, sig-
moid, and tanh. We also provided a framework for computing convex relaxations
of ReLU that are more precise than prior work based on considering multiple Re-
LUs jointly, which was ignored previously. We developed a new combination of
our relaxations with exact solvers to achieve state-of-the-art certification results.

systems We have released two state-of-the-art systems based on our contribu-


tion in this dissertation: ELINA for numerical program analysis and ERAN for
neural network certification. ELINA contains implementations of the popular Poly-
hedra, Octagon, and Zone domains and enables precise relational analysis of large
real-world Linux device drivers in a few seconds while prior work often timed
out or ran out of memory. ERAN is a flexible and extensible certifier and supports
several application domains, specifications, neural network architectures, and both
incomplete and complete certification of neural networks. It can precisely analyze
large neural networks containing hundreds of thousands of neurons in a few sec-
onds, producing certification results beyond the reach of other competing certifiers.

173
174 conclusion and future work

Next, we discuss several extensions of our work in both problem domains.

7.1 numerical program analysis

Our work opens up several open problems in numerical program analysis. We list
some of these below:

online decomposition beyond numerical program analysis. The


work of [56] extends our results on the applicability of online decomposition for
speeding up numerical program analysis and shows that it can be applied for de-
composing all abstract domains. However, the authors do not identify conditions
where the decomposition does not lose precision when extended to non-numerical
domains or reduced product of numerical and non-numerical domains. We believe
that this is an interesting direction for future work.

semantic online decomposition. The finest partitions possible with our


theory of online decomposition are sensitive to the set of constraints used to repre-
sent an abstract element. A possible direction of future work could be to investigate
whether the abstract domains permit partitions based on semantic criteria indepen-
dent of the particular representation of the abstract element. This can lead to a new
theory and potential speedups over the methods presented in this work.

floating point polyhedra. The Polyhedra domain discussed in Chapter


2 abstracts a set of rational points but does not capture floating-point behavior
needed for analyzing hybrid and embedded systems. Interval Polyhedra [48] can
be used for implementing floating-point polyhedra domain; however, the complex-
ity of the underlying algorithms is doubly exponential. It can be an interesting
problem to investigate whether specialized algorithms can be designed to reduce
some of the complexity barriers for making the floating-point Polyhedra domain
practically efficient.

domain specific language (dsl) for decomposing numerical do-


mains. We provided a mechanical recipe for constructing decomposed trans-
formers from original non-decomposed transformers in Chapter 3. However, gen-
erating an optimized implementation of decomposed transformers still requires
substantial effort. We believe that this process can be automated in the future via
the design of a DSL where one can specify the mathematical definition of abstract
elements and transformers of a given domain. The DSL can then produce a decom-
posed implementation of the abstract domain.

automated synthesis of numerical transformers. Designing abstract


transformers requires significant expertise and effort. We note that the transform-
7.2 neural network certification 175

ers of numerical domains satisfy certain mathematical properties [35] of sound-


ness and precision which can be leveraged to automatically synthesize sound-by-
construction abstract transformers. This combined with the DSL can enable auto-
mated generation of fast, decomposed transformers for numerical domains.

machine learning for systems We believe that our approach of using ma-
chine learning for speeding up numerical analysis presented in Chapter 4 is more
general and can be used to learn adaptive policies for balancing different tradeoffs
in system design. Examples include tuning the degree of compartmentalization in
operating systems [202] for improved performance without sacrificing system se-
curity and balancing the accuracy vs. performance tradeoff in IoT applications [26].
Another direction is to automate the learning process presented in Chapter 4 via
generative models for approximations and dataset generation.

7.2 neural network certification

We next discuss several directions and open problems for future research in neural
network certification:

extending eran. ERAN currently handles a subset of the different dimen-


sions of the neural network certification problem shown in Fig. 1.7. In the future,
ERAN can be extended by designing custom methods to support more applica-
tion domains (e.g., NLP, finance), specifications (e.g., robustness against patches,
or fairness), neural network architectures (e.g., transformers, GANs). Further prob-
abilistic abstract interpretation [58] could be applied for providing probabilistic
guarantees on forecasting models [62].

specialized solver for neural networks. We believe that using off-the-


shelf solvers for complete certification of neural networks produces suboptimal
results as these are not designed for handling the transformations in the neural
networks. Better results can be obtained by designing custom heuristics that exploit
the structure of transformations applied in the neural networks. For example, [135]
uses graph neural networks to learn branching rules for the MILP solver for ReLU
networks, which improves its scalability.

proof transfer. For certain specifications, it might be possible to use the


mathematical certificate of correctness obtained via the analysis for one problem
instance to prove another instance. Examples include proving robustness against
patches [217] where the different adversarial regions overlap and checking whether
a network was tampered with [139].
176 conclusion and future work

designing better training methods. Another direction for future inves-


tigation is of training neural networks that are both provably robust and accurate.
With existing methods, there is a significant drop in network accuracy when train-
ing the network to be provably robust. This problem is particularly grave for larger
networks and more complex datasets like ImageNet. We note that our custom ab-
stractions have already been applied for training state-of-the-art provable neural
networks [148]. In the future, custom training methods and better abstractions can
be designed for obtaining networks that are more provable.

designing better adversarial attacks. Attacks on neural networks pro-


vide an upper bound on their provable robustness whereas incomplete methods
provide a lower bound. For larger networks, there is a substantial gap between the
two bounds. Designing better attacks can reduce this gap. In the future, this di-
rection can be explored by designing custom attacks that leverage the information
from the certification method to reduce the search area for finding an attack. These
counterexamples can also be combined with the training method to achieve better
accuracy and robustness. Further, our relaxations can be used for producing robust
adversarial examples reproducible in the real-world [63].

designing specialized algorithms for interpretability. Improving


network interpretability is crucial for making deep learning models more trust-
worthy. In this direction, one can look into designing specialized algorithms for
explaining the decisions of a deep neural network that identify the set of inputs
and hidden neuron patterns causing a particular decision and generate explana-
tions [167] in the form of symbolic constraints over the inputs and hidden neurons.
The symbolic constraints can be combined with network training for generating
more interpretable networks.

7.3 formal reasoning about cyber-physical systems

An increasing number of cyber-physcial systems these days, such as those used


in autonomous driving, medical diagnosis, and robots, contain both software and
machine learning components. An exciting direction of future research is to com-
bine techniques from both program analysis and neural network certification to
establish formal guarantees on the correctness of not only the individual compo-
nents but the entire system [177]. For example, in the case of self-driving cars, a
generative model of the environment can be learned, and then formal reasoning
can be used to establish that certain safety properties are satisfied by the car with
respect to the obtained model. The interactions between the components make the
problem harder, and therefore specialized combinations may need to be designed.
BIBLIOGRAPHY

[1] ELINA: ETH Library for Numerical Analysis. http://elina.ethz.ch. pages 50,
54, 56, 79, 103, 137

[2] Extended convex hull. Computational Geometry, 20(1):13 – 23, 2001. pages 28,
164

[3] ERAN: ETH Robustness Analyzer for Neural Networks, 2018. pages 137, 164

[4] pycddlib, 2018. pages 164

[5] M. E. Akintunde, A. Kevorchian, A. Lomuscio, and E. Pirovano. Verification


of rnn-based neural agent-environment systems. In Proc. AAAI Conference on
Artificial Intelligence (AAAI), pages 6006–6013, 2019. pages 169

[6] F. Amato, A. López, E. M. Peña-Méndez, P. Vaňhara, A. Hampl, and J. Havel.


Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine,
11(2):47 – 58, 2013. pages 11, 113

[7] G. Anderson, S. Pailoor, I. Dillig, and S. Chaudhuri. Optimization and ab-


straction: A synergistic approach for analyzing neural network robustness. In
Proc. Programming Language Design and Implementation (PLDI), page 731–744,
2019. pages 13, 106, 107, 168

[8] R. Anderson, J. Huchette, C. Tjandraatmadja, and J. P. Vielma. Strong mixed-


integer programming formulations for trained neural networks. In Proc. In-
teger Programming and Combinatorial Optimization (IPCO), volume 11480 of
Lecture Notes in Computer Science, pages 27–42, 2019. pages 13, 114, 148, 149,
158, 168

[9] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust ad-


versarial examples. In J. G. Dy and A. Krause, editors, Proc. International
Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine
Learning Research, pages 284–293, 2018. pages 170

[10] D. Avis. A Revised Implementation of the Reverse Search Vertex Enumeration


Algorithm, pages 177–198. 2000. pages 28, 86

[11] M. Baader, M. Mirman, and M. T. Vechev. Universal approximation with


certified networks. In Proc. International Conference on Learning Representations
(ICLR), 2020. pages 171

177
178 Bibliography

[12] R. Bagnara, P. M. Hill, and E. Zaffanella. The Parma Polyhedra Library:


Toward a complete set of numerical abstractions for the analysis and verifi-
cation of hardware and software systems. Sci. Comput. Program., 72(1-2):3–21,
2008. pages 2, 25, 28, 45, 50, 79

[13] R. Bagnara, P. M. Hill, E. Ricci, and E. Zaffanella. Precise widening operators


for convex polyhedra. Science of Computer Programming, 58(1):28 – 56, 2005.
pages 25, 76, 100

[14] M. Balcan, T. Dick, T. Sandholm, and E. Vitercik. Learning to branch. In


Proc. International Conference on Machine Learning (ICML), pages 344–353, 2018.
pages 108

[15] M. Balunovic, M. Baader, G. Singh, T. Gehr, and M. T. Vechev. Certifying ge-


ometric robustness of neural networks. In Proc. Neural Information Processing
Systems (NeurIPS), pages 15287–15297, 2019. pages viii, 16, 137, 147, 169

[16] M. Balunovic, P. Bielik, and M. T. Vechev. Learning to solve SMT formulas.


In Proc. Neural Information Processing Systems (NeurIPS), pages 10317–10328,
2018. pages 108

[17] T. Baluta, Z. L. Chua, K. S. Meel, and P. Saxena. Scalable quantitative verifi-


cation for deep neural networks. CoRR, abs/2002.06864, 2020. pages 170

[18] F. Banterle and R. Giacobazzi. A fast implementation of the octagon abstract


domain on graphics hardware. In Proc. International Static Analysis Sympo-
sium (SAS), volume 4634 of Lecture Notes in Computer Science, pages 315–335.
Springer, 2007. pages 87

[19] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. V. Nori, and A. Cri-


minisi. Measuring neural net robustness with constraints. In Proc. Neural
Information Processing Systems (NIPS), pages 2621–2629, 2016. pages 170

[20] A. Becchi and E. Zaffanella. A direct encoding for nnc polyhedra. In Proc.
Computer Aided Verification (CAV), pages 230–248, 2018. pages 86

[21] A. Becchi and E. Zaffanella. An efficient abstract domain for not necessar-
ily closed polyhedra. In A. Podelski, editor, Proc. Static Analysis Symposium
(SAS), pages 146–165, 2018. pages 86

[22] A. Becchi and E. Zaffanella. Revisiting polyhedral analysis for hybrid sys-
tems. In Proc. Static Analysis Symposium (SAS), pages 183–202, 2019. pages
86

[23] R. Beckett, A. Gupta, R. Mahajan, and D. Walker. Abstract interpretation of


distributed network control planes. Proc. ACM Program. Lang., 4(POPL), 2019.
pages 1
Bibliography 179

[24] D. Beyer. Reliable and reproducible competition results with benchexec and
witnesses (report on sv-comp 2016). In Proc. Tools and Algorithms for the Con-
struction and Analysis of Systems (TACAS), pages 887–904, 2016. pages 50, 79,
104

[25] P. Bielik, V. Raychev, and M. Vechev. Learning a static analyzer from data.
pages 233–253, 2017. pages 107

[26] K. Birman, B. Hariharan, and C. D. Sa. Cloud-hosted intelligence for real-


time iot applications. ACM SIGOPS Oper. Syst. Rev., 53(1):7–13, 2019. pages
175

[27] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monni-


aux, and X. Rival. A static analyzer for large safety-critical software. In Proc.
Programming Language Design and Implementation (PLDI), pages 196–207, 2003.
pages 1, 2, 7, 21, 49, 86

[28] M. Böhme, V. Pham, and A. Roychoudhury. Coverage-based greybox fuzzing


as markov chain. In Proc. Conference on Computer and Communications Security
(CCS), page 1032–1043, 2016. pages 108

[29] B. Boigelot and I. Mainz. Efficient symbolic representation of convex polyhe-


dra in high-dimensional spaces. In S. K. Lahiri and C. Wang, editors, Proc.
Automated Technology for Verification and Analysis (ATVA), pages 284–299, 2018.
pages 86

[30] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D.


Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End
to end learning for self-driving cars. CoRR, abs/1604.07316, 2016. pages 11,
113

[31] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel. Cnn-cert: An effi-
cient framework for certifying robustness of convolutional neural networks.
In Proc. AAAI Conference on Artificial Intelligence (AAAI), pages 3240–3247,
2019. pages 13, 149, 152, 164, 168, 169

[32] E. Botoeva, P. Kouvaros, J. Kronqvist, A. Lomuscio, and R. Misener. Efficient


verification of relu-based neural networks via dependency analysis. In Proc.
AAAI Conference on Artificial Intelligence (AAAI), pages 3291–3299, 2020. pages
13, 114, 148, 149, 168

[33] R. Boutonnet and N. Halbwachs. Disjunctive relational abstract interpreta-


tion for interprocedural program analysis. In Proc. Verification, Model Check-
ing, and Abstract Interpretation (VMCAI), pages 136–159, 2019. pages 86

[34] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch.


CoRR, abs/1712.09665, 2017. pages 170
180 Bibliography

[35] A. Bugariu, V. Wüstholz, M. Christakis, and P. Müller. Automatically testing


implementations of numerical abstract domains. In Proc. ACM/IEEE Auto-
mated Software Engineering (ASE), pages 768–778, 2018. pages 175

[36] R. Bunel, J. Lu, I. Turkaslan, P. H. S. Torr, P. Kohli, and M. P. Kumar. Branch


and bound for piecewise linear neural network verification. J. Mach. Learn.
Res., 21:42:1–42:39, 2020. pages 13, 114, 148, 149, 168

[37] R. Bunel, I. Turkaslan, P. H. Torr, P. Kohli, and M. P. Kumar. A unified view


of piecewise linear neural network verification. In Proc. Advances in Neural
Information Processing Systems (NeurIPS), pages 4795–4804, 2018. pages 13,
114, 148, 149, 152, 164, 168

[38] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. J.


Goodfellow, A. Madry, and A. Kurakin. On evaluating adversarial robustness.
CoRR, abs/1902.06705, 2019. pages 170

[39] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Ground-truth adversarial ex-


amples. CoRR, abs/1709.10207, 2017. pages 170

[40] N. Carlini and D. A. Wagner. Towards evaluating the robustness of neural


networks. In Proc. IEEE Symposium on Security and Privacy (SP), pages 39–57,
2017. pages 12, 115, 124, 138, 165, 170

[41] N. Carlini and D. A. Wagner. Audio adversarial examples: Targeted attacks


on speech-to-text. In Proc. IEEE Security and Privacy Workshops, (SP), pages
1–7. IEEE Computer Society, 2018. pages 170

[42] K. Chae, H. Oh, K. Heo, and H. Yang. Automatically generating features for
learning program analysis heuristics for c-like languages. Proc. ACM Program.
Lang., 1(OOPSLA):101:1–101:25, 2017. pages 89, 105, 106

[43] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopad-


hyay. Adversarial attacks and defences: A survey. CoRR, abs/1810.00069,
2018. pages 170

[44] A. Chawdhary and A. King. Compact difference bound matrices. In Proc.


Asian Symposium on Programming Languages and Systems (APLAS), pages 471–
490, 2017. pages 87

[45] A. Chawdhary, E. Robbins, and A. King. Simple and efficient algorithms for
octagons. In Proc. Asian Symposium on Programming Languages and Systems
(APLAS), volume 8858 of Lecture Notes in Computer Science, pages 296–313.
Springer, 2014. pages 87

[46] A. Chawdhary, E. Robbins, and A. King. Incrementally closing octagons.


Formal Methods Syst. Des., 54(2):232–277, 2019. pages 87
Bibliography 181

[47] J. Chen, J. Wei, Y. Feng, O. Bastani, and I. Dillig. Relational verification using
reinforcement learning. Proc. ACM Program. Lang., 3(OOPSLA):141:1–141:30,
2019. pages 107

[48] L. Chen, A. Miné, and P. Cousot. A sound floating-point polyhedra abstract


domain. In Proc. Asian Symposium on Programming Languages and Systems
(APLAS), volume 5356 of Lecture Notes in Computer Science, pages 3–18, 2008.
pages 174

[49] C.-H. Cheng, G. Nührenberg, and H. Ruess. Maximum resilience of artificial


neural networks. In Proc. Automated Technology for Verification and Analysis
(ATVA), 2017. pages 13, 114, 149, 168

[50] N. Chernikova. Algorithm for discovering the set of all the solutions of a
linear programming problem. USSR Computational Mathematics and Mathe-
matical Physics, 8(6):282 – 293, 1968. pages 28

[51] R. Clarisó and J. Cortadella. The octahedron abstract domain. Science of


Computer Programming, 64:115 – 139, 2007. pages 6, 21, 55, 58

[52] E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided


abstraction refinement. In E. A. Emerson and A. P. Sistla, editors, Proc. Com-
puter Aided Verification (CAV), pages 154–169, 2000. pages 108

[53] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certified adversarial robustness


via randomized smoothing. In K. Chaudhuri and R. Salakhutdinov, editors,
Proc. International Conference on Machine Learning (ICML), volume 97, pages
1310–1320, 2019. pages 170

[54] P. Cousot and R. Cousot. Static determination of dynamic properties of programs,


pages 106–130. 1976. pages 6, 58

[55] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for
static analysis of programs by construction or approximation of fixpoints. In
Proc. Symposium on Principles of Programming Languages (POPL), page 238–252,
1977. pages 1, 154

[56] P. Cousot, R. Giacobazzi, and F. Ranzato. A2 i: Abstract2 interpretation.


PACMPL, 3(POPL):42:1–42:31, 2019. pages 9, 87, 174

[57] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among


variables of a program. In Proc. Symposium on Principles of Programming Lan-
guages (POPL), pages 84–96, 1978. pages 6, 57, 58, 114, 118, 149

[58] P. Cousot and M. Monerau. Probabilistic abstract interpretation. In Proc.


Programming Languages and Systems, pages 169–193, 2012. pages 175
182 Bibliography

[59] F. Croce, M. Andriushchenko, and M. Hein. Provable robustness of relu


networks via maximization of linear regions. In Proc. International Conference
on Artificial Intelligence and Statistics (AISTATS), volume 89 of Proceedings of
Machine Learning Research, pages 2057–2066, 2019. pages 170

[60] F. Croce and M. Hein. Provable robustness against all adversarial $l_p$-
perturbations for $p\geq 1$. In Proc. International Conference on Learning Rep-
resentations (ICLR), 2020. pages 170

[61] C. Cummins, P. Petoumenos, A. Murray, and H. Leather. Compiler fuzzing


through deep learning. In Proc. International Symposium on Software Testing
and Analysis (ISSTA), page 95–105, 2018. pages 108

[62] R. Dang-Nhu, G. Singh, P. Bielik, and M. Vechev. Adversarial attacks on


probabilistic autoregressive forecasting models. 2020. pages viii, 13, 170, 175

[63] D. I. Dimitrov, G. Singh, T. Gehr, and M. Vechev. Scalable inference of sym-


bolic adversarial examples, 2020. pages viii, 176

[64] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial
attacks with momentum. In Proc. Computer Vision and Pattern Recognition
(CVPR), pages 9185–9193, 2018. pages 138, 170

[65] S. Drews, A. Albarghouthi, and L. D’Antoni. Proving data-poisoning robust-


ness in decision trees. In Proc. Programming Language Design and Implementa-
tion (PLDI), pages 1083–1097. ACM, 2020. pages 169

[66] S. Dutta, S. Jha, S. Sankaranarayanan, and A. Tiwari. Output range analysis


for deep feedforward neural networks. In Proc. NASA Formal Methods (NFM),
2018. pages 13, 114, 149, 168

[67] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli. A dual approach


to scalable verification of deep networks. In Proc. Uncertainty in Artificial
Intelligence (UAI), pages 162–171, 2018. pages 13, 114, 148, 152, 154, 163, 164,
168

[68] K. D. Dvijotham, R. Stanforth, S. Gowal, C. Qin, S. De, and P. Kohli. Efficient


neural network verification with exactness characterization. In Proc. Uncer-
tainty in Artificial Intelligence, UAI, page 164, 2019. pages 13, 114, 148, 152,
163, 164, 168

[69] R. Ehlers. Formal verification of piece-wise linear feed-forward neural net-


works. In Automated Technology for Verification and Analysis (ATVA), 2017.
pages xvi, 13, 114, 119, 120, 148, 149, 150, 160, 168
Bibliography 183

[70] L. Engstrom, D. Tsipras, L. Schmidt, and A. Madry. A rotation and


a translation suffice: Fooling cnns with simple transformations. CoRR,
abs/1712.02779, 2017. pages 170

[71] P. Ferrara, F. Logozzo, and M. Fähndrich. Safer unsafe code for .net. SIG-
PLAN Not., 43:329–346, 2008. pages 58

[72] M. Fischer, M. Baader, and M. Vechev. Certification of semantic perturbations


via randomized smoothing, 2020. pages 170

[73] M. Fischer, M. Balunovic, D. Drachsler-Cohen, T. Gehr, C. Zhang, and M. T.


Vechev. DL2: training and querying neural networks with logic. In Proc.
International Conference on Machine Learning (ICLR), volume 97 of Proceedings
of Machine Learning Research, pages 1931–1941, 2019. pages 171

[74] T. Fischer and C. Krauss. Deep learning with long short-term memory net-
works for financial market predictions. European Journal of Operational Re-
search, 270(2):654–669, 2018. pages 11

[75] K. Fukuda and A. Prodon. Double description method revisited. In M. Deza,


R. Euler, and I. Manoussakis, editors, Combinatorics and Computer Science,
pages 91–111, 1996. pages 28, 86, 164

[76] G. Gange, J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Exploit-


ing Sparsity in Difference-Bound Matrices, pages 189–211. 2016. pages 87

[77] P. Garg, D. Neider, P. Madhusudan, and D. Roth. Learning invariants us-


ing decision trees and implication counterexamples. In Proc. Symposium on
Principles of Programming Languages (POPL), page 499–512, 2016. pages 108

[78] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and


M. Vechev. AI2: Safety and robustness certification of neural networks with
abstract interpretation. In Proc. IEEE Symposium on Security and Privacy (SP),
volume 00, pages 948–963, 2018. pages 1, 13, 14, 114, 121, 124, 137, 138, 148,
154, 155, 163, 168

[79] A. Geramifard, T. J. Walsh, and S. Tellex. A Tutorial on Linear Function Approx-


imators for Dynamic Programming and Reinforcement Learning. Now Publishers
Inc., Hanover, MA, USA, 2013. pages 92

[80] E. Gershuni, N. Amit, A. Gurfinkel, N. Narodytska, J. A. Navas, N. Rinetzky,


L. Ryzhyk, and M. Sagiv. Simple and precise static analysis of untrusted linux
kernel extensions. In Proc. Programming Language Design and Implementation
(PLDI), page 1069–1084, 2019. pages 1, 2
184 Bibliography

[81] K. Ghorbal, E. Goubault, and S. Putot. The zonotope abstract domain tay-
lor1+. In Proc. Computer Aided Verification (CAV), pages 627–633, 2009. pages
14, 59, 114, 137

[82] K. Ghorbal, E. Goubault, and S. Putot. A logical product approach to zono-


tope intersection. In Proc. Computer Aided Verification (CAV), page 212–226,
2010. pages 59

[83] R. Giacobazzi, F. Ranzato, and F. Scozzari. Making abstract interpretations


complete. J. ACM, 47(2):361–416, Mar. 2000. pages 59

[84] P. Godefroid, H. Peleg, and R. Singh. Learn&fuzz: Machine learning for


input fuzzing. In Proc. Automated Software Engineering (ASE), page 50–59,
2017. pages 108

[85] S. Gokulanathan, A. Feldsher, A. Malca, C. W. Barrett, and G. Katz. Sim-


plifying neural networks with the marabou verification engine. CoRR,
abs/1910.12396, 2019. pages 169

[86] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adver-


sarial examples. In Proc. International Conference on Learning Representations
(ICLR), 2015. pages 11, 113, 170

[87] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arand-


jelovic, T. A. Mann, and P. Kohli. On the effectiveness of interval bound
propagation for training verifiably robust models. CoRR, abs/1810.12715,
2018. pages 170

[88] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. D. McDaniel. Adver-


sarial perturbations against deep neural networks for malware classification.
CoRR, abs/1606.04435, 2016. pages 170

[89] S. Gu and L. Rigazio. Towards deep neural network architectures robust to


adversarial examples. In International Conference on Learning Representations
(ICLR), Workshop Track Proceedings, 2015. pages 170

[90] D. Guidotti, F. Leofante, L. Pulina, and A. Tacchella. Verification of neural net-


works: Enhancing scalability through pruning. CoRR, abs/2003.07636, 2020.
pages 17, 169

[91] A. Gurfinkel, T. Kahsai, A. Komuravelli, and J. A. Navas. The SeaHorn verifi-


cation framework. In Proc. Computer Aided Verification (CAV), pages 343–361,
2015. pages 10, 50, 79

[92] Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2018. pages
158, 164
Bibliography 185

[93] N. Halbwachs, D. Merchat, and L. Gonnord. Some ways to reduce the space
dimension in polyhedra computations. Formal Methods in System Design
(FMSD), 29(1):79–95, 2006. pages 8, 21, 85

[94] N. Halbwachs, D. Merchat, and C. Parent-Vigouroux. Cartesian factoring of


polyhedra in linear relation analysis. In Proc. Static Analysis Symposium (SAS),
pages 355–365, 2003. pages 8, 21, 85

[95] J. He, M. Balunovic, N. Ambroladze, P. Tsankov, and M. T. Vechev. Learning


to fuzz from symbolic execution with application to smart contracts. In Proc.
Conference on Computer and Communications Security (CCS), page 531–548, 2019.
pages 108

[96] J. He, G. Singh, M. Püschel, and M. Vechev. Learning fast and precise numer-
ical analysis. In Proc. Programming Language Design and Implementation (PLDI),
page 1112–1127. Association for Computing Machinery, 2020. pages viii, 11,
106, 107

[97] T. A. Henzinger and P.-H. Ho. A note on abstract interpretation strategies


for hybrid automata. In Proc. Hybrid Systems II, pages 252–264, 1995. pages 1

[98] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In


Proc. Principles of Programming Languages (POPL), pages 58–70, 2002. pages
108

[99] K. Heo, H. Oh, and H. Yang. Learning a variable-clustering strategy for


Octagon from labeled data generated by a static analysis. In Proc. Static
Analysis Symposium (SAS), pages 237–256, 2016. pages 86, 89, 106

[100] K. Heo, H. Oh, and H. Yang. Resource-aware program analysis via online ab-
straction coarsening. In Proc. International Conference on Software Engineering
(ICSE), 2019. pages 106, 107

[101] J. M. Howe and A. King. Logahedra: A new weakly relational domain. In


Proc. Automated Technology for Verification and Analysis (ATVA), pages 306–320,
2009. pages 58

[102] J. L. Imbert. Fourier’s elimination: Which to choose? Principles and Practice of


Constraint Programming, pages 117–129, 1993. pages 26, 32, 68

[103] Y. Jacoby, C. W. Barrett, and G. Katz. Verifying recurrent neural networks


using invariant inference. CoRR, abs/2004.02462, 2020. pages 169

[104] B. Jeannet and A. Miné. APRON: A library of numerical abstract domains


for static analysis. In Proc. Computer Aided Verification (CAV), volume 5643,
pages 661–667, 2009. pages 25, 28, 45, 50, 79
186 Bibliography

[105] M. Jeon, S. Jeong, and H. Oh. Precise and scalable points-to analysis via
data-driven context tunneling. Proc. ACM Program. Lang., 2(OOPSLA):140:1–
140:29, 2018. pages 106, 107

[106] K. Jia and M. Rinard. Efficient exact verification of binarized neural networks.
CoRR, abs/2005.03597, 2020. pages 169

[107] K. Jia and M. Rinard. Exploiting verified neural networks via floating point
numerical error. CoRR, abs/2003.03021, 2020. pages 17, 137

[108] R. Jia, A. Raghunathan, K. Göksel, and P. Liang. Certified robustness to


adversarial word substitutions. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors,
Proc. Empirical Methods in Natural Language Processing (EMNLP), pages 4127–
4140, 2019. pages 148, 169

[109] J.-H. Jourdan. Sparsity preserving algorithms for octagons. Electronic Notes
in Theoretical Computer Science, 331:57 – 70, 2017. Workshop on Numerical
and Symbolic Abstract Domains (NSAD). pages 87

[110] K. D. Julian, M. J. Kochenderfer, and M. P. Owen. Deep neural network


compression for aircraft collision avoidance systems. CoRR, abs/1810.04240,
2018. pages 15, 164

[111] J. Julien Bertrane, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, and


X. Rival. Static analysis by abstract interpretation of embedded critical soft-
ware. SIGSOFT Softw. Eng. Notes, 36(1):1–8, 2011. pages 1

[112] M. Karr. Affine relationships among variables of a program. Acta Informatica,


6:133–151, 1976. pages 58

[113] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex:


An efficient SMT solver for verifying deep neural networks. In Proc. Inter-
national Conference on Computer Aided Verification (CAV), pages 97–117, 2017.
pages 13, 114, 148, 149, 165, 168

[114] G. Katz, D. A. Huang, D. Ibeling, K. Julian, C. Lazarus, R. Lim, P. Shah,


S. Thakoor, H. Wu, A. Zeljić, D. L. Dill, M. J. Kochenderfer, and C. Barrett. The
marabou framework for verification and analysis of deep neural networks. In
Proc. Computer Aided Verification (CAV), pages 443–452, 2019. pages 13, 114,
148, 149, 168

[115] E. B. Khalil, P. L. Bodic, L. Song, G. L. Nemhauser, and B. Dilkina. Learning to


branch in mixed integer programming. In Proc. AAAI Conference on Artificial
Intelligence (AAAI), page 724–731, 2016. pages 108
Bibliography 187

[116] E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial


optimization algorithms over graphs. In Proc. Neural Information Processing
Systems (NIPS), pages 6348–6358, 2017. pages 108

[117] C. Ko, Z. Lyu, L. Weng, L. Daniel, N. Wong, and D. Lin. POPQORN: quantify-
ing robustness of recurrent neural networks. In Proc. International Conference
on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Re-
search, pages 3468–3477, 2019. pages 169

[118] A. Krizhevsky. Learning multiple layers of features from tiny images. Tech-
nical report, 2009. pages 138, 164

[119] S. Kulkarni, R. Mangal, X. Zhang, and M. Naik. Accelerating program analy-


ses by cross-program training. In Proc. Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), pages 359–377, 2016. pages 107

[120] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the


physical world. In Proc. International Conference on Learning Representations
(ICLR). OpenReview.net, 2017. pages 170

[121] M. G. Lagoudakis and M. L. Littman. Learning to select branching rules in


the dpll procedure for satisfiability. Electronic Notes in Discrete Mathematics,
9:344 – 359, 2001. pages 108

[122] V. Laviron and F. Logozzo. Subpolyhedra: A (more) scalable approach to


infer linear inequalities. In Proc. Verification, Model Checking, and Abstract In-
terpretation (VMCAI), volume 5403, pages 229–244, 2009. pages 6, 21

[123] H. Le Verge. A note on Chernikova’s algorithm. Technical Report 635, IRISA,


1992. pages 28

[124] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning ap-


plied to document recognition. In Proc. of the IEEE, pages 2278–2324, 1998.
pages 115, 138, 164

[125] M. Lécuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certified robust-


ness to adversarial examples with differential privacy. In 2019 IEEE Sympo-
sium on Security and Privacy (SP), pages 656–672, 2019. pages 170

[126] G. Lederman, M. N. Rabe, S. Seshia, and E. A. Lee. Learning heuristics


for quantified boolean formulas through reinforcement learning. In Proc.
International Conference on Learning Representations (ICLR), 2020. pages 108

[127] J. Li, J. Liu, P. Yang, L. Chen, X. Huang, and L. Zhang. Analyzing deep
neural networks with symbolic propagation: Towards higher precision and
faster verification. In Proc. Static Analysis Symposium (SAS), volume 11822 of
Lecture Notes in Computer Science, pages 296–319, 2019. pages 13, 168
188 Bibliography

[128] J. Li, S. Qu, X. Li, J. Szurley, J. Z. Kolter, and F. Metze. Adversarial music:
Real world audio adversary against wake-word detection system. In Proc. Ad-
vances in Neural Information Processing Systems (NeurIPS), pages 11908–11918,
2019. pages 170

[129] L. Li, M. Weber, X. Xu, L. Rimanic, T. Xie, C. Zhang, and B. Li. Provable robust
learning based on transformation-specific smoothing. CoRR, abs/2002.12398,
2020. pages 170

[130] L. Li, Z. Zhong, B. Li, and T. Xie. Robustra: Training provable robust neural
networks over reference adversarial space. In Proc. International Joint Confer-
ence on Artificial Intelligence (IJCAI), pages 4711–4717, 2019. pages 170

[131] P. Liang, O. Tripp, and M. Naik. Learning minimal abstractions. In Proc.


Symposium on Principles of Programming Languages (POPL), pages 31–42, 2011.
pages 89, 106, 107

[132] X. Lin, H. Zhu, R. Samanta, and S. Jagannathan. ART: abstraction refinement-


guided training for provably correct neural networks. CoRR, abs/1907.10662,
2019. pages 171

[133] J. Liu, L. Chen, A. Miné, and J. Wang. Input validation for neural networks
via runtime local robustness verification. CoRR, abs/2002.03339, 2020. pages
17, 169

[134] F. Logozzo and M. Fähndrich. Pentagons: A weakly relational abstract do-


main for the efficient validation of array accesses. In Proc. Symposium on
Applied Computing, pages 184–188, 2008. pages 1, 2, 6, 7, 24, 58

[135] J. Lu and M. P. Kumar. Neural network branching for neural network ver-
ification. In Proc. International Conference on Learning Representations (ICLR),
2020. pages 5, 13, 108, 114, 148, 149, 168, 175

[136] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep


learning models resistant to adversarial attacks. In Proc. International Confer-
ence on Learning Representations (ICLR), 2018. pages 164, 165, 170

[137] A. Maréchal, D. Monniaux, and M. Périn. Scalable minimizing-operators


on polyhedra via parametric linear programming. In Proc. Static Analysis
Symposium (SAS), pages 212–231, 2017. pages 49, 86

[138] A. Maréchal and M. Périn. Efficient elimination of redundancies in polyhedra


by raytracing. In Proc. Verification, Model Checking, and Abstract Interpretation,
(VMCAI), pages 367–385, 2017. pages 49, 86

[139] E. L. Merrer and G. Tredan. Tampernn: Efficient tampering detection of de-


ployed neural nets, 2019. pages 175
Bibliography 189

[140] A. Miné. A new numerical abstract domain based on difference-bound ma-


trices. In Proc. Programs As Data Objects (PADO), pages 155–172, 2001. pages
6, 24, 55, 57, 58, 76, 83

[141] A. Miné. Relational abstract domains for the detection of floating-point run-
time errors. In Proc. European Symposium on Programming (ESOP), pages 3–17,
2004. pages 26, 133, 134

[142] A. Miné. The octagon abstract domain. Higher Order and Symbolic Computa-
tion, 19(1):31–100, 2006. pages 6, 24, 55, 57, 58, 76

[143] A. Miné, E. Rodriguez-Carbonell, and A. Simon. Speeding up polyhedral


analysis by identifying common constraints. Electronic Notes in Theoretical
Computer Science, 267(1):127 – 138, 2010. pages 57, 85

[144] M. Mirman, T. Gehr, and M. Vechev. Differentiable abstract interpretation for


provably robust neural networks. In Proc. International Conference on Machine
Learning (ICML), pages 3575–3583, 2018. pages 116, 122, 138, 164, 165, 170,
171

[145] M. Mirman, T. Gehr, and M. Vechev. Robustness certification of generative


models, 2020. pages 169

[146] M. Mirman, G. Singh, and M. Vechev. A provable defense for deep residual
networks, 2019. pages viii

[147] M. Mirman, G. Singh, and M. T. Vechev. A provable defense for deep residual
networks. CoRR, abs/1903.12519, 2019. pages 170, 171

[148] M. V. Mislav Balunovic. Adversarial training and provable defenses: Bridging


the gap. In Proc. International Conference on Learning Representations (ICLR),
2020. pages 5, 17, 116, 122, 170, 171, 176

[149] J. Mohapatra, T. Weng, P. Chen, S. Liu, and L. Daniel. Towards verify-


ing robustness of neural networks against semantic perturbations. CoRR,
abs/1912.09533, 2019. pages 137, 148, 169

[150] J. X. Morris, E. Lifland, J. Y. Yoo, and Y. Qi. Textattack: A framework for


adversarial attacks in natural language processing, 2020. pages 170

[151] T. S. Motzkin, H. Raiffa, G. L. Thompson, and R. M. Thrall. The double


description method. In Proc. Contributions to the theory of games, vol. 2, pages
51–73. 1953. pages 23

[152] C. Müller, G. Singh, M. Püschel, and M. Vechev. Neural network robustness


verification on gpus, 2020. pages viii, 116, 137, 147, 169
190 Bibliography

[153] N. Narodytska, S. P. Kasiviswanathan, L. Ryzhyk, M. Sagiv, and T. Walsh.


Verifying properties of binarized deep neural networks. In Proc. AAAI Con-
ference on Artificial Intelligence (AAAI), pages 6615–6624, 2018. pages 169

[154] N. Narodytska, H. Zhang, A. Gupta, and T. Walsh. In search for a sat-friendly


binarized neural network architecture. In Proc. International Conference on
Learning Representations (ICLR), 2020. pages 169

[155] A. M. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images. In Proc. IEEE
Computer Vision and Pattern Recognition (CVPR), pages 427–436, 2015. pages
170

[156] H. Oh, W. Lee, K. Heo, H. Yang, and K. Yi. Selective context-sensitivity


guided by impact pre-analysis. In Proc. Programming Language Design and
Implementation (PLDI), pages 475–484, 2014. pages 2, 89, 105, 106

[157] H. Oh, H. Yang, and K. Yi. Learning a strategy for adapting a program anal-
ysis via bayesian optimisation. In Proc. Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), pages 572–588, 2015. pages 89, 105,
106

[158] K. Pei, Y. Cao, J. Yang, and S. Jana. Deepxplore: Automated whitebox testing
of deep learning systems. In Proc. Symposium on Operating Systems Principles
(SOSP), pages 1–18, 2017. pages 169, 170

[159] K. Pei, Y. Cao, J. Yang, and S. Jana. Towards practical verification of machine
learning: The case of computer vision systems. CoRR, abs/1712.01785, 2017.
pages 136

[160] C. Qin, K. D. Dvijotham, B. O’Donoghue, R. Bunel, R. Stanforth, S. Gowal,


J. Uesato, G. Swirszcz, and P. Kohli. Verification of non-linear specifications
for neural networks. In Proc. International Conference on Learning Representa-
tions (ICLR), 2019. pages 169

[161] Y. Qin, N. Carlini, G. W. Cottrell, I. J. Goodfellow, and C. Raffel. Impercepti-


ble, robust, and targeted adversarial examples for automatic speech recogni-
tion. In Proc. International Conference on Machine Learning (ICML), volume 97
of Proceedings of Machine Learning Research, pages 5231–5240, 2019. pages 170

[162] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adver-


sarial examples. In Proc. International Conference on Machine Learning (ICML),
2018. pages 170
Bibliography 191

[163] A. Raghunathan, J. Steinhardt, and P. S. Liang. Semidefinite relaxations for


certifying robustness to adversarial examples. In Advances in Neural Informa-
tion Processing Systems (NeurIPS), pages 10877–10887. 2018. pages 13, 114, 148,
151, 152, 154, 163, 164, 168

[164] F. Ranzato and F. Tapparo. Strong preservation as completeness in abstract


interpretation. In Proc. European Symposium on Programming (ESOP), pages
18–32, 2004. pages 59

[165] F. Ranzato and M. Zanella. Robustness verification of support vector ma-


chines. In Proc. Static Analysis Symposium (SAS), volume 11822 of Lecture
Notes in Computer Science, pages 271–295, 2019. pages 169

[166] F. Ranzato and M. Zanella. Abstract interpretation of decision tree ensemble


classifiers. In Proc. AAAI Conference on Artificial Intelligence (AAAI), pages
5478–5486, 2020. pages 169

[167] M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explain-
ing the predictions of any classifier. In Proc. Knowledge Discovery and Data
Mining (KDD), pages 1135–1144, 2016. pages 176

[168] H. G. Rice. Classes of recursively enumerable sets and their decision prob-
lems. Transactions of the American Mathematical Society, 74(2):358–366, 1953.
pages 7

[169] X. Rival and L. Mauborgne. The trace partitioning abstract domain. ACM
Trans. Program. Lang. Syst., 29(5), 2007. pages 116, 135, 136

[170] W. Ruan, X. Huang, and M. Kwiatkowska. Reachability analysis of deep neu-


ral networks with provable guarantees. In Proc. International Joint Conference
on Artificial Intelligence, (IJCAI), 2018. pages 13, 114, 148, 149, 168

[171] A. Ruoss, M. Balunovic, M. Fischer, and M. T. Vechev. Learning certified


individually fair representations. CoRR, abs/2002.10312, 2020. pages 169

[172] W. Ryou, J. Chen, M. Balunovic, G. Singh, A. M. Dan, and M. T. Vechev. Fast


and effective robustness certification for recurrent neural networks. CoRR,
abs/2005.13300, 2020. pages viii, 16, 148, 169

[173] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet. Adversarial manipulation of


deep representations. In Y. Bengio and Y. LeCun, editors, Proc. International
Conference on Learning Representations (ICLR), 2016. pages 170

[174] H. Salman, J. Li, I. P. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and


G. Yang. Provably robust deep learning via adversarially trained smoothed
classifiers. In Proc. Advances in Neural Information Processing Systems (NeurIPS),
pages 11289–11300, 2019. pages 170
192 Bibliography

[175] H. Salman, G. Yang, H. Zhang, C. Hsieh, and P. Zhang. A convex relaxation


barrier to tight robustness verification of neural networks. In Proc.Neural
Information Processing Systems (NeurIPS), pages 9832–9842, 2019. pages 13, 15,
16, 114, 148, 149, 150, 152, 154, 163, 168

[176] S. Sankaranarayanan, M. A. Colón, H. Sipma, and Z. Manna. Efficient


strongly relational polyhedral analysis. In Proc. Verification, Model Checking,
and Abstract Interpretation (VMCAI), pages 111–125, 2006. pages 6, 21

[177] S. A. Seshia, S. Jha, and T. Dreossi. Semantic adversarial deep learning. IEEE
Des. Test, 37(2):8–18, 2020. pages 176

[178] D. She, K. Pei, D. Epstein, J. Yang, B. Ray, and S. Jana. NEUZZ: efficient
fuzzing with neural program smoothing. In Proc. IEEE Symposium on Security
and Privacy (S&P), pages 803–817, 2019. pages 108

[179] Z. Shi, H. Zhang, K. Chang, M. Huang, and C. Hsieh. Robustness verification


for transformers. In Proc. International Conference on Learning Representations
(ICLR), 2020. pages 148, 169

[180] D. Shriver, D. Xu, S. G. Elbaum, and M. B. Dwyer. Refactoring neural net-


works for verification. CoRR, abs/1908.08026, 2019. pages 17, 169

[181] X. Si, H. Dai, M. Raghothaman, M. Naik, and L. Song. Learning loop invari-
ants for program verification. In Proc. Neural Information Processing Systems
(NeurIPS), pages 7751–7762, 2018. pages 108

[182] A. Simon and A. King. Exploiting sparsity in polyhedral analysis. In Proc.


Static Analysis Symposium (SAS), pages 336–351, 2005. pages 85

[183] A. Simon and A. King. The two variable per inequality abstract domain.
Higher Order Symbolic Computation (HOSC), 23:87–143, 2010. pages 55, 58

[184] A. Simon, A. Venet, G. Amato, F. Scozzari, and E. Zaffanella. Efficient con-


straint/generator removal from double description of polyhedra. Electronic
Notes in Theoretical Computer Science, 307:3 – 15, 2014. pages 43, 86

[185] G. Singh, R. Ganvir, M. Püschel, and M. Vechev. Beyond the single neuron
convex barrier for neural network certification. In Advances in Neural Infor-
mation Processing Systems (NeurIPS), pages 15098–15109. 2019. pages vii, 15,
151

[186] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. Vechev. Fast and effective
robustness certification. In Proc. Advances in Neural Information Processing
Systems (NeurIPS), pages 10825–10836. 2018. pages vii, 13, 14, 15, 114, 122,
148, 149, 152, 166, 168
Bibliography 193

[187] G. Singh, T. Gehr, M. Püschel, and M. Vechev. Boosting robustness certifica-


tion of neural networks. In Proc. International Conference on Learning Represen-
tations (ICLR), 2019. pages vii, 13, 15, 151, 152, 164, 166, 168

[188] G. Singh, T. Gehr, M. Püschel, and M. Vechev. An abstract domain for certi-
fying neural networks. Proc. ACM Program. Lang., 3(POPL):41:1–41:30, 2019.
pages vii, 13, 15, 16, 59, 116, 151, 152, 154, 163, 164, 166, 168, 169, 171

[189] G. Singh, M. Püschel, and M. Vechev. Making numerical program analysis


fast. In Proc. Programming Language Design and Implementation (PLDI), pages
303–313, 2015. pages vii, 21, 55, 56, 63, 79, 81, 87, 95, 107

[190] G. Singh, M. Püschel, and M. Vechev. Fast polyhedra abstract domain. In


Proc. Principles of Programming Languages (POPL), pages 46–59, 2017. pages
vii, 8, 21, 55, 56, 63, 79, 95, 107, 137, 160

[191] G. Singh, M. Püschel, and M. Vechev. A practical construction for decom-


posing numerical abstract domains. Proc. ACM Program. Lang., 2(POPL):55:1–
55:28, 2017. pages vii, 9, 56, 59, 95, 107, 121, 137, 154, 163, 164

[192] G. Singh, M. Püschel, and M. Vechev. Fast numerical program analysis with
reinforcement learning. In Proc. Computer Aided Verification CAV, pages 211–
229, 2018. pages vii, 11, 91

[193] M. Sotoudeh and A. V. Thakur. Computing linear restrictions of neural net-


works. In Proc. Advances in Neural Information Processing Systems (NeurIPS),
pages 14132–14143, 2019. pages 169

[194] M. Stoer and F. Wagner. A simple min-cut algorithm. J. ACM, 44(4):585–591,


1997. pages 96

[195] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press,


Cambridge, MA, USA, 1st edition, 1998. pages 89, 92, 102

[196] P. Tabacof and E. Valle. Exploring the space of adversarial images. In Proc.
International Joint Conference on Neural Networks (IJCNN), pages 426–433, 2016.
pages 170

[197] V. Tjeng, K. Y. Xiao, and R. Tedrake. Evaluating robustness of neural net-


works with mixed integer programming. In Proc. International Conference on
Learning Representations, (ICLR), 2019. pages 13, 114, 148, 149, 151, 152, 154,
158, 164, 166, 167, 168

[198] F. Tramèr, N. Carlini, W. Brendel, and A. Madry. On adaptive attacks to


adversarial example defenses. CoRR, abs/2002.08347, 2020. pages 170
194 Bibliography

[199] H. Tran, X. Yang, D. M. Lopez, P. Musau, L. V. Nguyen, W. Xiang, S. Bak, and


T. T. Johnson. NNV: the neural network verification tool for deep neural net-
works and learning-enabled cyber-physical systems. CoRR, abs/2004.05519,
2020. pages 13, 114, 148, 152, 168

[200] C. Urban and A. Miné. An abstract domain to infer ordinal-valued ranking


functions. In Proc. European Symposium on Programming (ESOP), pages 412–
431, 2014. pages 2, 24

[201] C. Urban and A. Miné. A decision tree abstract domain for proving condi-
tional termination. In Proc. Static Analysis Symposium (SAS), pages 302–318,
2014. pages 2, 24

[202] N. Vasilakis, B. Karel, N. Roessler, N. Dautenhahn, A. DeHon, and J. M.


Smith. Towards fine-grained, automated application compartmentalization.
In Proc. Programming Languages and Operating Systems (PLOS), PLOS’17, page
43–50, 2017. pages 175

[203] A. Venet. Abstract cofibered domains: Application to the alias analysis of


untyped programs. In R. Cousot and D. A. Schmidt, editors, Proc. Static
Analysis Symposium (SAS), volume 1145 of Lecture Notes in Computer Science,
pages 366–382, 1996. pages 22

[204] A. Venet and G. Brat. Precise and efficient static array bound checking for
large embedded C programs. In Proc. Programming Language Design and Im-
plementation (PLDI), pages 231–242, 2004. pages 2, 87

[205] A. J. Venet. The Gauge domain: Scalable analysis of linear inequality invari-
ants. In Proc. Computer Aided Verification (CAV), pages 139–154, 2012. pages
6

[206] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Efficient formal safety
analysis of neural networks. In Proc. Advances in Neural Information Processing
Systems (NeurIPS), pages 6369–6379. 2018. pages 13, 114, 148, 151, 152, 154,
163, 164, 166, 168

[207] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal security anal-
ysis of neural networks using symbolic intervals. In Proc. USENIX Security
Symposium (USENIX Security 18), pages 1599–1614, 2018. pages 168

[208] C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279–292,


1992. pages 92

[209] S. Wei, P. Mardziel, A. Ruef, J. S. Foster, and M. Hicks. Evaluating design


tradeoffs in numeric static analysis for java. In Proc. European Symposium on
Programming (ESOP), pages 653–682, 2018. pages 2, 24
Bibliography 195

[210] Z. Wei, J. Chen, X. Wei, L. Jiang, T. Chua, F. Zhou, and Y. Jiang. Heuristic
black-box adversarial attacks on video recognition models. In Proc. AAAI
Conference on Artificial Intelligence (AAAI), pages 12338–12345, 2020. pages
170

[211] L. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, L. Daniel, D. Boning,


and I. Dhillon. Towards fast computation of certified robustness for ReLU
networks. In Proc. International Conference on Machine Learning (ICML), vol-
ume 80, pages 5276–5285, 2018. pages 13, 114, 137, 138, 139, 148, 149, 152,
154, 163, 164, 168, 169

[212] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples


via the convex outer adversarial polytope. In Proc. International Conference on
Machine Learning (ICML), volume 80, pages 5283–5292, 2018. pages 13, 114,
148, 149, 154, 163, 164, 165, 168, 169, 170

[213] E. Wong, F. R. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable ad-


versarial defenses. In Proc. Advances in Neural Information Processing Systems
(NeurIPS), page 8410–8419, 2018. pages 170

[214] M. Wu and M. Kwiatkowska. Robustness guarantees for deep neural net-


works on videos. CoRR, abs/1907.00098, 2019. pages 169

[215] K. Y. Xiao, V. Tjeng, N. M. M. Shafiullah, and A. Madry. Training for faster


adversarial robustness verification via inducing relu stability. In Proc. Inter-
national Conference on Learning Representations (ICLR), 2019. pages 169

[216] Y. Yang and M. Rinard. Correctness verification of neural networks. CoRR,


abs/1906.01030, 2019. pages 169

[217] P. yeh Chiang*, R. Ni*, A. Abdelkader, C. Zhu, C. Studor, and T. Goldstein.


Certified defenses for adversarial patches. In Proc. International Conference on
Learning Representations (ICLR), 2020. pages 148, 169, 175

[218] H. Yu and D. Monniaux. An efficient parametric linear programming solver


and application to polyhedral projection. In Proc. Static Analysis Symposium
(SAS), pages 203–224, 2019. pages 49, 86

[219] E. Zaffanella. On the efficiency of convex polyhedra. Electronic Notes in The-


oretical Computer Science, 334:31 – 44, 2018. Seventh Workshop on Numerical
and Symbolic Abstract Domains (NSAD 2017). pages 86

[220] H. Zhang, H. Chen, C. Xiao, S. Gowal, R. Stanforth, B. Li, D. S. Boning, and


C. Hsieh. Towards stable and efficient training of verifiably robust neural
networks. In Proc. International Conference on Learning Representations (ICLR),
2020. pages 170
196 Bibliography

[221] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efficient neural
network robustness certification with general activation functions. In Proc.
Advances in Neural Information Processing Systems (NeurIPS). 2018. pages 13,
149, 152, 154, 163, 164, 168, 169, 171

[222] H. Zhu, S. Magill, and S. Jagannathan. A data-driven CHC solver. In Proc.


Programming Language Design and Implementation (PLDI), page 707–721, 2018.
pages 108

You might also like