0% found this document useful (0 votes)
10 views33 pages

Deterministic Parallel Fixpoint Computation

Uploaded by

iyedhaddad.2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Deterministic Parallel Fixpoint Computation

Uploaded by

iyedhaddad.2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Deterministic Parallel Fixpoint Computation

SUNG KOOK KIM, University of California, Davis, U.S.A.


ARNAUD J. VENET, Facebook, Inc., U.S.A.
ADITYA V. THAKUR, University of California, Davis, U.S.A.
arXiv:1909.05951v2 [cs.PL] 14 Nov 2019

Abstract interpretation is a general framework for expressing static program analyses. It reduces the problem
of extracting properties of a program to computing an approximation of the least fixpoint of a system of
equations. The de facto approach for computing the approximation of this fixpoint uses a sequential algorithm
based on weak topological order (WTO). This paper presents a deterministic parallel algorithm for fixpoint
computation by introducing the notion of weak partial order (WPO). We present an algorithm for constructing
a WPO in almost-linear time. Finally, we describe Pikos, our deterministic parallel abstract interpreter, which
extends the sequential abstract interpreter IKOS. We evaluate the performance and scalability of Pikos on a
suite of 1017 C programs. When using 4 cores, Pikos achieves an average speedup of 2.06x over IKOS, with a
maximum speedup of 3.63x. When using 16 cores, Pikos achieves a maximum speedup of 10.97x.
CCS Concepts: • Software and its engineering → Automated static analysis; • Theory of computation
→ Program analysis.

Additional Key Words and Phrases: Abstract interpretation, Program analysis, Concurrency
ACM Reference Format:
Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur. 2020. Deterministic Parallel Fixpoint Computation.
Proc. ACM Program. Lang. 4, POPL, Article 14 (January 2020), 33 pages. https://doi.org/10.1145/3371082

1 INTRODUCTION
Program analysis is a widely adopted approach for automatically extracting properties of the
dynamic behavior of programs [Balakrishnan et al. 2010; Ball et al. 2004; Brat and Venet 2005;
Delmas and Souyris 2007; Jetley et al. 2008]. Program analyses are used, for instance, for program
optimization, bug finding, and program verification. To be effective, a program analysis needs to be
efficient, precise, and deterministic (the analysis always computes the same output for the same
input program) [Bessey et al. 2010]. This paper aims to improve the efficiency of program analysis
without sacrificing precision or determinism.
Abstract interpretation [Cousot and Cousot 1977] is a general framework for expressing static
program analyses. A typical use of abstract interpretation to determine program invariants involves:
C1 An abstract domain A that captures relevant program properties. Abstract domains have been
developed to perform, for instance, numerical analysis [Cousot and Halbwachs 1978; Miné 14
2004, 2006; Oulamara and Venet 2015; Singh et al. 2017; Venet 2012], heap analysis [Rinetzky
et al. 2005; Wilhelm et al. 2000], and information flow [Giacobazzi and Mastroeni 2004].

Authors’ addresses: Sung Kook Kim, Computer Science, University of California, Davis, Davis, California, 95616, U.S.A.,
[email protected]; Arnaud J. Venet, Facebook, Inc. Menlo Park, California, 94025, U.S.A., [email protected]; Aditya V. Thakur,
Computer Science, University of California, Davis, Davis, California, 95616, U.S.A., [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
© 2020 Copyright held by the owner/author(s).
2475-1421/2020/1-ART14
https://doi.org/10.1145/3371082

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:2 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

C2 An equation system X = F (X) over A that captures the abstract program behavior:
X1 = F 1 (X1 , . . . , Xn ), X2 = F 1 (X1 , . . . , Xn ), . . . , Xn = Fn (X1 , . . . , Xn ) (1)
Each index i ∈ [1, n] corresponds to a control point of the program, the unknowns Xi of the
system correspond to the invariants to be computed for these control points, and each Fi is a
monotone operator incorporating the abstract transformers and control flow of the program.
C3 Computing an approximation of the least fixpoint of Eq. 1. The exact least solution of the system
can be computed using Kleene iteration starting from the least element of A n provided A is
Noetherian. However, most interesting abstract domains require the use of widening to ensure
termination, which may result in an over-approximation of the invariants of the program. A
subsequent narrowing iteration tries to improve the post solution via a downward fixpoint
iteration. In practice, abstract interpreters compute an approximation of the least fixpoint. In
this paper, we use “fixpoint” to refer to such an approximation of the least fixpoint.
The iteration strategy specifies the order in which the equations in Eq. 1 are applied during fixpoint
computation and where widening is performed. For a given abstraction, the efficiency, precision,
and determinism of an abstract interpreter depends on the iteration strategy. The iteration strategy
is determined by the dependencies between the individual equations in Eq. 1. If this dependency
graph is acyclic, then the optimal iteration strategy is any topological order of the vertices in the
graph. This is not true when the dependency graph contains cycles. Furthermore, each cycle in the
dependency graph needs to be cut by at least one widening point.
Since its publication, Bourdoncle’s algorithm [Bourdoncle 1993] has become the de facto approach
for computing an efficient iteration strategy for abstract interpretation. Bourdoncle’s algorithm
determines the iteration strategy from the weak topological order (WTO) of the vertices in the de-
pendency graph corresponding to the equation system. However, there are certain disadvantages to
Bourdoncle’s algorithm: (i) the iteration strategy computed by Bourdoncle’s algorithm is inherently
sequential: WTO gives a total order of the vertices in the dependency graph; (ii) computing WTO
using Bourdoncle’s algorithm has a worst-case cubic time complexity; (iii) the mutually-recursive
nature of Bourdoncle’s algorithm makes it difficult to understand (even for seasoned practitioners
of abstract interpretation); and (iv) applying Bourdoncle’s algorithm, as is, to deep dependency
graphs can result in a stack overflow in practice [Crab 2018; ReDex 2017].
This paper addresses the above disadvantages of Bourdoncle’s algorithm by presenting a concur-
rent iteration strategy for fixpoint computation in an abstract interpreter (§ 5). This concurrent
fixpoint computation can be efficiently executed on modern multi-core hardware. The algorithm for
computing our iteration strategy has a worst-case almost-linear time complexity, and lends itself to
a simple iterative implementation (§6). The resulting parallel abstract interpreter, however, remains
deterministic: for the same program, all possible executions of the parallel fixpoint computation
give the same result. In fact, the fixpoint computed by our parallel algorithm is the same as that
computed by Bourdoncle’s sequential algorithm (§7).
To determine the concurrent iteration strategy, this paper introduces the notion of a weak partial
order (WPO) for the dependency graph of the equation system (§4). WPO generalizes the notion of
WTO: a WTO is a linear extension of a WPO (§7). Consequently, the almost-linear time algorithm for
WPO can also be used to compute WTO. The algorithm for WPO construction handles dependency
graphs that are irreducible [Hecht and Ullman 1972; Tarjan 1973]. The key insight behind our
approach is to adapt algorithms for computing loop nesting forests [Ramalingam 1999, 2002] to the
problem of computing a concurrent iteration strategy for abstract interpretation.
We have implemented our concurrent fixpoint iteration strategy in a tool called Pikos (§8). Using
a suite of 1017 C programs, we compare the performance of Pikos against the state-of-the-art
abstract interpreter IKOS [Brat et al. 2014], which uses Bourdoncle’s algorithm (§9). When using 4

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:3

cores, Pikos achieves an average speedup of 2.06x over IKOS, with a maximum speedup of 3.63x.
We see that Pikos exhibits a larger speedup when analyzing programs that took longer to analyze
using IKOS. Pikos achieved an average speedup of 1.73x on programs for which IKOS took less
than 16 seconds, while Pikos achieved an average speedup of 2.38x on programs for which IKOS
took greater than 508 seconds. The scalability of Pikos depends on the structure of the program
being analyzed. When using 16 cores, Pikos achieves a maximum speedup of 10.97x.
The contributions of the paper are as follows:
• We introduce the notion of a weak partial order (WPO) for a directed graph (§4), and show
how this generalizes the existing notion of weak topological order (WTO) (§7).
• We present a concurrent algorithm for computing the fixpoint of a set of equations (§5).
• We present an almost-linear time algorithm for WPO and WTO construction (§6).
• We describe our deterministic parallel abstract interpreter Pikos (§8), and evaluate its perfor-
mance on a suite of C programs (§9).
§2 presents an overview of the technique; §3 presents mathematical preliminaries; §10 describes
related work; §11 concludes.

2 OVERVIEW
Abstract interpretation is a general framework that captures most existing approaches for static
program analyses and reduces extracting properties of programs to approximating their semantics
[Cousot and Cousot 1977; Cousot et al. 2019]. Consequently, this section is not meant to capture all
possible approaches to implementing abstract interpretation or describe all the complex optimiza-
tions involved in a modern implementation of an abstract interpreter. Instead it is only meant to set
the appropriate context for the rest of the paper, and to capture the relevant high-level structure of
abstract-interpretation implementations such as IKOS [Brat et al. 2014].
Fixpoint equations. Consider the simple program P represented by its control flow graph (CFG)
in Figure 1(a). We will illustrate how an abstract interpreter would compute the set of values
that variable x might contain at each program point i in P. In this example, we will use the
standard integer interval domain [Cousot and Cousot 1976, 1977] represented by the complete
def
lattice ⟨Int, ⊑, ⊥, ⊤, ⊔, ⊓⟩ with Int = {⊥} ∪ {[l, u] | l, u ∈ Z ∧ l ≤ u} ∪ {[−∞, u] | u ∈ Z} ∪ {[l, ∞] |
l ∈ Z} ∪ {[−∞, ∞]}. The partial order ⊑ on Int is interval inclusion with the empty interval ⊥ = ∅
encoded as [∞, −∞] and ⊤ = [−∞, ∞].
Figure 1(b) shows the corresponding equation system X = F (X), where X = (X0 , X1 , . . . , X8 ).
Each equation in this equation system is of the form Xi = Fi (X0 , X1 , . . . , X8 ), where the variable
Xi ∈ Int represents the interval value at program point i in P and Fi is monotone. The operator
+ represents the (standard) addition operator over Int. As is common (but not necessary), the
dependencies among the equations reflect the CFG of program P.
The exact least solution X lfp of the equation system X = F (X) would give the required set of
values for variable x at program point i. Let X 0 = (⊥, ⊥, . . . , ⊥) and X i+1 = F (X i ), i ≥ 0 represent
the standard Kleene iterates, which converge to X lfp .
Chaotic iteration. Instead of applying the function F during Kleene iteration, one can use chaotic
iterations [Cousot 1977; Cousot and Cousot 1977] and apply the individual equations Fi . The order
in which the individual equations are applied is determined by the chaotic iteration strategy.
Widening. For non-Noetherian abstract domains, such as the interval abstract domain, termina-
tion of this Kleene iteration sequence requires the use of a widening operator (▽) [Cousot 2015;
Cousot and Cousot 1977]. A set of widening points W is chosen and the equation for i ∈ W is replaced
by Xi = Xi ▽Fi (X0 , . . . , Xn ). An admissible set of widening points “cuts” each cycle in the dependency

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:4 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

X0 = ⊤ X1 = [0, 0] X0 = ⊤ X1 = [0, 0]
0:
X2 = X1 ⊔ X3 X2 = X2 ▽(X1 ⊔ X3 )
X3 = X2 + [1, 1] X3 = X2 + [1, 1]
1 : x=0 8 : x=1
X4 = X2 ⊔ X8 ⊔ X5 ⊔ X6 X4 = X4 ▽(X2 ⊔ X8 ⊔ X5 ⊔ X6 )
2: 4: 7: X5 = X4 + [1, 1] X5 = X4 + [1, 1]
X6 = [0, 0] X7 = X4 X6 = [0, 0] X7 = X4
3 : x=x+1 5 : x=x+1 6 : x=0 X8 = [1, 1] X8 = [1, 1]
(a) (b) (c)

Fig. 1. (a) A simple program P that updates x; (b) Corresponding equation system for interval domain;
(c) Corresponding equation system with vertices 2 and 4 as widening points.

graph of the equation system by the use of a widening operator to ensure termination [Cousot
and Cousot 1977]. Finding a minimal admissible set of widening points is an NP-complete prob-
lem [Garey and Johnson 2002]. A possible widening operator for the interval abstract domain is
defined by: ⊥▽I = I ▽⊥ = I ∈ Int and [i, j]▽[k, l] = [if k < i then − ∞ else i, if l > j then ∞ else j].
This widening operator is non-monotone. The application of a widening operator may result in a
crude over-approximation of the least fixpoint; more sophisticated widening operators as well as
techniques such as narrowing can be used to ensure precision [Amato and Scozzari 2013; Amato
et al. 2016; Cousot and Cousot 1977; Gopan and Reps 2006; Kim et al. 2016]. Although the discussion
of our fixpoint algorithm uses a simple widening strategy (§5), our implementation incorporates
more sophisticated widening and narrowing strategies implemented in IKOS (§8).
Bourdoncle’s approach. Bourdoncle [1993] introduces the notion of hierarchical total order (HTO)
of a set and weak topological order (WTO) of a directed graph (see §7). An admissible set of widening
points as well as a chaotic iteration strategy, called the recursive strategy, can be computed using
a WTO of the dependency graph of the equation system. A WTO for the equation system in
def
Figure 1(b) is T = 0 8 1 (2 3) (4 5 6) 7. The set of elements between two matching parentheses
are called a component of the WTO, and the first element of a component is called the head of the
component. Notice that components are non-trivial strongly connected components (“loops”) in
the directed graph of Figure 1(a). Bourdoncle [1993] proves that the set of component heads is an
admissible set of widening points. For Figure 1(b), the set of heads {2, 4} is an admissible set of
widening points. Figure 1(c) shows the corresponding equation system that uses widening.
def
The iteration strategy generated using WTO T is S 1 = 0 8 1 [2 3]∗ [4 5 6]∗ 7, where occurrence of i
in the sequence represents applying the equation for Xi , and [. . .]∗ is the “iterate until stabilization”
operator. A component is stabilized if iterating over its elements does not change their values.
The component heads 2 and 4 are chosen as widening points. The iteration sequence S 1 should be
interpreted as “apply equation for X0 , then apply the equation for X8 , then apply the equation for X1 ,
repeatedly apply equations for X2 and X3 until stabilization” and so on. Furthermore, Bourdoncle
[1993] showed that stabilization of a component can be detected by the stabilization of its head.
For instance, stabilization of component {2, 3} can be detected by the stabilization of its head 2.
This property minimizes the number of (potentially expensive) comparisons between abstract
values during fixpoint computation. For the equation system of Figure 1(c), the use of Bourdoncle’s
fp
recursive iteration strategy would give us X7 = [0, ∞].
Asynchronous iterations. The iteration strategy produced by Bourdoncle’s approach is nec-
essarily sequential, because the iteration sequence is generated from a total order. One could

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:5

alternatively implement a parallel fixpoint computation using asynchronous iterations [Cousot


1977]: each processor i computes the new value of Xi accessing the shared state consisting of X
using the appropriate equation from Figure 1(c). However, the parallel fixpoint computation using
asynchronous iterations is non-deterministic; that is, the fixpoint computed might differ based
on the order in which the equations are applied (as noted by Monniaux [2005]). The reason for
this non-determinism is due to the non-monotonicity of widening. For example, if the iteration
def
sequence S 2 = 0 8 [4 5 6]∗ 1 [2 3]∗ [4 5 6]∗ 7 is used to compute the fixpoint for the equations in
fp
Figure 1(c), then X7 = [−∞, ∞], which differs from the value computed using iteration sequence S 1 .
Our deterministic parallel fixpoint computation. In this paper, we present a parallel fixpoint
computation that is deterministic, and, in fact, gives the same result as Bourdoncle’s sequential
fixpoint computation (§ 7). Our approach generalizes Bourdoncle’s hierarchical total order and
weak topological order to hierarchical partial order (HPO) and weak partial order (WPO) (§4). The
iteration strategy is then based on the WPO of the dependency graph of the equation system. The
use of partial orders, instead of total orders, enables us to generate an iteration strategy that is
concurrent (§5). For the equation system in Figure 1(c), our approach would produce the iteration
def
sequence represented as S 3 = 0 ((1 [2 3]∗ ) | 8) [4 (5 | 6)]∗ 7, where | represents concurrent
execution. Thus, the iteration (sub)sequences 1 [2 3]∗ and 8 can be computed in parallel, as well as
the subsequences 5 and 6. However, unlike iteration sequence S 2 , the value for X4 in S 3 will not be
computed until the component {2, 3} stabilizes. Intuitively, determinism is achieved by ensuring
that no element outside the component will read the value of an element in the component until the
component stabilizes. In our algorithm, the value of 2 is read by elements outside of the component
{2, 3} only after the component stabilizes. Similarly, the value of 4 will be read by 7 only after the
component {4, 5, 6} stabilizes. Parallel fixpoint computation based on a WPO results in the same
fixpoint as the sequential computation based on a WTO (§7).

3 MATHEMATICAL PRELIMINARIES
A binary relation R on set S is a subset of the Cartesian product of S and S; that is, R ⊆ S × S.
Given S ′ ⊆ S, let R⇂S ′ = R ∩ (S ′ × S ′). A relation R on set S is said to be one-to-one iff for all
w, x, y, z ∈ S, (x, z) ∈ R and (y, z) ∈ R implies x = y, and (w, x) ∈ R and (w, y) ∈ R implies x = y. A
transitive closure of a binary relation R, denoted by R+ , is the smallest transitive binary relation
that contains R. A reflexive transitive closure of a binary relation R, denoted by R∗ , is the smallest
reflexive transitive binary relation that contains R.
A preorder (S, R) is a set S and a binary relation R over S that is reflexive and transitive. A partial
order (S, R) is a preorder where R is antisymmetric. Two elements u, v ∈ S are comparable in a
partial order (S, R) if (u, v) ∈ R or (v, u) ∈ R. A linear (total) order or chain is a partial order in
which every pair of its elements are comparable. A partial order (S, R′) is an extension of a partial
order (S, R) if R ⊆ R′; an extension that is a linear order is called a linear extension. There exists a
linear extension for every partial order [Szpilrajn 1930].
def def
Given a partial order (S, R), define ⌊⌊x⌉R = {y ∈ S | (x, y) ∈ R}, and ⌊x⌉⌉R = {v ∈ S | (v, x) ∈ R},
def
and ⌊⌊x, y⌉⌉R = ⌊⌊x⌉R ∩ ⌊y⌉⌉R . A partial order (S, R) is a forest if for all x ∈ S, (⌊x⌉⌉R , R) is a chain.

Example 3.1. Let (Y , T) be a partial order with Y = {y1 , y2 , y3 , y4 } and T = {(y1 , y2 ), (y2 , y3 ),
(y2 , y4 )}∗ . Let Y ′ = {y1 , y2 } ⊆ Y , then T⇂Y ′ = {(y1 , y1 ), (y1 , y2 ), (y2 , y2 )}.

⌊⌊y1 ⌉T = {y1 , y2 , y3 , y4 } ⌊⌊y2 ⌉T = {y2 , y3 , y4 } ⌊⌊y3 ⌉T = {y3 } ⌊⌊y4 ⌉T = {y4 }


⌊y1 ⌉⌉T = {y1 } ⌊y2 ⌉⌉T = {y1 , y2 } ⌊y3 ⌉⌉T = {y1 , y2 , y3 } ⌊y4 ⌉⌉T = {y1 , y2 , y4 }

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:6 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

We see that the partial order (Y , T) is a forest because for all y ∈ Y , (⌊y⌉⌉T , T) is a chain.
⌊⌊y1 , y1 ⌉⌉T = {y1 } ⌊⌊y1 , y2 ⌉⌉T = {y1 , y2 } ⌊⌊y1 , y3 ⌉⌉T = {y1 , y2 , y3 } ⌊⌊y1 , y4 ⌉⌉T = {y1 , y2 , y4 }
⌊⌊y4 , y1 ⌉⌉T = ∅ ⌊⌊y4 , y2 ⌉⌉T = ∅ ⌊⌊y4 , y3 ⌉⌉T = ∅ ⌊⌊y4 , y4 ⌉⌉T = {y4 }

A directed graph G(V , ) is defined by a set of vertices V and a binary relation over V . The
reachability among vertices is captured by the preorder ∗ : there is a path from vertex u to vertex
v in G iff u ∗ v. G is a directed acyclic graph (DAG) iff (V , ∗ ) is a partial order. A topological order
of a DAG G corresponds to a linear extension of the partial order (V , ∗ ). We use G ⇂V ′ to denote
the subgraph (V ∩ V ′, ⇂V ′ ). Given a directed graph G(V , ), a depth-first numbering (DFN) is the
order in which vertices are discovered during a depth-first search (DFS) of G. A post depth-first
numbering (post-DFN) is the order in which vertices are finished during a DFS of G. A depth-first
tree (DFT) of G is a tree formed by the edges used to discover vertices during a DFS. Given a DFT
of G, an edge u v is called (i) a tree edge if v is a child of u in the DFT; (ii) a back edge if v is an
ancestor of u in the DFT; (iii) a forward edge if it is not a tree edge and v is a descendant of u in the
DFT; and (iv) a cross edge otherwise [Cormen et al. 2009]. In general, a directed graph might contain
multiple connected components and a DFS yields a depth-first forest (DFF). The lowest common
ancestor (LCA) of vertices u and v in a rooted tree T is a vertex that is an ancestor of both u and v
and that has the greatest depth in T [Tarjan 1979]. It is unique for all pairs of vertices.
A strongly connected component (SCC) of a directed graph G(V , ) is a subgraph of G such that
u ∗ v for all u, v in the subgraph. An SCC is trivial if it only consists of a single vertex without
any edges. A feedback edge set B of a graph G(V , ) is a subset of such that (V , ( \ B)∗ ) is a
partial order; that is, the directed graph G(V , \ B) is a DAG. The problem of finding the minimum
feedback edge set is NP-complete [Karp 1972].
Example 3.2. Let G(V , ) be directed graph shown in Figure 1(a). The ids used to label the
vertices V of G correspond to a depth-first numbering (DFN) of the directed graph G. The following
lists the vertices in increasing post-DFN numbering: 3, 5, 6, 7, 4, 2, 1, 8, 0. Edges (3, 2), (5, 4), and
(6, 4) are back edges for the DFF that is assumed by the DFN, edge (8, 4) is a cross edge, and the
rest are tree edges. The lowest common ancestor (LCA) of 3 and 7 in this DFF is 2. The subgraphs
induced by the vertex sets {2, 3}, {4, 5}, {4, 6}, and {4, 5, 6} are all non-trivial SCCs. The minimum
feedback edge set of G is F = {(3, 2), (5, 4), (6, 4)}. We see that the graph G(V , \ F ) is a DAG. ■

4 AXIOMATIC CHARACTERIZATION OF WEAK PARTIAL ORDER


This section introduces the notion of Weak Partial Order (WPO), presents its axiomatic charac-
terization, and proves relevant properties. A constructive characterization is deferred to §6. The
notion of WPO is built upon the notion of a hierarchical partial order, which we define first.
A Hierarchical Partial Order (HPO) is a partial order (S, ⪯) overlaid with a nesting relation
N ⊆ S × S that structures the elements of S into well-nested hierarchical components. As we will
see in §5, the elements in a component are iterated over until stabilization in a fixpoint iteration
strategy, and the partial order enables concurrent execution.
Definition 4.1. A hierarchical partial order H is a 3-tuple (S, ⪯, N) such that:
H1. (S, ⪯) is a partial order.
H2. N ⊆ S × S is one-to-one.
H3. (x, h) ∈ N implies h ≺ x.
def 
H4. Partial order (CH , ⊆) is a forest, where CH = ⌊⌊h, x⌉⌉ ⪯ | (x, h) ∈ N is the set of components.
H5. For all h, x, u, v ∈ S, h ⪯ u ⪯ x and (x, h) ∈ N and u ⪯ v implies either x ≺ v or v ⪯ x. ■

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:7

def
For each (x, h) ∈ N, the set ⌊⌊h, x⌉⌉ ⪯ = {u ∈ S | h ⪯ u ⪯ x } defines a component of the HPO,
with x and h referred to as the exit and head of the component. A component can be identified
using either its head or its exit due to condition H2; we use Ch or C x to denote a component with
head h and exit x. Condition H3 states that the nesting relation N is in the opposite direction of the
partial order ⪯. The reason for this convention will be clearer when we introduce the notion of
WPO (Definition 4.3), where we show that the nesting relation N has a connection to the feedback
edge set of the directed graph. Condition H4 implies that the set of components CH is well-nested;
that is, two components should be either mutually disjoint or one must be a subset of the other.
Condition H5 states that if an element v depends upon an element u in a component C x , then
either v depends on the exit x or v is in the component C x . Recall that (x, h) ∈ N and h ⪯ u ⪯ x
implies u ∈ C x by definition. Furthermore, v ⪯ x and u ⪯ v implies v ∈ C x . Condition H5 ensures
determinism of the concurrent iteration strategy (§5); this condition ensures that the value of u
does not “leak” from C x during fixpoint computation until the component C x stabilizes.
Example 4.2. Consider the partial order (Y , T) defined in Example 3.1. Let N1 = {(y3 , y1 ), (y4 , y2 )}.
(Y , T, N1 ) violates condition H4. In particular, the components Cy3 = Cy1 = {y1 , y2 , y3 } and Cy4 =
Cy2 = {y2 , y4 } are neither disjoint nor is one a subset of the other. Thus, (Y , T, N1 ) is not an HPO.
Let N2 = {(y3 , y1 )}. (Y , T, N2 ) violates condition H5. In particular, y2 ∈ Cy3 and (y2 , y4 ) ∈ T, but
we do not have y3 ≺ y4 or y4 ⪯ y3 . Thus, (Y , T, N2 ) is not an HPO.
Let N3 = {(y2 , y1 )}. (Y , T, N3 ) is an HPO satisfying all conditions H1–H5. ■

Building upon the notion of an HPO, we now define a Weak Partial Order (WPO) for a directed
graph G(V , ). In the context of fixpoint computation, G represents the dependency graph of the
fixpoint equation system. To find an effective iteration strategy, the cyclic dependencies in G need
to be broken. In effect, a WPO partitions the preorder ∗ into a partial order ∗ and an edge set
defined using of a nesting relation .
Definition 4.3. A weak partial order W for a directed graph G(V , ) is a 4-tuple (V , X , , )
such that:
W1. V ∩ X = ∅.
W2. ⊆ X × V , and for all x ∈ X , there exists v ∈ V such that x v.
W3. ⊆ (V ∪ X ) × (V ∪ X ).
W4. H (V ∪ X , ∗ , ) is a hierarchical partial order (HPO).
W5. For all u v, either (i) u + v, or (ii) u ∈ ⌊⌊v, x⌉⌉ ∗ and x v for some x ∈ X . ■

Condition W4 states that H (V ∪ X , ∗ , ) is an HPO. Consequently, (V ∪ X , ∗ ) is a partial


order and plays the role of the nesting relation in Definition 4.1. We refer to the relation in
WPO W as the scheduling constraints, the relation as stabilization constraints, and the set X as
the exits. Furthermore, the notion of components CH of an HPO H as defined in Definition 4.1
def 
can be lifted to components of WPO CW = ⌊⌊h, x⌉⌉ ∗ | x h . Condition W4 ensures that the
concurrent iteration strategy for the WPO is deterministic (§5). Condition W1 states that exits are
always new. Condition W2 states that X does not contain any unnecessary elements.
Condition W5 connects the relation of the directed graph G with relations and used in
the HPO H (V ∪ X , ∗ , ) in condition W4. Condition W5 ensures that all dependencies u v in
G are captured by the HPO H either via a relation in the partial order u + v or indirectly via the
component corresponding to x v, as formalized by the following theorem:
Theorem 4.4. For graph G(V , ) and its WPO W(V , X , , ), ∗ ⊆( ∪ )∗ .
Proof. By property W5, for each u v, either u + v, or u ∈ ⌊⌊v, x⌉⌉ ∗ and x v for some
x ∈ X . For the latter, u ∈ ⌊⌊v, x⌉⌉ ∗ implies that u + x v. Thus, ∗ ⊆ ( ∪ )∗ . □

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:8 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

3 4
1 2 5 1 2 3 4 x3 5 x2 10
6 7 8 6 7 8 x6
10
9 9

(a) (b)

Fig. 2. (a) Directed graph G 1 . Vertices V are labeled using depth-first numbering (DFN); (b) WPO W1 for G 1
with exits X = {x 2 , x 3 , x 6 }.

Example 4.5. Consider the directed graph G 1 (V , ) in Figure 2(a). Figure 2(b) shows a WPO
W1 (V , X , , ) for G 1 , where X = {x 2 , x 3 , x 6 }, and satisfies all conditions in Definition 4.3. One
can verify that (V ∪ X , ∗ , ) satisfies all conditions in Definition 4.1 and is an HPO.
Suppose we were to remove x 6 5 and instead add 6 5 to W1 (to more closely match the
edges in G 1 ), then this change would violate condition H5, and hence condition W4.
If we were to only remove x 6 5 from W1 , then it would still satisfy condition W4. However,
this change would violate condition W5. ■

Definition 4.6. For graph G(V , ) and its WPO W(V , X , , ), the back edges of G with respect to
def
the WPO W, denote by B W , are defined as B W = {(u, v) ∈ | ∃x ∈ X .u ∈ ⌊⌊v, x⌉⌉ ∗ ∧x v}. ■
In other words, (u, v) ∈ B W if u v satisfies condition W5-(ii) in Definition 4.3. Theorem 4.7
proves that B W is a feedback edge set for G, and Theorem 4.9 shows that the subgraph (V , \ B W )
forms a DAG. Together these two theorems capture the fact that the WPO W(V , X , , ) partitions
the preorder ∗ of G(V , ) into a partial order ∗ and a feedback edge set B W .
Theorem 4.7. For graph G(V , ) and its WPO W(V , X , , ), B W is a feedback edge set for G.
Proof. Let v 1 v 2 · · · vn v 1 be a cycle of n distinct vertices in G. We will show that there
exists i ∈ [1, n) such that vi vi+1 ∈ B W ; that is, vi ∈ ⌊⌊vi+1 , x⌉⌉ ∗ and x vi+1 for some x ∈ X .
If this were not true, then v 1 + · · · + vn + v 1 (using W5). Therefore, v 1 + vn and vn + v 1 ,
which contradicts the fact that ∗ is a partial order. Thus, B W cuts all cycles at least once and is a
feedback edge set. □
Example 4.8. For the graph G 1 (V , ) in Figure 2(a) and WPO W1 (V , X , , ) in Figure 2(b),
B W1 = {(4, 3), (8, 6), (5, 2)}. One can verify that B W1 is a feedback edge set for G 1 . ■

Theorem 4.9. For graph G(V , ) and its WPO W(V , X , , ), ( \ B W )+ ⊆ +.

Proof. Each edge (u, v) ∈ B W satisfies W5-(ii) by definition. Therefore, all edges in ( \ BW)
must satisfy W5-(i). Thus, u + v for all edges (u, v) ∈ ( \ B W ), and ( \ B W )∗ ⊆ ∗ . □
Given the tuple W(V , X , , ) and a set S, we use W⇂S to denote the tuple (V ∩ S, X ∩
S, ⇂S , ⇂S ). The following two theorems enable us to decompose a WPO into sub-WPOs, which
allows us to use structural induction when proving properties of WPOs.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:9

Theorem 4.10. For graph G(V , ) and its WPO W(V , X , , ), W⇂C is a WPO for subgraph
G ⇂C for all C ∈ CW .
Proof. We show that W⇂C satisfies all conditions W1–W5 in Definition 4.3 for all C ∈ CW .
Conditions W1, W2, W3, W4-[H1, H2, H3, H4] trivially holds true.
[W4-H5] If v < C, H5 is true because (u, v) < ∗ ⇂C . Else, H5 is still satisfied with ∗ ⇂C .
[W5] We show that u + v implies u +⇂C v, if u, v ∈ C. Let C = ⌊⌊h, x⌉⌉ ∗ with x h. If u + ⇂C v
is false, there exists w ∈ ⌊⌊u⌉ ∗ ∩ ⌊v⌉⌉ ∗ such that w < C. However, u + w and w < C implies
x + w (using H5). This contradicts that (V ∪ X , ∗ ) is a partial order, because w ∈ ⌊v⌉⌉ ∗ and
v ∈ ⌊⌊h, x⌉⌉ ∗ implies w + x. Thus, W5 is satisfied. □

Theorem 4.11. For graph G(V , ) and its WPO W(V , X , , ), if V ∪ X = ⌊⌊h, x⌉⌉ ∗ for some
def
(x, h) ∈ , then W⇂S is a WPO for subgraph G ⇂S , where S = V ∪ X \ {h, x }.
Proof. We show that W⇂S satisfies all conditions W1–W5 in Definition 4.3. Conditions W1, W2,
W3, W4-[H1, H2, H3, H4] trivially holds true.
[W4-H5] x has no outgoing and h has no incoming scheduling constraints. Thus, W⇂S still
satisfies H5.
[W5] Case (i) is still satisfied because h only had outgoing scheduling constraints and x only had
incoming scheduling constraints. Case (ii) is still satisfied due to H2 and H4. □
Example 4.12. The decomposition of WPO W1 for graph G 1 in Figure 2 is:
W1 W W⇂S 2
⇂C 2
W⇂C3
1 2 3 4 x3 5 x2 10

6 7 8 x6
W⇂C6 9 W⇂S 6

CW = {C 2 , C 3 , C 6 }, where C 2 = {2, 3, 4, x 3 , 5, x 2 }, C 3 = {3, 4, x 3 }, and C 6 = {6, 7, 8, 9, x 6 }. As proved


in Theorem 4.10, W⇂C2 , W⇂C3 , and W⇂C6 (shown using dotted lines) are WPOs for the subgraphs
G ⇂C2 , G ⇂C3 , and G ⇂C6 , respectively. Furthermore, Theorem 4.11 is applicable to each of these WPOs.
Therefore, W⇂S2 , W⇂S3 , and W⇂S6 (shown using solid lines) are WPOs for subgraphs G ⇂S2 , G ⇂S3 , and
G ⇂S6 , respectively, where Sh = Ch \ {h, x h } for h ∈ {2, 3, 6}. For example, S 6 = C 6 \ {6, x 6 } = {7, 8, 9}.
Note that W⇂C6 is a WPO for subgraph 6 7 8 , while W⇂S 6 is a WPO for subgraph 7 8. ■
9 9
Definition 4.13. Given a WPO W, C ∈ CW is a maximal component if there does not exist another
0 denotes the set of maximal components of W.
component C ′ ∈ CW such that C ⊂ C ′. CW ■

Theorem 4.14. For graph G(V , ) and its WPO W(V , X , , ), if there is a cycle in G consisting
0 such that V ′ ⊆ C.
of vertices V ′, then there exists C ∈ CW
Proof. Assume that the theorem is false. Then, there exists multiple maximal components that
partition the vertices in the cycle. Let (u, v) be an edge in the cycle where u and v are in different
maximal components. By W5, u + v, and by H5, xu + v, where xu is the exit of the maximal
component that contains u. By the definition of the component, v + xv , where xv is the exit of
the maximal component that contains v. Therefore, xu + xv . Applying the same reasoning for all
such edges in the cycle, we get xu + xv + · · · + xu . This contradicts the fact that (V ∪ X , ∗ ) is
a partial order for the WPO W. □

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:10 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

Init
forall v ∈ V , X[v 7→ (v is entry ? ⊤ : ⊥)] forall v ∈ V ∪ X , N [v 7→ 0]

v ∈V N (v) = NumSchedPreds(v)
NonExit
N [v 7→ 0] ApplyF (v) forall v w, N [w 7→ (N [w] + 1)]

x ∈X N (x) = NumSchedPreds(x) ComponentStabilized(x)


CompStabilized
N [x 7→ 0] forall x w, N [w 7→ (N (w) + 1)]

x ∈X N (x) = NumSchedPreds(x) ¬ComponentStabilized(x)


CompNotStabilized
N [x 7→ 0] SetN ForComponent(x)
def
ApplyF (v) = X[v 7→ v ∈ image of ? X(v)▽Fv (X) : Fv (X) ]


def
ComponentStabilized(x) = ∃h ∈ V .x h ∧ Fh (X) ⊑ X(h)
def
SetN ForComponent(x) = forall v ∈ C x , N [v 7→ NumOuterSchedPreds(v, x)]
def
NumSchedPreds(v) = |{u ∈ V ∪ X | u v}|
def
NumOuterSchedPreds(v, x) = |{u ∈ V ∪ X | u v, u < C x , v ∈ C x }|

Fig. 3. Deterministic concurrent fixpoint algorithm for WPO. X maps an element in V to its value. N maps
an element in V ∪ X to its count of executed scheduling predecessors. Operations on N are atomic.

Corollary 4.15. For G(V , ) and its WPO W(V , X , , ), if G is a non-trivial strongly connected
0 = {⌊⌊h, x⌉⌉ ∗ } and ⌊⌊h, x⌉⌉ ∗ = V ∪ X .
graph, then there exists h ∈ V and x ∈ X such that CW

Proof. Because there exists a cycle in the graph, there must exists at least one component in
the WPO. Let h ∈ V and x ∈ X be the head and exit of a maximal component in W. Because V ∪ X
contains all elements in the WPO, ⌊⌊h, x⌉⌉ ∗ ⊆ V ∪ X . Now, suppose ⌊⌊h, x⌉⌉ ∗ ⊉ V ∪ X . Then,
there exists v ∈ V ∪ X such that v < ⌊⌊h, x⌉⌉ ∗ . If v ∈ V , because the graph is strongly connected,
there exists a cycle v + h + v. Then, by Theorem 4.14, v ∈ ⌊⌊h, x⌉⌉ ∗ , which is a contradiction. If
v ∈ X , then there exists w ∈ V such that v w by W2. Due to H4, w < ⌊⌊h, x⌉⌉ ∗ . By the same
reasoning as the previous case, this leads to a contradiction. □

5 DETERMINISTIC CONCURRENT FIXPOINT ALGORITHM


This section describes a deterministic concurrent algorithm for computing a fixpoint of an equation
system. Given the equation system X = F (X) with dependency graph G(V , ), we first construct a
WPO W(V , X , , ). The algorithm in Figure 3 uses W to compute the fixpoint of X = F (X). It
defines a concurrent iteration strategy for a WPO: equations are applied concurrently by following
the scheduling constraints , while stabilization constraints act as “iterate until stabilization”
operators, checking the stabilization at the exits and iterating the components.
Except for the initialization rule Init, which is applied once at the beginning, rules in Figure 3
are applied concurrently whenever some element in V ∪ X satisfies the conditions. The algorithm
uses a value map X, which maps an element in V to its abstract value, and a count map N , which
maps an element in V ∪ X to its counts of executed scheduling predecessors. Access to the value
map X is synchronized by scheduling constraints, and operations on N are assumed to be atomic.
Rule Init initializes values for elements in V to ⊥ except for the entry of the graph, whose value is
initialized to ⊤. The counts for elements in V ∪ X are all initialized to 0.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:11

Rule NonExit applies to a non-exit element v ∈ V whose scheduling predecessors are all
executed (N (v) = NumSchedPreds(v)). This rule applies the function Fv to update the value Xv
(ApplyF (v)). Definition of the function ApplyF shows that the widening is applied at the image of
(see Theorem 5.3). The rule then notifies the scheduling successors of v that v has executed by
incrementing their counts. Because elements within a component can be iterated multiple times,
the count of an element is reset after its execution. If there is no component in the WPO, then only
the NonExit rule is applicable, and the algorithm reduces to a DAG scheduling algorithm.
Rules CompStabilized and CompNotStabilized are applied to an exit x (x ∈ X ) whose schedul-
ing predecessors are all executed (N (x) = NumSchedPreds(x)). If the component C x is stabilized,
CompStabilized is applied, and CompNotStabilized otherwise. A component is stabilized if
iterating it once more does not change the values of elements inside the component. Boolean
function ComponentStabilized checks the stabilization of C x by checking the stabilization of its
head (see Theorem 5.5). Upon stabilization, rule CompStabilized notifies the scheduling successors
of x and resets the count for x.
Example 5.1. Consider WPO W1 in Figure 2(b). An iteration sequence generated by the concurrent
fixpoint algorithm for WPO W1 is:
Time step in N −→
Scheduled element 1 2 3 4 x3 3 4 x3 3 4 x3 5 x2 10
u ∈V ∪X 6 7 8 x6 6 7 8 x6
9 9
The initial value of N (8) is 0. Applying NonExit to 7 and 9 increments N (8) to 2. N (8) now
equals NumSchedPreds(8), and NonExit is applied to 8. Applying NonExit to 8 updates X8 by
applying the function F 8 , increments N (x 6 ), and resets N (8) to 0. Due to the reset, same thing
happens when C 6 is iterated once more.
The initial value of N (x 6 ) is 0. Applying NonExit to 8 increments N (x 6 ) to 1, which equals
NumSchedPreds(x 6 ). The stabilization of component C 6 is checked at x 6 . If it is stabilized, Comp-
Stabilized is applied to x 6 , which increments N (5) and resets N (x 6 ) to 0. ■

If the component C x is not stabilized, rule CompNotStabilized is applied instead. This rule does
not notify the scheduling successors of x, blocking further advancement until the component stabi-
lizes. To drive the iteration over C x , each count for an element in C x is set to SetN ForComponent(x),
which is the number of its scheduling predecessors not in C x . In particular, the count for the head
of C x , whose scheduling predecessors are all not in C x , is set to the number of all scheduling pre-
decessors, allowing rule NonExit to be applied to the head. The map NumOuterSchedPreds(v, x),
which returns the number of outer scheduling predecessors of v w.r.t. component C x , can be
computed by running the WPO construction twice: in the first run, compute NumSchedPreds; in the
second run, initialize NumOuterSchedPreds to 0s, and set NumOuterSchedPreds(v ′, exit[v]) to
NumSchedPreds(v ′) minus the number of scheduling predecessors of v ′ found so far, if scheduling
constraint targeting v ′ with v , exit[v] is found in Line 26 and Line 47 of Algorithm 2 in §6. The
rule also resets the count for x.
5 5
Example 5.2. Let G 2 (V , ) be 1 2 3 4 and its WPO W2 be 1 2 3 4 x3 x2 .
6 6
An iteration sequence generated by the concurrent fixpoint algorithm for WPO W2 is:
Time step in N −→
Scheduled element 1 2 3 4 x3 3 4 x3 x2 2 3 4 x3 x2
u ∈V ∪X 6 5 5

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:12 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

Consider the element 4, whose scheduling predecessors in the WPO are 3, 5, and 6. Further-
more, 4 ∈ C 3 and 4 ∈ C 2 with C 3 ⊊ C 2 . After NonExit is applied to 4, N (4) is reset to 0.
Then, if the stabilization check of C 3 fails at x 3 , CompNotStabilized sets N (4) to 2, which is
NumOuterSchedPreds(4, x 3 ). If it is not set to 2, then the fact that elements 5 and 6 are executed will
not be reflected in N (4), and the iteration over C 3 will be blocked at element 4. If the stabilization
check of C 2 fails at x 2 , CompNotStabilized sets N (4) to NumOuterSchedPreds(4, x 2 ) = 1. ■

In ApplyF , the image of is chosen as the set of widening points. These are heads of the
components. The following theorem proves that the set of component heads is an admissible set of
the widening points, which guarantee the termination of the fixpoint computation:
Theorem 5.3. Given a dependency graph G(V , ) and its WPO W(V , X , , ), the set of com-
ponent heads is an admissible set of widening points.
def
Proof. Theorem 4.7 proves that B W = {(u, v) ∈ | ∃x ∈ X .u ∈ ⌊⌊v, x⌉⌉ ∗ ∧ x v} is a
feedback edge set. Consequently, the set of component heads {h | ∃x ∈ X .x h} is a feedback vertex
set. Therefore, the set W is an admissible set of widening points [Cousot and Cousot 1977]. □
Example 5.4. The set of component heads {2, 3, 6} is an admissible set of widening points for the
WPO W1 in Figure 2(b). ■

The following theorem justifies our definition of ComponentStabilized; viz., checking the
stabilization of Xh is sufficient for checking the stabilization of the component Ch .
Theorem 5.5. During the execution of concurrent fixpoint algorithm with WPO W(V , X , , ),
stabilization of the head h implies the stabilization of the component Ch at its exit for all Ch ∈ CW .
Proof. Suppose there exists an element v ∈ Ch that is not stabilized despite the stabilization of h.
That is, Xv changes if Ch is iterated once more. For this to be possible, there must exist u such that
u v and whose value Xu changed after the last application of function Fv . By W5 and u v, it’s
either (i) u + v or (ii) u ∈ ⌊⌊v, x⌉⌉ ∗ = Cv and x v for some x ∈ X . It cannot be case (i) because
if it were true, function Fu cannot be applied after Fv . Even if u were in some other component
C x ′ , due to H5, x ′ + v, resulting in the same conclusion. Therefore, it should be case (ii). By H4,
Cv ⊊ Ch . However, because u ∈ Cv , our algorithm checks the stabilization of v at the exit of Cv
after the last application of function Fu . This contradicts the assumption that Xu changed after the
last application of function Fv . □
A WPO W(V , X , , ) where V = {v}, X = = = ∅ is said to be a trivial WPO, which is
represented as v . It can only be a WPO for a trivial SCC with vertex v. A WPO W(V , X , , )
where V = {h}, X = {x }, = {(h, x)}, = (x, h) is said to be a self-loop WPO, and is represented
as h x . It can only be a WPO for a trivial SCC with vertex h or a single vertex h with a self-loop.
The following theorem proves that the concurrent fixpoint algorithm in Figure 3 is deterministic.
Theorem 5.6. Given a WPO W(V , X , , ) for a graph G(V , ) and a set of monotonone,
deterministic functions {Fv | v ∈ V }, concurrent fixpoint algorithm in Figure 3 is deterministic,
computing the same approximation of the least fixpoint for the given set of functions.
Proof. We use structural induction on the WPO W to show this.
[Base case]: The two cases for the base case are (i) W = v and (ii) W = h x . If W =
v , v is the only vertex in G. Functions are assumed to be deterministic, so applying the function
Fv () in rule NonExit of Figure 3 is deterministic. Because Fv () does not take any arguments, the
computed value Xv is a unique fixpoint of Fv ().

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:13

If W = h x , h is the only vertex in G. If h has a self-loop, function Fh (Xh ) and widening


operator ▽ may need to be applied multiple times to reach a post-fixpoint (approximation of the least
fixpoint) of Fh (Xh ). Rule NonExit applies them once on Xh and signals the exit x. If the post-fixpoint
of Fh (Xh ) is not reached, ComponentStabilized returns false, and rule CompNotStabilized in
Figure 3 applies rule NonExit on h again. If the post-fixpoint is reached, ComponentStabilized
returns true, and rule CompStabilized in Figure 3 stops the algorithm. Because Fh (Xh ) and ▽ are
deterministic, each iteration is deterministic, and the entire sequence of iterations are deterministic.
The computed value Xh is a post-fixpoint of Fh (Xh ).
[Inductive step]: By condition H4 and Theorem 4.10, W can be decomposed into a set of WPOs
of its maximal components and trivial WPOs. The two cases for the inductive step are (i) the
decomposition of W is a single WPO of the maximal component and (ii) W is decomposed into
multiple WPOs.
If the decomposition of W is a single WPO of the maximal component ⌊⌊h, x⌉⌉ ∗ , then by
def
Theorem 4.11, W = h W⇂S x , where S = ⌊⌊h, x⌉⌉ ∗ \ {h, x } and W⇂S is a WPO. By the
induction hypothesis, the fixpoint algorithm is deterministic for W⇂S . The head h of W is its unique
source, so each iteration begins with the application of rule NonExit on h. After applying Fh (·) and
▽, rule NonExit signals all its scheduling successors, initiating the iteration over W⇂S . Because
all sinks of W⇂S are connected to the exit x of W, x will be processed after the iteration finishes.
Thus, Xh remains fixed during the iteration. A single iteration over W⇂S is identical to running
the fixpoint algorithm on W⇂S with the set of functions {Fv′ | v ∈ V ′ }, where W⇂S is a WPO for
subgraph G ⇂S and function Fv′ is a partial application of Fv that binds the argument that take Xh to
its current value. The number of functions and the arity of each function decrease. Because Fh (·)
and ▽ are deterministic, and each iteration over W⇂S is deterministic, it is deterministic for W.
The algorithm iterates until the post-fixpoint of Fh (·) is reached, and by Theorem 5.5, the computed
value Xv is a post-fixpoint of Fv (·) for all v ∈ V .
If W is decomposed into multiple WPOs, then by the induction hypothesis, the fixpoint algorithm
deterministically computes the post-fixpoints for all sub-WPOs. Let Wi (Vi , X i , ⇂Vi ∪X i , ⇂Vi ∪X i )
be an arbitrary sub-WPO. For any u ∈ V \ Vi such that u v, we have u + v by W5 and H4.
Hence, v is processed after u. Combined with H5, Xu remains unchanged during the iteration of
Wi . Single iteration over Wi is equal to running the fixpoint algorithm on Wi with the set of the
functions {Fv′ | v ∈ V ′ }, where function Fv′ is a partial application of Fv that binds the arguments
in V \ Vi to their current values. The number of functions decreases, and the arity of the functions
does not increase. The outer scheduling predecessors of Wi can be ignored in the iterations by
SetN ForComponent in rule CompNotStabilized. Therefore, by the induction hypothesis, each
iteration over Wi is deterministic, and because the choice of Wi is arbitrary, the algorithm is
deterministic for W. Furthermore, (v, u) < + for any u ∈ V \ Vi such that u v, because its
negation would contradict Theorem 4.14. Therefore, change in Xv does not change Xu , and Xu is
still a post-fixpoint of Fu (·). □

6 ALGORITHMS FOR WPO CONSTRUCTION


This section presents two algorithms for constructing a WPO for a graph G(V , ). The first
algorithm, ConstructWPOTD , is a top-down recursive algorithm that is inefficient but intuitive (§6.1).
The second one, ConstructWPOBU , is an efficient bottom-up iterative algorithm that has almost-
linear time complexity (§6.2). Both algorithms do not introduce superfluous scheduling constraints
that could restrict the parallelism during the fixpoint computation.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:14 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

Algorithm 1: ConstructWPOTD (G, D)


Input: Directed graph G(V , ), Depth-first forest D 13 def sccWPO(G, D):
Output: WPO W(V , X , , ) 14 ▷ G is strongly connected.
1 G 1 , G 2 , . . . , G k B SCC(G) ▷ Maximal SCCs. 15 h B arg minv ∈V DF N (D, v) ▷ Minimum DFN.
2 foreach i ∈ [1, k] do 16 B B {(v, h) | v ∈ V and v h}
▷ WPOs for SCCs. 17 if |B| = 0 then return trivialWPO(h)
3 (Vi , X i , i, i ), hi , x i B sccWPO(G i , D) 18 if |V | = 1 then return self-loopWPO(h)
k k k k 19 xh B new exit
V,X, , Vi , Xi , i,
Ð Ð Ð Ð
4 B i
i=1 i=1 i=1 i=1 20 V ′ B V ∪ {xh } \ {h}
▷ Edges between different maximal SCCs. 21 ′ B
⇂V ′ ∪ {(v, x h ) | (v, h) ∈ B}
5 foreach u v s.t. u ∈ Vi ∧ v ∈ Vj ∧ i , j do ▷ WPO for modified graph.
6 B ∪ {(x i , v)} 22 V ′, X ′, ′, ′ B
7 return (V , X , , ) ConstructWPOTD ((V ′, ′ ), D ⇂V ′ )
23 X B X ′ ∪ {xh }
8 def trivialWPO(h):
9 return ({h}, ∅, ∅, ∅), h, h 24 B ′ ∪ {(h, v) | v ∈ V and h v}
25 B ′ ∪ {(xh , h)}
10 def self-loopWPO(h):
26 return (V , X , , ), h, xh
11 xh B new exit
12 return ({h}, {xh }, {(h, xh )}, {(xh , h)}), h, xh

6.1 Top-down Recursive Construction


Algorithm 1 presents a top-down recursive algorithm ConstructWPOTD , which acts as a proxy
between the axiomatic characterization of WPO in § 4 and the efficient construction algorithm
ConstructWPOBU in §6.2.
ConstructWPOTD is parametrized by the depth-first forest (DFF) of the graph G, and it may yield
a different WPO for a different DFF. ConstructWPOTD begins with the identification of the maximal
strongly connected components (SCCs) in G on Line 1. An SCC G i is maximal if there does not
exists another SCC that contains all vertices and edges of G i . A WPO for an SCC G i is constructed
by a call sccWPO(G i , D) on Line 3. This call returns a WPO (Vi , X i , i , i ), head hi , and exit x i . In
case of trivial SCCs, the head and the exit are assigned the vertex in G. In other cases, the returned
value satisfies (x i , hi ) ∈ i and ⌊⌊hi , x i ⌉⌉ i∗ = Vi ∪ X i (Lemma 6.2). Line 4 initializes the WPO
for the graph G to union of the WPOs for the SCCs, showing the inductive structure mentioned
in Theorem 4.10. On Line 6, scheduling constraints are added for the dependencies that cross
the maximal SCCs. x v is added for a dependency u v, where x is the exit of the maximal
component WPO that contains u but not v. This ensures that W5 and H5 are satisfied.
The function sccWPO takes as input an SCC and its DFF, and returns a WPO, a head, and an
exit for this SCC. It constructs the WPO by removing the head h to break the SCC, adding the
exit x h as a unique sink, using ConstructWPOTD to construct a WPO for the modified graph, and
appending necessary elements for the removed head. Ignoring the exit x h , it shows the inductive
structure mentioned in Theorem 4.11. Line 15 chooses a vertex with minimum DFS numbering
(DFN) as the head. Incoming edges to the head h are back edges for DFF D on Line 16 because h
has the minimum DFN and can reach to all other vertices in the graph. If there are no back edges,
given SCC is trivial, returning a trivial WPO with single element on Line 17. If there is only one
vertex in the SCC (with a self-loop), corresponding self-loop WPO is returned on Line 18. For other
non-trivial SCCs, h is removed from the graph and newly created x h is added as a unique sink on
Lines 19–21. A call ConstructWPOTD ((V ′, ′), D ⇂V ′ ) on Line 22 returns a WPO for the modified

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:15

1 2 3 4 1 2 3 x2 4 x4
5 6 7 8 5 6 7 x6 8 x5

(a) (b)

Fig. 4. (a) Directed graph G 3 . Vertices V are labeled using depth-first numbering (DFN); (b) WPO W3 for G 3
with exits X = {x 2 , x 4 , x 5 , x 8 }.

graph. Exit x h is moved from V ′ to X ′ on Line 23, scheduling constraints regarding the head h is
added on Line 24, and x h h is added on Line 25 to satisfy W5 for the removed back edges.
Example 6.1. Consider the graph G 3 in Figure 4(a). SCC(G 3 ) on Line 1 of ConstructWPOTD re-
turns a trivial SCC with vertex 1 and three non-trivial SCCs with vertex sets {5, 6, 7, 8}, {2, 3},
and {4}. For the trivial SCC, sccWPO on Line 3 returns ( 1 , 1, 1). For the non-trivial SCCs, it

returns ( 5 6 7 x6 8 x 5 , 5, x 5 ), ( 2 3 x 2 , 2, x 2 ), and ( 4 x 4 , 4, x 4 ). On Line 6,


scheduling constraints 1 2, 1 5, x 5 3, x 5 4, and x 2 4 are added for the edges 1 2,
1 5, 7 3, 8 4, and 3 4, respectively. The final result is identical to W3 in Figure 4(b).
Now, consider the execution of sccWPO when given the SCC with vertex set V = {5, 6, 7, 8} as
input. On Line 15, the vertex 5 is chosen as the head h, because it has the minimum DFN among
V . The set B on Line 16 is B = {(8, 5)}. The SCC is modified on Lines 19–21, and ConstructWPOTD
on Line 22 returns 6 7 x6 8 x 5 for the modified graph. Moving x 5 from V ′ to X ′ on
Line 23, adding 5 6 on Line 24, and adding x 5 5 on Line 25 yields the WPO for the SCC. ■
Before we prove that the output of ConstructWPOTD in Algorithm 1 is a WPO, we prove that the
output of function sccWPO satisfies the property of a WPO for non-trivial strongly connected graph
in Corollary 4.15.
Lemma 6.2. Given a non-trivial strongly connected graph G(V , ) and its depth-first forest D, the
returned value (V , X , , ), h, x h of sccWPO(G, D) satisfies x h h and ⌊⌊h, x h ⌉⌉ ∗ = V ∪ X .
Proof. We use structural induction on the input G to prove this. The base case is when G has
no non-trivial nested SCCs, and the inductive step uses an induction hypothesis on the non-trivial
nested SCCs.
[Base case]: If the graph only has a single vertex h and a self-loop, self-loopWPO(h) is returned
on Line 18, whose value satisfies the lemma. Otherwise, because there are no non-trivial nested
SCCs inside the graph, removing the head h on Line 20 removes all the cycles in the graph. Also,
adding a new vertex x h on Line 19 and 21 does not create a cycle, so the modified graph is acyclic.
Therefore, (V ′, ′) on Line 22 equals (V ′, ′). Because the input graph is strongly connected,
every vertex in the graph is reachable from h. This is true even without the back edges because we
can ignore the cyclic paths. Also, because the exit x h is a unique sink in the modified graph, x h is
reachable from every vertex. It is unique because the negation would imply that there is a vertex
in the original graph that has no outgoing edges, contradicting that the input graph is strongly
connected. Finally, with changes in Line 24 and 25, we see that the lemma holds for the base case.
[Inductive step]: Let G i be one of the maximal non-trivial nested SCCs that is identified on Line 1.
Because hi has the minimum DFN in G i , hi must be an entry of G i , and there must exist an incoming
scheduling constraints to hi . By the induction hypothesis, u + hi implies u + v for all v ∈ (Vi ∪X i )
and u < (Vi ∪ X i ). Also, because scheduling constraints added on Line 6 has exits as their sources,
v + w implies x i + w for all v ∈ (Vi ∪ X i ) and w < (Vi ∪ X i ). The graph of super-nodes (an SCC

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:16 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

contracted to a single vertex) of the modified graph (V ′, ′) is acyclic. Therefore, by applying the
similar reasoning as the base case on this graph of super-nodes, we see that the lemma holds. □

Armed with the above lemma, we now prove that ConstructWPOTD constructs a WPO.

Theorem 6.3. Given a graph G(V , ) and its depth-first forest D, the returned value (V , X , , )
of ConstructWPOTD (G, D) is a WPO for G.

Proof. We show that the returned value (V , X , , ) satisfies all properties W1–W5 in Defini-
tion 4.3.
[W1] V equals the vertex set of the input graph, and X consists only of the newly created exits.
[W2] For all exits, x h h is added on Line 12 and 25. These are the only places stabilization
constraints are created.
[W3] All scheduling constraints are created on Line 6, 12, and 24.
[W4-H1] ( i∗ , Vi ∪ X i ) is reflexive and transitive by definition. Because the graph with maximal
SCCs contracted to single vertices (super-nodes) is acyclic, scheduling constraints on Line 6 cannot
create a cycle. Also, Line 24 only adds outgoing scheduling constraints and does not create a cycle.
Therefore, ( ∗ , V ∪ X ) is antisymmetric.
[W4-H2] Exactly one stabilization constraint is created per exit on Line 12 and 25. Because h is
removed from the graph afterwards, it does not become a target of another stabilization constraint.
[W4-H3] By Lemma 6.2, x h h implies h + x h .
[W4-H4] Because the maximal SCCs on Line 3 are disjoint, by Lemma 6.2, all components
⌊⌊hi , x i ⌉⌉ ∗ are disjoint.
[W4-H5] All additional scheduling constraints going outside of a component have exits as their
sources on Line 6.
[W5] For u v, either (i) scheduling constraint is added in Line 6 and 24, or (ii) stabilization
constraint is added in Line 18 and 25. In the case of (i), one can check that the property holds for
Line 24. For Line 6, if u is a trivially maximal SCC, x i = u. Else, u ∗ x i by Lemma 6.2, and with
added x i v, u + v. In the case of (ii), u ∈ ⌊⌊h, x h ⌉⌉ ∗ by Lemma 6.2 where v = h. □

The next theorem proves that the WPO constructed by ConstructWPOTD does not include super-
fluous scheduling constraints, which could reduce concurrency during the fixpoint computation.

Theorem 6.4. For a graph G(V , ) and its depth-first forest D, WPO W(V , X , , ) returned by
ConstructWPOTD (G, D) has the smallest ∗ among the WPOs for G with the same set of .

Proof. We use structural induction on the input G to prove this.


[Base case]: The base case is when G is either (i) a trivial SCC or (ii) a single vertex with a self-loop.
If G is a trivial SCC, the algorithm outputs single trivial WPO, whose ∗ is empty.
If G is a single vertex with a self-loop, there should be at least one exit in the WPO. The algorithm
outputs self-loop WPO, whose ∗ contains only (h, x h ).
[Inductive step]: The two cases for the inductive step are (i) G is strongly connected and (ii) G is
not strongly connected. If G is strongly connected, then due to Corollary 4.15, all WPOs of G must
have a single maximal component that is equal to V ∪ X . We only consider WPOs with the same
set of , so h, with minimum DFN, is the head of all WPOs that we consider. By the inductive
hypothesis, the theorem holds for the returned value of recursive call on Line 22, where the input
is the graph without h. Because all WPOs have to satisfy v ∗ x h for all v ∈ V ∪ X , adding x h does
not affect the size of ∗ . Also, Line 24 only adds the required scheduling constraints to satisfy W5
for the dependencies whose source is h.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:17

If G is not strongly connected, then by the induction hypothesis, the theorem holds for the
returned values of self-loopWPO on Line 3 for all maximal SCCs. Line 6 only adds the required
scheduling constraints to satisfy W5 and H5 for dependencies between different maximal SCCs, □

6.2 Bottom-up Iterative Construction


Algorithm 2 presents ConstructWPOBU (G, D, lift), an efficient, almost-linear time iterative algorithm
for constructing the WPO of a graph G(V , ); the role of the boolean parameter lift is explained in
§7, and is assumed to be to be false throughout this section. ConstructWPOBU is also parametrized
by the depth-first forest D of the graph G, and the WPO constructed by the algorithm may change
with different forest. DFF defines back edges (B) and cross or forward edges (CF) on Lines 2 and 3.
ConstructWPOBU maintains a partitioning of the vertices with each subset containing vertices
that are currently known to be strongly connected. The algorithm use a disjoint-set data structure
to maintain this partitioning. The operation rep(v) returns the representative of the subset that
contains vertex v, which is used to determine whether or not two vertices are in the same partition.
The algorithm assumes that the vertex with minimum DFN is the representative of the subset.
Initially, rep(v) = v for all vertices v ∈ V . When the algorithm determines that two subsets are
strongly connected, they are merged into a single subset. The operation merge(v, h) merges the
subsets containing v and h, and assigns h to be the representative for the combined subset.
Example 6.5. Consider the graph G 3 in Figure 4(a). Let 1 | 2 | 3 | 4 | 5 | 6 7 | 8 be the current
partition, where the underline marks the representatives. Thus, rep(1) = 1 and rep(6) = rep(7) = 6.
If it is found that vertices 5, 6, 7, 8 are strongly connected, then calls to merge(6, 5) and merge(8, 5)
update the partition to 1 | 2 | 3 | 4 | 5 6 7 8. Thus, rep(6) = rep(7) = rep(8) = rep(5) = 5. ■

Auxiliary data structures rep, exit, and R are initialized on Line 5. The map exit maps an SCC
(represented by its header h) to its corresponding exit x h . Initially, exit[v] is set to v, and updated
on Line 32 when a non-trivial SCC is discovered by the algorithm.
The map R maps a vertex to a set of edges, and is used to handle irreducible graphs [Hecht
and Ullman 1972; Tarjan 1973]. Initially, R is set to ∅, and updated on Line 17. The function
findNestedSCCs relies on the assumption that the graph is reducible. This function follows the
edges backwards to find nested SCCs using rep(p) instead of predecessor p, as on Lines 36 and
43, to avoid repeatedly searching inside the nested SCCs. rep(p) is the unique entry to the nested
SCC that contains the predecessor p if the graph is reducible. To make this algorithm work for
irreducible graphs as well, cross or forward edges are removed from the graph initially by function
removeAllCrossForwardEdges (called on Line 6) to make the graph reducible. Removed edges
are then restored by function restoreCrossForwardEdges (called on Line 10) right before the
edges are used. The graph is guaranteed to be reducible when restoring a removed edge u v as
u rep(v) when h is the lowest common ancestor (LCA) of u, v in the depth-first forest. Cross
and forward edges are removed on Line 16 and are stored at their LCAs in D on Line 17. Then, as h
hits the LCAs in the loop, the removed edges are restored on Line 20. Because the graph edges are
modified, map O is used to track the original edges. O[v] returns set of original non-back edges that
now targets v after the modification. The map O is initialized on Line 8 and updated on Line 21.
The call to function constructWPOForSCC(h) on Line 11 constructs a WPO for the largest
SCC such that h = arg minv ∈V ′ DF N (D, v), where V ′ is the vertex set of the SCC. For example,
constructWPOForSCC(5) constructs WPO for SCC with vertex set {5, 6, 7, 8}. Because the loop on
Line 9 traverses the vertices in descending DFN, the WPO for a nested SCC is constructed before
that of the enclosing SCC. For example, constructWPOForSCC(6) and constructWPOForSCC(8) are
called before constructWPOForSCC(5), which construct WPOs for nested SCCs with vertex sets
{6, 7} and {8}, respectively. Therefore, constructWPOForSCC(h) reuses the constructed sub-WPOs.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:18 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

Algorithm 2: ConstructWPOBU (G, D, lift)


Input: Directed graph G(V , ), Depth-first forest D, 22 def constructWPOForSCC(h):
Boolean lift 23 Nh , Ph B findNestedSCCs(h)
Output: WPO W(V , X , , ) 24 if Ph = ∅ then return
1 X, , B ∅, ∅, ∅ ⋆25 foreach v ∈ Nh do
▷ These are also feedback edges of G. ▷ (u, v ′ ) is the original edge.
2 B B {(u, v) | u v is a back edge in D} ▷ rep(u) represents maximal SCC
3 CF B {(u, v) | u v is a cross/fwd edge in D} ▷ containing u but not v ′ .
4 foreach v ∈ V do ⋆26 B ∪{(exit[rep(u)], v ′ ) | (u, v ′ ) ∈ O[v]}
5 rep(v) B exit[v] B v; R[v] B ∅ ⋆27 if lift then
6 removeAllCrossForwardEdges() ⋆28 B ∪ {(exit[rep(u)], v) | (u, v) ∈
( \ B)}
⋆7 foreach v ∈ V do
⋆8 O[v] B {(u, v) ∈ ( \ B)} ⋆29 xh B new exit; X B X ∪ {xh }
9 foreach h ∈ V in descending DF N (D) do ⋆30 B ∪ {(exit[rep(p)], xh ) | p ∈ Ph }
10 restoreCrossForwardEdges(h) ⋆31 B ∪ {(xh , h)}
11 constructWPOForSCC(h) ⋆32 exit[h] B xh
⋆12 connectWPOsOfMaximalSCCs() 33 foreach v ∈ Nh do merge(v, h)
⋆13 return (V , X , , ) 34 def findNestedSCCs(h):
35 ▷ Search backwards from the sinks.
14 def removeAllCrossForwardEdges():
15 foreach (u, v) ∈ CF do ⋆36 Ph B {rep(p) | (p, h) ∈ B}
16 B \ {(u, v)} 37 Nh B ∅ ▷ Reps. of nested SCCs except h.
▷ Removed edges will be restored 38 Wh B Ph \ {h} ▷ Worklist.
▷ when h is the LCA of u, v in D. 39 while there exists v ∈ Wh do
17 R[lcaT (u, v)] B R[lcaT (u, v)] ∪ {(u, v)} 40 Wh , Nh B Wh \ {v}, Nh ∪ {v}
41 foreach p s.t. (p, v) ∈ ( \ B) do
18 def restoreCrossForwardEdges(h): 42 if rep(p) < Nh ∪ {h} ∪ Wh then
19 foreach (u, v) ∈ R[h] do 43 Wh B Wh ∪ {rep(p)}
20 B ∪ {(u, rep(v))}
▷ Record the original edge. 44 return Nh , Ph
⋆21 O[rep(v)] B O[rep(v)] ∪ {(u, v)}
⋆45 def connectWPOsOfMaximalSCCs():
⋆46 foreach v ∈ V s.t. rep(v) = v do
⋆47 B ∪{(exit[rep(u)], v ′ ) | (u, v ′ ) ∈ O[v]}
⋆48 if lift then
⋆49 B ∪ {(exit[rep(u)], v) | (u, v) ∈
( \ B)}

The call to function findNestedSCCs(h) on Line 23 returns Nh , the representatives of the nested
SCCs Nh , as well as Ph , the predecessors of h along back edges. If Ph is empty, then the SCC is
trivial, and the function immediately returns on Line 24. Line 26 adds scheduling constraints for
the dependencies crossing the nested SCCs. As in ConstructWPOTD , this must be from the exit
of maximal SCC that contains u but not v ′ for u v ′. Because u v ′ is now u rep(v ′), O[v],
where v = rep(v ′), is looked up to find u v ′. exit is used to find the exit, where rep(u) is the
representative of maximal SCC that contains u but not v. If the parameter lift is true, scheduling
constraint targeting v is also added, forcing all scheduling predecessors outside of a component to
be visited before the component’s head. Similarly, function connectWPOsOfMaximalSCCs is called
after the loop on Line 47 to connect WPOs of maximal SCCs. The exit, scheduling constraints to the

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:19

Step updates Current partition Current WPO

Init {(7, 3), (8, 4)} 1|2|3|4|5|6|7|8 1 2 3 4


removed 5 6 7 8

h=8 - 1|2|3|4|5|6|7|8 1 2 3 4
5 6 7 8

h=7 - 1|2|3|4|5|6|7|8 1 2 3 4
5 6 7 8

h=6 - 1|2|3|4|5|67|8 1 2 3 4
5 6 7 x6 8

h=5 - 1|2|3|4|5678 1 2 3 4
5 6 7 x6 8 x5

1 2 3 4 x4
h=4 - 1|2|3|4|5678
5 6 7 x6 8 x5

1 2 3 4 x4
h=3 - 1|2|3|4|5678
5 6 7 x6 8 x5

1 2 3 x2 4 x4
h=2 - 1|23|4|5678
5 6 7 x6 8 x5

1 2 3 x2 4 x4
h=1 {(7, 2), (8, 4)} 1|23|4|5678
added 5 6 7 x6 8 x5

1 2 3 x2 4 x4
Final - 1|23|4|5678
5 6 7 x6 8 x5

Table 1. Steps of ConstructWPOBU in Algorithm 2 for Figure 4.

exit, and stabilization constraints are added on Lines 29–31. After the WPO is constructed, Line 32
updates the map exit, and Line 33 updates the partition.
Example 6.6. Table 1 describes the steps of ConstructWPOBU for the irreducible graph G 3 (Figure 4).
The updates column shows the modifications to the graph edges, the Current partition column
shows the changes in the disjoint-set data structure, and the Current WPO column shows the
WPO constructed so far. Each row shows the changes made in each step of the algorithm. Row
‘Init’ shows the initialization step on Lines 5–6. Row ‘h = k’ shows the k-th iteration of the loop

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:20 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

on Lines 9–11. The loop iterates over the vertices in descending DFN: 8, 7, 6, 5, 4, 3, 2, 1. Row ‘Final’
shows the final step after the loop on Line 12.
During initialization, the cross or forward edges {(7, 3), (8, 4)} are removed, making G 3 reducible.
These edges are added back as {(7, rep(3)), (8, rep(4))} = {(7, 2), (8, 4)} in h = 1, where 1 is the LCA
of both (7, 3) and (8, 4). G 3 remains reducible after restoration. In step h = 5, WPOs for the nested
SCCs are connected with 5 6 and x 6 8. The new exit x 5 is created, connected to the WPO via
8 x 5 , and x 5 5 is added. Finally, 1 2, 1 5, x 5 3, x 5 4, and x 2 4 are added, connecting
the WPOs for maximal SCCs. If lift is true, scheduling constraint x 5 2 is added. ■

ConstructWPOBU adapts Tarjan-Havlak-Ramaligam’s almost-linear time algorithm [Ramalingam


1999] for constructing Havlak’s loop nesting forest (LNF) [Havlak 1997]. Similar to components in
the WPO, an LNF represents the nesting relationship among the SCCs. A WPO contains additional
information in the form of scheduling constraints, which are used to generate the concurrent
iteration strategy for fixpoint computation. Lines in ConstructWPOBU indicated by ⋆ were added
to the algorithm in Ramalingam [1999]. The following two theorems prove the correctness and
runtime efficiency of ConstructWPOBU .
Theorem 6.7. Given a graph G(V , ) and its depth-first forest D, ConstructWPOTD (G, D) and
ConstructWPOBU (G, D, false) construct the same WPO for G.
Proof. Using the constructive characterization of an LNF [Ramalingam 2002, Definition 3], an
LNF of a graph is constructed by identifying the maximal SCCs, choosing headers for the maximal
SCCs, removing incoming back edges to the headers to break the maximal SCCs, and repeating
this on the subgraphs. In particular, Havlak’s LNF is obtained when the vertex with minimum DFN
in the SCC is chosen as the header [Ramalingam 2002, Definition 6]. This construction is similar to
how the maximal SCCs are identified on Line 1 of ConstructWPOTD and how ConstructWPOTD is
called recursively on Line 22. Because ConstructWPOBU is based on the LNF construction algorithm,
both algorithms identify the same set of SCCs, resulting in the same X and .
Now consider u v ′ in the original graph that is not a back edge. Let v be rep(v ′) when restoring
the edge if the edge is cross/forward, or v ′ otherwise. If both u, v ′ are in some non-trivial SCC,
then v ∈ Nh on Line 25 for some h. In this case, rep(u) must be in Nh . If not, u v creates an entry
to the SCC other than h. We know that h must be the entry because it has minimum DFN. This
contradicts that the modified graph is reducible over the whole run [Ramalingam 1999, Section 5].
Because all nested SCCs have been already identified, rep(u) is the representative of the maximal
SCC that contains u but not v. Also, if no SCC contains both u, v ′, v = rep(v) on Line 46. Because
all maximal SCCs are found, rep(u) returns the representative of the maximal SCC that contains u
but not v. Therefore, Line 26 and Line 47 construct the same as ConstructWPOTD . □

Theorem 6.8. The running time of ConstructWPOBU (G, D, ∗) is almost-linear.


Proof. The running time of Tarjan-Havlak-Ramalingam’s algorithm is almost-linear [Rama-
lingam 1999, Section 5]. The starred lines in ConstructWPOBU only add constant factors to the
algorithm. Thus, the running time of ConstructWPOBU (G, D, ∗) is almost-linear. □

7 CONNECTION TO WEAK TOPOLOGICAL ORDER


The weak partial order defined in this paper is a strict generalization of weak topological order
(WTO) defined by Bourdoncle [1993]. Let us first recall the definitions by Bourdoncle.
Definition 7.1. A hierarchical total order of a set S is a well-parenthesized permutation of this set
without two consecutive “(”. ■

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:21

Algorithm 3: ConstructWTOTD (G, D)


Input: Directed graph G(V , ), Depth-first forest D 8 def sccWTO(G, D):
Output: WTO 9 ▷ G is strongly connected.
1 G 1 , G 2 , . . . , G k B SCC(G) ▷ Maximal SCCs. 10 h B arg minv ∈V DF N (D, v) ▷ Minimum DFN.
2 foreach i ∈ [1, k] do 11 B B {(v, h) | v ∈ V and v h}
▷ WTOs for SCCs. 12 if |B| = 0 then return h, post-DFN(D, h)
3 W TO i , h#i B sccWTO(G i , D) 13 if |V | = 1 then return h , post-DFN(D, h)


4 WTO B nil 14 V ′ B V \ {h}


5 foreach i in increasing order of h#i do 15 ′ B
⇂V ′
6 W TO B W TO i :: WTO ▷ ’::’ means to append. 16 WTO B ConstructWTOTD ((V ′, ′ ), D ⇂V ′ )
return h :: WTO , post-DFN(D, h)

7 return W TO 17

A hierarchical total order is a string over the alphabet S augmented with left and right parenthesis.
A hierarchical total order of S induces a total order ⪯ over the elements of S. The elements between
two matching parentheses are called a component, and the first element of a component is called
the head. The set of heads of the components containing the element l is denoted by ω(l).
Definition 7.2. A weak topological order (WTO) of a directed graph is a hierarchical total order of
its vertices such that for every edge u → v, either u ≺ v or v ⪯ u and v ∈ ω(u). ■

A WTO factors out a feedback edge set from the graph using the matching parentheses and
topologically sorts the rest of the graph to obtain a total order of vertices. A feedback edge set
defined by a WTO is {(u, v) ∈ | v ⪯ u and v ∈ ω(u)}.
Example 7.3. The WTO of graph G 1 in Figure 2(a) is 1 (2 (3 4) (6 7 9 8) 5) 10. The feedback edge
set defined by this WTO is {(4, 3), (8, 6), (5, 2)}, which is the same as that defined by WPO W1 . ■
Algorithm 3 presents a top-down recursive algorithm for constructing a WTO for a graph G
and its depth-first forest D. Notice the use of increasing post DFN order when merging the results
on Line 5. In general, a reverse post DFN order of a graph is its topological order. Therefore,
ConstructWTOTD , in effect, topologically sorts the DAG of SCCs recursively. Because it is recursive,
it preserves the components and their nesting relationship. Furthermore, by observing the corre-
spondence between ConstructWTOTD and ConstructWPOTD , we see that ConstructWTOTD (G, D) and
ConstructWPOTD (G, D, ∗) construct the same components with same heads and nesting relationship.
The definition of HPO (Definition 4.1) generalizes the definition of hierarchical total order
(Definition 7.1) to partial orders, while the definition of WPO (Definition 4.3) generalizes the
definition of WTO (Definition 7.2) to partial orders. In other words, the two definitions define the
same structure if we strengthen H1 to a total order. If we view the exits in X as closing parenthesis
“)” and x h as matching parentheses (h . . . ), the correspondence between the two definitions
becomes clear. The conditions that a hierarchical total order must be well-parenthesized and that it
disallows two consecutive “(” correspond to conditions H4, H2, and W1. While H5 is not specified in
Bourdoncle’s definition, it directly follows from the fact that ⪯ is a total order. Finally, the condition
in the definition of WTO matches W5. Thus, using the notion of WPO, we can define a WTO as:
Definition 7.4. A weak topological order (WTO) for a graph G(V , ) is a WPO W(V , X , , )
for G where (V ∪ X , ∗ ) is a total order. ■

Definition 7.4 hints at how a WTO for a graph can be constructed from a WPO. The key is
to construct a linear extension of the partial order (V ∪ X , ∗ ) of the WPO, while ensuring that

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:22 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

Algorithm 4: ConstructWTOBU (G, D)


Input: Directed graph G(V , ), Depth-first forest D 8 while stack , ∅ do
Output: WTO W(V , X , , ) 9 v B stack.pop()
1 V,X, , B ConstructWPOBU (G, D, true) 10 if prev , nil then B ∪ {(prev, v)}
2 stack B ∅ 11 prev B v
3 foreach v ∈ V ∪ X do 12 foreach v w do
4 count[v] B 0 13 count[w] B count[w] + 1
5 if G.predecessors(v) = ∅ then 14 if count[w] = |W.predecessors(w)| then
6 stack.push(v) stack.push(w)
7 prev B nil 15 return (V , X , , )

properties H1, H4, and H5 continue to hold. ConstructWTOBU (Algorithm 4) uses the above insight
to construct a WTO of G in almost-linear time, as proved by the following two theorems.
Theorem 7.5. Given a directed graph G and its depth-first forest D, the returned value (V , X , , )
of ConstructWTOBU (G, D) is a WTO for G.
Proof. The call ConstructWPOBU (G, D, true) on Line 1 constructs a WPO for G. With lift set to
true in ConstructWPOBU , all scheduling predecessors outside of a component are visited before
the head of the component. The algorithm then visits the vertices in topological order according
to . Thus, the additions to on Line 10 do not violate H1 and lead to a total order (V ∪ X , ∗ ).
Furthermore, because a stack is used as the worklist and because of H5, once a head of a component
is visited, no element outside the component is visited until all elements in the component are
visited. Therefore, the additions to on Line 10 preserve the components and their nesting
relationship, satisfying H4. Because the exit is the last element visited in the component, no
scheduling constraint is added from inside of the component to outside, satisfying H5. Thus,
ConstructWTOBU (G, D) constructs a WTO for G. □
Example 7.6. For graph G 1 in Figure 2(a), ConstructWTOBU (G 1 , D) returns the following WTO:

1 2 3 4 x3 6 7 9 8 x6 5 x2 10
.
ConstructWTOBU first constructs the WPO W1 in Figure 2(b) using ConstructWPOBU . The partial
order ∗ of W1 is extended to a total order by adding x 3 6 and 7 9. The components and
their nesting relationship in W1 are preserved in the constructed WTO. This WTO is equivalent to
1 (2 (3 4) (6 7 9 8) 5) 10 in Bourdoncle’s representation (see Definition 7.2 and 7.3). ■

Theorem 7.7. Running time of ConstructWTOBU (G, D) is almost-linear.


Proof. The call to ConstructWPOBU in ConstructWTOBU takes almost-linear time, by Theorem 6.8.
After this call the elements and scheduling constraints are visited only once. Thus, the running
time of ConstructWTOBU is almost-linear. □
Bourdoncle [1993] presents a more efficient version of the ConstructWTOTD for WTO construction;
however, it has worst-case cubic time complexity. Thus, Theorem 7.7 improves upon the previously
known algorithm for WTO construction.
The next theorem shows that the output of ConstructWTOTD is the same as that of ConstructWTOBU .

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:23

Theorem 7.8. Given a directed graph G and its depth-first forest D, ConstructWTOBU (G, D) and
ConstructWTOTD (G, D) construct the same WTO for G.
Proof. We shown above, ConstructWTOTD (G, D) and ConstructWPOTD (G, D) construct the same
components with same heads and nesting relationship. Therefore, using Theorem 6.7, we can
conclude that ConstructWTOBU (G, D) and ConstructWTOTD (G, D) construct the same WTO. □
The following theorem shows that our concurrent fixpoint algorithm in Figure 3 computes the
same fixpoint as Bourdoncle’s sequential fixpoint algorithm:
Theorem 7.9. The fixpoint computed by the concurrent fixpoint algorithm in Figure 3 using the
WPO constructed by Algorithm 2 is the same as the one computed by the sequential Bourdoncle’s
algorithm that uses the recursive iteration strategy.
Proof. With both stabilization constraint, , and matching parentheses, (. . . ), interpreted as the
“iteration until stabilization” operator, our concurrent iteration strategy for ConstructWTOBU (G, D)
computes the same fixpoint as Bourdoncle’s recursive iteration strategy for ConstructWTOTD (G, D).
The only change we make to a WPO in ConstructWTOBU is adding more scheduling constraints.
Further, Theorem 5.6 proved that our concurrent iteration strategy is deterministic. Thus, our
concurrent iteration strategy computes the same fixpoint when using the WPO constructed by
either ConstructWTOBU (G, D) or ConstructWPOBU (G, D, false). Therefore, our concurrent fixpoint
algorithm in Figure 3 computes the same fixpoint as Bourdoncle’s sequential fixpoint algorithm. □

8 IMPLEMENTATION
Our deterministic parallel abstract interpreter, which we called Pikos, was built using IKOS [Brat
et al. 2014], an abstract-interpretation framework for C/C++ based on LLVM.
Sequential baseline IKOS. IKOS performs interprocedural analysis to compute invariants for all
programs points, and can detect and prove the absence of runtime errors in programs. To compute
the fixpoint for a function, IKOS constructs the WTO of the CFG of the function and uses Bour-
doncle’s recursive iteration strategy [Bourdoncle 1993]. Context sensitivity during interprocedural
analysis is achieved by performing dynamic inlining during fixpoint: formal and actual parameters
are matched, the callee is analyzed, and the return value at the call site is updated after the callee
returns. This inlining also supports function pointers by resolving the set of possible callees and
joining the results.
Pikos. We modified IKOS to implement our deterministic parallel abstract interpreter using Intel’s
Threading Building Blocks (TBB) library [Intel 2019]. We implemented the almost-linear time
algorithm for WPO construction (§6). We implemented the deterministic parallel fixpoint iterator
(§5) using TBB’s parallel_do. Multiple callees at an indirect call site are analyzed in parallel using
TBB’s parallel_reduce. We refer to this extension of IKOS as Pikos; we use Pikos⟨k⟩ to refer to
the instantiation of Pikos that uses up to k threads.
Path-based task spawning in Pikos. Pikos relies on TBB’s tasks to implement the parallel
fixpoint iterator. Our initial implementation would spawn a task for each WPO element when it is
ready to be scheduled. Such a naive approach resulted Pikos being slower than IKOS; there were
10 benchmarks where speedup of Pikos⟨2⟩ was below 0.90x compared to IKOS, with a minimum
speedup of 0.74x. To counter such behavior, we implemented a simple path-based heuristic for
spawning tasks during fixpoint computation. We assign ids to each element in the WPO W as
follows: assign id 1 to the elements along the longest path in W , remove these elements from W
and assign id 2 to the elements along the longest path in the resulting graph, and so on. The length
of the path is based on the number of instructions as well as the size of the functions called along

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:24 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

the path. During the fixpoint computation, a new task is spawned only if the id of the current
element differs from that of the successor that is ready to be scheduled. Consequently, elements
along critical paths are executed in the same task.
Memory allocator for concurrency. We experimented with three memory allocators optimized
for parallelism: Tcmalloc [Google 2019], Jemalloc [Evans 2019], and Tbbmalloc [Intel 2019]. Tcmalloc
was chosen because it performed the best in our settings for both Pikos and IKOS.
Abstract domain. Our fixpoint computation algorithm is orthogonal to the abstract domain in use.
Pikos⟨k⟩ works for all abstract domains provided by IKOS as long as the domain was thread-safe.
These abstract domains include interval [Cousot and Cousot 1977], congruence [Granger 1989],
gauge [Venet 2012], and DBM [Miné 2001]. Variable-packing domains [Gange et al. 2016] could
not be used because their implementations were not thread-safe. We intend to explore thread-safe
implementations for these domains in the future.

9 EXPERIMENTAL EVALUATION
In this section, we study the runtime performance of Pikos (§8) on a large set of C programs using
IKOS as the baseline. The experiments were designed to answer the following questions:
RQ0 [Determinism] Is Pikos deterministic? Is the fixpoint computed by Pikos the same as that
computed by IKOS?
RQ1 [Performance] How does the performance of Pikos⟨4⟩ compare to that of IKOS?
RQ2 [Scalability] How does the performance of Pikos⟨k⟩ scale as we increase the number of
threads k?
Platform. All experiments were run on Amazon EC2 C5, which use 3.00 GHz Intel Xeon Platinum
8124M CPUs. IKOS and Pikos⟨k⟩ with 1 ≤ k ≤ 4 were run on c5.2xlarge (8 vCPUs, 4 physical cores,
16GB memory), Pikos⟨k⟩ with 5 ≤ k ≤ 8 on c5.4xlarge (16 vCPUs, 8 physical cores, 32GB memory),
and Pikos⟨k⟩ with 9 ≤ k on c5.9xlarge (36 vCPUs, 18 physical cores, 72GB memory). Dedicated
EC2 instances and BenchExec [Beyer et al. 2019] were used to improve reliability of timing results.
The Linux kernel version was 4.4, and gcc 8.1.0 was used to compile Pikos⟨k⟩ and IKOS.
Abstract Domain. We experimented with both interval and gauge domain, and the analysis
precision was set to track immediate values, pointers, and memory. The results were similar for
both interval and gauge domain. We show the results using the interval domain. Because we are
only concerned with the time taken to perform fixpoint computation, we disabled program checks,
such as buffer-overflow detection, in both IKOS and Pikos.
Benchmarks. We chose 4319 benchmarks from the following two sources:
SVC We selected all 2701 benchmarks from the Linux, control-flows, and loops categories of
SV-COMP 2019 [Beyer 2019]. These categories are well suited for numerical analysis, and
have been used in recent work [Singh et al. 2018a,b]. Programs from these categories have
indirect function calls with multiple callees at a single call site, large switch statements,
nested loops, and irreducible CFGs.
OSS We selected all 1618 programs from the Arch Linux core packages that are primarily written
in C and whose LLVM bitcode are obtainable by gllvm [gllvm 2019]. These include, but are
not limited to, apache, coreutils, cscope, curl, dhcp, fvwm, gawk, genius, ghostscript,
gnupg, iproute, ncurses, nmap, openssh, postfix, r, socat, vim, wget, etc.
We checked that the time taken by IKOS and Pikos⟨1⟩ was the same; thus, any speedup achieved
by Pikos⟨k⟩ is due to parallelism in the fixpoint computation. Note that the time taken for WTO
and WPO construction is very small compared to actual fixpoint computation, which is why
Pikos⟨1⟩ does not outperform IKOS. The almost-linear algorithm for WPO construction (§ 6.2)

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:25

Fig. 5. Log-log scatter plot of analysis time taken by IKOS and Pikos⟨4⟩ on 1017 benchmarks. Speedup is
defined as the analysis time of IKOS divided by analysis time of Pikos⟨4⟩. 1.00x, 2.00x, and 4.00x speedup
lines are shown. Benchmarks that took longer to analyze in IKOS tended to have higher speedup.

is an interesting theoretical result, which shows a new connection between the algorithms of
Bourdoncle and Ramalingam. However, the practical impact of the new algorithm is in preventing
stack overflow in the analyzer that occurs when using a recursive implementation of Bourdoncle’s
WTO construction algorithm. Such stack overflows occur when analyzing SV-COMP benchmarks
as well as production code [Crab 2018; ReDex 2017].
There were 130 benchmarks for which IKOS took longer than 4 hours. To include these bench-
marks, we made the following modification to the dynamic function inliner, which implements
the context sensitivity in interprocedural analysis in both IKOS and Pikos: if the call depth during
the dynamic inlining exceeds the given limit, the analysis returns ⊤ for that callee. For each of
the 130 benchmarks, we determined the largest limit for which IKOS terminated within 4 hours.
Because our experiments are designed to understand the performance improvement in fixpoint
computation, we felt this was a reasonable thing to do.

9.1 RQ0: Determinism of Pikos


We experimentally validated Theorem 5.6 that states that Pikos computes the same fixpoint re-
gardless of the underlying schedules. We ran Pikos multiple times with varying number of threads
and abstract domains, and checked that the final fixpoints were indeed the same. Furthermore, we
experimentally validated Theorem 7.9 by comparing the fixpoints computed by Pikos and IKOS.
Recall that IKOS uses Bourdoncle’s recursive iteration strategy based on WTOs. This research
question is more of a sanity check for our theoretical results.

9.2 RQ1: Performance of Pikos⟨4⟩ compared to IKOS


We exclude results for benchmarks for which IKOS took less than 5 seconds. For these benchmarks,
the average analysis time of IKOS was 0.76 seconds. The analysis time of IKOS minus the analysis
time of Pikos⟨4⟩ ranged from +2.81 seconds (speedup in Pikos⟨4⟩) to -0.61 seconds (slowdown
in Pikos⟨4⟩), with an average of +0.16 seconds. Excluding these benchmarks left us with 1017
benchmarks, consisting of 518 SVC and 499 OSS benchmarks.
Figure 5 shows a log-log scatter plot, comparing the analysis time of Pikos⟨4⟩ with that of
IKOS for each of the 1017 benchmarks. Speedup is defined as the analysis time of IKOS divided by
the analysis time of Pikos⟨4⟩. The maximum speedup was 3.63x, which is close to the maximum
speedup of 4.00x. Arithmetic, geometric, and harmonic mean of the speedup were 2.06x, 1.95x, 1.84x,

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:26 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

(a) 0% ~ 25% (5.02 seconds ~ 16.01 seconds) (b) 25% ~ 50% (16.04 seconds ~ 60.45 seconds)

(c) 50% ~ 75% (60.85 seconds ~ 508.14 seconds) (d) 75% ~ 100% (508.50 seconds ~ 14368.70 seconds)

Fig. 6. Histograms of speedup of Pikos⟨4⟩ for different ranges. Figure 6(a) shows the distribution of benchmarks
that took from 5.02 seconds to 16.01 seconds in IKOS. They are the bottom 25% in terms of the analysis time
in IKOS. The distribution tended toward a higher speedup in the upper range.

respectively. Total speedup of all the benchmarks was 2.16x. As we see in Figure 5, benchmarks
for which IKOS took longer to analyze tended to have greater speedup in Pikos⟨4⟩. Top 25%
benchmarks in terms of the analysis time in IKOS had higher averages than the total benchmarks,
with arithmetic, geometric, and harmonic mean of 2.38x, 2.29x, and 2.18x, respectively. Table 2
shows the speedups for the five benchmarks with the highest speedup and the longest analysis
time in IKOS.
Figure 6 provides details about the distribution of the speedup achieved by Pikos⟨4⟩. Frequency
on y-axis represents the number of benchmarks that have speedups in the bucket on x-axis. A
bucket size of 0.25 is used, ranging from 0.75 to 3.75. Benchmarks are divided into 4 ranges using the
analysis time in IKOS, where 0% represents the benchmark with the shortest analysis time in IKOS
and 100% represents the longest. The longer the analysis time was in IKOS (higher percentile), the
more the distribution tended toward a higher speedup. The most frequent bucket was 1.25x-1.50x
with frequency of 52 for the range 0% ~ 25%. For the range 25% ~ 50%, it was 1.50x-1.75x with
frequency of 45; for the range 50% ~ 75%, 3.00x-3.25x with frequency of 38; and for the range 75% ~
100%, 3.00x-3.25x with frequency of 50. Overall, 533 benchmarks out of 1017 (52.4%) had speedups
over 2.00x. The number of benchmarks with more than 3.00x speedup were 106 out of 1017 (10.4%).

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:27

Benchmark Src. IKOS (s) Pikos⟨4⟩ (s) Speedup


audit-2.8.4/aureport OSS 684.29 188.25 3.63x
feh-3.1.3/feh OSS 9004.83 2534.91 3.55x
ldv-linux-4.2-rc1/43_2a-crypto SVC 10051.39 2970.02 3.38x
ratpoison-1.4.9/ratpoison OSS 1303.73 387.70 3.36x
ldv-linux-4.2-rc1/08_1a-gpu-amd SVC 2002.80 602.06 3.33x
fvwm-2.6.8/FvwmForm OSS 14368.70 6913.47 2.08x
ldv-linux-4.2-rc1/32_7a-ata SVC 14138.04 7874.58 1.80x
ldv-linux-4.2-rc1/43_2a-ata SVC 14048.39 7925.82 1.77x
ldv-linux-4.2-rc1/43_2a-scsi-mpt3sas SVC 14035.69 4322.59 3.25x
ldv-linux-4.2-rc1/08_1a-staging-rts5208 SVC 13540.72 7147.69 1.89x
Table 2. A sample of the 1017 results in Figure 5. The first 5 rows list benchmarks with the highest speedup,
and the remaining 5 rows list benchmarks with the longest analysis time in IKOS.

(a) Box plot (b) Violin plot

Fig. 7. Box and violin plot for speedup of Pikos⟨k⟩ with k ∈ {2, 4, 6, 8}.

Benchmarks with high speedup contained code with large switch statements nested inside loops.
For example, ratpoison-1.4.9/ratpoison, a tiling window manager, had an event handling loop
that dispatches the events using the switch statement with 15 cases. Each switch case called an event
handler that contained further branches, leading to more parallelism. Most of the analysis time for
this benchmark was spent in this loop. On the other hand, benchmarks with low speedup usually
had a dominant single execution path. An example of such a benchmark is xlockmore-5.56/xlock,
a program that locks the local X display until a password is entered.
Maximum speedup gained by parallelism in Pikos⟨4⟩ was 3.63x, where 4.00x is the maximum
possible speedup. Arithmetic, geometric, and harmonic mean of the speedup were 2.06x, 1.95x,
1.84x, respectively. The performance was generally better for the benchmarks for which IKOS
took longer to analyze.

9.3 RQ2: Scalability of Pikos


To understand how the performance of Pikos⟨k⟩ scales as we increase the number of threads k, we
carried out the same measurements as in RQ1 using Pikos⟨k⟩ with k ∈ {2, 4, 6, 8}.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:28 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

(a) Speedup of Pikos⟨k⟩ for 3 benchmarks with differ- (b) Distribution of scalability coefficients for 1017
ent scalability coefficients. The lines show the linear benchmarks. (x, y) in the plot means that y number
regressions of these benchmarks. of benchmarks have scalability coefficient at least x.

Fig. 8. Scalability coefficient for Pikos⟨k⟩.

Speedup of Pikos⟨k⟩
Benchmark (5 out of 1017. Criteria: Scalability) Src. IKOS (s) k = 4 k = 8 k = 12 k = 16
audit-2.8.4/aureport OSS 684.29 3.63x 6.57x 9.02x 10.97x
feh-3.1.3/feh.bc OSS 9004.83 3.55x 6.57x 8.33x 9.39x
ratpoison-1.4.9/ratpoison OSS 1303.73 3.36x 5.65x 5.69x 5.85x
ldv-linux-4.2-rc1/32_7a-net-ethernet-intel-igb SVC 1206.27 3.10x 5.44x 5.71x 6.46x
ldv-linux-4.2-rc1/08_1a-net-wireless-mwifiex SVC 10224.21 3.12x 5.35x 6.20x 6.64x
Table 3. Five benchmarks with the highest scalability out of 1017 benchmarks.

Figure 7 shows the box and violin plots for speedup obtained by Pikos⟨k⟩, k ∈ {2, 4, 6, 8}. Box
plots show the quartiles and the outliers, and violin plots show the estimated distribution of the
observed speedups. The box plot [Tukey 1977] on the left summarizes the distribution of the results
for each k using lower inner fence (Q1 − 1.5 ∗ (Q3 − Q1)), first quartile (Q1), median, third quartile
(Q3), and upper inner fence (Q3 + 1.5 ∗ (Q3 − Q1)). Data beyond the inner fences (outliers) are
plotted as individual points. Box plot revealed that while the benchmarks above the median (middle
line in the box) scaled, speedups for the ones below median saturated. Violin plot [Hintze and
Nelson 1998] on the right supplements the box plot by plotting the probability density of the results
between minimum and maximum. In the best case, speedup scaled from 1.77x to 3.63x, 5.07x, and
6.57x. For each k, the arithmetic means were 1.48x to 2.06x, 2.26x, and 2.46x. The geometric means
were 1.46x to 1.95x, 2.07x, and 2.20x. The harmonic means were 1.44x to 1.84x, 1.88x, and 1.98x.
To better measure the scalability of Pikos⟨k⟩ for individual benchmarks, we define a scalability
coefficient as the slope of the linear regression of the number of threads and the speedups. The
maximum scalability coefficient is 1, meaning that the speedup increases linearly with the number
of threads. If the scalability coefficient is 0, the speedup is the same regardless of the number of
threads used. If it is negative, the speedup goes down with increase in number of threads. The
measured scalability coefficients are shown in Figure 8. Figure 8(a) illustrates benchmarks exhibiting
different scalability coefficients. For the benchmark with coefficient 0.79, the speedup of Pikos
roughly increases by 4, from 2x to 6x, with 6 more threads. For benchmark with coefficient 0, the
speedup does not increase with more threads. Figure 8(b) shows the distribution of scalability

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:29

Fig. 9. CFG of function per_event_detailed in aureport for which Pikos had the maximum scalability.
This function calls the appropriate handler based on the event type. This function is called inside a loop.

coefficients for all benchmarks. From this plot we can infer, for instance, that 124 benchmarks have
at least 0.4 scalability coefficient. For these benchmarks, speedups increased by at least 2 when 5
more threads are given.
Table 3 shows the speedup of Pikos⟨k⟩ for k ≥ 4 for a selection of five benchmarks that had the
highest scalability coefficient in the prior experiment. In particular, we wanted to explore the limits
of scalability of Pikos⟨k⟩ for this smaller selection of benchmarks. With scalability coefficient 0.79,
the speedup of audit-2.8.4/aureport reached 10.97x using 16 threads. This program is a tool
that produces summary reports of the audit system logs. Like ratpoison, it has an event-handler
loop consisting of a large switch statement as shown in Figure 9.
In the best case, the speedup of Pikos⟨k⟩ scaled from 1.77x to 3.63x, 5.07x, and 6.57x with k =
2, 4, 6 and 8. With this benchmark, the speedup reached 10.97x with 16 threads. The scalability
varies on the structure of the analyzed programs, and programs with multiple paths of similar
lengths exhibit high scalability.

10 RELATED WORK
Since its publication in 1993, Bourdoncle’s algorithm [Bourdoncle 1993] has become the de facto
approach to solving equations in abstract interpretation. Many advances have been developed since,
but they rely on Bourdoncle’s algorithm; in particular, different ways of intertwining widening and
narrowing during fixpoint computation with an aim to improve precision [Amato and Scozzari
2013; Amato et al. 2016; Halbwachs and Henry 2012].
C Global Surveyor (CGS) [Venet and Brat 2004] that performed array bounds checking was the
first attempt at distributed abstract interpretation. It performed distributed batch processing, and a
relational database was used for both storage and communication between processes. Thus, the
communication costs were too high, and the analysis did not scale beyond four CPUs.
Monniaux [2005] describes a parallel implementation of the ASTRÉE analyzer [Cousot et al.
2005]. It relies on dispatch points that divide the control flow between long executions, which are
found in embedded applications; the tool analyzes these two control paths in parallel. Unlike our
approach, this parallelization technique is not applicable to programs with irreducible CFGs. The

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:30 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

particular parallelization strategy can also lead to a loss in precision. The experimental evaluation
found that the analysis does not scale beyond 4 processors.
Dewey et al. [2015] present a parallel static analysis for JavaScript by dividing the analysis into
an embarrassingly parallel reachability computation on a state transition system, and a strategy for
selectively merging states during that reachability computation.
Prior work has explored the use of parallelism for specific program analysis. BOLT [Albarghouthi
et al. 2012] uses a map-reduce framework to parallelize a top-down analysis for use in verification
and software model checking. Graspan [Wang et al. 2017] implements a single-machine disk-based
graph system to solve graph reachability problems for interprocedural static analysis. Graspan is
not a generic abstract interpreter, and solves data-flow analyses in the IFDS framework [Reps et al.
1995]. Su et al. [2014] describe a parallel points-to analysis via CFL-reachability. Garbervetsky et al.
[2017] use an actor-model to implement distributed call-graph analysis.
McPeak et al. [2013] parallelize the Coverity Static Analyzer [Bessey et al. 2010] to run on an
8-core machine by mapping each function to its own work unit. Tricorder [Sadowski et al. 2015]
is a cloud-based static-analysis platform used at Google. It supports only simple, intraprocedural
analyses (such as code linters), and is not designed for distributed whole-program analysis.
Sparse analysis [Oh et al. 2014, 2012] and database-backed analysis [Weiss et al. 2015] are
orthogonal approaches that improve the memory cost of static analysis. Newtonian program
analysis [Reps 2018; Reps et al. 2017] provides an alternative to Kleene iteration used in this paper.

11 CONCLUSION
We presented a generic, parallel, and deterministic algorithm for computing a fixpoint of an
equation system for abstract interpretation. The iteration strategy used for fixpoint computation is
constructed from a weak partial order (WPO) of the dependency graph of the equation system. We
described an axiomatic and constructive characterization of WPOs, as well as an efficient almost-
linear time algorithm for constructing a WPO. This new notion of WPO generalizes Bourdoncle’s
weak topological order (WTO). We presented a linear-time algorithm to construct a WTO from a
WPO, which results in an almost-linear algorithm for WTO construction given a directed graph.
The previously known algorithm for WTO construction had a worst-case cubic time-complexity.
We also showed that the fixpoint computed using the WPO-based parallel fixpoint algorithm is the
same as that computed using the WTO-based sequential fixpoint algorithm.
We presented Pikos, our implementation of a WPO-based parallel abstract interpreter. Using a
suite of 1017 open-source programs and SV-COMP 2019 benchmarks, we compared the performance
of Pikos against the IKOS abstract interpreter. Pikos⟨4⟩ achieves an average speedup of 2.06x over
IKOS, with a maximum speedup of 3.63x. Pikos⟨4⟩ showed greater than 2.00x speedup for 533
benchmarks (52.4%) and greater than 3.00x speedup for 106 benchmarks (10.4%). Pikos⟨4⟩ exhibits
a larger speedup when analyzing programs that took longer to analyze using IKOS. Pikos achieved
an average speedup of 1.73x on programs for which IKOS took less than 16 seconds, while Pikos
achieved an average speedup of 2.38x on programs for which IKOS took greater than 508 seconds.
The scalability of Pikos depends on the structure of the program being analyzed with Pikos⟨16⟩
exhibiting a maximum speedup of 10.97x.

ACKNOWLEDGMENTS
The authors would like to thank Maxime Arthaud for help with IKOS. This material is based upon
work supported by a Facebook Testing and Verification research award, and AWS Cloud Credits
for Research.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:31

REFERENCES
Aws Albarghouthi, Rahul Kumar, Aditya V Nori, and Sriram K Rajamani. 2012. Parallelizing top-down interprocedural
analyses. In ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI). ACM.
Gianluca Amato and Francesca Scozzari. 2013. Localizing Widening and Narrowing. In Static Analysis Symposium (SAS)
(Lecture Notes in Computer Science), Vol. 7935. Springer, 25–42.
Gianluca Amato, Francesca Scozzari, Helmut Seidl, Kalmer Apinis, and Vesal Vojdani. 2016. Efficiently intertwining widening
and narrowing. Science of Computer Programming (SCP) 120 (2016), 1–24.
Gogul Balakrishnan, Malay K. Ganai, Aarti Gupta, Franjo Ivancic, Vineet Kahlon, Weihong Li, Naoto Maeda, Nadia
Papakonstantinou, Sriram Sankaranarayanan, Nishant Sinha, and Chao Wang. 2010. Scalable and precise program
analysis at NEC. In Formal Methods in Computer-Aided Design (FMCAD).
Thomas Ball, Byron Cook, Vladimir Levin, and Sriram K Rajamani. 2004. SLAM and Static Driver Verifier: Technology
transfer of formal methods inside Microsoft. In Integrated formal methods.
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott Mc-
Peak, and Dawson Engler. 2010. A few billion lines of code later: using static analysis to find bugs in the real world.
Communications of the ACM (CACM) 53, 2 (2010), 66–75.
Dirk Beyer. 2019. Automatic Verification of C and Java Programs: SV-COMP 2019. In Tools and Algorithms for the Construction
and Analysis of Systems, Dirk Beyer, Marieke Huisman, Fabrice Kordon, and Bernhard Steffen (Eds.). Springer International
Publishing, Cham, 133–155.
Dirk Beyer, Stefan Löwe, and Philipp Wendler. 2019. Reliable benchmarking: requirements and solutions. International
Journal on Software Tools for Technology Transfer 21, 1 (01 Feb 2019), 1–29. https://doi.org/10.1007/s10009-017-0469-y
François Bourdoncle. 1993. Efficient chaotic iteration strategies with widenings. In Formal Methods in Programming and
Their Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, 128–141.
Guillaume Brat, Jorge A Navas, Nija Shi, and Arnaud Venet. 2014. IKOS: A framework for static analysis based on abstract
interpretation. In International Conference on Software Engineering and Formal Methods. Springer, 271–277.
Guillaume Brat and Arnaud Venet. 2005. Precise and scalable static program analysis of NASA flight software. In 2005 IEEE
Aerospace Conference. IEEE, 1–10.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition
(3rd ed.). The MIT Press.
Patrick Cousot. 1977. Asynchronous iterative methods for solving a fixpoint system of monotone equations. Technical Report.
Research Report IMAG-RR-88, Université Scientifique et Médicale de Grenoble.
Patrick Cousot. 2015. Abstracting induction by extrapolation and interpolation. In Verification, Model Checking, and Abstract
Interpretation (VMCAI). Springer.
P. Cousot and R. Cousot. 1976. Static Determination of Dynamic Properties of Programs. In International Symposium on
Programming. Paris.
P. Cousot and R. Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction
or approximation of fixpoints. In Conference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages. ACM Press, New York, NY, Los Angeles, California, 238–252.
Patrick Cousot, Radhia Cousot, Jerôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2005.
The ASTRÉE analyzer. In European Symposium on Programming (ESOP), Vol. 5. 21–30.
Patrick Cousot, Roberto Giacobazzi, and Francesco Ranzato. 2019. A2 I: abstract2 interpretation. PACMPL 3, POPL (2019),
42:1–42:31. https://doi.org/10.1145/3290355
Patrick Cousot and Nicolas Halbwachs. 1978. Automatic discovery of linear restraints among variables of a program. In
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 84–96.
Crab. 2018. Possibly stack overflow while computing WTO of a large CFG. https://github.com/seahorn/crab/issues/18.
Accessed November 2019.
David Delmas and Jean Souyris. 2007. Astrée: From research to industry. In Static Analysis Symposium (SAS).
Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf. 2015. A parallel abstract interpreter for JavaScript. In Proceedings of the
13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 34–45.
Jason Evans. 2019. Jemalloc. https://github.com/jemalloc/jemalloc.
Graeme Gange, Jorge A. Navas, Peter Schachte, Harald Søndergaard, and Peter J. Stuckey. 2016. An Abstract Domain of
Uninterpreted Functions. In 17th International Conference on Verification, Model Checking, and Abstract Interpretation
(VMCAI).
Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits. 2017. Toward Full Elasticity in Distributed Static Analysis: The
Case of Callgraph Analysis. In Foundations of Software Engineering (FSE).
Michael R Garey and David S Johnson. 2002. Computers and intractability. Vol. 29. wh freeman New York.
Roberto Giacobazzi and Isabella Mastroeni. 2004. Abstract non-interference: Parameterizing non-interference by abstract
interpretation. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
14:32 Sung Kook Kim, Arnaud J. Venet, and Aditya V. Thakur

gllvm 2019. gllvm. https://github.com/SRI-CSL/gllvm.


Google. 2019. Tcmalloc. https://github.com/gperftools/gperftools.
Denis Gopan and Thomas W. Reps. 2006. Lookahead Widening. In Computer Aided Verification (CAV).
Philippe Granger. 1989. Static analysis of arithmetical congruences. International Journal of Computer Mathematics 30, 3-4
(1989), 165–190.
Nicolas Halbwachs and Julien Henry. 2012. When the Decreasing Sequence Fails. In Static Analysis Symposium (SAS) (Lecture
Notes in Computer Science), Vol. 7460. Springer, 198–213.
Paul Havlak. 1997. Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems
(TOPLAS) 19, 4 (1997), 557–567.
Matthew S Hecht and Jeffrey D Ullman. 1972. Flow graph reducibility. SIAM J. Comput. 1, 2 (1972), 188–202.
Jerry L. Hintze and Ray D. Nelson. 1998. Violin Plots: A Box Plot-Density Trace Synergism.
The American Statistician 52, 2 (1998), 181–184. https://doi.org/10.1080/00031305.1998.10480559
arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00031305.1998.10480559
Intel. 2019. Theading Building Blocks (TBB). https://www.threadingbuildingblocks.org.
Raoul Praful Jetley, Paul L Jones, and Paul Anderson. 2008. Static analysis of medical device software using CodeSonar. In
Proceedings of the 2008 workshop on Static analysis (SAW).
Richard M. Karp. 1972. Reducibility among Combinatorial Problems. Springer US, Boston, MA, 85–103. https://doi.org/10.
1007/978-1-4684-2001-2_9
Sol Kim, Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2016. Widening with thresholds via binary search. Softw., Pract. Exper.
46, 10 (2016), 1317–1328.
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection.
In Foundations of Software Engineering (FSE). ACM, 554–564.
Antoine Miné. 2001. A New Numerical Abstract Domain Based on Difference-Bound Matrices. In Programs as Data Objects,
Second Symposium (PADO).
Antoine Miné. 2004. Relational Abstract Domains for the Detection of Floating-Point Run-Time Errors. In European
Symposium on Programming (ESOP) (Lecture Notes in Computer Science), Vol. 2986. Springer, 3–17.
Antoine Miné. 2006. The octagon abstract domain. Higher-Order and Symbolic Computation 19, 1 (2006), 31–100.
David Monniaux. 2005. The parallel implementation of the Astrée static analyzer. In Asian Symposium on Programming
Languages and Systems(APLAS), Vol. 3780. Springer, 86–96.
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, Daejun Park, Jeehoon Kang, and Kwangkeun Yi. 2014. Global Sparse
Analysis Framework. ACM Transactions on Programmming Languages and Systems (TOPLAS) 36, 3 (2014), 8:1–8:44.
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse
global analyses for C-like languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation,
(PLDI).
Mendes Oulamara and Arnaud J Venet. 2015. Abstract interpretation with higher-dimensional ellipsoids and conic extrapo-
lation. In Computer Aided Verification (CAV). Springer, 415–430.
Ganesan Ramalingam. 1999. Identifying loops in almost linear time. ACM Transactions on Programming Languages and
Systems (TOPLAS) 21, 2 (1999), 175–188.
Ganesan Ramalingam. 2002. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages
and Systems (TOPLAS) 24, 5 (2002), 455–490.
ReDex. 2017. Workaround that prevents stack overflows in WTO computation. https://github.com/facebook/redex/commit/
6bbf8a5ddbaae0b282e9fd7183a21764db7fdf39. Accessed November 2019.
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 49–61.
Thomas W. Reps. 2018. Program Analyses Using Newton’s Method (Invited Paper). In Networked Systems - 6th International
Conference, (NETYS). 3–16.
Thomas W. Reps, Emma Turetsky, and Prathmesh Prabhu. 2017. Newtonian Program Analysis via Tensor Product. ACM
Transactions on Programmming Languages and Systems (TOPLAS) 39, 2 (2017), 9:1–9:72.
Noam Rinetzky, Jörg Bauer, Thomas Reps, Mooly Sagiv, and Reinhard Wilhelm. 2005. A semantics for procedure local heaps
and its abstractions. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM.
Caitlin Sadowski, Jeffrey Van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a program
analysis ecosystem. In International Conference on Software Engineering (ICSE), Vol. 1. IEEE, 598–608.
Gagandeep Singh, Markus Püschel, and Martin T Vechev. 2017. Fast polyhedra abstract domain.. In ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages (POPL). 46–59.
Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2018a. Fast Numerical Program Analysis with Reinforcement
Learning. In 30th International Conference on Computer Aided Verification (CAV).

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.
Deterministic Parallel Fixpoint Computation 14:33

Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2018b. A practical construction for decomposing numerical
abstract domains. PACMPL 2, POPL (2018), 55:1–55:28.
Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with cfl-reachability. In 2014 43nd International Conference
on Parallel Processing (ICPP). IEEE, 451–460.
Edward Szpilrajn. 1930. Sur l’extension de l’ordre partiel. Fundamenta mathematicae 1, 16 (1930), 386–389.
Robert Tarjan. 1973. Testing flow graph reducibility. In Proceedings of the fifth annual ACM symposium on Theory of
computing. ACM, 96–107.
Robert Endre Tarjan. 1979. Applications of Path Compression on Balanced Trees. J. ACM 26, 4 (Oct. 1979), 690–715.
John W. Tukey. 1977. Exploratory data analysis. In Addison-Wesley series in behavioral science : quantitative methods.
Arnaud Venet. 2012. The Gauge Domain: Scalable Analysis of Linear Inequality Invariants.. In Computer Aided Verification
(CAV). Springer, 139–154.
Arnaud Venet and Guillaume P. Brat. 2004. Precise and efficient static array bound checking for large embedded C programs.
In ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI). ACM.
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based
graph system for interprocedural static analyses of large-scale systems code. In Architectural Support for Programming
Languages and Operating Systems (ASPLOS). ACM, 389–404.
Cathrin Weiss, Cindy Rubio-González, and Ben Liblit. 2015. Database-Backed Program Analysis for Scalable Error Propaga-
tion. In 37th IEEE/ACM International Conference on Software Engineering, (ICSE).
Reinhard Wilhelm, Mooly Sagiv, and Thomas Reps. 2000. Shape analysis. In Compiler Construction (CC). Springer, 1–17.

Proc. ACM Program. Lang., Vol. 4, No. POPL, Article 14. Publication date: January 2020.

You might also like