Learning Graph-Based Heuristics For Pointer Analysis Without Handcrafting Application-Specific Features
Learning Graph-Based Heuristics For Pointer Analysis Without Handcrafting Application-Specific Features
MINSEOK JEON, MYUNGHO LEE, and HAKJOO OH∗ , Korea University, Republic of Korea
We present Graphick, a new technique for automatically learning graph-based heuristics for pointer analysis.
Striking a balance between precision and scalability of pointer analysis requires designing good analysis
heuristics. For example, because applying context sensitivity to all methods in a real-world program is
impractical, pointer analysis typically uses a heuristic to employ context sensitivity only when it is necessary.
Past research has shown that exploiting the program’s graph structure is a promising way of developing
cost-effective analysis heuristics, promoting the recent trend of łgraph-based heuristicsž that work on the
graph representations of programs obtained from a pre-analysis. Although promising, manually developing
such heuristics remains challenging, requiring a great deal of expertise and laborious effort. In this paper, we 179
aim to reduce this burden by learning graph-based heuristics automatically, in particular without hand-crafted
application-specific features. To do so, we present a feature language to describe graph structures and an
algorithm for learning analysis heuristics within the language. We implemented Graphick on top of Doop and
used it to learn graph-based heuristics for object sensitivity and heap abstraction. The evaluation results
show that our approach is general and can generate high-quality heuristics. For both instances, the learned
heuristics are as competitive as the existing state-of-the-art heuristics designed manually by analysis experts.
CCS Concepts: • Software and its engineering → Automated static analysis;
Additional Key Words and Phrases: Data-driven static analysis, Machine learning for program analysis, Pointer
analysis, Context sensitivity, Heap abstraction
ACM Reference Format:
Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning Graph-Based Heuristics for Pointer Analysis
without Handcrafting Application-Specific Features. Proc. ACM Program. Lang. 4, OOPSLA, Article 179
(November 2020), 30 pages. https://doi.org/10.1145/3428247
1 INTRODUCTION
Pointer analysis is a fundamental program analysis technique that serves as a key component of
various software engineering tools. The goal of pointer analysis is to statically and conservatively
estimate heap objects that pointer variables may refer to at runtime. The pointer information is
essential for virtually all kinds of program analysis tools, including bug detectors [Blackshear
et al. 2015; Livshits and Lam 2003; Naik et al. 2006, 2009; Sui et al. 2014], security analyzers [Arzt
et al. 2014; Avots et al. 2005; Grech and Smaragdakis 2017; Tripp et al. 2009; Yan et al. 2017],
program verifiers [Fink et al. 2008], symbolic executors [Kapus and Cadar 2019], and program
repair tools [Gao et al. 2015; Hong et al. 2020; Lee et al. 2018; Xu et al. 2019]. The success of
∗ Corresponding author
Authors’ address: Minseok Jeon, [email protected]; Myungho Lee, [email protected]; Hakjoo Oh, hakjoo_
[email protected], Department of Computer Science and Engineering, Korea University, 145, Anam-ro, Sungbuk-gu, Seoul,
02841, Republic of Korea.
This work is licensed under a Creative Commons Attribution 4.0 International License.
© 2020 Copyright held by the owner/author(s).
2475-1421/2020/11-ART179
https://doi.org/10.1145/3428247
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:2 Minseok Jeon, Myungho Lee, and Hakjoo Oh
these tools depends eventually on the precision and scalability of the underlying pointer analysis
algorithm.
Developing a fast and precise pointer analysis requires coming up with good analysis heuristics.
For example, context sensitivity is critical for accurately analyzing object-oriented programs as it
distinguishes method’s local variables and objects in different calling contexts [Smaragdakis and
Balatsouras 2015]. In reality, however, it is too expensive to apply deep context sensitivity (e.g.
2-object-sensitivity) to all methods in a nontrivial program. Therefore practical pointer analysis
applies context sensitivity selectively using a context abstraction heuristic that determines the
amount of context sensitivity that each method should receive [Jeong et al. 2017; Li et al. 2018a; Lu
and Xue 2019; Smaragdakis et al. 2014]. Similarly, the performance of pointer analysis depends
heavily on how heap objects are represented [Kanvar and Khedker 2016]. Pointer analysis usually
employs allocation-site-based heap abstraction, which models heap objects with their allocation
sites. However, because uniformly applying it to all heap objects is costly, a heap abstraction
heuristic can be used to apply it selectively and otherwise use a less precise scheme such as
type-based abstraction [Tan et al. 2017].
Trend: Graph-based Heuristics. A recent trend in state-of-the-art pointer analyses is use of
graph-based analysis heuristics [Li et al. 2018a,b; Lu and Xue 2019; Tan et al. 2016, 2017]. These
graph-based heuristics commonly work in the following two steps: (1) they first use a cheap pre-
analysis to construct a graph representation of the input program and (2) they reason about the
graph structure to produce a program-specific policy for the main analysis.
For example, Tan et al. [2016] presented Bean, which first runs a context-insensitive pre-analysis
to generate the object allocation graph (OAG) and infers from it a policy for improving the precision
of k-object-sensitive analysis. Li et al. [2018b] proposed Scaler, which also uses a context-insensitive
pre-analysis to derive the object allocation graph and analyzes its structure to identify method
calls that are likely to blow up the analysis cost during the 2-object-sensitive analysis. Li et al.
[2018a] presented Zipper, another graph-based heuristic for context-sensitive analysis. Zipper uses
a pre-analysis to generate a so-called precision flow graph (PFG) and identifies precision-critical
method calls that may lose precision significantly if context insensitivity is used. Lu and Xue [2019]
presented a graph-based heuristic, called Eagle, that uses a CFL-reachability-based pre-analysis to
find out variables and objects that need context sensitivity in the main analysis. Tan et al. [2017]
developed Mahjong, a graph-based heap abstraction heuristic that first runs a cheap pre-analysis
to derive a field points-to graph (FPG) and decides when to merge and differentiate heap objects
based on the structure of the points-to graph.
This Work. In this paper, we aim to advance this line of research by automating the process of
creating graph-based analysis heuristics for pointer analysis. While all of the existing graph-based
heuristics have been designed manually by analysis experts, our technique generates such heuristics
automatically from a given graph without any human effort, significantly increasing applicability
and accessibility of the emerging and promising approach in pointer analysis.
We achieve this goal by developing (1) a feature language for describing graph structures and (2)
an algorithm for learning analysis heuristics in terms of the sentences of the language. We first
present a language for describing structural features of nodes in a graph. This feature description
language is simple and general, allowing it to be reused for various analysis instances (e.g. object
sensitivity and heap abstraction). Second, we present a learning algorithm that takes training
programs (and their graph representations) and produces graph-based heuristics by automatically
discovering features appropriate for the given analysis task. Compared to prior data-driven static
analysis techniques [He et al. 2020; Jeon et al. 2018; Jeong et al. 2017; Singh et al. 2018], a salient
characteristic of our technique is that it does not require pre-designed, application-specific features;
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:3
instead, it uses a feature language to generate a proper set of features during the learning process.
By contrast, existing learning-based techniques for static analysis need a different set of hand-tuned
features for each analysis task.
The evaluation results show that our technique is effective and general; it can automatically
produce competitive heuristics for two different analysis instances. We implemented our approach
on top of the Doop pointer analysis framework for Java [Bravenboer and Smaragdakis 2009].
We used our approach to produce a object-sensitivity heuristic from the object allocation graph
on which the state-of-the-art object-sensitivity heuristic Scaler [Li et al. 2018b] was developed.
Additionally, we learned a heap abstraction heuristic from the field points-to graph, which is used
in the state-of-the-art heap abstraction heuristic Mahjong [Tan et al. 2017]. For both instances,
our approach successfully generated high-quality heuristics that are as competitive as Scaler and
Mahjong in terms of the precision and scalability of the main analysis. In particular, the generated
heuristic by our framework successfully analyzes large programs which the state-of-the-art heap
abstraction heuristic, Mahjong, cannot handle within a time budget.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:4 Minseok Jeon, Myungho Lee, and Hakjoo Oh
2 OVERVIEW
We illustrate how our graph-based heuristic looks like and works with an example.
Example Program. Figure 1a is an example program with two queries checking the down-
casting safety. This example has a main method that calls the method foo with two different
receiver objects F1 and F2. Class C provides getter and setter methods to manipulate its field data.
Class F has a method foo which allocates two objects C1 and C2 to variables c1 and c2, respectively.
These variables call the set method with newly allocated objects A1 and B1. There are two queries
query1 and query2 asking the down-casting safety at lines 21 and 23. The safety holds because
the get method returns objects A1 and B1 at lines 21 and 23, respectively.
Goal: Selective Object Sensitivity. Our goal is to analyze the program cost-effectively by apply-
ing context sensitivity only when it is necessary. To prove the queries, we need an object-sensitive
analysis that differentiates the methods called under receiver objects C1 and C2. Without object
sensitivity, the analysis merges the methods get and set called on receiver objects C1 and C2;
eventually, the analysis misjudges that the get method can return both A and B at lines 21 and 23,
and fails to prove the down-casting safety. The context-sensitive analysis, however, is not necessary
for the method foo called from other objects F1 and F2 because foo is not related to the queries. If
we apply context sensitivity to this method, it only increases the analysis cost without any precision
gain. Thus, our heuristic aims to infer the following policy for the main analysis:
Apply object sensitivity only to method calls whose receiver objects are C1 or C2.
Graph-based Heuristics. To generate such a policy, graph-based heuristics first run a cheap
pre-analysis (e.g. context-insensitive analysis) to obtain a graph representation of the program.
For object-sensitivity heuristics, the object allocation graph (OAG) has been considered as a good
program representation [Li et al. 2018b; Tan et al. 2016]. Nodes in an OAG are heap objects
(represented by allocation sites) and edges represent the connections between objects and their
allocators. Figure 1b shows the OAG of the example program. In Figure 1b, for instance, two objects
F1 and F2 have edges toward the objects A1, B1, C1, and C2 because these four objects are allocated
inside the method foo that is called on the receiver objects F1 and F2. Given the OAG, the goal
of graph-based heuristics is to choose a set of nodes in the graph. Ideally, a good heuristic would
accurately identify the set {C1, C2} that needs object sensitivity during the main analysis.
How Our Heuristic Works. Our heuristic consists of a set of features, where a feature describes
a set of nodes in the given graph. A feature is of the form (prev, ([a, b], [c, d]), succ), where [a, b]
and [c, d] are intervals, and prev and succ are sequences of pairs of intervals. A node n in a graph
is described by the feature iff the number of incoming edges of n is between a and b, the number of
outgoing edges is between c and d, and the node has a sequence of predecessors satisfying prev,
and the node has a sequence of successors satisfying succ.
For example, Figure 1c shows a heuristic comprising of a single feature (prev, ([0, ∞], [0, ∞]), succ
), where prev and succ are single pair of intervals (i.e. prev = ([0, ∞], [2, ∞]) and succ = ([2, ∞],
[0, ∞])). It describes nodes that have at least 0 incoming and 0 outgoing edges, have a predecessor
with at least 0 incoming and 2 outgoing edges, and have a successor with at least 2 incoming and 0
outgoing edges. In Figure 1b, C1 and C2 are the only nodes that satisfy these conditions because
they have a successor (i.e. O) with two incoming edges and a predecessor (F1 or F2) with four
outgoing edges. From a set of training programs, our learning algorithm in Section 4.4 can generate
such features automatically.
Given a graph and a set of features, our heuristic finds out all the nodes that satisfy one of the
features. This information is used by the main analysis to perform a selective object-sensitive
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:5
analysis; the methods called under receiver objects C1 and C2 are analyzed with 1-object-sensitivity
while the others are analyzed context insensitively.
Note that the performance of the main analysis heavily depends on the features in learned
heuristics. For example, assume that a heuristic contains the following feature which takes off a
predecessor of a target node from the feature in Figure 1c:
[0,∞],[0,∞] [2,∞],[0,∞]
.
Unlike the feature in Figure 1c, which only C1 and C2 satisfy, four nodes C1, C2, F1, and F2 are
implied by the above feature. Because the feature still includes precision-critical nodes, C1 and C2,
the heuristic is able to prove the queries; however, it pays additional analysis costs as the set of nodes
include F1 and F2 which are not related to the queries. As such, inappropriately learned heuristics
can degrade the performance in costs and even the precision of the main analysis. Therefore, the
goal of our learning algorithm is to find out qualified heuristics that are able to maintain as many
precision-critical nodes as possible while excluding others that are not.
3 PRELIMINARIES
In this section, we define the baseline pointer analysis for Java-like languages (Section 3.1) and
explain how to parameterize its context sensitivity (Section 3.2) and heap abstraction (Section 3.2).
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:6 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Fig. 2. Pointer analysis rules with object sensitivity and allocation-site-based heap abstraction
Analysis Output. The goal of the analysis is to compute the following information:
• VarPtsTo : V × C → ℘(H × HC)
• FldPtsTo : H × HC × F → ℘(H × HC)
• MethodCtx : M → ℘(C)
The points-to information is classified into VarPtsTo and FldPtsTo. VarPtsTo maps each pointer
variable qualified with a calling context to a set of abstract heaps, where an abstract heap consists
of an allocation site and a heap context. FldPtsTo maps each object’s field locations to abstract heaps.
MethodCtx maps each method to the set of its reachable contexts.
In recent pointer analyses, graph representations of the analysis results have been widely used
and our technique also leverages them. Notable examples include object allocation graph (OAG) [Tan
et al. 2016] and field points-to graph (FPG) [Tan et al. 2017]. The object allocation graph is a directed
graph, G OAG = (N OAG , ֒→OAG ), where nodes are allocation sites in the program (i.e. N OAG = H) and
edges (֒→OAG ) ⊆ H × H describe the object allocation relation defined as follows:
h ֒→OAG h ′ ⇐⇒ ∃m ∈ M. (h, _) ∈ VarPtsTo(this m , _) and (_, h ′, m) ∈ Alloc.
In words, we have h ֒→OAG h ′ if h is a receiver object of method m, i.e. (h, _) ∈ VarPtsTo(this m , _),
and m allocates h ′, i.e. (_, h, m) ∈ Alloc. Intuitively, object allocation graph is the łcall graphž
in object sensitivity, which provides information about how each context is constructed in k-
object-sensitive analysis [Li et al. 2018b]. The field points-to graph G FPG = (N FPG , ֒→FPG ) is simply a
context-insensitive representation of the FldPtsTo relation. We define N FPG = H and h ֒→FPG h ′ ⇐⇒
(h ′, _) ∈ FldPtsTo(h, _, _).
Analysis Rules. Figure 2 shows the rules for computing the analysis results. Let maxK and
maxH be the maximum lengths to maintain for call and heap contexts, respectively. Suppose that
(var, heap, inMeth ) is in Alloc, ctx is a reachable context of inMeth (i.e. ctx ∈ MethodCtx(inMeth )),
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:7
(heap, hctx ) ∈ VarPtsTo(this callee , ctx ′ ) VarPtsTo(return callee , ctx ′ ) ⊆ VarPtsTo(return, ctx )
and hctx is a heap context obtained by truncating the last maxH elements of ctx (i.e. hctx =
⌈ctx ⌉maxH ). Then, VarPtsTo(var, ctx ) should include (heap, ctx ). Analysis rules for Move, FldLoad,
and FldStore are defined similarly. The rule for Call describes the standard k-object-sensitive analy-
sis [Milanova et al. 2005; Smaragdakis et al. 2011]. Suppose a method is called on a base variable
base with a context ctx , (heap, hctx ) is a receiver object, and ctx ′ is a new calling context. The
context ctx ′ is obtained by appending heap to the heap context hctx of the receiver object (i.e.
hctx ++ heap) and truncating the result (i.e. ⌈hctx ++ heap⌉maxK ). Then, ctx ′ becomes a reachable
context of the callee (i.e. ctx ′ ∈ MethodCtx(callee )), the points-to set of the formal parameter of
the callee (denoted param callee ) is updated with that of the actual parameter, the this variable of
the callee points-to the receiver object, and the points-to set of the return variable of the callee
(denoted return callee ) is transferred to the return variable of the caller.
3.2 Parameterization
Next, we parameterize the baseline pointer analysis.
Parametric Object Sensitivity. The analysis in Figure 2 uses the same maxK value for every
method call. The parametric object-sensitive analysis generalizes it to be able to assign different
call depths for different method calls. To do so, the parameterized analysis uses the rule in Figure 3
instead of the last rule in Figure 2. In Figure 3, we use the function ContextAbstraction : H →
[0, maxK ], which assigns a context depth between 0 and maxK to each method call. When a
method is called on a receiver object heap, ContextAbstraction produces an appropriate context
depth for it. In Section 4, we present a technique for automatically learning a heuristic that produces
the ContextAbstraction information for a given program.
Parametric Heap Abstraction. The analysis in Figure 2 uses allocation-based heap abstraction
for every heap object. We can generalize it to support selective use of allocation-site- and type-based
heap abstractions. We first need to generalize the analysis results as follows:
• VarPtsTo : V × C → ℘((H + T) × HC)
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:8 Minseok Jeon, Myungho Lee, and Hakjoo Oh
4 GRAPHICK
In this section, we present our approach for automatically learning graph-based analysis heuristics.
In Section 4.1, we define static analyses with k-limited abstractions. Section 4.2 presents a feature
description language for directed graphs, which is important for the generality and effectiveness of
our approach. In Section 4.3, we define a parameterized abstraction heuristic based on the feature
language. Section 4.4 presents our algorithm for learning parameters of the heuristic.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:9
In this paper, we generally assume the analysis F P is monotone with respect to the abstractions
in the sense that more refined abstractions imply higher analysis precision:
a ⊑ a′ =⇒ proved(F P (a)) ⊆ proved(F P (a′ )). (1)
Many static analysis problems are monotone [Jeong et al. 2017; Li et al. 2018a; Liang and Naik
2011; Liang et al. 2011; Tan et al. 2017; Zhang et al. 2014] and therefore our approach is directly
applicable to them. For non-monotone analyses (e.g. interval analysis with widening [Cha et al.
2016]), our approach is still applicable in practice but it does not guarantee its theoretical property
(Theorem 4.2).
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:10 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Finally, a feature (⟨p̂0 , p̂1 , . . . , p̂q ⟩, n̂, ⟨ŝ 0 , ŝ 1 , . . . , ŝr ⟩) ∈ Feature denotes a set of nodes in γG (n̂)
whose predecessors and successors are described as ⟨p̂0 , p̂1 , . . . , p̂q ⟩ and ⟨ŝ 0 , ŝ 1 , . . . , ŝr ⟩, respectively:
γG (⟨p̂0 , p̂1 , . . . , p̂q ⟩, n̂, ⟨ŝ 0 , ŝ 1 , . . . , ŝr ⟩) = {n ∈ γG (n̂) | ∃p0 , p1 , . . . , pq , s 0 , s 1 , . . . , sr ∈ N .
⟨p0 , p1 , . . . , pq ⟩ ∈ γG (⟨p̂1 , p̂2 , . . . , p̂q ⟩), pq ֒→ n ֒→ s 0 , ⟨s 0 , s 1 , . . . , sr ⟩ ∈ γG (⟨ŝ 1 , ŝ 2 , . . . , ŝr ⟩).}
For example, feature (ϵ, ([0, 3], [5, ∞]), ⟨([0, 2], [0, 0])⟩) describes the set of nodes that have 1)
three or less incoming edges and five or more outgoing edges, and 2) a successor node with two or
less incoming edges and no outgoing edges. For another example, the following feature
(⟨([0, 0], [0, 5]), ([1, 2], [3, ∞])⟩, ([0, 3], [100, ∞), ⟨([1, 1], [2, 2])⟩)
describes a node n iff 1) n has three or less incoming edges and 100 or more outgoing edges, 2) n
has a predecessor p with one or two incoming edges and three or more outgoing edges, 3) p also
has a predecessor with no incoming edge and five or less outgoing edges, and 4) n has a successor s
with a single incoming edge and two outgoing edges.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:11
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:12 Minseok Jeon, Myungho Lee, and Hakjoo Oh
objective is as follows:
Find Π = ⟨F1 , F2 , . . . , Fk ⟩ such that ∀Pi ∈ P.HΠ (G i ) is a minimal abstraction for Pi .
where G i is a graph obtained by running a pre-analysis on Pi (e.g. G i = graph(F Pi (0))). The
definition of minimal abstractions is as follows:
Definition 4.1 (Minimal Abstraction [Liang et al. 2011] ). An abstraction a is a minimal abstraction
for program P if
(1) a is precise: proved(F P (a)) = proved(F P (k)), and
(2) a is minimal: (a′ ⊑ a ∧ proved(F P (a′ )) = proved(F P (a))) =⇒ a′ = a.
Algorithm 2 presents our algorithm for efficiently computing a minimal abstraction for program
P. Our algorithm is similar to the ScanCoarsen algorithm by Liang et al. [2011], but ours is more
efficient than the prior algorithm as we exploit the high-level structure of k-limited abstractions to
reduce the search space. The algorithm by Liang et al. [2011] first transforms k-limited abstractions
into binary abstractions (where k is 1), losing the opportunity to leverage the properties of the
search space induced by monotone k-limited analyses. As a result, the size of search space is
(k + 1) |CP | for the existing algorithm [Liang et al. 2011]. We safely reduce the space to k · 2 |CP | .
At line 2, we set C to all program components CP . The algorithm begins with the most precise
abstraction (line 3). At lines 4ś15, it considers each of the abstraction degrees 1, 2, . . . , k in reverse.
Iterating the abstraction degrees in reverse (from k to 1) is important to reduce the search space
safely. At lines 6ś13, it iteratively picks a program component (line 7) and assigns the lower
abstraction degree i − 1 to it (line 9). At line 10, the algorithm checks if the refined abstraction still
preserves the precision; if so, the lower abstraction degree is sufficient for that program component.
Otherwise, the program component needs the degree i to preserve the precision. At the end of
the iteration (line 14), we exclude from C the program components that are determined to require
the current degree i (i.e. {c | a(c) = i}). In the worst case (when the minimal abstraction is λc.0),
our algorithm iterates k · |C | times where the search space for each degree i is 2C and we have k
different degrees. Although the algorithm considers a significantly smaller search space than the
original one, it still guarantees to find a minimal abstraction:
Theorem 4.2. Algorithm 2 returns a minimal abstraction for the input program P.
Proof. See the proof in the link.1 □
Learning a Set of Features. Algorithm 3 describes the algorithm for learning a set of features.
It takes the abstraction level i, minimal abstractions AP , and graphs G P as input. It returns as
output a set of features F that best describe the nodes assigned the abstraction level i according
to minimal abstractions in AP . At line 2, it collects all program components C (e.g. nodes) whose
abstraction degrees are i according to minimal abstractions. At line 3, it initializes F to be the
empty set. At lines 4ś8, the algorithm iteratively calls LearnFeature to generate a feature. The
algorithm adds the generated features to F until the features cover all program components, and
returns F as learned features when F does so.
Learning a Feature. Algorithm 4 presents how each feature f in F is learned. LearnFeature
takes as input components C and graphs G P , and aims to generate a feature f that maximizes the
following score function: P
P ∈P |C ∩ γG P (P ) ( f )|
Score( f , C) = P
P ∈P |γG P (P ) ( f )|
1 http://doi.org/10.5281/zenodo.4216569
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:13
where the score is a real number between 0 and 1. Intuitively, the score describes how accurately a
feature describes the program components in C. For example, the score becomes the highest value
1 when ∀P ∈ P. γG P (P ) ( f ) ⊆ C. The score decreases as the feature selects components not in C.
The algorithm starts from the most general feature, i.e., (ϵ, ([0, ∞], [0, ∞]), ϵ ), and iteratively
refines it until the feature becomes sufficiently informative, meaning that the score of the refined
feature becomes higher than the hyper parameter θ . The value of θ has great impacts on the
performance of learned heuristics, and we discuss how we determine the value of θ in Section 5.2.
At lines 4ś10, the algorithm iteratively calls Refine to make the current feature f more specific.
When no more improvement is possible (i.e. Score( f ′, C) ≤ Score( f , C)), the loop terminates and
the algorithm returns the current feature f . We define the refinement function Refine as follows:
where Append( f ) and Replace( f ) produce new features that are more specific than f . From the set
of new features, Refine chooses the one with the highest score. Append( f ) denotes the features that
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:14 Minseok Jeon, Myungho Lee, and Hakjoo Oh
which has the highest score 0.06 among the 12 cases of features.
(3) Because the score is less than 0.5, it refines the feature again to the following specific one,
which comes from Append(ϵ, ([0, 97], [0, ∞]), ϵ ), with the same manner:
[0,97],[0,∞] [97,∞],[0,∞]
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:15
5 EVALUATION
In this section, we experimentally evaluate our technique for learning graph-based heuristics. We
aim to answer the following research questions:
• Effectiveness and Generality: How effectively does the learned heuristic perform com-
pared to the state-of-the-art heuristics? Is it generally applicable for different analysis tasks
without manual effort for designing application-specific features?
• Learning Algorithm: How much does the learning cost? How does the hyper-parameter θ
affect the performance of the learned heuristics?
• Learned Insight: Does our approach produce explainable heuristics? What are the insights
learned from the generated heuristics?
Overall Setting. We implemented our approach, as a tool Graphick, on top of Doop [Bravenboer
and Smaragdakis 2009], a pointer analysis framework for Java that has been widely used in prior
works [Jeon et al. 2018; Jeong et al. 2017; Smaragdakis et al. 2014; Tan et al. 2016, 2017]. For the
precision and scalability metrics, we follow existing works [Jeong et al. 2017; Li et al. 2018a,b;
Tan et al. 2017] and use the number of may-fail casts alarms and the time spent on each analysis.
We also use the number of polymorphic call sites (i.e. call sites whose targets are not uniquely
determined by each pointer analysis) and call-graph edges as additional precision metrics. For all
precision metrics, the lower is the better. We set the time budget as 3 hours (10,800 sec) for all
analyses. For the hyper-parameter θ , we chose the one among various values (e.g., 0.1, 0.2, ..., 0.9)
via cross validation (we explain how this is done in section 5.2). For each feature, we limit it to have
at most three nodes due to scalability. All the experiments were done on a machine with i7 CPU
and 64 GB RAM running Ubuntu 16.04 (64bit). We used the OpenJDK (1.6.0_24) library.
We used a total of 17 programs: 10 programs (luindex, lusearch, antlr, pmdm , chart, eclipse,
fop, bloat, xalan, and jython) from the DaCapo 2006-10-MR2 benchmark suite [Blackburn et al.
2006] and 7 programs (pmds , jedit, briss, soot, findbugs, JPC, and checkstyle) obtained from the
artifacts provided by Tan et al. [2017] and Li et al. [2018b]. Here, we used two different versions
of pmd where pmdm is a small program used by Tan et al. [2017], and pmds is an open-source
application used by Li et al. [2018b]. We split the benchmark programs into training, validation,
and test sets. The training and validation sets are used for learning a heuristic, and the test set is
used for evaluating the performance of the learned heuristic. For the training set, we used relatively
small benchmarks, because our algorithm includes a process to obtain minimal abstractions and
this task is too expensive to run for large programs. The validation set is used for choosing the
hyper-parameter θ ; we chose the one that leads the heuristic to the best performance on the
validation set.
Setting. Scaler is a context-sensitivity heuristic that works on the object allocation graph
(OAG) [Li et al. 2018b]. From the OAG, it infers a policy to assign one of 2-object-sensitivity,
2-type-sensitivity, 1-type-sensitivity, and context-insensitivity to each method. We used the same
pre-analysis of Scaler to obtain the OAG and let our technique produce a heuristic. We set maxK
in Section 3.2 to 3, where 0, 1, 2, and 3 correspond to context-insensitivity, 1-type-sensitivity,
2-type-sensitivity, and 2-object-sensitivity, respectively. Unlike Scaler, our heuristic assigns a
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:16 Minseok Jeon, Myungho Lee, and Hakjoo Oh
context for each heap allocation site. It poses 4N possibilities where N denotes the number of
allocation-sites in the program.
Although the primary objective in this evaluation is to compare with Scaler, we evaluated two
more heuristics as well: Zipper [Li et al. 2018a] and Data [Jeong et al. 2017]. Zipper is another
graph-based context-sensitivity heuristic that works on the precision flow graph (PFG). Data is not
graph-based, but we include it because Data is currently the state-of-the-art data-driven pointer
analysis algorithm (with hand-crafted features). In short, we compare the following pointer analysis
algorithms:
• Scaler: A hand-crafted graph-based object-sensitivity heuristic for OAG [Li et al. 2018b]
• Graphick: Our learning-based graph-based object-sensitivity heuristic for OAG
• Zipper: A hand-crafted graph-based object-sensitivity heuristic for PFG [Li et al. 2018a]
• Data: A state-of-the-art learning-based object-sensitivity heuristic [Jeong et al. 2017]
• 2objH: The 2-object-sensitivity with 1-context-sensitive heap (precision upper bound)
• Insens: The context-insensitive analysis (scalability upper bound)
We used three programs (luindex, lusearch, antlr) as the training set, one program (findbugs)
as the validation set, and the remaining thirteen programs (pmds , chart, eclipse, jedit, briss, soot,
jython, pmdm , fop, bloat, JPC, checkstyle, xalan) as the test set. We chose findbugs as a validation
program because it is a popular Java application and requires suitable heuristics to be analyzed
cost-effectively. For example, 2objH does not terminate on this program even after thousands of
seconds or more.
Results. Table 1 and 2 present the performance of the context-sensitivity heuristics described
above. The number in a parenthesis for graph-based heuristics (i.e. Graphick, Zipper, and Scaler)
represents the sum of time spent on performing the pre-analysis (i.e. context-insensitive analysis)
and running the heuristics on the graphs for extracting context abstractions.
The results show that our technique can automatically generate a cost-effective heuristic that
performs as competitive as the state-of-the-art object-sensitivity heuristics. Compared to the
baseline heuristic Scaler, which employs the same graph OAG, Graphick shows a better precision
than Scaler with some losses in scalability for the test programs pmds , eclipse, and briss. For
example, Graphick reports 101 less may-fail casts alarms than Scaler for the test program pmds
while taking 216 more seconds. In addition, Graphick shows better performance in both precision
and scalability than Scaler on the test programs (except pmdm ) in Table 2. For example, in
jedit, Graphick produces 201 less alarms with 35% less analysis time. In comparison to Zipper,
Graphick consistently outperforms in scalability. For example, Graphick successfully analyzed
pmds , jedit, and briss with remarkably less costs when Zipper fails to analyze them within the time
budget. In comparison to Data, the result shows that Graphick performs far better in precision.
Although Data presents better scalability than Graphick, it produces more than 92 alarms for
the test programs pmds , eclipse, jedit and briss. Compared to 2objH, Graphick shows better
performance in scalability for the majority of test programs which 2objH fails to analyze within the
given time budget (3 hours).
5.1.2 Comparison with Mahjong.
Setting. Mahjong is a graph-based heap abstraction heuristic that works on the field points-to
graph (FPG) [Tan et al. 2017]. From the FPG, which is obtained by running a context-insensitive pre-
analysis, Mahjong infers a policy that determines whether to merge objects allocated in different
allocation sites. We used the same pre-analysis to obtain the FPG and let our technique produce a
heap abstraction heuristic. Unlike Mahjong, our heap abstraction heuristic (i.e. Graphick) assigns
‘type’ (type-based heap abstraction) or ‘alloc’ (allocation-site-based heap abstraction) to each heap
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:17
Table 1. Performance of the context-sensitivity heuristics against benchmarks. For all metrics, the lower is the
better. For precision metric, we use the number of may-fail casts(#may-fail casts) and polymorphic call sites
(#poly-call sites) whose targets are not uniquely determined by each pointer analysis. For scalability metric,
we use analysis time, and the number in a parenthesis presents the sum of time spent during pre-analysis
process. #call-graph-edges for the training and validation programs are omitted due to the lack of space.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:18 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Table 2. Performance comparison among various context sensitivity heuristics against the left six benchmarks.
All the notations are the same with Table 1.
allocation-site which poses 2N possibilities where N denotes the number of allocation-sites in the
program. We compare the following four analyses:
• Mahjong: The state-of-the-art graph-based heap abstraction heuristic [Tan et al. 2017]
• Graphick: Our learning-based graph-based heap abstraction heuristic
• Alloc-Based: The uniform allocation-site-based heap abstraction (precision upper bound)
• Type-Based: The uniform type-based heap abstraction (scalability upper bound)
Following Mahjong [Tan et al. 2017], all analyses above use 3-object-sensitivity with 2-context-
sensitive heap.
For this evaluation, we used the same benchmark programs in the section 5.1.1. We used four
programs (luindex, lusearch, antlr, pmdm ) as the training set and twelve programs (fop, chart, bloat,
xalan, JPC, checkstype, eclipse, pmds , jecit, briss, soot, jython) as the test set. We also used findbugs
as a validation program.
Results. Table 3 and 4 show that our technique can produce a competitive graph-based heap
abstraction heuristic from the FPG. In comparison with Mahjong, Graphick shows a far better
scalability while losing precision a bit. Mahjong produced the same number of may-fail-casts
with the most precise one, Alloc-Based, but it was unable to analyze large programs like chart and
bloat within the time budget (3 hours). Although Graphick produced more alarms (103 at most)
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:19
Table 3. Performance of the heap abstraction heuristics against benchmarks. The notions are the same with
those in Table 1.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:20 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Table 4. Performance comparison between the heap abstraction heuristics against the left benchmarks.
than Mahjong, it successfully analyzed programs (i.e. chart and bloat) which Mahjong failed
to analyze. Currently, the overhead, the time taken by extracting an abstraction from the FPG,
of our heuristic is bigger than Mahjong because Mahjong designed an efficient algorithm to
produce an abstraction from FPG while ours is not optimized to minimize it. The results, however,
still demonstrate that Graphick is competitive and has a strength in scalability compared to the
state-of-the-art technique as it successfully analyzed the large programs, chart and bloat, which
Mahjong cannot handle.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:21
16 training
6
validation
Score of learned heuristic
5 12
10
4 training
validation 8
6
3
4
2 2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
value of hyper parameter value of hyper parameter
the value of θ set to learn each heuristic, and Pthe Y-axis presents scores that we measured for
(F P (Hθ (G )))
performance of each heuristic Hθ according to PP ∈P proved where cost denotes analysis
P ∈P cost (F P (Hθ (G )))
time. This score function presents the number of queries proved per second; thereby, more precise
and scalable the analysis, higher the score. The red dotted and black solid lines present how the
scores change over the training programs P and the validation program, respectively. For the
training programs, the score of the learned heuristic increases as the higher θ is given because the
heuristic becomes more fitted to the training programs.2 In our evaluation, both learned heuristics,
however, perform the best on the validation program when θ is 0.5; thus, Graphick in Table 1 and
Table 3 corresponds to H0 . 5 .
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:22 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Concretization
Top 5 Features portion score
(Top 1)
n1 n2 n3
[0,∞],[100,∞] [0,∞],[29,∞] 35% 0.55
[105,155],[0,∞] 9% 1
n1
2type
[84,91],[0,∞] 4% 0.5
[0,∞],[53,61] 9% 0.53
[0,∞],[24,25] 6% 0.53
2obj
n2
[38,∞],[0,62] [0,∞],[236,∞] 27% 0.59
Fig. 6. Top-5 features learned by our technique, and concrete nodes implied by the top-1 feature. Gray colored
abstract nodes in the features correspond to the target nodes and others are predecessors or successors.
Gray colored nodes in the column Concretization are precision-critical nodes which are selected by the first
features; other nodes are predecessors or successors that make the gray colored nodes satisfy the features.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:23
Table 5. Performance of our manually-designed graph-based heap abstraction heuristic for FPG
of satisfying nodes over the total precision-critical nodes in the given graphs (portion) and the
scores (score). The right most column, Concretization, illustrates the visualized concretization
for each first feature in Top 5 Features column, where the gray colored nodes correspond to the
target abstract nodes of the first feature. For space reasons, we draw each node to have at most 13
incoming and outgoing edges although it can have more than 13 edges.
Insights. The generated features during the learning process provide hints on designing analy-
sis heuristics from the graphs. For example, we investigated the features of the heap abstraction
heuristic in Figure 6 and found two commonalities in them. First, the features have the form of
(ϵ, n̂, ŝ) where ŝ is not ϵ, which implies that we should consider successors more than predeces-
sors when designing heap abstraction heuristics from points-to graph. The second commonality
is that ŝ or n̂ tends to include an abstract node Node E that presents nodes with lots of outgoing
E
edges, i.e., Node = (itv, [b, ∞]) where the number b is about 3% of the total nodes in a graph
of a training program. From these observations, we manually designed a graph-based heap ab-
straction heuristic which assigns allocation-site based heap abstraction to the target nodes if
at least 3% of the total nodes in FPG belong to either the target node or its successor nodes
(i.e. H = ⟨{(ϵ, ([0, ∞], [b ′, ∞]), ϵ ), (ϵ, ⊤, ⟨([0, ∞], [b ′, ∞])⟩), (ϵ, ⊤, ⟨⊤, ([0, ∞], [b ′, ∞])⟩), . . . }⟩ where
⊤ equals to the most general one ([0, ∞], [0, ∞]) and b ′ is 3% of the total nodes in the given graph).
Otherwise, the heuristic assigns type-based heap abstraction to the others. Table 5 demonstrates
the performance of the manually-crafted heuristic. In comparison to Alloc-Based, it reduces about
99% of analysis cost while producing only 2% more alarms.
Intuitively, the nodes with lots of successors in FPG should be analyzed precisely because merging
the objects with others would produce lots of spurious analysis results. For example, if there exists
an object with lots of field objects which we want to merge with another one with a few field
objects, it eventually produces lots of spurious results stating that the both heaps can have lots
of field objects. Such insight is related with that of Mahjong which merges the objects if their
successors have the same type; statistically, if an object has lots of successors, there hardly exist
the other objects with exactly the same types of successors. Surprisingly, it is easy to find such
insight through the features generated by our technique. Note that this insight is general as it is
not dependent to Java programs. For example, when analyzing a C program, it is a required task
not to merge such heaps with others as it would produce lots of spurious results.
Interestingly, Figure 6 also shows the difference between the statistically-learned insight be-
hind Graphick and the logical insight behind Scaler in deciding which nodes to analyze more
precisely. Based on the logical insight, Scaler relies heavily on the number of incoming edges as
that number in the object allocation graph indicates how many contexts will be constructed in
object sensitivity. Graphick, however, treats the number of neighbor nodes’ outgoing edges more
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:24 Minseok Jeon, Myungho Lee, and Hakjoo Oh
importantly, as shown in Figure 6. Such differences result in the performance gap between the two
object-sensitivity heuristics.
Generality of learned heuristic. We found the learned heuristic for object sensitivity is general
to the hybrid-context sensitivity [Kastrinis and Smaragdakis 2013]. Table 6 presents the performance
of the conventional 2-hybrid-context sensitivity (S2objH) and 2-hybrid-context sensitivity with
the learned heuristic (Graphick) used in Section 5.1.1. The table shows that Graphick is also
cost-effective compared to S2objH. For example, on a test program bloat, Graphick produces only
22 more alarms while reducing about 90% of analysis costs.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:25
Table 7. Performance comparison among heuristics learned from various combinations of training sets
(i.e. {luindex}, {luindex, lusearch}, {luindex, lusearch, antlr}, and {luindex, lusearch, antlr, pmd}) and an ideal
heuristic (ideal) against the validation program findbugs. #proven casts presents the number of casts proved
to be safe; a more precise analysis produces a larger number of #proven casts. The row score presents the
#proven casts
quality of the heuristics computed by .
analysis time (s)
Our training programs provide sufficient training data to learn cost-effective heuristics in this sense.
More precisely, the smallest program (lusearch) has 4,752 allocation-sites, and the remaining three
training programs (lusearch, antlr, pmdm ) provide 14,068 unique allocation-sites in total; we have a
total of 18,820 allocation-sites for training data.
In practice, we recommend a user to choose programs with less than 400 classes as training
programs, for which we found Grahpick typically works well. Although limited, our experience
shows that a collection of such programs can provide useful training data.
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:26 Minseok Jeon, Myungho Lee, and Hakjoo Oh
variable points-to set is one of the most general clients that affect the others. The clients we used
in our evaluation (i.e. may-fail-casts, poly-call-sites, and call-graph-edges) are computed based on
the context-insensitive variable points-to set and therefore minimizing it would likely minimize
other clients too.
6 RELATED WORK
In this section, we discuss the prior works related to ours.
Heuristics for Static Analysis. Designing heuristics for precise and scalable static analysis has
been an active research area. For example, Smaragdakis et al. [2014] proposed a context-sensitivity
heuristic that runs pre-analysis (e.g., context-insensitive analysis) to identify scalability-detrimental
method calls if context-sensitive analysis is applied; it analyzes those methods context-insensitively
to obtain tractable scalability while sacrificing precision a bit. Oh et al. [2014] presented the idea of
impact pre-analysis, which first estimates the impact of applying context sensitivity with a fully
context-sensitive yet coarse pre-analysis and then performs selective context sensitivity during the
main analysis. Hassanshahi et al. [2017] aimed to find a parameter which determines context depths
for each heap. They performed context-insensitive analysis as a pre-analysis, and from the analysis
results, determined the heap context depths for each object to achieve reasonable scalability without
losing too much precision. Kastrinis and Smaragdakis [2013] introduced a hybrid context-sensitivity
heuristic that applies object-sensitivity for the virtual calls while applying call-site-sensitivity to
the static calls. Xu and Rountev [2008] proposed a technique to identify the equivalence classes; it
merges the contexts in the same class in order to improve the scalability without any precision loss.
Recently, to design cost-effective analysis policies, graph-based heuristics have arisen as a trending
technique [Li et al. 2018a,b; Lu and Xue 2019; Tan et al. 2016, 2017]; our work lies in this line of
research and aims to generate such graph-based heuristics automatically.
Data-driven Static Analysis. Our work also belongs to the family of techniques known as data-
driven static analysis [Cha et al. 2016, 2018; He et al. 2020; Heo et al. 2017; Jeong et al. 2017; Oh et al.
2015]. Data-driven static analysis leverages machine learning to produce favorable program analysis
heuristics automatically. Oh et al. [2015] proposed a data-driven technique based on Bayesian
optimization to learn flow- and context-sensitivity heuristics. They designed features for variables
and functions in C programs to learn flow- and context-sensitivity heuristics which are presented
as linear combinations of the features. Later, the linear-model approach was extended to capture
disjunctive program properties [Jeon et al. 2019; Jeong et al. 2017] Jeon et al. [2018] introduced an
approach, called data-driven context tunneling, which constructs contexts with the most important
k context elements instead of using the most recent k context elements as the conventional k
context abstraction does. To learn context tunneling heuristics, they designed features for methods
of Java programs to present which method calls require context tunneling for better performance
in both precision and scalability. Heo et al. [2016] proposed a supervised learning algorithm to
learn variable clustering strategy in the Octagon domain where the learned heuristics determine
whether to keep relation between variables during analysis. He et al. [2020] introduced a data-driven
approach Lait that learns neural policies for removing substantially redundant constraints that
need not be computed in numeric program analysis. Singh et al. [2018] leveraged reinforcement
learning to speed up numeric analysis with the Polyhedra domain. The prior works above require
manually designed features to learn suitable heuristics. By contrast, our technique proposes to use
a feature language to reduce the burden of manual effort on designing features.
Closely related to our work, Chae et al. [2017] also automatically generated features for data-
driven static analysis. Given programs, it runs a program reducer to convert the programs into
small feature programs which only maintain the query-related program components, and generates
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:27
features for data-driven static analysis from data-flow graphs obtained from the feature programs.
Not to mention that the technique is specialized for C programs, it is hardly applicable to learning
context-sensitivity heuristics because reducing programs spanning multiple procedures into reason-
ably small feature programs while maintaining the query-related components is challenging [Chae
et al. 2017] . In this paper, we present a new technique based on a feature language, which does not
have such a limitation and is effectively applicable to context-sensitive analysis for Java.
Finding Optimal Abstractions. Various techniques have been proposed to find optimal ab-
stractions efficiently that precisely analyze the query-related program parts only [Liang et al. 2011;
Zhang et al. 2014, 2013]. Our work is different from them as we aim for good-enough abstractions
with much smaller overheads. Zhang et al. [2013] suggested a counterexample-guided abstraction
refinement technique that iteratively refines an abstraction toward a desirable one in dataflow
analysis. This approach was improved further by Zhang et al. [2014], which can find desirable
abstractions in parametric program analysis written in Datalog. Liang et al. [2011] proposed an
efficient algorithm to find minimal abstractions that precisely analyze the components related to
queries only. In our work, we improved the algorithm by Liang et al. [2011] in terms of the size of
search space (Section 4.4).
7 CONCLUSION
In this paper, we presented a technique, Graphick, that automatically learns graph-based analysis
heuristics. Recently, designing heuristics on the graph representations of programs has been arisen
as a promising approach in pointer analysis. Graphick aims to automate the designing process
using a feature language and a learning algorithm. To demonstrate the performance of our approach,
we implemented it on top of the Doop pointer analysis framework for Java. The experimental
results show that Graphick successfully produces high-quality analysis heuristics that are as
competitive as the existing state-of-the-art heuristics designed manually by analysis experts. We
hope our work facilitates the recent development of graph-based heuristics for pointer analysis.
ACKNOWLEDGMENTS
We thank Donghoon Jeon for helpful comments on Algorithm 2. This work was supported by
Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number
SRFC-IT1701-51. This work was partly supported by Institute of Information & communications
Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2020-
0-01337, (SW STAR LAB) Research on Highly-Practical Automated Software Repair and No.2017-0-
00184, Self-Learning Cyber Immune Technology Development).
REFERENCES
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien
Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware
Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language
Design and Implementation (Edinburgh, United Kingdom) (PLDI ’14). ACM, New York, NY, USA, 259ś269. https:
//doi.org/10.1145/2594291.2594299
Dzintars Avots, Michael Dalton, V. Benjamin Livshits, and Monica S. Lam. 2005. Improving Software Security with a C
Pointer Analysis. In Proceedings of the 27th International Conference on Software Engineering (St. Louis, MO, USA) (ICSE
’05). ACM, New York, NY, USA, 332ś341. https://doi.org/10.1145/1062455.1062520
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan,
Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B.
Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The
DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:28 Minseok Jeon, Myungho Lee, and Hakjoo Oh
Conference on Object-oriented Programming Systems, Languages, and Applications (Portland, Oregon, USA) (OOPSLA ’06).
ACM, New York, NY, USA, 169ś190. https://doi.org/10.1145/1167473.1167488
Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan. 2015. Selective Control-flow Abstraction via Jumping. In
Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and
Applications (Pittsburgh, PA, USA) (OOPSLA 2015). ACM, New York, NY, USA, 163ś182. https://doi.org/10.1145/2814270.
2814293
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses.
SIGPLAN Not. 44, 10 (Oct. 2009), 243ś262. https://doi.org/10.1145/1639949.1640108
Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. 2016. Learning a Strategy for Choosing Widening Thresholds from a Large
Codebase. Springer International Publishing, Cham, 25ś41. https://doi.org/10.1007/978-3-319-47958-3_2
Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. 2018. A scalable learning algorithm for data-driven program analysis.
Information and Software Technology 104 (2018), 1 ś 13. https://doi.org/10.1016/j.infsof.2018.07.002
Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically Generating Features for Learning
Program Analysis Heuristics for C-like Languages. Proc. ACM Program. Lang. 1, OOPSLA, Article 101 (Oct. 2017), 25 pages.
https://doi.org/10.1145/3133925
Stephen J. Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel Geay. 2008. Effective Typestate Verification in the
Presence of Aliasing. ACM Trans. Softw. Eng. Methodol. 17, 2, Article 9 (May 2008), 34 pages. https://doi.org/10.1145/
1348250.1348255
Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe
Memory-leak Fixing for C Programs. In Proceedings of the 37th International Conference on Software Engineering - Volume 1
(Florence, Italy) (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 459ś470. http://dl.acm.org/citation.cfm?id=2818754.2818812
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and Taint Analysis. Proc. ACM Program. Lang. 1,
OOPSLA, Article 102 (Oct. 2017), 28 pages. https://doi.org/10.1145/3133926
Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An
Efficient Tunable Selective Points-to Analysis for Large Codebases. In Proceedings of the 6th ACM SIGPLAN International
Workshop on State Of the Art in Program Analysis (Barcelona, Spain) (SOAP 2017). ACM, New York, NY, USA, 13ś18.
https://doi.org/10.1145/3088515.3088519
Jingxuan He, Gagandeep Singh, Markus Püschel, and Martin Vechev. 2020. Learning Fast and Precise Numerical Analysis. In
Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI
2020). Association for Computing Machinery, New York, NY, USA, 1112ś1127. https://doi.org/10.1145/3385412.3386016
Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2016. Learning a Variable-Clustering Strategy for Octagon from Labeled Data
Generated by a Static Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 237ś256. https://doi.org/10.1007/978-3-
662-53413-7_12
Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-Learning-Guided Selectively Unsound Static Analysis. In
Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina) (ICSE ’17). IEEE Press,
519ś529. https://doi.org/10.1109/ICSE.2017.54
Seongjoon Hong, Junhee Lee, Jeongsoo Lee, and Hakjoo Oh. 2020. SAVER: Scalable, Precise, and Safe Memory-Error Repair.
In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20).
Association for Computing Machinery, New York, NY, USA, 271ś283. https://doi.org/10.1145/3377811.3380323
Minseok Jeon, Sehun Jeong, Sungdeok Cha, and Hakjoo Oh. 2019. A Machine-Learning Algorithm with Disjunctive
Model for Data-Driven Program Analysis. ACM Trans. Program. Lang. Syst. 41, 2, Article 13 (June 2019), 41 pages.
https://doi.org/10.1145/3293607
Minseok Jeon, Sehun Jeong, and Hakjoo Oh. 2018. Precise and Scalable Points-to Analysis via Data-driven Context Tunneling.
Proc. ACM Program. Lang. 2, OOPSLA, Article 140 (Oct. 2018), 29 pages. https://doi.org/10.1145/3276510
Sehun Jeong, Minseok Jeon, Sungdeok Cha, and Hakjoo Oh. 2017. Data-driven Context-sensitivity for Points-to Analysis.
Proc. ACM Program. Lang. 1, OOPSLA, Article 100 (Oct. 2017), 28 pages. https://doi.org/10.1145/3133924
Vini Kanvar and Uday P. Khedker. 2016. Heap Abstractions for Static Analysis. ACM Comput. Surv. 49, 2 (2016), 29:1ś29:47.
https://doi.org/10.1145/2931098
Timotej Kapus and Cristian Cadar. 2019. A Segmented Memory Model for Symbolic Execution. In Proceedings of the 2019
27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software
Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 774ś784.
https://doi.org/10.1145/3338906.3338936
George Kastrinis and Yannis Smaragdakis. 2013. Hybrid Context-sensitivity for Points-to Analysis. In Proceedings of the
34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) (PLDI
’13). ACM, New York, NY, USA, 423ś434. https://doi.org/10.1145/2491956.2462191
Junhee Lee, Seongjoon Hong, and Hakjoo Oh. 2018. MemFix: Static Analysis-Based Repair of Memory Deallocation Errors
for C. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features 179:29
on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing
Machinery, New York, NY, USA, 95ś106. https://doi.org/10.1145/3236024.3236079
Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis. 2018a. Precision-guided Context Sensitivity for Pointer Analysis.
Proc. ACM Program. Lang. 2, OOPSLA, Article 141 (Oct. 2018), 29 pages. https://doi.org/10.1145/3276511
Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis. 2018b. Scalability-first Pointer Analysis with Self-tuning Context-
sensitivity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, New York, NY, USA,
129ś140. https://doi.org/10.1145/3236024.3236041
Percy Liang and Mayur Naik. 2011. Scaling Abstraction Refinement via Pruning. In Proceedings of the 32Nd ACM SIGPLAN
Conference on Programming Language Design and Implementation (San Jose, California, USA) (PLDI ’11). ACM, New York,
NY, USA, 590ś601. https://doi.org/10.1145/1993498.1993567
Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning Minimal Abstractions. In Proceedings of the 38th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) (POPL ’11). ACM, New York,
NY, USA, 31ś42. https://doi.org/10.1145/1926385.1926391
V. Benjamin Livshits and Monica S. Lam. 2003. Tracking Pointers with Path and Context Sensitivity for Bug Detection in
C Programs. In Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT
International Symposium on Foundations of Software Engineering (Helsinki, Finland) (ESEC/FSE-11). ACM, New York, NY,
USA, 317ś326. https://doi.org/10.1145/940071.940114
Jingbo Lu and Jingling Xue. 2019. Precision-Preserving yet Fast Object-Sensitive Pointer Analysis with Partial Context
Sensitivity. Proc. ACM Program. Lang. 3, OOPSLA, Article 148 (Oct. 2019), 29 pages. https://doi.org/10.1145/3360574
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-effect
Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis
(Roma, Italy) (ISSTA ’02). ACM, New York, NY, USA, 1ś11. https://doi.org/10.1145/566172.566174
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized Object Sensitivity for Points-to Analysis for
Java. ACM Trans. Softw. Eng. Methodol. 14, 1 (Jan. 2005), 1ś41. https://doi.org/10.1145/1044834.1044835
Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Proceedings of the 27th ACM
SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, Ontario, Canada) (PLDI ’06). ACM,
New York, NY, USA, 308ś319. https://doi.org/10.1145/1133981.1134018
Mayur Naik, Chang-Seo Park, Koushik Sen, and David Gay. 2009. Effective Static Deadlock Detection. In Proceedings of the
31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 386ś396.
https://doi.org/10.1109/ICSE.2009.5070538
Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014. Selective Context-sensitivity Guided
by Impact Pre-analysis. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and
Implementation (Edinburgh, United Kingdom) (PLDI ’14). ACM, New York, NY, USA, 475ś484. https://doi.org/10.1145/
2594291.2594318
Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a Strategy for Adapting a Program Analysis via Bayesian
Optimisation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming,
Systems, Languages, and Applications (Pittsburgh, PA, USA) (OOPSLA 2015). ACM, New York, NY, USA, 572ś588. https:
//doi.org/10.1145/2814270.2814309
Gagandeep Singh, Markus Püschel, and Martin Vechev. 2018. Fast Numerical Program Analysis with Reinforcement Learning.
In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham,
211ś229.
Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis. Foundations and Trends in Programming Languages 2,
1 (2015), 1ś69. https://doi.org/10.1561/2500000014
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity.
In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin,
Texas, USA) (POPL ’11). ACM, New York, NY, USA, 17ś30. https://doi.org/10.1145/1926385.1926390
Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective Analysis: Context-sensitivity, Across
the Board. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
(Edinburgh, United Kingdom) (PLDI ’14). ACM, New York, NY, USA, 485ś495. https://doi.org/10.1145/2594291.2594320
SPEC SPECjvm98. 1999. Release 1.03. Standard Performance Evaluation Corporation (1999).
Y. Sui, D. Ye, and J. Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Transactions
on Software Engineering 40, 2 (Feb 2014), 107ś122. https://doi.org/10.1109/TSE.2014.2302311
Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting. In
Static Analysis, Xavier Rival (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 489ś510.
Tian Tan, Yue Li, and Jingling Xue. 2017. Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent
Automata. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.
179:30 Minseok Jeon, Myungho Lee, and Hakjoo Oh
(Barcelona, Spain) (PLDI 2017). ACM, New York, NY, USA, 278ś291. https://doi.org/10.1145/3062341.3062360
Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. 2009. TAJ: Effective Taint Analysis of Web
Applications. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation
(Dublin, Ireland) (PLDI ’09). ACM, New York, NY, USA, 87ś97. https://doi.org/10.1145/1542476.1542486
Guoqing Xu and Atanas Rountev. 2008. Merging Equivalent Contexts for Scalable Heap-cloning-based Context-sensitive
Points-to Analysis. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (Seattle, WA,
USA) (ISSTA ’08). ACM, New York, NY, USA, 225ś236. https://doi.org/10.1145/1390630.1390658
Xuezheng Xu, Yulei Sui, Hua Yan, and Jingling Xue. 2019. VFix: Value-Flow-Guided Precise Program Repair for Null Pointer
Dereferences. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada)
(ICSE ’19). IEEE Press, 512ś523. https://doi.org/10.1109/ICSE.2019.00063
Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2017. Machine-Learning-Guided Typestate Analysis for Static Use-
After-Free Detection. In Proceedings of the 33rd Annual Computer Security Applications Conference (Orlando, FL, USA)
(ACSAC 2017). ACM, New York, NY, USA, 42ś54. https://doi.org/10.1145/3134600.3134620
Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On Abstraction Refinement for Program Anal-
yses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
(Edinburgh, United Kingdom) (PLDI ’14). ACM, New York, NY, USA, 239ś248. https://doi.org/10.1145/2594291.2594327
Xin Zhang, Mayur Naik, and Hongseok Yang. 2013. Finding Optimum Abstractions in Parametric Dataflow Analysis.
In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle,
Washington, USA) (PLDI ’13). Association for Computing Machinery, New York, NY, USA, 365ś376. https://doi.org/10.
1145/2491956.2462185
Proc. ACM Program. Lang., Vol. 4, No. OOPSLA, Article 179. Publication date: November 2020.