0% found this document useful (0 votes)
57 views11 pages

Sparse Flow-Sensitive Pointer Analysis For Multithreaded Programs

This document summarizes a new sparse flow-sensitive pointer analysis technique called FSAM that improves the scalability of pointer analysis for large multithreaded C programs. FSAM performs sparse analysis across a series of thread interference analysis phases to handle the exponential number of thread interleavings without considering all possible interleavings. The technique is evaluated on 10 multithreaded programs and is shown to be 12x faster and use 28x less memory than traditional flow-sensitive pointer analysis, enabling analysis of programs that were previously too large to analyze in a reasonable amount of time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views11 pages

Sparse Flow-Sensitive Pointer Analysis For Multithreaded Programs

This document summarizes a new sparse flow-sensitive pointer analysis technique called FSAM that improves the scalability of pointer analysis for large multithreaded C programs. FSAM performs sparse analysis across a series of thread interference analysis phases to handle the exponential number of thread interleavings without considering all possible interleavings. The technique is evaluated on 10 multithreaded programs and is shown to be 12x faster and use 28x less memory than traditional flow-sensitive pointer analysis, enabling analysis of programs that were previously too large to analyze in a reasonable amount of time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Sparse Flow-Sensitive Pointer Analysis

t if act
for Multithreaded Programs Ar en
t * Co m p let e
*

A ECDo
*
st
C GO *

We
* Co n si

ll
*
cu m
se
eu

Ev
e
R nt
ed
* Easy t o

*
alu d
Yulei Sui, Peng Di, and Jingling Xue at e

UNSW Australia

Abstract for pointer analysis, since shared memory locations can be


For C programs, flow-sensitivity is important to enable accessed non-deterministically by concurrent threads.
pointer analysis to achieve highly usable precision. Despite Pointer analysis is a fundamental static analysis, on which
significant recent advances in scaling flow-sensitive pointer many other analyses/optimizations are built. The more pre-
analysis sparsely for sequential C programs, relatively little cisely a pointer is resolved, the more effective the pointer
progress has been made for multithreaded C programs. analysis will likely be. By improving its precision and scal-
In this paper, we present FSAM, a new Flow-Sensitive ability for multithreaded C programs, we can directly im-
pointer Analysis that achieves its scalability for large prove the effectiveness of many clients, including data race
Multithreaded C programs by performing sparse analysis detection [24], deadlock detection [30], compiler optimiza-
on top of a series of thread interference analysis phases. tion reuse [14], control-flow integrity enforcement [8], mem-
We evaluate FSAM with 10 multithreaded C programs ory safety verification [20], and memory leak detection [28].
(with more than 100K lines of code for the largest) from For such client applications operating on C programs,
Phoenix-2.0, Parsec-3.0 and open-source applica- pointer analysis needs to be flow-sensitive (by respecting
tions. For two programs, raytrace and x264, the tradi- control flow) in order to achieve highly usable precision.
tional data-flow-based flow-sensitive pointer analysis is un- There have been significant recent advances in applying
scalable (under two hours) but our analysis spends just under sparse analysis to scale flow-sensitive pointer analysis for
5 minutes on raytrace and 9 minutes on x264. For the sequential C programs [10, 11, 21, 29, 32, 33]. However,
rest, our analysis is 12x faster and uses 28x less memory. applying them directly to their multithreaded C programs
using Pthreads will lead to unsound (imprecise) results if
Categories and Subject Descriptors F.3.2 [Semantics of thread interference on shared memory locations is ignored
Programming Languages]: Program Analysis (grossly over-approximated). In the case of pointer analy-
sis for OO languages like Java, context-sensitivity instead of
General Terms Algorithms, Languages, Performance flow-sensitivity is generally regarded as essential in improv-
Keywords Pointer Analysis, Sparse Analysis, Flow-Sensitivity ing precision [19, 27, 31]. So far, relatively little progress
has been made in improving the scalability of flow-sensitive
1. Introduction pointer analysis for multithreaded C programs. Below we
describe some challenges and insights for tackling this prob-
C, together with its OO incarnation C++, is the de facto lem and introduce a sparse approach for solving it efficiently.
standard for implementing system software (e.g., operating
systems and language runtimes), server and client applica- 1.1 Challenges and Insights
tions. A substantial number of these applications are multi- One challenge lies in dealing with an unbounded number of
threaded in order to better utilize multicore computing re- thread interleavings. Two threads interfere with each other
sources. However, multithreading poses a major challenge when one writes into a memory location that may be ac-
cessed by the other. In Figure 1(a), c “ ˚p can load the
points-to values from x that are stored into by ˚p “ r in the
Permission to make digital or hard copies of all or part of this work for personal or same (main) thread or ˚p “ q in a parallel thread t. As a
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation result, the points-to set of c is ptpcq “ ty, zu.
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, In addition, computing sound (i.e., over-approximate)
to post on servers or to redistribute to lists, requires prior specific permission and/or a points-to sets flow-sensitively relies on a so-called may-
fee. Request permissions from [email protected].
happen-in-parallel (MHP) analysis to discover parallel code
CGO’16, March 12–18, 2016, Barcelona, Spain
c 2016 ACM. 978-1-4503-3778-6/16/03...$15.00 regions. Unlike structured languages such as Cilk [25] and
http://dx.doi.org/10.1145/2854038.2854043 X10 [1], which provide high-level concurrency constructs

160
p = &x; q = &y; p = &x; q = &y; p = &x; q = &y; p = &x; q = &y; p = &x; q = &y;
r = &z; r = &z; r = &z; r = &z; x = &a; r = &z; u = &v;
void main(){ void main(){ void main(){ void main(){ void main(){
fork(t,foo); fork(t1,foo); *p = r; fork(t,foo); *p = r;
*p = r; join(t1); fork(t,foo); c = *p; fork(t,foo);
c = *p; *p = r; join(t); }
lock(l1); c = *p; unlock(l1);
} } c = *p;
void foo(){ } void foo(){ }
void foo(){ fork(t2,bar); *p = q; void foo(){
*x = r; lock(l2);
*p = q; } void foo(){ }
} void bar(){ *p = q; *p = u; *p = q;
*p = q; } unlock(l2);
c = *p; }
} //l1 and l2 point to same lock
pt(c) = {y, z} pt(c) = {y, z} pt(c) = {y} pt(c) = {y} pt(c) = {y, z}
(a) Interleaving (b) Soundness (c) Precision (d) Data-flow (e) Sparsity

Figure 1: Examples for illustrating some challenges faced by flow-sensitive pointer analysis for multithreaded C programs
(with irrelevant code elided). For brevity, fork() and join() represent pthread create() and pthread join() in the Pthreads API.

Pre-analysis
fork/join memory accesses lock/unlock thread-oblivious def-use

Interleaving MHP aliased Sparse


Value-flow thread-aware flow-sensitive
Lock analysis flow-sensitive
analysis pairs analysis pairs def-use points-to
resolution

Figure 2: FSAM: a sparse flow-sensitive pointer analysis framework for multithreaded C programs.

with restricted parallelism patterns, unstructured and low- order to accommodate the side-effects of all parallel threads.
level constructs in the Pthreads API allow programmers to Blindly propagating the points-to information this way under
express richer parallelism patterns. However, such flexible all thread interleavings is inefficient in both time and space.
non-lexically-scoped parallelism significantly complicates In Figure 1(d), c “ ˚p in the main thread can interleave with
MHP analysis. For example, a thread may outlive its spawn- ˚p “ q and ˚x “ r in thread t. However, propagating the
ing thread or can be joined partially along some program points-to information generated at ˚x “ r to c “ ˚p is not
paths or indirectly in one of its child threads. In Figure 1(b), necessary, since ˚p and ˚x are not aliases. So ptpcq “ tyu.
thread t2 executes independently of its spawning thread t1 Finally, how do we improve scalability by propagating
and will stay alive even after t1 has been joined by the main points-to facts along only a set of pre-computed def-use
thread. Thus, ˚p “ r executed in the main thread may inter- chains sparsely? It turns out that this pre-computation is
leave with the two statements ˚p “ q and c “ ˚p in bar() much more challenging in the multithreaded setting than the
executed by t2. A sound points-to set for c is ptpcq “ ty, zu. sequential setting [10]. Imprecise handling of synchroniza-
How to maintain precision can also be challenging. Syn- tion statements (e.g., fork/join and lock/unlock) may lead
chronization statements (e.g., fork/join and lock/unlock) to spurious def-use chains, reducing both the scalability and
must be well tracked to reduce spurious interleavings among precision of the subsequent sparse analysis. In Figure 1(e),
non-parallel statements. In Figure 1(c), ˚p “ r, ˚p “ q, and ptpcq “ ty, zu, if l1 and l2 are must aliases pointing to
c “ ˚p are always executed serially in that order. By per- the same lock. However, if a pre-computed def-use edge is
forming a strong update at ˚p “ q with respect to thread added from ˚u “ v to c “ ˚p, then following this spurious
ordering, we can discover that c points to y stored in x by edge makes the analysis not only less efficient but also less
˚p “ q (not z stored in x at ˚p “ r, since x has been precise by concluding that ptpcq “ ty, z, vu is possible.
strongly updated with &y, killing &z). Thus, ptpcq “ tyu.
1.2 Our Solution
How do we scale flow-sensitive pointer analysis for large
multithreaded C programs? One option is to adopt a data- In this paper, we present FSAM, a new Flow-Sensitive
flow analysis to propagate iteratively the points-to facts pointer Analysis for handling large Multithreaded C pro-
generated at a statement s to every other statement s1 that grams (using Pthreads). We address the afore-mentioned
is either reachable along the control flow or may-happen- challenges by performing sparse analysis along the def-
in-parallel with s, without knowing whether the facts are use chains precomputed by a pre-analysis and a series of
needed at s1 or not. This traditional approach computes and thread interference analysis phases, as illustrated in Fig-
maintains a separate points-to graph at each program point in ure 2. To bootstrap the sparse analysis, a pre-analysis (by
applying Andersen’s pointer analysis algorithm [2]) is first

161
performed flow- and context-insensitively to discover over- p = &a;
approximately the points-to information in the program. p = &a; t1 = &b;
a = &b; *p = t1;
Based on the pre-analysis, some thread-oblivious def-
use edges are identified. Then thread interleavings are an- q = &c; q = &c;
alyzed to discover all the missing thread-sensitive def-use *p = *q; t2 = *q;
edges. Our interleaving analysis reasons about fork and join *p = t2;
operations flow- and context-sensitively to discover may- (a) C code (b) Partial SSA
happen-in-parallel (MHP) statement pairs. Our value-flow
analysis adds the thread-aware def-use edges for MHP state- Figure 3: A C code fragment and its partial SSA form.
ment pairs with common value flows to produce so-called
aliased pairs. Our lock analysis analyzes lock/unlock opera-
tions flow- and context-sensitively to identify those interfer- or global variable with its address taken or a dynamically
ing aliased pairs based on the happen-before relations estab- created abstract heap object (at, e.g., a malloc() site).
lished among their corresponding mutex regions. Figure 3 shows a code fragment and its corresponding
Finally, a sparse flow-sensitive pointer analysis algorithm partial SSA form, where p, q, t1, t2 P T and a, b, c P A.
is applied by propagating the points-to facts sparsely along Note that a is indirectly accessed at a store ˚p “ t1 by
the pre-computed def-use chains, rather than along all pro- introducing a top-level pointer t1 in the partial SSA form.
gram points with respect to the program’s control flow. The complex statements like ˚p “ ˚q are decomposed into
This paper makes the following contributions: the basic ones by introducing a top-level pointer t2.

2.2 Sparse Flow-Sensitive Pointer Analysis For


• We present the first sparse flow-sensitive pointer analysis
Sequential C Programs
for unstructured multithreaded C programs.
The traditional data-flow-based flow-sensitive pointer anal-
• We describe several techniques (including thread inter-
ysis computes and maintains points-to information at every
ference analyses) for pre-computing def-use information
program point with respect to the program’s control flow.
so that it is sufficiently accurate in bootstrapping sparse
This is costly as it propagates points-to information blindly
flow-sensitive analysis for multithreaded C programs.
from each node in the CFG of the program to its successors
• We show that FSAM (implemented in LLVM (3.5.0)) is without knowing if the information will be used there or not.
superior over the traditional data-flow analysis, denoted To address the scalability issue in analyzing large sequen-
N ON S PARSE, in terms of scalability on 10 multithreaded tial C programs, sparse analysis [10] is proposed by stag-
C programs from Phoenix-2.0, Parsec-3.0 and ing the pointer analysis: the def-use chains in a program
open-source applications. For two programs, raytrace are first approximated by applying a fast but imprecise pre-
and x264, N ON S PARSE is unscalable (under two hours) analysis (e.g., Andersen’s analysis) and the precise flow-
but FSAM spends just under 5 minutes on raytrace sensitive analysis is conducted next by propagating points-to
and 9 minutes on x264. For the remaining programs, facts only along the pre-computed def-use chains sparsely.
FSAM is 12x faster and uses 28x less memory. The core representation of sparse analysis is a def-use
graph, where a node represents a statement and an edge
v
between two nodes e.g., s1 ãÝÑ s2 represents a def-use
2. Background
relation for a variable v P V, with its def at statement s1 and
We introduce the partial SSA form used for representing a C its use at statement s2 . This representation is sparse since the
program and sparse pointer analysis in the sequential setting. intermediate program points between s1 and s2 are omitted.
In partial SSA form, the uses of any top-level pointer have
2.1 Partial SSA Form a unique definition (with φ functions inserted at confluence
t
A program is represented by putting it into LLVM’s partial points as is standard). A def-use s1 ãÝÑ s2 , where t P T , can
SSA form, following [10, 17, 18, 32]. The set of all program be found easily without requiring pointer analysis.
variables V are separated into two subsets: A containing all As address-taken variables are not (yet) in SSA form,
possible targets, i.e., address-taken variables of a pointer their indirect uses at loads may be defined indirectly at
and T containing all top-level variables, where V “ T Y A. multiple stores. Their def-use chains are built in several steps
After the SSA conversion, a program is represented by following [10], as illustrated in Figure 4. We go through a
five types of statements: p “ &a (A DDRO F), p “ q (C OPY), sequence of steps needed in building the def-use chains for
p “ ˚q (L OAD), ˚p “ q (S TORE), and p “ φpq, rq (P HI), a P A. The def-use chains for b P A are built similarly.
where p, q, r P T and a P A. Top-level variables are put First, indirect defs and uses (i.e., may-defs and may-
directly in SSA form, while address-taken variables are only uses) are exposed at loads and stores, based on the points-to
accessed indirectly via L OAD or S TORE. For an A DDRO F information obtained during the pre-analysis (Figure 4(a)).
statement p “ &a, known as an allocation site, a is a stack A load, e.g., s “ ˚r is annotated with a function µpaq, where

162
Pre-computed s1: *p = q; s1: *p = q; s1:*p = q;
Points-to a = !(a) a1 = !(a0) a = !(a) pc,f ki q pc1 ,f ki1 q
pt(p) = {a,b}
b = !(b) b1 = !(b0) b = !(b) [b] t ùùùùñ t1 t1 ùùùùùñ t2
[T-FORK] pc,f ki q
pt(w) = {b} "(b) "(b1)
pt(x) = {a} [a] "(a) t ùùùùñ t2
pt(r) = {a}
s2: v = *w; s2: v = *w; s2:v = *w; pc,jni q pc1 ,jni1 q
t ðùùùù t1 t1 ðùùùùù t2
s3: *x = y; s3: *x = y; s3:*x = y; [T-JOIN] full
a = !(a) a2 = !(a1) b = !(b)
pc,jni q
2
t ðùùùù t
[a]
"(a) "(a2) "(b) pc,f ki q pc1 ,f ki1 q
s4: s = *r; s4: s = *r; s4:s = *r; t2 ùùùùñ t t2 ùùùùùñ t1 (i ‰ i1 _c ‰ c1 )
[T-SIBLING]
(a) annotated !/" (b) !/"$after SSA$ (c) Def-use graph t ’ t1

Figure 4: A sparse def-use graph. Figure 5: Static modeling of fork and join operations.

a P A may be pointed to by r to represent a potential use of a runtime thread unless t is multi-forked, in which case, t may
at the load. Similarly, a store, e.g., ˚x “ y is annotated with represent more than one runtime thread.
a function a “ χpaq to represent a potential def and use of
a at the store. If a can be strongly updated, then a receives Definition 1 (Multi-Forked Threads). A thread t P M is a
whatever y points to and the old contents in a are killed. multi-forked thread if its fork site, say, f ki resides in a loop,
Otherwise, a must also incorporate its old contents, resulting a recursion cycle, or its spawner thread t1 P M.
in a weak update to a. Third, each address-taken variable, Intra-Thread CFG For an abstract thread t, its intra-thread
e.g., a is converted into SSA form (Figure 4(b)), with each control flow graph, ICFGt , is constructed as in [15], where
µpaq treated as a use of a. and each a “ χpaq as both a def a node s represents a program statement and an edge from
and use of a. Finally, an indirect def-use chain of a is added s1 to s2 signifies a possible transfer of control from s1 to
from a definition of a identified as an (version n) at a store to s2 . For convenience, a call site is split into a call node and
its uses at a store or a load, resulting in two indirect def-use a return node. Three kinds of edges are distinguished: (1) an
a a
edges of a i.e. s1 ãÝÑ s3 and s3 ãÝÑ s4 (Figure 4(c)). Any intra-procedural control flow edge s Ñ s1 from node s to
φ function introduced for an address-taken variable a during calli
its successor s1 , (2) an interprocedural call edge s ÝÝÝÑ s1
the SSA conversion will be ignored as a is not versioned. 1
from a call node s to the entry node s of a callee at callsite
Every callsite is also annotated with µ and χ functions reti
to expose its indirect uses and defs. As is standard, passing i, and (3) an interprocedural return edge s ÝÝÑ s1 from an
1
arguments into and returning results from functions are mod- exit node s of a callee to the return node s at callsite i.
eled by copies. So the def-use chains across the procedural There are no outgoing edges for a fork or join site. Func-
boundaries are added similarly. For details, we refer to [10]. tion pointers are resolved by pre-analysis.
Once the def-use chains are in place for the program, Modeling Thread Forks and Joins Figure 5 gives three
flow-sensitive pointer analysis can be performed sparsely, rules for modeling fork and join operations statically. We
i.e., by propagating points-to information only along these pc,f ki q
pre-computed def-use edges. For example, the points-to sets write t ùùùùñ t1 to represent the spawning relation that a
of a computed at s1 are propagated to s3 with s2 bypassed, spawner thread t creates a spawnee thread t1 at a context-
resulting in significant savings both time and memory. sensitive fork site pc, f ki q, where c is a context stack repre-
sented by a sequence of callsites, [cs0 , ¨ ¨ ¨ , csn ], from the
3. The FSAM Approach entry of the main function to the fork site f ki . Note that
the callsites inside each strongly-connected cycle in the call
We first describe a static thread model used for handling graph of the program are analyzed context-insensitively.
fork and join operations (Section 3.1). We then introduce For a thread t forked at pc, f ki q, we write St to stand
our FSAM framework (Figure 2), focusing on how to pre- for its start procedure, where the execution of t begins.
compute def-use chains (Sections 3.2 and 3.3) and dis- EntrypSt q “ pc1 , sq maps St to its first statement pc1 , sq,
cussing thereafter on how to perform the subsequent sparse where c1 “ c.pushpiq, context-sensitively.
analysis for multithreaded C programs (Section 3.4). Consider the three rules in Figure 5. The spawning rela-
pc,f ki q
3.1 Static Thread Model tion t ùùùùñ t1 is transitive, representing the fact that t can
Abstract Threads A program starts its execution from its create t1 directly or indirectly at a fork site f ki ([T-FORK]).
main function in the main (root) thread. An abstract thread We will handle only the join operations identified by
t refers to a call of pthread create() at a context- [T-JOIN] and ignore the rest in the program. The joining
pc,jni q
sensitive fork site during the analysis. Thus, a thread t al- relation t ðùùùù t1 indicates that a spawnee t1 is joined by
ways refers to a context-sensitive fork site, i.e., a unique its spawner t at a join site pc, jni q As our pre-analysis is

163
void main() {
... t0 t1 t0 t1 t0 t1
s1 : *p = ...; s1 o s1 o s1 o

f k1 : fork(t1 , foo); o s4 o
o s4 o
o s4
s2 : *p = ...; void foo() { s2 o s2 o s2 o

jn1 : join(t1 ); s4 : ˚q “ ...


o
s5 o
s5 o
s5
s3 : ... = *p; s5 : ¨ ¨ ¨ “ ˚q s3 s3 s3 o

} }
(a) Program P (b) Def-use for Pseq (c) Fork-related def-use (d) Join-related def-use

Figure 6: Thread-oblivious def-use edges (where p and q are found to point to o during the pre-analysis).

flow- and context-insensitive, we achieve soundness by re- procedure, say, fun, in Sf k may be executed nondeterminis-
quiring t1 joined at a join site pthread join() in the pro- tically later. This means that any value flow added in Step 1
gram to be excluded from M, so that t1 represents a unique that go through f un can also bypass it altogether. For ev-
runtime thread (under all contexts). Note that the joining re- ery such a def-use chain that starts at a statement s before
lation is not transitive in the same sense as the spawning rela- the callsite for f un and ends at a statement s1 after, we add a
tion. In Pthreads programs, a thread can be joined fully along def-use edge from s to s1 . (Technically, a weak update is per-
all program paths or partially along some but not all paths. formed for every a “ χpaq function associated with the call-
pc,jni q pc1 ,jni1 q pc,jni q site for f un, so that the old contents of a prior to the call are
Given t ðùùùù t1 and t1 ðùùùùù t2 , t ðùùùù t2 holds when
pc1 ,jni1 q pc1 ,jni1 q preserved.) In our example Figure 6(b) becomes Figure 6(c)
t1 ðùùùùù t2 is a full join, denoted t1 ðùùùùù t2 . o
with the fork-related def-use edge s1 ãÝÑ s2 being added.
full
pc,f ki q pc1 ,f ki1 q In Step 3, we deal with every direct join operation han-
If neither t ùùùùñ t nor t ùùùùùñ t holds, then t and t1
1 1
dled by our static thread model ([T-JOIN]). Let join(t1 q be
are siblings, denoted t ’ t1 ([T-SIBLING]). In this case,
a candidate join site executed in the spawner thread t, which
t and t1 , where t ‰ t1 , share a common ancestor thread
implies, as discussed in Section 3.1, that t1 is a unique run-
t2 . Furthermore, t and t1 do not happen-in-parallel if one
time thread to be joined. Let fun1 be the start procedure of
happens before the other (as defined below).
t1 . In one possible thread interleaving, this join statement
Definition 2 (Happens-Before (HB) Relation for Sibling plays a similar role as an exception-catching statement for
Threads). Given two sibling threads t and t1 , t happens an exception thrown at the end of f un1 . Given this implicit
before t1 , denoted t ą t1 , if the fork site of t1 is backward control flow, we need to make the modification side effects
reachable to a join site of t along every program path. of fun1 visible at the join site. Let a P A be an address-
Presently, FSAM does not model other synchroniza- taken variable defined at the exit of f un1 . For the first use
tion constructs such as barriers and signal/wait, resulting of a reachable from the join site along every program path
in sound, i.e., over-approximate results. in ICFGt , we add a def-use edge from that definition to the
use. In our example, Figure 6(c) becomes Figure 6(d) with
o
3.2 Computing Thread-Oblivious Def-Use Chains the join-related def-use edge s4 ãÝÑ s3 added.
Given a multithreaded C program P , we transform P into
3.3 Computing Thread-Aware Def-Use Chains
a sequential version Pseq , representing one possible thread
interleaving of P . We then derive the def-use chains from For a program P , we must also discover the def-use chains
Pseq as the thread-oblivious def-use chains for P , based on formed by all the other thread interleavings except Pseq .
the points-to information obtained during the pre-analysis. Such def-use chains are thread-aware and computed with the
There are three steps, illustrated in Figure 6. three thread interference analyses incorporated in FSAM.
In Step 1, we transform P into Pseq by replacing every
fork statement f k in P by calls to the start procedures of all 3.3.1 Interleaving Analysis
the threads spawned at f k. Let Sf k be the set of these start As shown in Figure 2, FSAM invokes this as the first of the
procedures. We keep the join operations that we can han- three interference analyses to compute thread-aware def-use
dle by [T-JOIN] and ignore the rest. We then follow [10] chains. The objective here is to reason about fork and join
to compute its def-use chains for Pseq , as discussed in Sec- operations to identify all MHP statements in the program.
tion 2.2. Given P in Figure 6(a), where Sf k1 “ tfoou, we Our interleaving analysis operates flow- and context-
obtain its Pseq by replacing fork(t,foo) with a call to sensitively on the ICFGs of all the threads (but uses points-
foo(). The def-use chains for Pseq are given in Figure 6(b). to information from the pre-analysis). For a statement s in
In Step 2, we add the fork-related missing def-use edges thread t’s ICFGt , our analysis approximates which threads
at a fork site f k by assuming Sf k “ H, since every start may run in parallel with t when s is executed, denoted as

164
pc,f ki q
t ùùùùñ t1 pt, c, f ki q Ñ pt, c, sq pc1 , s1 q “ EntrypSt1 q
[I-DESCENDANT]
tt u Ď Ipt, c, sq ttu Ď Ipt1 , c1 , s1 q
1

t ’ t1 pc, sq “ EntrypSt q pc1 , s1 q “ EntrypSt1 q t č t1 ^ t1 č t


[I-SIBLING]
ttu Ď Ipt1 , c1 , s1 q tt1 u Ď Ipt, c, sq
pc,jni q call
i
t ðùùùù t1 pt, c, sq ÝÝÝÑ pt, c1 , s1 q c1 “ c.pushpiq
[I-JOIN] [I-CALL]
Ipt, c, jni q “ Ipt, c, jni qztt1 u Ipt, c, sq Ď Ipt, c1 , s1 q

ret
pt, c, sq Ñ pt, c, s1 q i
pt, c, sq ÝÝÑ pt, c1 , s1 q i “ c.peekpq c1 “ c.poppq
[I-INTRA] [I-RET]
Ipt, c, sq Ď Ipt, c, s1 q Ipt, c, sq Ď Ipt, c1 , s1 q

Figure 7: Interleaving analysis (where Ñ denotes a control flow edge in a thread’s ICFG introduced Section 3.1).

main() { foo1(){
Fork: Join: Ipt0 , rs, s1 q =H pt0 , rs, s2 q k pt3 , r1, 3s, s5 q
s1 ; f k3 :fork(t3 , bar);
prs,f k1 q prs,jn1 q
f k1 : fork(t1 , foo1); jn3 :join(t3 ); t0 ùùùùñ t1 t0 ðùùùù t1 Ipt0 , rs, s2 q = tt1 , t3 u pt0 , rs, s3 q k pt2 , r2, 4s, s5 q
} pr1s,f k3 q pr1s,jn3 q
s2 ; t1 ùùùùùñ t3 t1 ðùùùùù t3
prs,f k1 q Ipt0 , rs, s3 q = tt2 u pt0 , rs, s3 q k pt2 , r2s, s4 q
jn1 : join(t1 ); foo2(){ t0 ùùùùñ t3
prs,jn1 q
t0 ðùùùù t3
cs4 : bar(); prs,f k2 q prs,jn2 q Ipt2 , r2s, s4 q = tt0 u
s4 ; t0 ùùùùñ t2 t0 ðùùùù t2
f k2 : fork(t2 , foo2);
} Ipt3 , r1, 3s, s5 q = tt0 , t1 u
s3 ;
jn2 : join(t2 ); Sibling: HB:
bar(){ t1 ’ t2 t1 ą t2 Ipt2 , r2, 4s, s5 q = tt0 u
s5 ; t3 ’ t2 t3 ą t2
} }
(a) Program (b) Thread relations (c) Thread interleavings (d) MHP pairs

Figure 8: An illustrating example for interleaving analysis (with t0 denoting the main thread).

Ipt, c, sq, where c is a calling context to capture one in- Given two sibling threads t and t1 , the entry statements
stance of s when its enclosing method is invoked under c. pc, sq and pc1 , s1 q of their start procedures may interleave
For example, if Ipt1 , c, sq “ tt2 , t3 u, then threads t2 and t3 with each other if neither t ą t1 nor t1 ą t ([I-SIBLING]).
may be alive when s1 is executed under context c in t1 . [I-JOIN] represents the fact that a descendent thread
Statement s1 in thread t1 may happen in parallel with will no longer be alive after it has been joined at a join site.
statement s2 in thread t2 , denoted as pt1 , c1 , s1 q k pt2 , c2 , s2 q, For a thread t, [I-CALL] and [I-RET] ([I-INTRA])
if the#following holds (with M from Definition 1): propagate data-flow facts interprocedurally by matching
t2 P Ipt1 , c1 , s1 q ^ t1 P Ipt2 , c2 , s2 q if t1 ‰ t2 calls and returns context-sensitively (intraprocedurally).
t1 P M otherwise Example 1. We illustrate our interleaving analysis with a
pc,f ki q pc,jni q program in Figure 8. As shown in Figure 8(a), the main
Given ùùùùñ (spawning relation), ðùùùù (joining re- thread t0 creates two threads t1 and t2 at fork sites f k1 and
lation), ’ (thread sibling) and ą (HB from Definition 2), f k2 , respectively. In its start procedure foo1, t1 spawns an-
our interleaving analysis is formulated as a forward data- other thread t3 and fully joins it later at jn3 . Figure 8(b)
flow problem pV, [, F q (Figure 7). Here, V represents the shows all the thread relations. Note that t2 continues to ex-
set of all thread interleaving facts, [ is the meet operator ecute after its two sibling threads t1 and t3 have terminated
(Y), and F : V Ñ V represents the set of transfer functions due to jn1 , which joins t1 directly and t3 indirectly.
associated with each node in an ICFG. The results of applying the rules in Figure 7 are listed in
pc,f ki q
[I-DESCENDANT] handles thread creation t ùùùùñ t1 at Figure 8(c). Due to context-sensitivity, our analysis has iden-
a fork site pc, f ki q. The statement pc, sq that appears imme- tified precisely the three MHP relations given in Figure 8(d).
diately after pc, f ki q in ICFGt may-happen-in-parallel with As bar() is called under two contexts, s5 has two differ-
the entry statement pc1 , s1 q of the start procedure of thread t1 . ent instances pt3 , r1, 3s, s5 q and pt2 , r2, 4s, s5 q. The former

165
main() { Pre-computed aliases: Inter-thread value-flows:
... foo1() { foo2() { ASp˚p, ˚qq “ tou t1 t2
cs1 : bar(); s1 : *p = ...; lock(l2); Lock spans:
s1
o
f k2 : fork(t1 , foo1); lock(l1); cs4 : bar(); pt1 , r2s, s2 q P spl1
f k3 : fork(t2 , foo2); s2 : *p = ...; unlock(l2); pt1 , r2s, s3 q P spl1 o sp!1 sp!2
} s3 : *p = ...; } pt2 , r3, 4s, s4 q P spl2 o
s2 X
s4
bar() { unlock(l1); MHP relations:
s4 : ¨ ¨ ¨ “ ˚q } pt1 , r2s, s1 q k pt2 , r3, 4s, s4 q
} // p and q point to the same object o s3 o
pt1 , r2s, s2 q k pt2 , r3, 4s, s4 q
// l1 and l2 point to the same lock pt1 , r2s, s3 q k pt2 , r3, 4s, s4 q
o
Figure 9: A lock analysis example (with irrelevant code elided), avoiding s2 ãÝÑ s4 that would be added by [THREAD-VF].

one may-happen-in-parallel with pt0 , rs, s2 q and the later lease site pc1 , unlockpl1 qq in ICFGt obtained with a forward
one with pt0 , rs, s3 q. As our analysis is context-sensitive, reachability analysis with calls and returns being matched
pt0 , rs, s3 q k pt2 , r2s, s4 q but pt0 , rs, s2 q ∦ pt2 , r2s, s4 q. context-sensitively, where l and l1 points to the same single-
ton (i.e., runtime) lock object, denoted as l ” l1 .
3.3.2 Value-Flow Analysis
Just in the case of MHP analysis for fork/join operations,
Given a pair of MHP statements, we make use of the points-
context-sensitivity ensures that lock analysis can distinguish
to information discovered during the pre-analysis to add
different calling contexts under which a statement appears
the potential (thread-aware) def-use edges in between. In
inside a lock-release span. In Figure 9, bar() is called
partial SSA form, the top-level pointers in T are kept in
twice, but only the instance of statement pt2 , r3, 4s, s4 q
registers and thus thread-local. However, the address-taken
called from cs4 is inside the lock-release span spl2 .
variables in A can be accessed by concurrent threads via
loads and stores. It is only necessary to consider inter- Definition 4 (Span Head). For an object o P A, HDpspl , oq
thread value-flows for MHP store-load and store-store pairs represents a set of context-sensitive loads or stores that
rpt, c, sq, pt1 , c1 , s1 q, where s is a store ˚p “ . . . and s1 is a may access o at the head of the span spl : HDpspl , oq “
o
load ¨ ¨ ¨ “ ˚q or a store ˚q “ ¨ ¨ ¨ . Hence, [THREAD-VF] tpt, c, sq P spl | E pt1 , c1 , s1 q P spl : s1 ãÝÑ su.
comes into play, where ASp˚p, ˚qq is the set of objects in V
pointed to by both p and q (due to pre-analysis). Definition 5 (Span Tail). For an object o P A, T Lpspl , oq
represents a set of context-sensitive stores that may access
s : ˚p “ s1 : “ ˚q or ˚ q “ o at the tail of the span spl : T Lpspl , oq “ tpt, c, sq P spl |
o
pt, c, sq k pt1 , c1 , s1 q o P ASp˚p, ˚qq s is a store, E pt1 , c1 , s1 q P spl : ps1 is a store ^ s ãÝÑ s1 qu.
[THREAD-VF] o
s ãÝÑ s1 Definition 6 (Non-Interference Lock Pairs). Let pt, c, sq k
pt1 , c1 , s1 q be a MHP statement pair, where s is a store, such
Example 2. For the program in Figure 6(a), we apply that both statements are protected by at least one common
[THREAD-VF] to add all the missing thread-aware def- lock, i.e., D l, l1 : pt, c, sq P spl ^ pt1 , c1 , s1 q P spl1 ùñ
use chains on top of Figure 6(d). According to pre-analysis, l ” l1 . We say that the pair is a non-interference lock pair if
o
ASp˚p, ˚qq “ tou. As pt0 , rs, s2 q k pt1 , r1s, s4 q, s2 ãÝÑ s4 pt, c, sq R T Lpspl , oq _ pt1 , c1 , s1 q R HDpspl1 , oq holds.
o
is added. As pt0 , rs, s2 q k pt1 , r1s, s5 q, s2 ãÝÑ s5 is added.
o
While pt1 , r1s, s4 q k pt0 , rs, s2 q, s4 ãÝÑ s2 has been added By refining [THREAD-VF] with Definition 6 being taken
earlier as a thread-oblivious def-use edge (Section 3.2). into account, some spurious value-flows are filtered out.
Example 3. In Figure 9, two lock-release spans spl1 and
3.3.3 Lock Analysis
spl2 are protected by a common lock, since ˚l1 and ˚l2
Statements from different mutex regions are interference- are found to be must aliases. By applying [THREAD-VF]
free if these regions are protected by a common lock. By cap- alone, all the three def-use edges in red will be added. By
turing lock correlations, we can avoid some spurious def-use Definition 6, however, s2 inside spl1 cannot interleave with
edges introduced by [THREAD-VF] in the two lock-release o
s4 inside spl2 . So s2 ãÝÑ s4 is spurious and can be ignored.
spans defined below. We do this by performing a flow- and
context-sensitive analysis for lock/unlock operations (based
on the points-to information from pre-analysis) 3.4 Sparse Analysis
Definition 3 (Lock-Release Spans). A lock-release span spl Once all the def-use chains have been built, the sparse flow-
at a context-sensitive lock site pt, c, lockplqq consists of the sensitive pointer analysis algorithm developed for sequential
statements starting from pc, lockplqq to the corresponding re- C programs [10], given in Figure 10, can be reused in the

166
q
s : p “ &o s : p “ q s1 ãÝÑ s Table 1: Program statistics.
[P-ADDR] [P-COPY]
tou Ď ptps, pq ptps1 , qq Ď ptps, pq
Benchmark Description LOC
q r word count Word counter based on map-reduce 6330
s : p “ φpq, rq s1 ãÝÑ s s2 ãÝÑ s kmeans Iterative clustering of 3-D points 6008
[P-PHI]
ptps1 , qq Ď ptps, pq ptps2 , rq Ď ptps, pq radiosity Graphics 12781
automount Manage autofs mount points 13170
q o
s : p “ ˚q s2 ãÝÑ s o P ptps2 , qq s1 ãÝÑ s ferret Content similarity search server 15735
[P-LOAD] bodytrack Body tracking of a person 19063
ptps1 , oq Ď ptps, pq
httpd server Http server 52616
p q mt daapd Multi-threaded DAAP Daemon 57102
s : ˚p “ q s2 ãÝÑ s o P ptps2 , pq s1 ãÝÑ s
[P-STORE] raytrace Real-time raytracing 84373
ptps1 , qq Ď ptps, oq x264 Media processing 113481
Total 380659
o
s : ˚p “ s1 ãÝÑ s o P Azkillps, pq
[P-SU/WU]
1
ptps , oq Ď ptps, oq 2.70GHz Intel Xeon Quad Core CPU with 64 GB memory,
$ running Ubuntu Linux (kernel version 3.11.0).
1
&to u
’ if ptps, pq “ to1 u ^ po1 P singletonsq The source code of each program is compiled into bit
killps, pq “ A else if ptps, pq “ ∅ code files using clang and then merged together using LLVM

%
∅ otherwise Gold Plugin at link time stage (LTO) to produce a whole-
program bc file. In addition, the compiler option mem2reg is
Figure 10: Sparse flow-sensitive pointer analysis. turned on to promote memory into registers.

// word_count-pthread.c
multithreaded setting. For a variable v, ptps, vq denotes its 140 for(i=0; i<num_procs; i++){
166 pthread_create(&tid[i], &attr,
points-to set computed immediately after statement s. wordcount_map, (void*)out) != 0);
The first five rules deal with the five types of statements 167 }
introduced in Section 2.1, by following the pre-computed 170 for (i = 0; i < num_procs; i++){
173 pthread_join(tid[i],
def-use chains ãÝÑ. The last enables a strong or weak update (void **)(void*)&ret_val) != 0);
at a store, whichever is appropriate, where singletons [17] 175 }
is the set of objects in A representing unique locations by ...
excluding heap, array, and local variables in recursion.
FSAM is sound since (1) its pre-analysis is sound, (2) Figure 11: A multi-forked example in word count.
the def-use chains constructed for the program (as described
in Sections 3.2 and 3.3) are over-approximate, and (3) the
sparse analysis given in Figure 10 is as precise as the tradi- 4.2 Implementation
tional iterative data-flow analysis [10]. We have implemented FSAM in LLVM (version 3.5.0). An-
dersen’s analysis (using the constraint resolution techniques
4. Evaluation from [23]) is used to perform its pre-analysis indicated in
Figure 2. In order to distinguish the concrete runtime threads
The objective is to show that our sparse flow-sensitive represented by an abstract multi-forked thread (Definition 1)
pointer analysis, FSAM, is significantly faster than while inside a loop, we use LLVM’s SCEV alias analysis to cor-
consuming less memory than the traditional data-flow-based relate a fork-join pair. Figure 11 shows a code snippet from
flow-sensitive pointer analysis, denoted N ON S PARSE, in an- word count, where a fixed number of threads are forked
alyzing large multithreaded C programs using Pthreads. and joined in two “symmetric” loops. FSAM can recog-
nize that any statement in a slave thread (with its start rou-
4.1 Experimental Setup tine wordcount map) does not happen in parallel with the
We have selected a set of 10 multithreaded C programs, in- statements after its join executed in the main thread.
cluding the two largest (word count and kmeans) from FSAM is field-sensitive. Each field of a struct is treated
Phoenix-2.0, the five largest (radiosity, ferret, as a separate object, but arrays are considered monolithic.
bodytrack, raytrace and x264) from Parsec-3.0, Positive weight cycles (PWCs) that arise from processing
and three open-source applications (automount, mt daapd fields are detected and collapsed [22]. The call graph of a
and httpd-server), as shown in Table 1. All our ex- program is constructed on-the-fly. Distinct allocation sites
periments were conducted on a platform consisting of a are modeled by distinct abstract objects [10, 32].

167
4.3 Methodology Table 2: Analysis time and memory usage.
We are not aware of any flow-sensitive pointer analysis for
multithreaded C programs with Pthreads in the literature Time (Secs) Memory (MB)
Program
or any publicly available implementation. RR [25] is clos- FSAM N ON S PARSE FSAM N ON S PARSE
est; it performs an iterative flow-sensitive data-flow-based word count 3.04 17.40 13.79 53.76
pointer analysis on structured parallel code regions in Clik kmeans 2.50 18.19 18.27 53.19
programs. However, C programs with Pthreads are unstruc- radiosity 6.77 29.29 38.65 95.00
tured, requiring MHP analysis to discover their parallel code automount 8.66 83.82 27.56 364.67
regions. PCG [14] is a recent MHP analysis for Pthreads that ferret 13.49 87.10 52.14 934.57
distinguishes whether two procedures may execute concur- bodytrack 128.80 2809.89 313.66 12410.16
rently. We have implemented RR also in LLVM (3.5.0) for httpd server 191.22 2079.43 55.78 6578.46
multithreaded C programs with their parallel regions discov- mt daapd 90.67 2667.55 37.92 3403.26
ered by PCG, denoted N ON S PARSE, as the base line. raytrace 284.61 OOT 135.06 OOT
To understand FSAM better, we also analyze the im- x264 531.55 OOT 129.58 OOT
pact of each of its phases on the performance of sparse
flow-sensitive points-to resolution. To do this, we measure
the slowdown of FSAM with each phase turned off indi- ysis is more beneficial than the other two in reducing spuri-
vidually: (1) No-Interleaving: with our interleaving analysis ous def-use edges passed to the final sparse analysis.
turned off but the results from PCG used instead, (2) No- Interleaving analysis is very useful for kmeans,
Value-Flow: with our value-flow analysis turned off (i.e., httpd-server and mt daapd in avoiding spurious
o P ASp˚p, ˚qq in [THREAD-VF] disregarded), and (3) No- MHP pairs. These programs adopt the master-slave pat-
Lock: with our lock analysis turned off. tern so that the slave threads perform their tasks in their
Note that some spurious def-use edges may be avoided by start procedures while the master thread handles some post-
more than one phase. Despite this, these three configurations processing task after having joined all the slave threads.
allow us to measure their relative performance impact. Precise handling of join operations is critical in avoiding
spurious MHP relations between the statements in the slave
threads and those after their join sites in the master thread.
4.4 Results and Analysis
Value-flow analysis is effective in reducing redundant
Table 2 gives the analysis times and memory usage of def-use edges among concurrent threads in most of the pro-
FSAM against N ON S PARSE. FSAM spends less than 22 grams evaluated. For automount, ferret and mt daapd,
minutes altogether in analyzing all the 10 programs (total- value-flow analysis has avoided adding over 80% (spurious)
ing 380KLOC). For the two largest programs, raytrace def-use edges. In these programs, the concurrent threads ma-
and x264, FSAM spends just under 5 and 9 minutes, re- nipulate not only global variables but also their local vari-
spectively, while N ON S PARSE fails to finish analyzing each ables frequently. Thus, value-flow-analysis can prevent the
under two hours. For the remaining 8 programs analyzable subsequent sparse analysis from propagating blindly a lot of
by both, FSAM is 12x faster and uses 28x less memory than points-to information for non-shared memory locations.
N ON S PARSE, on average. For the two programs with over Lock analysis is beneficial for programs such as
50KLOC, httpd server and mt daapd, FSAM is 11x automount and radiosity that have extensive usage
faster and uses 117x less memory for httpd server and of locks (with hundreds of lock-release spans) to protect
29x faster and uses 89x less memory for mt daapd. their critical code sections. In these program, some lock-
For small programs, such as word count and kmeans, release spans can cover many statements accessing glob-
FSAM yields little performance benefits over N ON S PARSE ally shared objects. Figure 13 gives a pair of lock-release
due to relatively few statements and simple thread synchro- spans with a common lock accessing the shared global’s
nizations used. For larger ones, which contain more pointers, task queue in two threads. The spurious def-use chains
loads/stores and complex thread synchronization primitives, from the write at line 457 in dequeue task to all the
FSAM has a more distinct advantage, with the best speedup statements accessing the shared task queue object in
39x observed at bodytrack and the best memory usage re- enqueue task are avoided by our analysis.
duction at httpd server. FSAM has achieved these bet-
ter results by propagating and maintaining significantly less 5. Related Work
points-to information than N ON S PARSE.
We discuss the related work on sparse flow-sensitive pointer
Figure 12 shows the relative impact of each of FSAM’s
analysis and pointer analysis for multithreaded programs.
three thread interference analysis phases on its analysis ef-
ficiency for the three configurations defined in Section 4.3. Sparse Flow-Sensitive Pointer Analysis Sparse analysis,
The performance impact of each phase varies considerably a recent improvement over the classic iterative data-flow
across the programs evaluated. On average, value-flow anal- approach, can achieve flow-sensitivity more efficiently by

168
No-Interleaving No-Value-Flow No-Lock 19.7x 13.9x 18.8x
8x
Slowdown over FSAM

6x

4x

2x

0x
word count kmeans radiosity automount ferret bodytrack httpd-server mt daapd raytrace x264

Figure 12: Impact of FSAM’s three thread interference analysis phases on its analysis efficiency.

// taskman.C lexically-scoped synchronization statements (e.g., fork/join


377 void enqueue_task(long qid, Task *task, and lock/unlock). For Java programs, a compositional ap-
long mode) { proach [26] analyzes pointer and escape information of vari-
382 tq = &global->task_queue[ qid ] ;
385 LOCK(tq->q_lock); ables in a method that may be escaped and accessed by
387 if( tq->tail == 0 ) // read other threads. The approach performs a flow-sensitive lock-
390 tq->tail = task ; // write
...... free analysis to analyze each method modularly but itera-
412 UNLOCK(tq->q_lock); tively without considering strong updates. The proposed ap-
413 } proach was evaluated on six small benchmarks (with up 18K
418 Task *dequeue_task(long qid, long max_visit,
long process_id){ lines of bytecode). To maintain scalability for large Java
443 tq = &global->task_queue[ qid ] ; programs, modern pointer analysis tools for Java embrace
449 LOCK(tq->q_lock);
..... context-sensitivity instead of flow-sensitivity [27, 31].
457 tq->tail = NULL ; // write However, flow-sensitivity is important to achieve preci-
470 tq->tail = prev ; // write sion required for C programs. To the best of our knowledge,
.....
475 UNLOCK(tq->q_lock); this paper presents the first sparse flow-sensitive pointer
494 } analysis for C programs using Pthreads. The prior analyses
on handling thread synchronizations are conservative, by ig-
Figure 13: Effectiveness of lock analysis for radiosity. noring locks [26] or joins [14] or dealing with only partial
and/or nested joins [3]. In contrast, FSAM models such syn-
chronization operations more accurately, by building on our
propagating points-to facts sparsely across pre-computed recent work on MHP analysis [7], to produce the first mul-
def-use chains [10, 11, 21]. Initially, sparsity was experi- tithreaded flow-sensitive points-to analysis that scales suc-
mented with in [12, 13] on a Sparse Evaluation Graph [4], cessfully to programs up to 100K lines of code.
a refined CFG with irrelevant nodes removed. On various
SSA form representations (e.g., factored SSA [5], HSSA [6]
and partial SSA [16]), further progress was made later. The
def-use chains for top-level pointers, once put in SSA form, 6. Conclusion
can be explicitly and precisely identified, giving rise to a We have designed and implemented FSAM, a new sparse
semi-sparse flow-sensitive analysis [11]. Recently, the idea flow-sensitive pointer analysis for multithreaded C programs
of staged analysis [9, 10] that uses pre-computed points-to and demonstrated its scalability over the traditional data-
information to bootstrap a later more precise analysis has flow approach. Some further details can be found in its ar-
been leveraged to make pointer analysis full-sparse for both tifact. In future work, we plan to evaluate the effectiveness
top-level and address-taken variables [10, 21, 29, 33]. of FSAM in helping some bug-detection tools in detecting
concurrency bugs such as data races and deadlocks in multi-
Pointer Analysis for Multithreaded Programs This has threaded C programs. We also plan to combine FSAM with
been an area that is not well studied and understood due some dynamic analysis tools such as Google’s ThreadSani-
to the challenges discussed in Section 1.1. Earlier, Rugina tizer to reduce their instrumentation overhead.
and Rinard [25] introduced a pointer analysis for Clik pro-
grams with structured parallelism. They solved a standard
data-flow problem to propagate points-to information iter-
atively along the control flow and evaluated their analysis Acknowledgments
with benchmarks with up to 4500 lines of code. We thank all the reviewers for their constructive comments
However, unstructured multithreaded C or Java programs on an earlier version of this paper. This research is supported
are more challenging to analyze due to the use of non- by ARC grants DP130101970 and DP150102109.

169
References [23] F. Pereira and D. Berlin. Wave propagation and deep propa-
[1] S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. gation for pointer analysis. In CGO ’09, pages 126–135.
[24] P. Pratikakis, J. S. Foster, and M. W. Hicks. LOCKSMITH:
May-happen-in-parallel analysis of X10 programs. In PPoPP
context-sensitive correlation analysis for race detection. In
’07, pages 183–193.
PLDI’ 06, pages 320–331.
[2] L. Andersen. Program analysis and specialization for the C
[25] R. Rugina and M. Rinard. Pointer analysis for multithreaded
programming language. PhD thesis, 1994.
programs. In PLDI ’99, pages 77–90.
[3] R. Barik. Efficient computation of may-happen-in-parallel
[26] A. Salcianu and M. Rinard. Pointer and escape analysis for
information for concurrent Java programs. In LCPC ’05,
multithreaded programs. PPOPP ’01, 36(7):12–23.
pages 152–169.
[27] Y. Smaragdakis, M. Bravenboer, and O. Lhoták. Pick your
[4] J.-D. Choi, R. Cytron, and J. Ferrante. Automatic construction
contexts well: understanding object-sensitivity. POPL ’11,
of sparse data flow evaluation graphs. In POPL ’91, pages 55–
pages 17–30.
66. [28] Y. Sui, D. Ye, and J. Xue. Static memory leak detection using
[5] J.-D. Choi, R. Cytron, and J. Ferrante. On the efficient engi-
full-sparse value-flow analysis. In ISSTA ’12, pages 254–264.
neering of ambitious program analysis. IEEE Transactions on [29] Y. Sui, S. Ye, J. Xue, and P.-C. Yew. SPAS: Scalable path-
Software Engineering, 20(2):105–114, 1994. sensitive pointer analysis on full-sparse ssa. In APLAS ’11,
[6] F. Chow, S. Chan, S. Liu, R. Lo, and M. Streich. Effective
pages 155–171.
representation of aliases and indirect memory operations in [30] Y. Wang, T. Kelly, M. Kudlur, S. Lafortune, and S. A. Mahlke.
SSA form. In CC ’96, pages 253–267. Gadara: Dynamic deadlock avoidance for multithreaded pro-
[7] P. Di, Y. Sui, D. Ye, and J. Xue. Region-based may-happen-in- grams. In OSDI’ 08, pages 281–294.
parallel analysis for c programs. In ICPP ’15, pages 889–898. [31] J. Whaley and M. S. Lam. Cloning-based context-sensitive
[8] I. Evans, F. Long, U. Otgonbaatar, H. Shrobe, M. Rinard, pointer alias analysis using binary decision diagrams. PLDI
H. Okhravi, and S. Sidiroglou-Douskos. Control Jujutsu: On ’04, pages 131–144.
the weaknesses of fine-grained control flow integrity. In CCS [32] S. Ye, Y. Sui, and J. Xue. Region-based selective flow-
’15, pages 901–913. sensitive pointer analysis. In SAS ’14, pages 319–336.
[9] S. J. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. [33] H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level:
Effective typestate verification in the presence of aliasing. making flow-and context-sensitive pointer analysis scalable
ACM Transactions on Software Engineering and Methodol- for millions of lines of code. In CGO ’10, pages 218–229.
ogy, 17(2):1–34, 2008.
[10] B. Hardekopf and C. Lin. Flow-Sensitive Pointer Analysis for
Millions of Lines of Code. In CGO ’11, pages 289–298. A. Artifact Description
[11] B. Hardekopf and C. Lin. Semi-sparse flow-sensitive pointer Summary: The artifact includes full implementation of
analysis. In POPL ’09, pages 226–238. FSAM and N ON S PARSE analyses, benchmarks and scripts
[12] M. Hind, M. Burke, P. Carini, and J.-D. Choi. Interprocedural
pointer alias analysis. ACM Transactions on Programming
to reproduce the data in this paper.
Languages and Systems, 21(4):848–894, 1999. Description: You may find the artifact package and all the
[13] M. Hind and A. Pioli. Assessing the effects of flow-sensitivity instructions on how to use FSAM via the following link:
on pointer alias analyses. In SAS ’98, pages 57–81. http://www.cse.unsw.edu.au/˜corg/fsam
[14] P. G. Joisha, R. S. Schreiber, P. Banerjee, H. J. Boehm, and
A brief checklist is as follows:
D. R. Chakrabarti. A technique for the effective and automatic
reuse of classical compiler optimizations on multithreaded • index.html: the detailed instructions for reproducing
code. In POPL’11, pages 623–636. the experimental results in the paper.
[15] W. Landi and B. Ryder. A safe approximate algorithm for
interprocedural aliasing. In PLDI ’92, pages 235–248. • FSAM.ova: virtual image file (4.6G) containing in-
[16] C. Lattner and V. Adve. LLVM: A compilation framework stalled Ubuntu OS and FSAM project.
for lifelong program analysis & transformation. In CGO ’04, • Full source code of FSAM developed on top of the SVF
pages 75–86.
[17] O. Lhoták and K.-C. A. Chung. Points-to analysis with effi-
framework http://unsw-corg.github.io/SVF.
cient strong updates. In POPL ’11, pages 3–16. • Scripts used to reproduce the data in the paper including
[18] L. Li, C. Cifuentes, and N. Keynes. Boosting the performance ./table2.sh and ./figure12.sh.
of flow-sensitive points-to analysis using value flow. In FSE
’11, pages 343–353. • Micro-benchmarks to validate pointer analysis results.
[19] Y. Li, T. Tan, Y. Sui, and J. Xue. Self-inferencing reflection Platform: All the results related to analysis times and
resolution for Java. In ECOOP’ 14, pages 27–53. memory usage in our paper are obtained on a 2.70GHz Intel
[20] S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Every-
thing You Want to Know About Pointer-Based Checking. In
Xeon Quad Core CPU running Ubuntu Linux with 64GB
SNAPL ’15, pages 190–208. memory. For the VM image, we recommend you to allocate
[21] H. Oh, K. Heo, W. Lee, W. Lee, and K. Yi. Design and imple- at least 16GB memory to the virtual machine. The OS in the
mentation of sparse global analyses for C-like languages. In virtual machine image is Ubuntu 12.04. A VirtualBox with
PLDI ’12, pages 229–238. version 4.1.12 or newer is required to run the image.
[22] D. Pearce, P. Kelly, and C. Hankin. Efficient field-sensitive
pointer analysis of C. ACM Transactions on Programming License: LLVM Release License (The University of Illi-
Languages and Systems, 30(1), 2007. nois/NCSA Open Source License (NCSA))

170

You might also like