Algorithms For Computing The Static Single Assignment Form
Algorithms For Computing The Static Single Assignment Form
Assignment Form
GIANFRANCO BILARDI
Università di Padova, Padova, Italy
AND
KESHAV PINGALI
Cornell University, Ithaca, New York
Abstract. The Static Single Assignment (SSA) form is a program representation used in many opti-
mizing compilers. The key step in converting a program to SSA form is called φ-placement. Many
algorithms for φ-placement have been proposed in the literature, but the relationships between these
algorithms are not well understood.
In this article, we propose a framework within which we systematically derive (i) properties of the
SSA form and (ii) φ-placement algorithms. This framework is based on a new relation called merge
which captures succinctly the structure of a program’s control flow graph that is relevant to its SSA
form. The φ-placement algorithms we derive include most of the ones described in the literature, as
well as several new ones. We also evaluate experimentally the performance of some of these algorithms
on the SPEC92 benchmarks.
Some of the algorithms described here are optimal for a single variable. However, their repeated
application is not necessarily optimal for multiple variables. We conclude the article by describing
such an optimal algorithm, based on the transitive reduction of the merge relation, for multi-variable
φ-placement in structured programs. The problem for general programs remains open.
Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—compilers and
optimization; I.1.2 [Symbolic and Algebraic Manipulation]: Algorithms—analysis of algorithms
General Terms: Algorithms, Languages, Theory
Additional Key Words and Phrases: Control dependence, optimizing compilers, program optimization,
program transformation, static single assignment form
G. Bilardi was supported in part by the Italian Ministry of University and Research and by the Italian
National Research Council. K. Pingali was supported by NSF grants EIA-9726388, ACI-9870687,
EIA-9972853, ACI-0085969, ACI-0090217, and ACI-0121401.
Section 6 of this article contains an extended and revised version of an algorithm that appeared in a
paper in a Proceedings of the ACM SIGPLAN Conference on Programming Language Design and
Implementation, ACM, New York, 1995, pp. 32–46.
Authors’ addresses: G. Bilardi, Dipartimento di Ingegneria dell’Informazione, Università di Padova,
35131 Padova, Italy, e-mail: [email protected]; K. Pingali, Department of Computer Science,
Cornell University, Upson Hall, Ithaca, NY 14853, e-mail: [email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along with the
full citation. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute
to lists, or to use any component of this work in other works requires prior specific permission and/or
a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York,
NY 10036 USA, fax: +1 (212) 869-0481, or [email protected].
°C 2003 ACM 0004-5411/03/0500-0375 $5.00
Journal of the ACM, Vol. 50, No. 3, May 2003, pp. 375–425.
376 G. BILARDI AND K. PINGALI
1. Introduction
Many program optimization algorithms become simpler and faster if programs are
first transformed to Static Single Assignment (SSA) form [Shapiro and Saint 1970;
Cytron et al. 1991] in which each use1 of a variable is reached by a single definition
of that variable. The conversion of a program to SSA form is accomplished by
introducing pseudo-assignments at confluence points, that is, points with multiple
predecessors, in the control flow graph (CFG) of the program. A pseudo-assignment
for a variable Z is a statement of the form Z = φ(Z , Z , . . . , Z ) where the φ-
function on the right-hand side has one argument for each incoming CFG edge at
that confluence point. Intuitively, a φ-function at a confluence point in the CFG
merges multiple definitions that reach that point. Each occurrence of Z on the right
hand side of a φ-function is called a pseudo-use of Z . A convenient way to represent
reaching definitions information after φ-placement is to rename the left-hand side
of every assignment and pseudo-assignment of Z to a unique variable, and use
the new name at all uses and pseudo-uses reached by that assignment or pseudo-
assignment. In the CFG of Figure 1(a), φ-functions for Z are placed at nodes B and
E; the program after conversion to SSA form is shown in Figure 1(b). Note that no
φ-function is needed at D, since the pseudo-assignment at B is the only assignment
or pseudo-assignment of Z that reaches node D in the transformed program.
An SSA form of a program can be easily obtained by placing φ-functions for all
variables at every confluence point in the CFG. In general, this approach introduces
more φ-functions than necessary. For example, in Figure 1, an unnecessary φ-
function for Z would be introduced at node D.
In this article, we study the problem of transforming an arbitrary program into
an equivalent SSA form by inserting φ-functions only where they are needed. A
φ-function for variable Z is certainly required
+
at a node
+
v if assignments to variable
Z occur along two nonempty paths u → v and w → v intersecting only at v. This
observation suggests the following definition [Cytron et al. 1991]:
Definition 1. Given a CFG G = (V, E) and a set S ⊆ V of its nodes such
that START ∈ S, J (S) is the set of all nodes +v for which+there are distinct nodes
u, w ∈ S such that there is a pair of paths u → v and w → v, intersecting only at
v. The set J (S) is called the join set of S.
If S is the set of assignments to a variable Z , we see that we need pseudo-
assignments to Z at least in the set of nodes J (S). By considering the assignments
in S and these pseudo-assignments in J (S), we see that we might need pseudo-
assignments in the nodes J (S ∪ J (S)) as well. However, as shown by Weiss [1992]
and proved in Section 2.3, J (S ∪ J (S)) = J (S). Hence, the φ-assignments in the
nodes J (S) are sufficient.2
The need for J sets arises also in the computation of the weak control dependence
relation [Podgurski and Clarke 1990], as shown in [Bilardi and Pingali 1996], and
briefly reviewed in Section 5.1.1.
1
Standard definitions of concepts like control flow graph, dominance, defs, uses, etc. can be found
in the Appendix.
2
Formally, we are looking for the least set φ(S) (where pseudo-assignments must be placed) such that
J (S ∪ φ(S)) ⊆ φ(S). If subsets of V are ordered by inclusion, the function J is monotonic. Therefore,
φ(S) is the largest element of the sequence {}, J (S), J (S ∪ J (S)), . . . . Since J (S ∪ J (S)) = J (S),
φ(S) = J (S).
Algorithms for Computing the Static Single Assignment Form 377
Once the set J (S Z ) has been determined for each variable Z of the program, the
following renaming steps are necessary to achieve the desired SSA form. (i) For each
v ∈ S Z ∪ J (S Z ), rename the assignment to Z as an assignment to Z v . (ii) For each
v ∈ J (S Z ), determine the arguments of the φ-assignment Z v = φ(Z x1 , . . . , Z xq ).
(iii) For each node u ∈ U Z where Z is used in the original program, replace Z by
the appropriate Z v . The above steps can be performed efficiently by an algorithm
proposed in [Cytron et al. 1991]. This algorithm visits the CFG according to a
top-down ordering of its dominator tree, and works in time
à !
X
Trenaming = O |V | + |E| + (|S Z | + |J (S Z )| + |U Z |) . (2)
Z
Preprocessing time T p is at least linear in the size |V |+|E| of the program and query
of its input and output sets (|S Z | + |J (S Z )|).
time Tq (S Z ) is at least linear in the sizeP
Hence, assuming P the number of uses Z |U Z | to be comparable with the number
of definitions Z |S Z |, we see that the main cost of SSA conversion is that of
φ-placement. Therefore, the present article focuses on φ-placement algorithms.
1.1. SUMMARY OF PRIOR WORK. A number of algorithms for φ-placement have
been proposed in the literature. An outline of an algorithm was given by Shapiro
and Saint [1970]. Reif and Tarjan [1981] extended the Lengauer and Tarjan [1979]
378 G. BILARDI AND K. PINGALI
3
Ramalingam [2000] has proposed a variant of the SSA form which may place φ-functions at nodes
other than those of the SSA form as defined by Cytron et al. [1991]; thus, it is outside the scope of
this article.
Algorithms for Computing the Static Single Assignment Form 379
above, this corresponds to computing g( f (x)), by computing f (x) first and passing
its output to g.
The main virtue of two-phase algorithms is simplicity. In Section 4, we describe
two such algorithms: edge-scan, a predecessor-oriented algorithm first proposed
here, and node-scan, a successor-oriented algorithm due to Cytron et al. [1991].
Both algorithms use preprocessing time T p = O(|V | + |E| + |DF|) and prepro-
cessing space S p = O(|V | + |DF|). To compute a set J (S), they visit the portion
of G DF reachable from S, in time Tq = O(|V | + |DF|).
1.2.2. Lock-Step Algorithms. A potential drawback of two-phase algorithms is
that the size of the DF relation can be quite large (e.g., |DF| = Ä(|V |2 ), even for
some very sparse (|E| = O(|V |)), structured CFGs) [Cytron et al. 1991]. A lock-
step algorithm interleaves the computation of the reachable set DF+ (S) with that
of the DF relation. Once a node is reached, further paths leading to it do not add
useful information, which ultimately makes it possible to construct only a subgraph
G 0DF = f 0 (G, S) of the DF graph that is sufficient to determine J (S) = g 0 (S, G 0DF ).
The idea of simplifying the computation of f (g(x)) by interleaving the computa-
tions of f and g is quite general. In the context of loop optimizations, this is similar
to loop jamming [Wolfe 1995], which may permit optimizations such as scalariza-
tion. Frontal algorithms for out-of-core sparse matrix factorizations [George and
Liu 1981] exploit similar ideas.
In Section 5, we discuss two lock-step algorithms, a predecessor-oriented pulling
algorithm and a successor-oriented pushing algorithm; for both, T p , S p , Tq =
O(|V | + |E|). A number of structural properties of the merge and dominance
frontier relations, established in this section, are exploited by the pulling and push-
ing algorithms. In particular, we exploit a result that permits us to topologically
sort a suitable acyclic condensate of the dominance frontier graph without actually
constructing this graph.
1.2.3. Lazy Algorithms. A potential source of inefficiency of lock-step algo-
rithms is that they perform computations at all nodes of the graph, even though
only a small subset of these nodes may be relevant for computing M(S) for a given
S. A second source of inefficiency in lock-step algorithms arises when several sets
J (S1 ), J (S2 ) · · · have to be computed, since the DF information is derived from
scratch for each query.
Algorithms for Computing the Static Single Assignment Form 381
Both issues are addressed in Section 6 with the introduction of the aug-
mented dominator tree, a data structure similar to the augmented postdomina-
tor tree [Pingali and Bilardi 1997]. The first issue is addressed by construct-
ing the DF graph lazily as needed by the reachability computation. The idea
of lazy algorithms is quite general and involves computing f (g(x)) by com-
puting only that portion of g(x) that is required to produce the output of f
[Peterson et al. 1997]. In our context, this means that we compute only that por-
tion of the DF relation that is required to perform the reachability computation.
The second issue is addressed by precomputing and caching DF sets for cer-
tain carefully chosen nodes in the dominator tree. Two-phase algorithms can be
viewed as one extreme of this approach in which the entire DF computation is
performed eagerly.
In Section 7, lazy algorithms are evaluated experimentally, both on a micro-
benchmark and on the SPEC benchmarks.
Although these φ-placement algorithms are efficient in practice, a query time
of O(|V | + |E|) is not asymptotically optimal when φ sets have to be found for
several variables in the same program. In Section 8, for the special case of struc-
tured programs, we achieve Tq = O(|S| + |J (S)|), which is asymptotically optimal
because it takes at least this much time to read the input (set S) and write the
output (set J (S)). We follow the two-phase approach; however, the total transitive
reduction Mr of M is computed instead of DF. This is because Mr for a structured
program is a forest which can be constructed, stored, and searched very efficiently.
Achieving query time Tq = O(|S| + |J (S)|) for general programs remains an
open problem.
In summary, the main contributions of this article are the following:
(1) We define the merge relation on nodes of a CFG and use it to derive systemat-
ically all known properties of the SSA form.
(2) We place existing φ-placement algorithms into a simple framework (Figure 3).
(3) We present two new O(|V | + |E|) algorithms for φ-placement, pushing and
pulling, which emerged from considerations of this framework.
(4) For the special case of structured programs, we present the first approach to
answer φ-placement queries in optimal time O(|S| + |J (S)|).
Approach Order Tp Sp Tq
FIG. 3. Overview of φ-placement algorithms. O() estimates are reported for preprocessing time T p ,
preprocessing space S p , and query time Tq .
For any node w, the merge set of node w, denoted by M(w), is the set {v|(w, v) ∈
M}. Similarly, we let M −1 (v) = {w|(w, v) ∈ M}.
Intuitively, M(w) is the set of the nodes where φ-functions must be placed if the
only assignments to the variable are at START and w; conversely, a φ-function is
needed at v if the variable is assigned in any node of M −1 (v). Trivially, M(START) =
{}. Next, we show that if S contains START, then J (S) is the union of the merge sets
of the elements of S.
THEOREM 1. Let G = (V, E) and {START} ⊆ S ⊆ V . Then, J (S) = ∪w∈S M(w).
PROOF. It is easy to see from the definitions of J and M that ∪w∈S M(w) ⊆
J (S). To show that J (S) ⊆ ∪w∈S M(w), consider a node v ∈ J (S). By Definition 1,
+ +
there are paths a → v and b → v , with
+
a, b ∈ S, intersecting only at v. By Defini-
tion 18, there is also a path START → v. There are two cases:
+ +
(1) Path START → v intersects path a → v only at v. Then, v ∈ M(a), hence
v ∈ ∪w∈S M(w).
+ +
(2) Path START → v intersects path a →+v at some node different from+
v. Then,
+
let z be the first node on path START → v occurring
+
on either a → v or b → v.
Without+loss+ of generality, let z be on
+
a → v. Then, there is clearly a path
START → z → v intersecting with b → v only at v, so that v ∈ M(b), hence
v ∈ ∪w∈S M(w).
The control flow graph in Figure 4(a) is the running example used in this paper.
Relation M defines a graph G M = (V, M). The M graph for the running example
is shown in Figure 4(c). Theorem 1 can be interpreted graphically as follows: for
any subset S of the nodes in a CFG, J (S) is the set of neighbors of these nodes in
the corresponding M graph. For example, J ({b, c}) = {b, c, f, a}.
Algorithms for Computing the Static Single Assignment Form 383
+
Inductive step. Let the length of P be at least two so that P = w → y → v.
+ +
By the inductive assumption, there are paths R1 = START → v and R2 = y → v
intersecting only at v. Let C be the path obtained by concatenating the edge w → y
to the path R2 and consider the following two cases:
—w 6∈ (R1 − {v}). Then, let P1 = R1 and P2 = C. Figures 5(i) and 5(ii) illustrate
the subcases w 6= v and w = v, respectively.
+
—w ∈ (R1 − {v}). Let D be the suffix w → v of R1 and observe that C and
D intersect only at their endpoints w and v (see Figure 5(iii)). Let also T =
+
START → v be a path that does not contain w (the existence of T was established
earlier). Let n be the first node on T that is contained in either C or D (such
a node must exist since all three paths terminate at v). Consider the following
cases:
(1) n = v. Then, we let P1 = T , and P2 = C.
(2) n ∈ (D − C). Referring to Figure 5, let P1 be the concatenation of the prefix
+ +
START → n of T with the suffix n → v of D, which is disjoint from P2 = C
except for v.
(3) n ∈ (C − D). The proof is analogous to the previous case and is omitted.
The dominator tree for the running example of Figure 4(a) is shown in Figure 4(b).
Consider the path P = e → b → d → f in Figure 4(a). This path does not contain
idom( f ) = a. As required by the theorem, there are paths P1 = START → a →
b → d → f and P2 = e → f with only f in common, that is, f ∈ M(e).
The preceding result motivates the following definition of M-paths.
+
Definition 3. Given a CFG G = (V, E), an M-path is a path w → v that does
not contain idom(v).
Note that M-paths are paths in the CFG, not in the graph of the M relation. They
enjoy the following important property, illustrated in Figure 6.
+
LEMMA 1. If P = w → v is an M-path, then (i) idom(v) strictly dominates all
nodes on P, hence (ii) no strict dominator of v occurs on P.
Algorithms for Computing the Static Single Assignment Form 385
PROOF
(i) (By contradiction.) Let n be a node on P that is not strictly dominated by
idom(v). Then, there is a path Q = START → n that does not contain idom(v);
concatenating Q with the suffix n → v of P, we get a path from START to v
that does not contain idom(v), a contradiction.
(ii) Since dominance is tree-structured, any strict dominator d of v dominates
idom(v), hence d is not strictly dominated by idom(v) and, by (i), can not occur
on P.
We note that in Figure 6, idom(v) strictly dominates w (Lemma 1(i)); so from
the definition of idom, it follows that idom(v) also dominates idom(w).
2.2. COMPUTING THE MERGE RELATION. Approaches to computing M can be
naturally classified as being successor oriented (for each w, M(w) is determined) or
predecessor oriented (for each v, M −1 (v) is determined). Next, based on Theorem 2,
we describe a predecessor-oriented algorithm that uses graph reachability and a
successor-oriented algorithm that solves a backward dataflow problem.
2.2.1. Reachability Algorithm. The reachability algorithm shown in Figure 7
computes the set M −1 (y) for any node y in the CFG by finding the the set of nodes
reachable from y in the graph obtained by deleting idom(y) from the CFG and
reversing all edges in the remaining graph (we call this graph (G − idom(y)) R . The
correctness of this algorithm follows immediately from Theorem 2.
PROPOSITION 1. The reachability algorithm for SSA has preprocessing time
T p = O(|V ||E|),
P preprocessing space S p = O(|V | + |M|) ≤ O(|V | ), and query
2
Procedure Merge(CFG);
{
1: Assume CFG = (V, E);
2: M = {};
3: for v ∈ V do
4: Let G 0 = (G − idom(v)) R ;
5: Traverse G 0 from v, appending (w, v) to M for each visited w.
6: od
7: return M;
}
Procedure φ-placement(M, S);
{
1: J = {};
2: for each v ∈ S
3: for each (v, w) ∈ M append w to J ;
4: return J ;
}
w in the CFG. This yields a system of backward dataflow equations that can be
solved by any one of the numerous methods in the literature [Aho et al. 1986].
Here and in several subsequent discussions, it is convenient to partition the edges
of the control flow graph G = (V, E) as E = E tree + E up , where (u → v) ∈ E tree
(a tree edge of the dominator tree of the graph) if u = idom(v), and (u → v) ∈ E up
(an up-edge) otherwise. Figure 4(a,b) shows a CFG and its dominator tree. In
Figure 4(a), a → b and g → h are tree edges, while h → a and e → b are
up-edges. For future reference, we introduce the following definition.
Definition 4. Given a CFG G = (V, E), (u → v) ∈ E is an up-edge if
u 6= idom(v). The subgraph (V, E up ) of G containing only the up-edges is called
the α-DF graph.
Figure 4(d) shows the α-DF graph for the CFG of Figure 4(a). Since an up-edge
(u → v) is a path from u to v that does not contain idom(v), its existence implies
+
v ∈ M(u) (from Theorem 2); then, from the transitivity of M, E up ⊆ M. In general,
the latter relation does not hold with equality (e.g., in Figure 4, a ∈ M(g) but a
is not reachable from g in the α-DF graph). Fortunately, the set M(w) can be
expressed as a function of α-DF(w) and the sets M(u) for all CFG successors u of
w as follows. We let children (w) represent the set of children of w in the dominator
tree.
THEOREM 3. The merge sets of the nodes of a CFG satisfy the following set of
relations, for w ∈ V :
M(w) = α−DF(w) ∪ (∪u∈succ(w) M(u) − children(w)). (3)
PROOF
(a) We first prove that M(w) ⊆ α-DF(w) ∪ (∪u∈succ(w) M(u) − children(w)).
+
If
v ∈ M(w), Theorem 2 implies that there is a path P = w → v that does
not contain idom(v); therefore, w 6= idom(v). If the length of P is 1, then
Algorithms for Computing the Static Single Assignment Form 387
FIG. 8. Equations set up and solved by the dataflow algorithm, for the CFG in Figure 4(a).
M is the goal. Unless M is acyclic (i.e., the graph G M is a dag), R is not necessarily
unique. However, if the strongly connected components of M are collapsed into
single vertices, the resulting acyclic condensate (call it Mc ) has a unique transitive
reduction Mr which can be computed in time O(|V ||Mc |) [Cormen et al. 1992] or
O(|V |γ ) by using an O(n γ ) matrix multiplication algorithm.4 In summary:
PROPOSITION 3. The reachability algorithm for φ-placement (with transitive
reduction preprocessing) has preprocessing time T p = O(|V |(|E| + min(|M|,
|V |γ −1 )), preprocessing space S p = O(|V | + |Mr |), and query time Tq = O(|V | +
|Mr+ (S)|).
Clearly, preprocessing time is too high for this algorithm to be of much practical
interest. It is natural to ask whether the merge relation M has any special structure
that could facilitate the transitive reduction computation. Unfortunately, for general
programs, the answer is negative. Given an arbitrary relation R ⊆ (V − START) ×
(V − START), it can be easily shown that the CFG G = (V, R ∪ ({START} × (V −
START))) has exactly R + as its own merge relation M. In particular, if R is transitive
to start with, then M = R.
Rather than pursuing the total transitive reduction of M, we investigate partial
reductions next.
4
For instance, γ = 3 for the standard algorithm and γ = log2 7 ≈ 2.81 for Strassen’s algorithm
[Cormen et al. 1992].
390 G. BILARDI AND K. PINGALI
Assume now that (v, w) ∈ DF. Then, by Definition 6, there is in the CFG an
edge u → v such that (i) w dominates u and ∗
(ii) w does not strictly dominate v.
By (i) and Lemma 8, there∗
is a path Q = w → u on which each node is dominated
by w. If we let R = w → u be the smallest suffix of Q whose first node equals w,
then each node on R except for the first one is strictly dominated by w. This fact
together with (ii) implies that the path P = R(u → v) satisfies Properties (1) and
(2) of Proposition 5, hence it is a prime M-path from w to v.
The developments of this section lead to the sought partial reduction of M.
THEOREM 6. M = DF+ .
PROOF. The stated equality follows from the equivalence of the sequence of
statements listed below, where the reason for the equivalence of a statement to its
predecessor in the list is in parenthesis.
—(w, v) ∈ M;
—there exists an M-path P from w to v, (by Theorem 2);
+
—for some k ≥ 1, P = P1 P2 · · · Pk where Pi = w i → v i are prime M-paths such
that w 1 = w, v k = v, and for i = 2, . . . , k, w i = v i−1 , (by Proposition 4 and
Theorem 5);
—for some k ≥ 1, for i = 1, . . . , k, (w i , v i ) ∈ DF, with w 1 = w, v k = v, and for
i = 2, . . . , k, w i = v i−1 , (by Proposition 6);
—(w, v) ∈ DF+ , (by definition of transitive closure).
In general, DF is neither transitively closed nor transitively reduced, as can be
seen in Figure 4(e). The presence of c → f and f → a and the absence of c → a
in the DF graph show that it is not transitively closed. The presence of edges d → c,
c → f , and d → f shows that it is not transitively reduced.
Combining Theorems 1 and 6, we obtain a simple graph-theoretic interpretation
of a join set J (S) = g(S, G DF ) as the set of nodes reachable in the DF graph by
nonempty paths originating at some node in S.
3.2. TWO IDENTITIES FOR THE DF RELATION. Most of the algorithms described
in the rest of this article are based on the computation of all or part of the DF graph
G DF = f (G) corresponding to the given CFG G. We now discuss two identities
for the DF relation, the first one enabling efficient computation of DF−1 (v) sets (a
predecessor-oriented approach), and the second one enabling efficient computation
of DF(w) sets (a successor-oriented approach).
Definition 7. Let T =< V, F > be a tree. For x, y ∈ V , let [x, y] denote the
set of vertices on the simple path connecting x and y in T , and let [x, y) denote
[x, y] − {y}. In particular, [x, x) is empty.
For example, in the dominator tree of Figure 4(b), [d, a] = {d, b, a}, [d, a) =
{d, b}, and [d, g] = {d, b, a, f, g}.
S
THEOREM 7. EDF = (u→v)∈E [u, idom(v)) × {u → v}, where
[u, idom(v)) ×{u → v} = {(w, u → v)| w ∈ [u, idom(v))}.
392 G. BILARDI AND K. PINGALI
PROOF
S
⊇: Suppose (w, a → b) ∈ (u→v)∈E [u, idom(v)) × u → v. Therefore, [a,
idom(b)) is non-empty which means that (a → b) is an up-edge. Applying
Lemma 1 to this edge, we see that idom(b) strictly dominates a. Therefore, w
dominates a but does not strictly dominate b, which implies that (w, v) ∈ DF from
Definition 6.
⊆: If (w, v) ∈ DF, there is an edge (u → v) such that w dominates u but does not
strictly dominate v. Therefore w ∈ [u, START] − [idom(v), START], which implies
u 6= idom(v). From Lemma 1, this means that idom(v) dominates u. Therefore, the
expression [u, START] − [idom(v), START] can be written as [u, idom(v)), and the
required result follows.
Based on Theorem 7, DF−1 (v) can be computed as the union of the sets
[u, idom(v)) for all incoming edges (u → v). Theorem 7 can be viewed as the
DF analog of the reachability algorithm of Figure 7 for the M relation: to find
DF−1 (v), we overlay on the dominator tree all edges (u → v) whose destination
is v and find all nodes reachable from v without going through idom(v) in the
reverse graph.
The next result [Cytron et al. 1991] provides a recursive characterization of
the DF(w) in terms of DF sets of the children of w in the dominator tree.
There is a striking analogy with the expression for M(w) in Theorem 3. How-
ever, the dependence of the DF expression on the dominator-tree children (rather
than on the CFG successors needed for M) is a great simplification, since it en-
ables solution in a single pass, made according to any bottom-up ordering of the
dominator tree.
PROOF
(⊆) We show that, if v ∈DF(w), then v is contained in the set described by the
right-hand side expression. Applying Definition 6, we see that there must be an
edge (u → v) such that w dominates u but does not strictly dominate v. There are
two cases to consider:
(⊇) We show that if v is contained in the set described by the right-hand side
expression, then v ∈ DF(w). There are two cases to consider.
(1) If v ∈ α−DF(w), there is a CFG edge (w → v) such that w does not strictly
dominate v. Applying Definition 6 with u = w, we see that v ∈ DF(w).
(2) If v ∈ (∪c∈children(w) DF(c) − children(w)), there is a child c of w and an edge
(u → v) such that (i) c dominates u, (ii) c does not strictly dominate v, and (iii)
v is not a child of w. From (i) and the fact that w is the parent of c, it follows
that w dominates u.
Furthermore, if w were to strictly dominate v, then either (a) v would be a
child of w, or (b) v would be a proper descendant of some child of w. Possibility
(a) is ruled out by fact (iii). Fact (ii) means that v cannot be a proper descendant
of c. Finally, if v were a proper descendant of some child l of w other than c,
then idom(v) would not dominate u, which contradicts Lemma 1. Therefore,
w cannot strictly dominate v. This means that v ∈ DF(w), as required.
3.3. STRONGLY CONNECTED COMPONENTS OF THE DF AND M GRAPHS. There
is an immediate and important consequence of Theorem 7, which is useful in proving
many results about the DF and M relations. The level of a node in the dominator
tree can be defined in the usual way: the root has a level of 0; the level of any
other node is 1 more than the level of its parent. From Theorem 7, it follows that
if (w, v) ∈ DF, then there is an edge (u → v) ∈ E such that w ∈ [u, idom(v));
therefore, level (w) ≥ level (v). Intuitively, this means that DF (and M) edges are
oriented in a special way with respect to the dominator tree: a DF or M edge
overlayed on the dominator tree is always directed “upwards” or “sideways” in
this tree, as can be seen in Figure 4. Furthermore, if (w, v) ∈ DF, then idom(v)
dominates w (this is a special case of Lemma 1). For future reference, we state
these facts explicitly.
LEMMA 2. Given a CFG = (V, E) and its dominator tree D, let level (v) be the
length of the shortest path in D from START to v. If (w, v) ∈ DF, then level (w) ≥
level (v) and idom(v) dominates w. In particular, if level(w) = level (v), then w
and v are siblings in D.
This result leads to an important property of strongly connected components
(scc’s) in the DF graph. If x and y are two nodes in the same scc, every node
reachable from x is reachable from y and vice-versa; furthermore, if x is reachable
from a node, y is reachable from that node too, and vice-versa. In terms of the
M relation, this means that M(x) = M(y) and M −1 (x) = M −1 (y). The following
lemma states that the scc’s have a special structure with respect to the dominator tree.
LEMMA 3. Given a CFG = (V, E) and its dominator tree D, all nodes in a
strongly connected component of the DF relation (equivalently, the M relation) of
this graph are siblings in D.
PROOF. Consider any cycle n 1 → n 2 → n 3 → · · · → n 1 in the scc. From
Lemma 2, it follows that level (n 1 ) ≥ level (n 2 ) ≥ level (n 3 ) ≥ · · · ≥ level (n 1 );
therefore, it must be true that level (n 1 ) = level (n 2 ) = level (n 3 ) · · · . From
Lemma 2, it also follows that n 1 , n 2 , etc. must be siblings in D.
In Section 5, we show how the strongly connected components of the DF graph
of a CFG (V, E) can be identified in O(|E|) time.
394 G. BILARDI AND K. PINGALI
PROOF
(⇒) Assume G is irreducible. Then, G has a cycle C on which no node dominates
all other nodes on C. Therefore, there must be two nodes a and b for which neither
idom(a) nor idom(b) is contained in C. Cycle C obviously contains two paths
+ +
P1 = a → b and P2 = b → a. Since C does not contain idom(b), neither does P1
which is therefore is an M-path, implying that b ∈ M(a). Symmetrically, a ∈ M(b).
Therefore, there is a nontrivial cycle containing nodes a and b in the M graph.
(⇐) Assume the M graph has a nontrivial cycle. Let a and b be any two nodes on
this cycle. From Lemma 3, idom(a) = idom(b). By Theorem 2, there are nontrivial
+
CFG paths P1 = a → b which does not contain idom(b) (equivalently, idom(a)),
+
and P2 = b → a which does not contain idom(a) (equivalently, idom(b)). Therefore,
the concatenation C = P1 P2 is a CFG cycle containing a and b but not containing
idom(a) or idom(b). Clearly, no node in C dominates all other nodes, so that CFG
G is irreducible.
It can also be easily seen that the absence from M of self loops (which implies
the absence of nontrivial cycles) characterizes acyclic programs.
4. Two-Phase Algorithms
Two-phase algorithms compute the entire DF graph G DF = f (G) in a preprocessing
phase before doing reachability computations J (S) = g(S, G DF ) to answer queries.
Algorithms for Computing the Static Single Assignment Form 395
4.1. EDGE SCAN ALGORITHM. The edge scan algorithm (Figure 9) is essen-
tially a direct translation of the expression for DF given by Theorem 7. A little
care is required to achieve the time complexity of T p = O(|V | + |DF|) given in
Proposition 8. Let v be the destination of a number of up-edges (say u 1 → v,
u 2 → v, . . . ). A naive algorithm would first visit all the nodes in the interval
[u 1 , idom(v)) adding v to the DF set of each node in this interval, then visit all
nodes in the interval [u 2 , idom(v)) adding v to the DF sets of each node in this
interval, etc. However, these intervals in general are not disjoint; if l is the least
common ancestor of u 1 , u 2 , . . . , nodes in the interval [l, idom(v)) will in general
be visited once for each up-edge terminating at v, but only the first visit would do
useful work. To make the preprocessing time proportional to the size of the DF sets,
all up-edges that terminate at a given CFG node v are considered together. The DF
sets at each node are maintained essentially as a stack in the sense that the first node
of a (ordered) DF set is the one that was added most recently. The traversal of the
nodes in interval [u k → idom(v)) checks each node to see if v is already in the DF
set of that node by examining the first element of that DF set in constant time; if
that element is v, the traversal is terminated.
Once the DF relation is constructed, procedure φ-placement is executed for each
variable Z to determine, given the set S where Z is assigned, all nodes where
φ-functions for Z are to be placed.
PROPOSITION 8. The edge scan algorithm for SSA in Figure 9 has preprocessing
time T p = O(|V
P | + |DF|), preprocessing space S p = O(|V | + |DF|), and query
time Tq = O( v∈(S∪M(S)) |DF(v)|).
PROOF. In the preprocessing stage, time O(|V | + |E|) is spent to visit the CFG,
and additional constant time is spent for each of the |DF| entries of (V, DF), for a
total preprocessing time T p = O(|V | + |E| + |DF|) as described above. The term
|E| can be dropped from the last expression since |E| = |E tree |+|E up | ≤ |V |+|DF|.
The preprocessing space is that needed to store (V, DF). Query is performed by
procedure φ-placement of Figure 9. Query time is proportional to the size of the
portion of (V, DF) reachable from S.
4.2. NODE SCAN ALGORITHM. The node scan algorithm (Figure 9) scans the
nodes according to a bottom-up walk in the dominator tree and constructs the
entire set DF(w) when visiting w, following the approach in Theorem 8. The DF
sets can be represented, for example, as linked lists of nodes; then, union and
difference operations can be done in time proportional to the size of the operand
sets, exploiting the fact that they are subsets of V . Specifically, we make use of
an auxiliary Boolean array B, indexed by the elements of V and initialized to 0.
To obtain the union of two or more sets, we scan the corresponding lists. When
a node v is first encountered (B[v] = 0), it is added to the output list and then
B[v] is set to 1. Further occurrences of v are then detected (B[v] = 1) and are not
appended to the output. Finally, for each v in the output list, B[v] is reset to 0, to
leave B properly initialized for further operations. Set difference can be handled
by similar techniques.
PROPOSITION 9. The node scan algorithm for SSA in Figure 9 has preprocessing
time T p = O(|V
P | + |DF|), preprocessing space S p = O(|V | + |DF|), and query
time Tq = O( v∈(S∪M(S)) |DF(v)|).
396 G. BILARDI AND K. PINGALI
PROOF. Time O(|V | + |E|) is required to walk over CFG edges and compute
the α-DF sets for all nodes. In the bottom-up walk, the work performed at node
w is bounded as follows:
X
work(w) ∝ |α(w)| + |DF(c)| + |children(w)|.
c∈children(w)
Therefore, the total work for preprocessing is bounded by O(|V | + |E| + |DF|)
which, as before, is O(|V | + |DF|). The preprocessing space is the space needed
to store (V, DF). Query time is proportional to the size of the subgraph of (V, DF)
that is reachable from S.
4.3. DISCUSSION. Node scan is similar to the algorithm given by Cytron et al.
[1991]. As we can see from Propositions 8 and 9, the performance of two-phase
algorithms is very sensitive to the size of the DF relation. We have seen in Section 3
that the size of the DF graph can be much larger than that of the CFG. However, real
programs often have shallow dominator trees; hence, their DF graph is comparable
in size to the CFG; thus, two-phase algorithms may be quite efficient.
5. Lock-Step Algorithms
In this section, we describe two lock-step algorithms that visit all the nodes of the
CFG but compute only a subgraph G 0DF = f 0 (G, S) of the DF graph that is sufficient
to determine J (S) = g 0 (S, G 0DF ). Specifically, the set reachable by nonempty paths
that start at a node in S in G 0DF is the same as in G DF . The f 0 and g 0 computations are
interleaved: when a node v is reached through the portion of the DF graph already
built, there is no further need to examine other DF edges pointing to v.
The set DF+ (S) of nodes reachable from an input set S via nonempty paths can
be computed efficiently in an acyclic DF graph, by processing nodes in topological
order. At each step, a pulling algorithm would add the current node to DF+ (S) if
any of its predecessors in the DF graph belongs to S or has already been reached,
that is, already inserted in DF+ (S). A pushing algorithm would add the successors
of current node to DF+ (S) if it belongs to S or has already been reached.
The class of programs with an acyclic DF graph is quite extensive since it is
identical to the class of reducible programs (Proposition 7). However, irreducible
programs have DF graphs with nontrivial cycles, such as the one between nodes
b and c in Figure 4(e). A graph with cycles can be conveniently preprocessed by
collapsing into a “supernode” all nodes in the same strongly connected component,
as they are equivalent as far as reachability is concerned [Cormen et al. 1992]. We
show in Section 5.1 that it is possible to exploit Lemma 3 to compute a topological
ordering of (the acyclic condensate of) the DF graph in O(|E|) time, directly from
the CFG, without actually constructing the DF graph. This ordering is exploited
by the pulling and the pushing algorithms presented in subsequent subsections.
5.1. TOPOLOGICAL SORTING OF THE DF AND M GRAPHS. It is convenient to
introduce the M-reduced CFG, obtained from a CFG G by collapsing nodes that
are part of the same scc in the M graph of G. Figure 10 shows the M-reduced CFG
corresponding to the CFG of Figure 4(a). The only nontrivial scc in the M graph
(equivalently, in the DF graph) of the CFG in Figure 4(a) contains nodes b and c,
and these are collapsed into a single node named bc in the M-reduced graph. The
398 G. BILARDI AND K. PINGALI
dominator tree for the M-reduced graph can be obtained by collapsing these nodes
in the dominator tree of the original CFG.
Definition 8. Given a CFG G = (V, E), the corresponding M-reduced CFG
is the graph G̃ = (Ṽ , Ẽ) where Ṽ is the set of strongly connected components of
M, and (a → b) ∈ Ẽ if and only if there is an edge (u → v) ∈ E such that u ∈ a
and v ∈ b.
Without loss of generality, the φ-placement problem can be solved on the reduced
CFG. In fact, if M̃ denotes the merge relation in G̃, and w̃ ∈ Ṽ denotes the
component to which w belongs, then M(w) = ∪x̃∈ M̃(w̃) x̃ is the union of all the
scc’s x̃ reachable via M̃-paths from the scc w̃ containing w. The key observation
permitting the efficient computation of scc’s in the DF graph is Lemma 3, which
states that all the nodes in a single scc of the DF graph are siblings in the dominator
tree. Therefore, to determine scc’s, it is sufficient to consider the subset of the DF
graph, called the ω-DF graph, that is defined next.
Definition 9. The ω-DF relation of a CFG is the subrelation of its DF relation
that contains only those pairs (w, v) for which w and v are siblings in the dominator
tree of that CFG.
Figure 4(f) shows the ω-DF graph for the running example. Figure 11 shows an
algorithm for computing this graph.
Algorithms for Computing the Static Single Assignment Form 399
6: Procedure Visit(u);
7: Push u on Stack;
8: for each edge e = (u → v) ∈ E do
9: if u 6= idom(v) then
10: let c = node pushed after idom(v) on Stack;
11: Append edge c → v to DFω ;
12: endif
13: od
14: for each child d of u do
15: Visit(d); od
16: Pop u from Stack;
}
LEMMA 4. The ω-DF graph for CFG G = (V, E) is constructed in O(|E|) time
by the algorithm in Figure 11.
PROOF. From Theorem 7, we see that each CFG up-edge generates one edge
in the ω-DF graph. Therefore, for each CFG up-edge u → v, we must identify
the child c of idom(v) that is an ancestor of u, and introduce the edge (c → v) in
the ω-DF graph. To do this in constant time per edge, we build the ω-DF graph
while performing a depth-first walk of the dominator tree, as shown in Figure 11.
This walk maintains a stack of nodes; a node is pushed on the stack when it is first
encountered by the walk, and is popped from the stack when it is exited by the
walk for the last time. When the walk reaches a node u, we examine all up-edges
u → v; the child of idom(v) that is an ancestor of u is simply the node pushed after
idom(v) on the node stack.
PROPOSITION 10. Given the CFG G = (V, E), its M-reduced version G̃ =
(Ṽ , Ẽ) can be constructed in time O(|V | + |E|).
PROOF. The steps involved are the following, each taking linear time:
(1) Construct the dominator tree [Buchsbaum et al. 1998].
(2) Construct the ω-DF graph (V, DFω ) as shown in Figure 11.
(3) Compute strongly connected components of (V, DFω ) [Cormen et al. 1992].
(4) Collapse each scc into one vertex and eliminate duplicate edges.
It is easy to see that the dominator tree of the M-reduced CFG can be obtained by
collapsing the scc’s of the ω-DF graph in the dominator tree of the original CFG.
For the CFG in Figure 4(a), the only nontrivial scc in the ω-DF graph is {b, c}, as
400 G. BILARDI AND K. PINGALI
is seen in Figure 4(f). By collapsing this scc, we get the M-reduced CFG and its
dominator tree shown in Figures 10(a) and 10(b).
It remains to compute a topological sort of the DF graph of the M-reduced
CFG (without building the DF graph explicitly). Intuitively, this is accomplished
by topologically sorting the children of each node according to the ω-DF graph of
the M-reduced CFG and concatenating these sets in some bottom-up order such as
post-order in the dominator tree. We can describe this more formally as follows:
Definition 10. Given a M-reduced CFG G = (V, E), let the children of each
node in the dominator tree be ordered left to right according to a topological sorting
of the ω-DF graph. A postorder visit of the dominator tree is said to yield an
ω-ordering of G.
The ω-DF graph of the M-reduced CFG of the running example is shown in
Figure 10(d). Note that the children of each node in the dominator tree are ordered
so that the left-to-right ordering of the children of each node is consistent with a topo-
logical sorting of these nodes in the ω-DF graph. In particular, node bc is ordered
before its sibling f . The postorder visit yields the sequence < d, e, bc, h, g, f, a >,
which is a topological sort of the acyclic condensate of the DF graph of the original
CFG in Figure 4(a).
THEOREM 9. An ω-ordering of an M-reduced CFG G = (V, E) is a topolog-
ical sorting of the corresponding dominance frontier graph (V, DF) and merge
graph (V, M) and it can be computed in time O(|E|).
PROOF. Consider an edge (w → v) ∈ DF. We want to show that, in the
ω-ordering, w precedes v.
From Theorem 7, it follows that there is a sibling s of v such that (i) s is an
ancestor of w and (ii) there is an edge (s → v) in the DF (and ω-DF) graph. Since
the ω-ordering is generated by a postorder walk of the dominator tree, w precedes
s in this order; furthermore, s precedes v because an ω-ordering is a topological
sorting of the ω-DF graph. Since M = DF+ , an ω-ordering is a topological sorting
of the merge graphs as well. The time bound follows from Lemma 4, Proposition 10,
Definition 10, and the fact that a postorder visit of a tree takes linear time.
From Proposition 7, it follows that for reducible CFGs, there is no need to
determine the scc’s of the ω-DF graph in order to compute ω-orderings.
5.1.1. An Application to Weak Control Dependence. In this section, we take a
short detour to illustrate the power of the techniques just developed by applying
these techniques to the computation of weak control dependence. This relation,
introduced in [Podgurski and Clarke 1990], extends standard control dependence
to include nonterminating program executions. We have shown in [Bilardi and
Pingali 1996] that, in this context, the standard notion of postdominance must be
replaced with the notion of loop postdominance. Furthermore, loop postdominance
is transitive and its transitive reduction is a forest that can be obtained from the
postdominator tree by disconnecting each node in a suitable set B from its parent.
As it turns out, B = J (K ∪ {START}), where K is the set of self-loops of the merge
relation of the reverse CFG, which are called the crowns. The following proposition
is concerned with the efficient computation of the self-loops of M.
Algorithms for Computing the Static Single Assignment Form 401
PROPOSITION 11. The self-loops of the M-graph for CFG G = (V, E) can be
found in O(|V | + |E|).
PROOF. It is easy to see that there is a self-loop for M at a node w ∈ V if
and only if there is a self-loop at w̃ (the scc containing w) in the M-reduced graph
G̃ = (Ṽ , Ẽ). By Proposition 10, G̃ can be constructed in time O(|V | + |E|) and
its self-loops can be easily identified in the same amount of time.
When applied to the reverse CFG, Proposition 11 yields the set of crowns K .
Then, J (K ∪ {START}) can be obtained from K ∪ {START} by using any of the
φ-placement algorithms presented in this article, several of which also run in time
O(|V | + |E|). In conclusion, the loop postdominance forest can be obtained from
the postdominator tree in time proportional to the size of the CFG. As shown in
[Bilardi and Pingali 1996], once the loop postdominance forest is available, weak
control dependence sets can be computed optimally by the algorithms of [Pingali
and Bilardi 1997].
In the reminder of this section, we assume that the CFG is M-reduced.
5.2. PULLING ALGORITHM. The pulling algorithm (Figure 12) is a variation of
the edge scan algorithm of Section 4.1. A bit-map representation is kept for the
input set S and for the output set J (S) = DF+ (S), which is built incrementally.
We process nodes in ω-ordering and maintain, for each node u, an off/on binary
tag, initially off and turned on when processing the first dominator of u which is
S∪DF+ (S), denoted w u . Specifically, when a node v is processed, either if it belongs
to S or if it is found to belong to DF+ (S), a top-down walk of the dominator subtree
rooted at v is performed turning on all visited nodes. If we visit a node x already
402 G. BILARDI AND K. PINGALI
turned on, clearly the subtree rooted at x must already be entirely on, making it
unnecessary to visit that subtree again. Therefore, the overall overhead to maintain
the off/on tags is O(|V |).
To determine whether to add a node v to DF+ (S), each up-edge u → v incoming
into v is examined: if u is turned on, then v is added and its processing can stop.
Let TurnOn(D,w u ) be the call that has switched u on. Clearly, w u belongs to
the set [u, idom(v)) of the ancestors of u that precede v in ω-ordering which, by
Theorem 7, is a subset of DF−1 (v). Hence, v is correctly added to DF+ (S) if and
only if one of its DF predecessors (w u ) is in S ∪ DF+ (S). Such predecessor could
be v itself, if v ∈ S and there is a self-loop at v; for this reason, when v ∈ S, the
call TurnOn(D,v) (Line 4) is made before processing the incoming edges. Clearly,
the overall work to examine and process the up-edges is O(|E up |) = O(|E|). In
summary, we have:
PROPOSITION 12. The pulling algorithm for SSA of Figure 12 has preprocessing
time T p = O(|V | + |E|), preprocessing space S p = O(|V | + |E|), and query time
Tq = O(|V | + |E|).
Which subgraph G 0DF = f 0 (G, S) of the DF graph gets (implicitly) built by the
pulling algorithm? The answer is that, for each v ∈ DF+ (S), G 0DF contains edge
(w u → v), where u is the first predecessor in the CFG adjacency list of node v that
has been turned on when v is processed, and w u is the ancestor that turned it on.
As a corollary, G 0DF contains exactly |DF+ (S)| edges.
5.3. PUSHING ALGORITHM. The pushing algorithm (Figure 13) is a variation
of the node scan algorithm in Section 4.2. It processes nodes in ω-ordering and
builds DF+ (S) incrementally; when a node w ∈ S ∪ DF+ (S) is processed, nodes in
DF(w) that are not already in set DF+ (S) are added to it. A set PDF(S, w), called
the pseudo-dominance frontier, is constructed with the property that any node in
DF(w) − PDF(w) has already been added to DF+ (S) by the time w is processed.
Hence, it is sufficient to add to DF+ (w) the nodes in PDF(S, w) ∩ DF(w), which
are characterized by being after w in the ω-ordering. Specifically, PDF(S, w) is
Algorithms for Computing the Static Single Assignment Form 403
defined (and computed) as the union of α-DF(w) with the PDFs of those children
of w that are not in S ∪ DF+ (S).
It is efficient to represent each PDF set as a singly linked list with a header
that has a pointer to the start and one at the end of the list, enabling constant time
concatenations. The union at Line 7 of procedure Pushing is implemented as list
concatenation, hence in constant time per child for a global O(|V |) contribution. The
resulting list may have several entries for a given node, but each entry corresponds
to a unique up-edge pointing at that node. If w ∈ S ∪ DF+ (S), then each node v in
the list is examined and possibly added to DF+ (S). Examination of each list entry
takes constant time. Once examined, a list no longer contributes to the PDF set of
any ancestor; hence, the global work to examine lists is O(|E|). In conclusion, the
complexity bounds are as follows:
PROPOSITION 13. The pushing algorithm for φ-placement of Figure 13 is cor-
rect and has preprocessing time T p = O(|V | + |E|), preprocessing space S p =
O(|V | + |E|), and query time Tq = O(|V | + |E|).
PROOF. Theorem 8 implies that a node the set PDF(S, w) computed in Line 7
either belongs to DF(w) or is dominated by w. Therefore, every node that is added
to DF+ (S) by Line 10, belongs to it (since v <ω w implies that v is not dominated
by w). We must also show that every node in DF+ (S) gets added by this procedure.
We proceed by induction on the length of the ω-ordering. The first node in such an
ordering must be a leaf and, for a leaf w, PDF(S, w) = DF(w). Assume inductively
that for all nodes n before w in the ω-ordering, those in DF(n) − PDF(S, n) are
added. Since all the children of w precede it in the ω-ordering, it is easy to see that
all nodes in DF(w) − PDF(S, w) are added after w has been visited, satisfying the
inductive hypothesis.
The DF subgraph G 0DF = f 0 (G, S) implicitly built by the pushing algorithm
contains, for each v ∈ DF+ (S), the DF edge (w → v) where w is the first node
of DF−1 (v) ∩ (S ∪ DF+ (S)) occurring in ω-ordering. In general, this is a differ-
ent subgraph from the one built by the pulling algorithm, except when the latter
works on a CFG representation where the predecessors of each node are listed in
ω-ordering.
5.4. DISCUSSION. The ω-DF graph was introduced in [Bilardi and Pingali 1996]
under the name of sibling connectivity graph to solve the problem of optimal
computation of weak control dependence [Podgurski and Clarke 1990].
The pulling algorithm can be viewed as an efficient version of the reachability
algorithm of Figure 7. At any node v, the reachability algorithm visits all nodes that
are reachable from v in the reverse CFG along paths that do not contain idom(v),
while the pulling algorithm visits all nodes that are reachable from v in the reverse
CFG along a single edge that does not contain (i.e., originate from) idom(v). The
pulling algorithm achieves efficiency by processing nodes in ω-order, which ensures
that information relevant to v can be found by traversing single edges rather than
entire paths. It is the simplest φ-placement algorithm that achieves linear worst-case
bounds for all three measures T p , S p and Tq .
For the pushing algorithm, the computation of the M-reduced graph can be
eliminated and nodes can simply be considered in bottom-up order in the dominator
tree, at the cost of having to revisit a node if it gets marked after it has been visited
for computing its PDF set.
404 G. BILARDI AND K. PINGALI
6. Lazy Algorithms
A drawback of lock-step algorithms is that they visit all the nodes in the CFG,
including those that are not in M(S). In this section, we discuss algorithms that
compute sets EDF(w) lazily, that is, only if w belongs to M(S), potentially saving
the effort to process irrelevant parts of the DF graph. Lazy algorithms have the
same the asymptotic complexity as lock-step algorithms, but outperform them in
practice (Section 7).
We first discuss a lazy algorithm that is optimal for computing EDF sets, based on
the approach of [Pingali and Bilardi 1995, 1997] to compute the control dependence
relation of a CFG. Then, we apply these results to φ-placement. The lazy algorithm
works for arbitrary CFGs (i.e., M-reduction is not necessary).
Procedure TopDownEDF(QueryNode);
{
1: EDF = {};
2: Visit(QueryNode, QueryNode);
3: return EDF;
}
Procedure Visit(QueryNode, VisitNode);
{
1: for each edge (u → v) ∈ L[VisitNode] do
2: if idom(v) is a proper ancestor of QueryNode
3: then EDF = EDF ∪ {(u → v)}; endif
4: od ;
5: if VisitNode is not a boundary node
6: then
7: for each child C of VisitNode
8: do
9: Visit(QueryNode,C)
10: od ;
11: endif ;
}
partitioned into smaller trees called zones. For example, in Figure 15(c), there are
seven zones, with node sets : {START}, {END}, {a}, {b, d}, {c, e}, { f }, {g, h}. A
query TopDownEDF(q) visits the portion of a zone below node q, which we call
the subzone associated with q. Formally:
In the implementation, we assume that for each node there is a Boolean variable
Bndry? set to true for boundary nodes and set to false for interior nodes. In Line 2
of Procedure Visit, testing whether idom(v) is a proper ancestor of QueryNode
can be done in constant time by comparing their dfs (depth-first search) number
or their level number. (Both numbers are easily obtained by preprocessing; the dfs
number is usually already available as a byproduct of dominator tree construction.)
It follows immediately that the query time Q q is proportional to the sum of the
number of visited nodes and the number of reported edges:
To limit query time, we shall define zones so that, in terms of a design parameter
β (a positive real number), for every node q we have:
|Z q | ≤ β|EDF(q)| + 1. (5)
Intuitively, the number of nodes visited when q is queried is at most one more than
some constant proportion of the answer size. We observe that, when EDF(q) is
empty (e.g., when q = START or when q = END), Condition (5) forces Z q = {q},
for any β.
Algorithms for Computing the Static Single Assignment Form 407
When we sum over w ∈ X both sides of Inequality (8), we see that the right-hand
side evaluates at most to |V |/β, since all
P subzones Z u ’s involved in the resulting
double summation are disjoint. Hence, w∈X |EDF(w)| ≤ |V |/β, which, used in
Relation (7) yields:
|V |
|L[w]| ≤ |E up | + . (9)
β
Therefore, to store this data structure, we need O(|V |) space for the dominator
tree, O(|V |) further space for the Bndry? bit and for list headers, and finally, from
Inequality (9), O(|E up | + |V |/β) for the list elements. All together, we have S p =
O(|E up | + (1 + 1/β)|V |).
5
The removal of this simplifying condition might lead to further storage reductions.
408 G. BILARDI AND K. PINGALI
We summarize the Augmented Dominator Tree ADT for answering EDF queries:
(1) T : dominator tree that permits top-down and bottom-up traversals.
(2) dfs[v]: dfs number of node v.
(3) Bndry?[v]: Boolean. Set to true if v is a boundary node, and set to false other-
wise.
(4) L[v]: list of CFG edges. If v is a boundary node, L[v] is EDF(v); otherwise, it
is α-DF(v).
6.1.2. ADT Construction. The preprocessing algorithm that constructs the
search structure ADT takes three inputs:
—The dominator tree T , for which we assume that the relative order of two nodes
one of which is an ancestor of the other can be determined in constant time.
—The set E up of up-edges (u → v) ordered by idom(v).
—Real parameter β > 0, which controls the space/query-time trade-off.
The stages of the algorithm are explained below and translated into pseudocode in
Figure 16.
(1) For each node x, compute the number b[x] (respectively, t[x]) of up-edges
(u → v) with u = x (respectively, idom(v) = x). Set up two counters initialized
to zero and, for each (u → v) ∈ E up , increment the appropriate counters of
its endpoints. This stage takes time O(|V | + |E up |), for the initialization of the
2|V | counters and for the 2|E up | increments of such counters.
(2) For eachPnode x, compute |EDF(x)|. It is easy to see that |EDF(x)| = b[x] −
t[x] + y∈children(x) |EDF(y)|. Based on this relation, the |EDF(x)| values can
be computed in bottom-up order, using the values of b[x] and t[x] computed
in Step (1), in time O(|V |).
(3) Determine boundary nodes, by appropriate setting of a Boolean variable
Bndry?[x] for each node P x. Letting z[x] = |Z x |, Definition 13 becomes:
If x is a leaf or (1 + y∈children(x) z[y]) > (β|EDF(x)| + 1), then x is a
boundary node,P and z[x] is set to 1. Otherwise, x is an interior node, and
z[x] = (1 + y∈children(x) z[y]).
Again, z[x] and Bndry?[x] are easily computed in bottom-up order, taking
time O(|V |).
(4) Determine, for each node x, the next boundary node NxtBndry[x] in the path
from x to the root. If the parent of x is a boundary node, then it is the next
boundary for x. Otherwise, x has the same next boundary as its parent. Thus,
NxtBndry[x] is easily computed in top-down order, taking O(|V |) time. The
next boundary for root of T set to a conventional value −∞, considered as a
proper ancestor of any node in the tree.
(5) Construct list L[x] for each node x. By Definition 11, given an up-edge
(u → v), v appears in list L[x] for x ∈ Wuv {w 0 = u, w 1 , . . . , w k }, where
Wuv contains u as well as all boundary nodes contained in the dominator-tree
path [u, idom(v)) from u (included) to idom(v) (excluded).
Specifically, w i = NxtBndry[w i−1 ], for i = 1, 2, . . . , k and w k is the proper
descendant of idom(v) such that idom(v) is a descendant of NxtBndry[w k ].
Algorithms for Computing the Static Single Assignment Form 409
and
β2 (G) = max(|Dq | − 1)/|EDF(q)|. (11)
q∈Y
6
Technically, we assume Y is not empty, a trivial case that, under Definition 18, arises only when the
CFG consists of a single path from START to END.
Algorithms for Computing the Static Single Assignment Form 411
Finally, in the range β1 (G) ≤ β < β2 (G), one can expect intermediate behaviors
where the ADT stores something in between α-EDF and EDF.
To obtain linear space and query time, β must be chosen to be a constant, inde-
pendent of G. A reasonable choice can be β = 1, illustrated in Figure 15(c) for
the running example. Depending on the values of β1 (G) and β2 (G), this choice can
yield anywhere from no caching to full caching. For many CFG’s arising in prac-
tice, β1 (G) < 1 < β2 (G); for such CFG’s, β = 1 corresponds to an intermediate
degree of caching.
6.2. LAZY PUSHING ALGORITHM. We now develop a lazy version of the the
pushing algorithm. Preprocessing consists in constructing the ADT data structure.
The query to find J (S) = DF+ (S) proceeds along the following lines:
—The successors DF(w) are determined only for nodes w ∈ S ∪ J (S).
—Set DF(w) is obtained by a query EDF(w) to the ADT , modified to avoid
reporting of some nodes already found to be in J (S).
—The elements of J (S) are processed according to a bottom-up ordering of the
dominator tree.
To develop an implementation of the above guidelines, consider first the simpler
problem where a set I ⊆ V is given, with its nodes listed in order of nonincreasing
level, and the set ∪w∈I EDF(w) must be computed. For each element of I in the given
order, an EDF query is made to the ADT . To avoid visiting tree nodes repeatedly
during different EDF queries, a node is marked when it is queried and the query
procedure of Figure 14 is modified so that it never visits nodes below a marked
node. The time Tq0 (I ) to answer this simple form of query is proportional to the size
of the set Vvis ⊆ V of nodes visited and the total number of up-edges in the L[v]
lists of these nodes. Considering Bound 9 on the latter quantity, we obtain
Tq0 (I ) = O(|Vvis | + |E up | + |V |/β) = O(|E| + (1 + 1/β)|V |). (15)
For constant β, the above time bound is proportional to program size.
In our context, set I = I (S) = S ∪DF+ (S) is not given directly; rather, it must be
incrementally constructed and sorted, from input S. This can be accomplished by
keeping those nodes already discovered to be in I but not yet queried for EDF in a
priority queue [Cormen et al. 1992], organized by level number in the tree. Initially,
the queue contains only the nodes in S. At each step, a node w of highest level is
extracted from the priority queue and an EDF (w) query is made in the ADT ; if a
reported node v is not already in the output set, it is added to it as well as inserted
into the queue. From Lemma 2, level (v) ≤ level (w), hence the level number is non-
increasing throughout the entire sequence of extractions from the priority queue.
The algorithm is described in Figure 17. Its running time can be expressed as
Tq (S) = Tq0 (I (S)) + TPQ (I (S)). (16)
The first term accounts for the ADT processing and satisfies Eq. (15). The second
term accounts for priority queue operations. The range for the keys has size
K , equal to the number of levels of the dominator tree. If the priority queue is
implemented using a heap, the time per operation is O(log K ) [Cormen et al.
1992], whence TPQ (I (S)) = O(|I (S)| log K ). A more sophisticated data structure,
exploiting the integer nature of the keys, achieves O(log log K ) time per operation
[Van Emde Boas et al. 1977]; hence, TPQ (I (S)) = O(|I (S)| log log K ).
412 G. BILARDI AND K. PINGALI
7. Experimental Results
In this section, we evaluate the lazy pushing algorithm of Figure 17 experimentally,
focusing on the impact that the choice of parameter β has on performance. These
experiments shed light on the two-phase and fully lazy approaches because the
lazy algorithm reduces to these approaches for extreme values of β, as explained in
Section 6.1.3. Intermediate values of β in the lazy algorithm let us explore trade-
offs between preprocessing time (a decreasing function of β) and query time (an
increasing function of β).
The programs used in these experiments include a standard model problem
and the SPEC92 benchmarks. The SPEC programs tend to have sparse domi-
nance frontier relations, so we can expect a two-phase approach to benefit from
small query time without paying much penalty in preprocessing time and space;
in contrast, the fully lazy approach might be expected to suffer from excessive
recomputation of dominance frontier information. The standard model problem
on the other hand exhibits a dominance frontier relation that grows quadratically
with program size, so we can expect a two-phase approach to suffer consider-
able overhead, while a fully lazy algorithm can get by with little preprocess-
ing effort. The experiments support these intuitive expectations and at the same
time show that intermediate values of β (say, β = 1) are quite effective for all
programs.
Next, we describe the experiments in more detail.
A model problem for SSA computation is a nest of l repeat-until loops, whose
CFG we denote G l , illustrated in Figure 18. Even though G l is structured, its DF
relation grows quadratically with program size, making it a worst-case scenario for
two-phase algorithms. The experiments reported here are based on G 200 . Although
a 200-deep loop nest is unlikely to arise in practice, it is large enough to exhibit
the differences between the algorithms discussed in this article. We used the lazy
pushing algorithm to compute DF+ (n) for different nodes n in the program, and
measured the corresponding running time as a function of β on a SUN-4. In the 3D
plot in Figure 19, the x axis is the value of log2 (β), the y-axis is the node number
n, and the z-axis is the time for computing DF+ (n).
Algorithms for Computing the Static Single Assignment Form 413
The 2D plot in Figure 18 shows slices parallel to the yz plane of the 3D plot for
three different values of β—a very large value (Sreedhar–Gao), a very small value
(Cytron et al.), and 1.
From these plots, it is clear that for small values of β (full caching/two-phase),
the time to compute DF+ grows quadratically as we go from outer loop nodes to
414 G. BILARDI AND K. PINGALI
FIG. 19. Time for φ-placement in model problem G 200 by lazy pushing with parameter β.
inner loop nodes. In contrast, for large values of β (no caching/fully lazy), this time
is essentially constant. These results can be explained analytically as follows.
The time to compute DF+ sets depends on the number of nodes and the number
of DF graph edges that are visited during the computation. It is easy to show that,
for 1 ≤ n ≤ l, we have DF(n) = DF(2l − n + 1) = {1, 2, . . . , n}.
For very small values of β, the dominance frontier information of every node is
stored at that node (full caching). For 1 ≤ n ≤ l, computing DF+ (n) requires a visit
to all nodes in the set {1, 2, . . . , n}. The number of DF edges examined during these
visits is 1 + 2 + · · · + n = n(n + 1)/2; each of these edge traversals involves a visit
to the target node of the DF edge. The reader can verify that a symmetric formula
holds for nodes numbered between l and 2l. These results explain the quadratic
growth of the time for DF+ set computation when full caching is used.
For large values of β, we have no caching of dominance frontier information.
Assume that 1 ≤ n ≤ l. To compute DF(n), we visit all nodes in the dominator
tree subtree below n, and traverse l edges to determine that DF(n) = {1, 2, . . . , n}.
Subsequently, we visit nodes (n − 1), (n − 2) etc., and at each node, we visit only
that node and the node immediately below it (which is already marked); since no
Algorithms for Computing the Static Single Assignment Form 415
FIG. 20. Time for φ-placement in SPEC92 benchmarks by lazy pushing with parameter β.
DF edges are stored at these nodes, we traverse no DF edges during these visits.
Therefore, we visit (3l + n) nodes, and traverse l edges. Since n is small compared
to 3l, we see that the time to compute DF+ (n) is almost independent of n, which is
borne out by the experimental results.
Comparing the two extremes, we see that for small values of n, full caching
performs better than no caching. Intuitively, this is because we suffer the overhead
of visiting all nodes below n to compute DF(n) when there is no caching; with full
caching, the DF set is available immediately at the node. However, for large values
of n, full caching runs into the problem of repeatedly discovering that certain nodes
are in the output set—for example, in computing DF+ (n), we find that node 1 is
in the output set when we examine DF(m) for every m between n and 1. It is easy
to see that with no caching, this discovery is made exactly once (when node 2l is
visited during the computation of DF+ (n)). The cross-over value of n at which no
caching performs better than full caching is difficult to estimate analytically but
from Figure 19, we see that a value of β = 1 outperforms both extremes for almost
all problem sizes.
Since deeply nested control structures are rare in real programs, we would expect
the time required for φ-function placement in practice to look like a slice of Figure 19
parallel to the xz plane for a small value of n. That is, we would expect full caching
to outperform no caching, and we would expect the use of β = 1 to outperform full
caching by a small amount. Figure 20 shows the total time required to do φ-function
placement for all unaliased scalar variables in all of the programs in the SPEC92
benchmarks. It can be seen that full caching (small β) outperforms no caching
(large β) by a factor between 3 and 4. In Sreedhar and Gao [1995], reported that
their algorithm, essentially lazy pushing with no caching, outperformed the Cytron
416 G. BILARDI AND K. PINGALI
PROOF. We give the proof only for (1) and omit the proof for (2), which is
similar.
(⇒) By the assumption (e → s) ∈ EDF(w) and Definition 6, we have that
(ii) w dominates e and (iii) w does not strictly dominate s. Thus, (ii) is immediately
established. To establish (i), we show that (iv) e does not strictly dominate w, that
(v) s dominates w, and then invoke part (3) of Lemma 5.
Indeed, (iv) follows from (ii) and the asymmetry of dominance.
Observe next that both s and w are dominators of e (from part (1) of Lemma 5
and (ii), respectively); hence, one of them must dominate the other. In view of (iii),
the only possibility remains (v).
(⇐) By assumption, (ii) w dominates e. Also by assumption, w ∈ W so that,
by part (3) of Lemma 5, (v) s dominates w. By (v) and asymmetry of dominance,
we have that (iii) w does not strictly dominate s. By (ii), (iii), and Definition 6, it
follows that (e → s) ∈ EDF(w).
Lemma 6 indicates that DF(w) can be determined by examining the loop and con-
ditional regions C that contain w and checking whether w dominates an appropriate
node. By part (4) of Lemma 5, this check amounts to determining whether w belongs
to the interior of some conditional region C ⊆ W . Since the regions containing w
are not disjoint, by part (5) of Lemma 5, they form a sequence ordered by inclu-
sion. Thus, each region in a suitable prefix of this sequence contributes one node
to DF(w). To help formalizing these considerations, we introduce some notation.
Definition 17. Given a node w in a structured CFG, let H1 (w) ⊂ H2 (w) ⊂
· · · ⊂ Hd(w) (w) be the sequence of loop regions containing w and of conditional
regions containing w as an interior node. We also let `(w) be the largest index ` for
which H1 (w), . . . , H`(w) (w) are all loop regions.
Figure 21(a) illustrates a structured CFG. The sequence of regions for node k
is H1 (k) =< j, l >, H2 (k) =< i, m >, H3 (k) =< h, n >, H4 (k) =< g, q >,
H5 (k) =< a, r >, with d(w) = 5, and `(w) = 1, since H2 (w) is the first conditional
region in the sequence. With the help of the dominator tree shown in Figure 21(b),
one also sees that DF(k) = { j, m} = {START(H1 (k)), END(H2 (k))}. For node c, we
have H1 (c) =< b, e >, H2 (c) =< a, r >, d(c) = 2, `(c) = 0, and DF(c) = {r } =
{END(H1 (c))}.
PROPOSITION 15. For w ∈ V , if `(w) < d(w), then we have:
© ¡ ¢ ¡ ¢ ¡ ¢ª
DF(w) = START H1 (w) , . . . , START H`(w) (w) , END H`(w)+1 (w) ,
else (`(w) = d(w), that is, no conditional region contains w in its interior) we
have:
© ¡ ¢ª
DF(w) = START(H1 (w)), . . . , START H`(w) (w) .
PROOF. · · · ⊆ DF(w). Consider a node START(Hi (w)) where i ≤ `(w). By
definition, w ∈ Hi (w) and there is no conditional region C ⊂ Hi (w) that contains w
as an internal node; by part (4) of Lemma 5, w dominates END(Hi (w)). By Lemma 6,
START(Hi (w)) ∈ DF(w). A similar argument establishes that END(H`(w)+1 (w)) ∈
DF(w).
DF(w) ⊆ · · · . Let (u → v) ∈ EDF(w). If (u → v) is the back-edge of a loop
region W =< v, u >, Lemma 6 asserts that w dominates u and is contained in
Algorithms for Computing the Static Single Assignment Form 419
Node c d f e b l k j p q n m i h g r a
h h h h
Stack j j j g g
e e r r r m g g g g .. ..
at Line 10 m m m r r
r r r r
Second, if the top of the stack is w itself, then it is removed from the stack. Third,
if the top of the stack is now a sibling of w, it also gets removed. We show that, at
Line 10 of the algorithm, the stack contains the nodes of DF(w) in w-order from top
to bottom. Therefore, examination of the top of the stack is sufficient to determine
whether there is a self-loop at w in the M-graph and to find the parent of w in
the forest Mr , if it exists. Figure 23 shows the contents of the stack at Line 10 of
Figure 22 when it is processing the nodes of the program of Figure 21 in ω-order.
PROPOSITION 16. Let G = (V, E) be a structured CFG. Then, the parent iM(w)
of each node w ∈ V in forest Mr and the presence of a self-loop at w can be
computed in time O(|E| + |V |) by the algorithm of Figure 22.
PROOF. Let w 1 , w 2 , . . . , w |V | be the ω-ordered sequence in which nodes are
visited by the loop beginning at Line 7. We establish the loop invariant In : at Line
10 of the nth loop iteration, the stack holds the nodes in DF(w n ), in ω-order from
top to bottom. This ensures that self-loops and iM(w) are computed correctly. The
proof is by induction on n.
Base case. The stack is initially empty and Lines 8 and 9 will push the nodes
of α-DF(w 1 ), in reverse-ω-order. Since w 1 is a leaf of the dominator tree, by
Theorem 8, DF(w 1 ) = α-DF(w 1 ), and I1 is established.
Inductive step. We assume In and prove In+1 . From the properties of post-order
walks of trees, three cases are easily seen to exhaust all possible mutual positions
of w n and w n+1 .
(1) w n+1 is the leftmost leaf of the subtree rooted at the first sibling r of w n
tothe right of w n . From Lemma 7 applied to parent(w n ), there is a region
< parent(w n ), e >=< w n , e1 > ⊗ < s2 , e2 >. From Proposition 15,
DF(w n ) ⊆ {w n , e}. Nodes w n and e will be popped off the stack by the time
control reaches the bottom of the loop at the nth iteration, leaving an empty
stack at Line 7 of the (n + 1)st iteration. Then the nodes in α-DF(w n+1 ) will
be pushed on the stack in reverse-ω order. Since w n+1 is a leaf, DF(w n+1 ) =
α-DF(w n+1 ) and In+1 holds.
(2) w n is the rightmost child of w n+1 , with w n+1 having other children. From
Lemma 7, < w n+1 , w n > is a conditional region. Since every loop and con-
ditional region that contains w n also contains w n+1 and vice-versa, it follows
from Proposition 15 that DF(w n+1 ) = DF(w n ). Furthermore, the children of
w n+1 cannot be in DF(w n+1 ), so they cannot be in DF(w n ) either. By assump-
tion, at Line 10 of the nth iteration, the stack contains DF(w n ). We see that
nothing is removed from the stack in Lines 10–19 during the nth iteration be-
cause neither w n nor the siblings of w n are in DF(w n ). Also, α-DF(w n+1 )
is empty, as no up-edges emanate from the end of a conditional, so nothing is
pushed on the stack at Line 9 of the (n +1)-st iteration, which then still contains
DF(w n ) = DF(w n+1 ). Thus, In+1 holds.
422 G. BILARDI AND K. PINGALI
(3) w n is the only child of w n+1 . By Theorem 8, DF(w n+1 ) = α-DF(w n+1 ) ∪
(DF(w n ) − {w n }). At the nth iteration, the stack contains DF(w n ), from which
Lines 10–14 will remove w n from the stack, if it is there, and Lines 15–
19 will not pop anything, since w n has no siblings. At the (n + 1)st itera-
tion, Lines 8–9 will push the nodes in α-DF(w n+1 ) on the stack, which will
then contain DF(w n+1 ). It remains to show that the nodes on the stack are in
ω-order.
If α-DF(w n+1 ) is empty, ω-ordering is a corollary of In . Otherwise, there are
up-edges emanating from w n+1 . Since w n+1 is not a leaf, part (3) of Lemma 7
rules out case (2) of Lemma 6. Therefore, w n+1 must be the end node of a loop
< s, w n+1 > and α-DF(w n+1 ) = {s}.
From Lemma 5, any other region W =< s 0 , e > that contains w n+1 in the
interior will properly include < s, w n+1 >, so that s 0 strictly dominates s (from
Lemma 5, part (1).) If W is a loop region, then s ∈ DF(w n ) occurs before s 0 in
ω-order. If W is a conditional region, then since e ∈ DF(w n ) is the rightmost
child of s 0 , s must occur before e in ω-order. In either case, s will correctly be
above s 0 or e in the stack.
The complexity bound of O(|E| + |V |) for the algorithm follows from the obser-
vation that each iteration of the loop in Lines 7–20 pushes the nodes in α-DF(w)
(which is charged to O(|E|)) and performs a constant amount of additional work
(which is charged to O(|V |)).
The class of programs with forest-structured M contains the class of struc-
tured programs (by Theorem 11) and is contained in the class of reducible pro-
grams (by Proposition 7). Both containments turn out to be strict. For exam-
ple, it can be shown that for any CFG whose dominator tree is a chain Mr is
a forest even though such a program may not be structured, due to the pres-
ence of non-well-nested loops. One can also check that the CFG with edges
(s, a), (s, b, ), (s, c), (s, d), (a, b), (b, d), (a, c), (a, d) is reducible but its Mr re-
lation is not a forest.
If the Mr relation for a CFG G is a forest, then it can be shown easily that
iM(w) = min DF(w), where the min is taken with respect to an ω-ordering of the
nodes. Then, Mr can be constructed efficiently by a simple modification of the node-
scan algorithm, where the DF sets are represented as balanced trees, thus enabling
dictionary and merging operations in logarithmic time. The entire preprocessing
then takes time T p = O(|E| log |V |). Once the forest is available, queries can be
handled optimally as in Proposition 14.
8.4. APPLICATIONS TO CONTROL DEPENDENCE. In this section, we briefly and
informally discuss how the Mr forest enables the efficient computation of set DF(w)
for a given w. This is equivalent to the well-known problem of answering node
control dependence queries [Pingali and Bilardi 1997]. In fact, the node control
dependence relation in a CFG G is the same as the dominance frontier relation in
the reverse CFG G R , obtained by reversing the direction of all arcs in G. Moreover,
it is easy to see that G is structured if and only if G R is structured.
By considering the characterization of DF(w) provided by Proposition 15, it is
not difficult to show that DF(w) contains w if and only if Mr has a self-loop at w
and, in addition, it contains all the proper ancestors of w in Mr up to and including
the first one that happens to be the end node of a conditional region. Thus, a simple
Algorithms for Computing the Static Single Assignment Form 423
9. Conclusions
This article is a contribution to the state of the art of φ-placement algorithms for
converting programs to SSA form. Our presentation is based on a new relation on
CFG nodes called the merge relation that we use to derive all known properties of
the SSA form in a systematic way. Consideration of this framework led us to invent
new algorithms for φ-placement that exploit these properties to achieve asymptotic
running times that match those of the best algorithms in the literature. We presented
both known and new algorithms for φ-placement in the context of this framework,
and evaluated performance on the SPEC benchmarks.
Although these algorithms are fast in practice, they are not optimal when
φ-placement has to be done for multiple variables. In the multiple variable problem,
a more ambitious goal can be pursued. Specifically, after suitable preprocessing of
the CFG, one can try to determine φ-placement for a variable in time O(|S|+|J (S)|)
(i.e., proportional to the number of nodes where that variable generates a defini-
tion in the SSA form). We showed how this could be done for the special case of
structured programs by discovering and exploiting the forest structure of the merge
relation. The extension of this result to arbitrary programs remains a challenging
open problem.
Appendix A.
Definition 18. A control flow graph (CFG) G = (V, E) is a directed graph in
which a node represents a statement and an edge u → v represents possible flow
of control from u to v. Set V contains two distinguished nodes: START, with no
predecessors and from which every node is reachable; and END, with no successors
and reachable from every node.
∗
(n = 0) is said to be empty. A path from x to y is denoted as x → y in general and
+
as x → y if it is not empty. Two paths of the form P1 = x0 → x1 , . . . , xn−1 → xn
and P2 = xn → xn+1 , . . . , xn+m−1 → xn+m (last vertex on P1 equals first vertex
on P2 ) are said to be concatenable and the path P = P1 P2 = x0 → x1 , x1 →
x2 , . . . , xn+m−1 → xn+m is referred to as their concatenation.
Definition 20. A node w dominates a node v, denoted (w, v) ∈ D, if every
path from START to v contains w. If, in addition, w 6= v, then w is said to strictly
dominate v.
It can be shown that dominance is a transitive relation with a tree-structured
transitive reduction called the dominator tree, T = (V, Dr ). The root of this tree
is START. The parent of a node v (distinct from START) is called the immediate
dominator of v and is denoted by idom(v). We let children(w) = {v : idom(v) = w}
denote the set of children of node w in the dominator tree. The dominator tree can
be constructed in O(|E|α(|E|)) time by an algorithm due to Lengauer and Tarjan
[1979], or in O(|E|) time by a more complicated algorithm due to Buchsbaum et al.
[1998]. The following lemma is useful in proving properties that rely on dominance.
LEMMA 8. Let G = (V, E) be a CFG. If w dominates u, then there is a path
from w to u on which every node is dominated by w.
∗
PROOF. Consider any acyclic+
path P = START → u. Since w dominates u, P
must contain w. Let P1 = w → u be the suffix of path P that originates at node w.
Suppose there
+
is+a node n on path P1 that +is not dominated by w. We can write
path P1 as w → n → u; let P2 be the suffix n → u of this path. Node w cannot occur
on P2 because P is acyclic. +
Since n is not dominated by w, there is a path Q = START → n that does not
contain w. The concatenation of Q with P2 is a path from START to u not containing
w, which contradicts the fact that w dominates u.
A key data structure in optimizing compilers is the def-use chain [Aho et al.
1986]. Briefly, a statement in a program is said to define a variable Z if it may write
to Z , and it is said to use Z if it may read the value of Z before possibly writing
to Z . By convention, the START node is assumed to be a definition of all variables.
The def-use graph of a program is defined as follows:
Definition 21. The def-use graph of a control flow graph G = (V, E) for
variable Z is a graph DU = (V, F) with the same vertices as G and an edge
(n 1 , n 2 ) whenever n 1 is a definition of a Z , n 2 is a use of Z , and there is a path
in G from n 1 to n 2 that does not contain a definition of Z other than n 1 or n 2 . If
(n 1 , n 2 ) ∈ F, then definition n 1 is said to reach the use of Z at n 2 .
In general, there may be several definitions of a variable that reach a use of that
variable. Figure 1(a) shows the CFG of a program in which nodes START, A and C
are definitions of Z . The use of Z in node F is reached by the definitions in nodes
A and C.
REFERENCES
AHO, A. V., SETHI, R., AND ULLMAN, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-
Wesley, Reading, Mass.
Algorithms for Computing the Static Single Assignment Form 425
BILARDI, G., AND PINGALI, K. 1996. A framework for generalized control dependence. In Proceedings
of the SIGPLAN ’96 Conference on Programming Language Design and Implementation. ACM, New
York, 291–300.
BUCHSBAUM, A. L., KAPLAN, H., ROGERS, A., AND WESTBROOK, J. R. 1998. Linear-time pointer-
machine algorithms for least common ancestors, MST verification, and dominators. In Proceedings of
the ACM Symposium on the Theory of Computing. ACM, New York, pp. 279–288.
CYTRON, R., AND FERRANTE, J. 1993. Efficiently computing φ-nodes on-the-fly. In Proceedings of the
6th Workshop on Languages and Compilers for Parallel Computing (Aug.). Lecture Notes in Computer
Science, vol. 768, Springer-Verlag, New York, pp. 461–476.
CYTRON, R., FERRANTE, J., ROSEN, B. K., WEGMAN, M. N., AND ZADECK, F. K. 1991. Efficiently
computing static single assignment form and the control dependence graph. ACM Trans. Prog. Lang.
Syst. 13, 4, (Oct.), 451–490.
CORMEN, T., LEISERSON, C., AND RIVEST, R. 1992. Introduction to Algorithms. The MIT Press,
Cambridge, Mass.
GEORGE, A., AND LIU, J. W.-H. 1981. Computer Solution of Large Sparse Positive Definite Systems.
Prentice-Hall, Englewood Cliffs, N.J.
JOHNSON, R., AND PINGALI, K. 1993. Dependence-based program analysis. In Proceedings of the
SIGPLAN ’93 Conference on Programming Language Design and Implementation (Albuquerque, N. M.,
June 23–25). ACM, New York, pp. 78–89.
LENGAUER, T., AND TARJAN, R. E. 1979. A fast algorithm for finding dominators in a flowgraph. ACM
Trans. Prog. Lang. Syst. 1, 1 (July), 121–141.
PETERSON, J. ET AL. 2002. Haskell: A purely functional language. http://www.haskell.org.
PEYTON JONES, S., AUGUSTSSON, L. BARTON, D., BOUTEL, B., BURTON, W., FASEL, J., HAMMOND, K.,
HINZE, R., HUDAK, P., HUGHES, J., JOHNSSON, T., JONES, M., LAUNCHOURY, J., MEIJER, E., PETERSON,
J., REID, A., RUNCIMAN, C., AND WADLER, P. 2002. Haskell 98 Language and Libraries: The Revised
Report. Available at www.haskell.org.
PINGALI, K., AND BILARDI, G. 1995. AP T : A data structure for optimal control dependence com-
putation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and
Implementation. ACM, New York, pp. 32–46.
PINGALI, K., AND BILARDI, G. 1997. Optimal control dependence computation and the Roman Chariots
problem. In ACM Trans. Prog. Lang. Syst. 19, 3 (May), pp. 462–491.
PINGALI, K., BECK, M., JOHNSON, R., MOUDGILL, M., AND STODGHILL, P. 1991. Dependence flow graphs:
An algebraic approach to program dependencies. In Conference Record of the 18th Annual ACM Sym-
posium on Principles of Programming Languages (Jan.). ACM, New York, pp. 67–78.
PODGURSKI, A., AND CLARKE, L. 1990. A formal model of program dependences and its implications
for software testing, debugging and maintenance. IEEE Trans. Softw. Eng. 16, 9 (Sept.) 965–979.
RAMALINGAM, G. 2002. On loops, dominators, and dominance frontiers In Proceedings of the ACM
SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM,
New York, pp. 233–241.
REIF, H., AND TARJAN, R. 1981. Symbolic program analysis in almost-linear time. J. Comput. 11, 1
(Feb.), 81–93.
SREEDHAR, V. C., AND GAO, G. R. 1995. A linear time algorithm for placing φ-nodes. In Conference
Record of POPL ’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages (San Francisco, Calif., Jan.). ACM, New York, pp. 62–73.
SHAPIRO, R. M., AND SAINT, H. 1970. The representation of algorithms. Tech. Rep. CA-7002-1432,
Massachusetts Computer Associates.
VAN EMDE BOAS, P., KAAS, R., AND ZIJLSTRA, E. 1977. Design and implementation of an efficient
priority queue. Math. Syst. Theory 10, 99–127.
WEISS, M. 1992. The transitive closure of control dependence: The iterated join. ACM Lett. Prog. Lang.
Syst. 1, 2 (June), 178–190.
WOLFE, M. 1995. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, Mass.