LCM V
LCM V
1. INTRODUCTION
Morel and Renvoise discussed in their paper [5] a universal method for global code optimization in a
compiler. By suppressing partial redundancies their method achieves several optimization techniques -
such as moving loop invariant computations out of a loop or deleting redundant computations - at once.
The algorithm is based on a bi-directional data flow analysis. It moves computations to earlier places in
execution paths and thereby eliminates partial redundancies. The movement is controlled by some
heuristics which are mainly due to a term named CONST in [5].
The heuristics and algorithms used in Morel and Renvoise's original paper [5] sometimes cause
difficulties in practical use. Subsequent papers partly circumvented these difficulties by modifications of
the heuristics and the data flow equations to be solved. In [4] Knoop, Rtithing, and Steffen published an
optimal algorithm for the elimination of partial redundancies which entirely prevents these difficulties.
Their algorithm is based on flow graphs with single statements as nodes. From a practical point of view
flow graphs with entire basic blocks as nodes would be preferable.
Section 3 of this paper presents a variant of the idea in [4] which is based on flow graphs with entire
basic blocks rather than single instructions as nodes. Further, in contrast to [4] and similar to [3], the
computational model of a multi-register machine (with an arbitrary number of pseudo registers) is
assumed. This allows to perform code motion after instruction selection and makes a special treatment of
temporaries superfluous (saving one of the systems of data flow equations). Similar as the algorithms in
[4] the algorithms presented in this paper entirely prevent the difficulties mentioned in [1], [2], and 113].
The idea from [4] of eliminating partial redundancies by moving computations to earlier places is kept,
but a different data flow frame work is used to achieve this. Therefore, correctness and also optimality of
the new algorithms need to be proved again which is done in section 4. Practical results are given in
section 5.
Moving a computation to an earlier place in an execution path increases the live range of registers
holding result values of the computations. Therefore, some of the difficulties with Morel and Renvoise's
original method are unnecessary (redundant) code movements. This problem was addressed in several
papers [1], [2], [3]. They improved the behaviour of code movement but were still not able to entirely
prevent redundant code movements.
Since Morel and Renvoise only inserted new computations at the end of basic blocks, they could not
eliminate all partial redundancies. To allow the elimination of more partial redundancies [1], [2], [3], and
[4] took edge placements into consideration (or assumed that critical edges have been prevented by
inserting empty blocks where necessary). The possibility of edge placement was already mentioned by
Morel and Renvoise but not used in their original paper [5].
ACM SIGPLAN Notices, Volume 28, No. 5 May 1993
29
Drechsler and Stadel discussed in [3] a solution to prevent the difficulty of the movement of complex
expressions to earlier places in execution paths than some of their subexpressions.
In general, solving a bi-directional system of data flow equations like that given in [5] is less efficient
than solving a uni-directional system. By using edge placement Drechsler and Stadel [3] could reduce the
data flow problem to a uni-directional system of equations. The systems of equations proposed by
Knoop, Riithing, and Steffen in [4] are also all uni-directional. The systems of equations suggested by
Dhamdhere in [1] and [2] are weak bi-directional and therefore have the same low computational
complexity as uni-directinal ones.
The method presented in this paper uses the same basic idea of two phases as Knoop, Rtithing, and
Steffen in [4]. As in [4], the first phase determines the earliest possible placements which allow the
elimination of as many partial redundancies as possible. In the second phase these placements are moved
forward to the latest possible places without affecting the elimination of partial redundancies. However,
the use of the information already contained in the previously computed values for availability (AVOUT)
and anticipability (ANTIN) leads to simpler and fewer data flow equations.
An expression e is available at entry of block i, AVINi (e), if every path re[entry, i[ contains at least
one computation of e, i.e. COMP k (e) = TRUE for some k on n[entry, i[, and TRANSPn]k, i[ (e) = TRUE.
Availability at exit of block i is then defined by
AVOUTi (e) = AVINi (e)- TRANSPi (e) + COMPi (e).
Availability of an expression e can be computed (iteratively) as the maximal solution of the system of
equations
AVINi (e) = FALSE, if i is the entry block
All these computations can be done concurrently for all expressions; Boolean operations -, +, and
then become operations on bit-vectors. For the sake of clarity in the following we will often consider
only one expression and thus drop argument (e) from Boolean properties.
The following additional definition is helpful for proofs of theorems.
Definition
A path rc[i,j[ is said to carry availability (for an expression e) if there is a basic block k on rc such
that COMP k = TRUE and TRANSPn]k, j[ = TRUE.
Note that e is available at entry o f j if and only if all paths re[entry, j[ carry availability, e is partially
available at entry of j if and only if a path re[entry, j[ exists which carries availability. The concept of
partial availability was introduced in [5], but this paper does not explicitly make use of it.
31
t
In a first phase for each edge (i, j) of the flow graph a property EARLIESTi, j is computed as follows:
EARLIEST/, j = ANTINj • AVOUTi, if i is the entry block
Note that - in contrast to similar equations in [4] - this is a local property of an edge rather than a
global system of equations which would need to be solved iteratively. All the necessary global data flow
information has already been collected during the previous computation of AVOUT and ANTIN.
In other words, for each path rc reaching a computation of some expression e, EARLIESTi, j = TRUE
for the earliest edge on n where e is not available but where a computation of e could safely be
inserted.As in [4], in a second phase a maximal solution of the following system of equations is
iteratively computed
LATERINj = FALSE, i f j is the entry block
This is a uni-directional system which can efficiently be solved. In other words, LATER determines
whether insertions of a computation can also be done later in the flow graph with the same effect. There
is another intuitive explanation of LATER which is based on the following definition:
Definition
A path n[i,j[ is said to carry (earliest) placeability (for an expression e) if there is an edge (k, k') on
such that EARLIESTk, k' = TRUE and TRANSPrc[k, ' j[ = TRUE and there is no computation (of e)
on rc[k',j[.
Lemma 1 below shows that LATERINj = TRUE if and only if all paths reaching j carry placeability.
In this sense LATERINj is similar to AVINj.
We are considenng the insertion of computations in edges rather than at the end of basic blocks. The
computation (of an expression e) is inserted in an edge (i, j) if INSERTi, j = TRUE, where:
The first computation (of an expression e) in a block i is deleted if DELETE/= TRUE, where:
DELETE/ = FALSE, if i is the entry block
All together only three global and uni-directional systems of Boolean equations need to be solved
iteratively: the systems for AVAIL, ANT, and LATER. When the original method of Morel and
Renvoise [5] is applied three uni-directional systems of equations for AVAIL, PAVAIL (partial
availability), and ANT, and in addition the bi-directinal system for PP (placement possible) have to be
solved. The method of Knoop, Riithing, and Steffen [4] requires the solution of 4 uni-directinal systems
of equations.
32
4. THEORETICAL RESULTS
The correctness and optimality of the code motion method presented in the previous section will now
be proved. Although these proofs use similar ideas as those in [4], they were developped independly.
LEMMA 1.
LATERINj = TRUE if and only if all paths ~[entry, j[ reaching block j carry placeability. LATERi, j =
TRUE if and only if all paths n[entry, i] carry placeability.
PROOF. LATERINj = TRUE implies that every path re[entry, j] reaching block j contains some edge
(i, i') with EARLIEST/, i' = T R U E and that ANTLOC k = FALSE for all blocks k on ~[i',j[, i.e. that there
is no computation on rc[i',j[. On the other hand, EARLIEST/, i' = TRUE implies ANTIN i, = TRUE, i.e.
every path [i', exit] contains a computation, say in block k, and the subpath to the first computation on
that path is transparent. From these facts we conclude TRANSPrc[i,,j [ = TRUE and thus 7r carries
placeability. The case LATER/, j = TRUE is treated similarly.
Assume now that all paths reaching j carry placeability. It is now a direct consequence of the
definitions that a maximal solution of the system of equations for LATER and LATERIN delivers
LATERINj = TRUE. 13
LEMMA 2.
Every path rc = re[entry, i[ reaching a computation in a block i (i.e. ANTLOCi = TRUE) carries
availability or placeability.
If every path reaching a block i carries availability (i.e. AVIN i = TRUE) then none of them carries
placeability.
Note that a path may can'y both, availability and placeability. An example is a path re[l, 7[ in the flow
graph of Figure 1 below.
LEMMA 3.
A transparent path rc]i,j'[ starting with an edge (i, i') and ending with an edge (j,j') such that
EARLIEST/, i' and EARLIESTj, j, are both TRUE must contain some computation.
33
Similarily, a transparent path rc]i,j'[ starting with an edge (i, i') and ending with an edge (j, j3 such
that INSERT/, i' and INSERTj. j, are both TRUE must also contain some computation.
PROOF. EARLIESTj, j , = TRUE and TRANSPj = TRUE imply ANTOUTj = FALSE. If there were no
computation on n this would propagate to block i' such that ANTIN i, would be FALSE, a contradiction to
EARLIEST/, i" = T R U E .
By lemma 1 INSERT/, i' = TRUE implies that every path reaching block i carries placeability, i.e.
there is a transparent path starting with an edge (k, k3, ending at block i, and satisfying EARLIESTk, k" =
TRUE. Further, there are no computations on that path between k' and i. A similar situation holds for
(j, j3. Thus, the case for INSERT is reduced to the case for EARLIEST. 0
THEOREM I: CORRECTNESS.
Insertions are only done at places where the computation was anticipatable. Only those computations
are deleted which would be redundant when all insertions have been done. After all insertions and
deletions have been done no path contains more computations of an expression than before.
PROOF. Consider an edge (i, j) with INSERT/, j = TRUE. Lemma 1 states that every path x ending with
edge (i, j) carries placeability, i.e. there is an edge (k, k') on n such that EARLIESTk, k' = TRUE and
TRANSPrc[k, ' j[ = TRUE. EARLIESTk, k' = TRUE implies ANTIN k, = TRUE which would be impossible
in the case of ANTINj = FALSE. Thus ANTINj = TRUE, which proves the first statement of the
theorem.
By definition ANTINj = TRUE says that each path n[j, exit] contains a computation, say in some block
k on rc (i.e. ANTLOC k = TRUE), such that TRANSPn~j, k[ = TRUE. Since we started our discussion from
an edge (i, j) with INSERT/, j = TRUE we know that LATERj = FALSE and this propagates to k such
that we also have LATER k = FALSE which implies DELETE k = TRUE. This proves that each insertion
is accompanied by a deletion on the same path.
Lemrna 3 states that no two INSERTs reach the same computation.
It remains to be proved that all deletions are correct. Let k be a basic block with ANTLOC k = TRUE.
DELETE k = TRUE then implies LATER k = FALSE. Lemma 2 states that all paths reaching k either
carry availability or placeability. From LATER k = FALSE we conclude that all paths carrying
placeability contain edges for which INSERT is TRUE. After these insertions have been done the first
computation in block k is redundant. 0
THEOREM 2: OPTIMALITY.
There is no placement causing less computations in some paths than the placement given by the values
of INSERT and DELETE.
PROOF. We only need to show that remaining transparent paths from one computation to another
cannot be prohibited by any other placement.
Assume the contrary. Then there is a transparent path ~]i,j[ from one computation of an expression e
in a basic block i (COMPi = TRUE) to another in a block j (ANTLOCj = TRUE). Let k' be the earliest
block on x]i,j[ with ANTIN k, = TRUE. Let k be the predecessor of k' on rr. Then either k = i or
ANTOUT k = FALSE.
34
In case of AVOUT k = FALSE, which excludes k = i, the only safe way to make the computation in j
redundant would be to insert computations on r~[k;j]. But then ~ would still be a transparent path from
one computation (in i) to another (the inserted one). Thus, in this case, the partial redundancy in j cannot
be eliminated by save insertions. An example of this situation is a path [ 1, 7] in the flow graph of Figure
1 below.
In case of AVOUT k = TRUE all paths reaching k carry availability. Thus, by lemma 2, none of them
can carry placeability, i.e. LATERINj = FALSE. If j = k' we then have DELETEj= TRUE. Otherwise,
since ANTIN k. = TRUE, TRANSP and ANTOUT are both TRUE everywhere on rc[k',j] which implies
that EARLIEST must be FALSE for every edge on rc[k',j]. This leads again to LATERINj = FALSE and
hence DELETEj = TRUE.
THEOREM 3.
Insertions are as late as possible as a maximal amount of partial redundancies are still being
eliminated.
A consequence of Theorem 3 is that life ranges of pseudo registers are kept as short as possible leading to
minimal stress for register assignment.
PROOF. From lemma 1 we know that LATERINj = TRUE is equivalent to "all paths reaching j carry
placeability". INSERT/, j = TRUE implies LATERINj = FALSE, i.e. some path reaching j does not carry
placeability. From theorem 1 we know that paths reaching j and not carrying placeability must carry
availability. Insertions later than in edge (i, j) would then be partially redundant.
THEOREM 4.
Let e be an expression a n d f a sub-expression of e. Then INSERTi, j (e) => AVOUTi (f) + INSERTi, j
Theorem 4 states that at places where an expression e is inserted all its sub-expressions are either also
inserted or are already available.
PROOF. We assume that at the beginning, together with the computation of e also all its sub-expressions
are computed. Further, operands o f f are also operands of e. Thus, when some operands o f f are modified
then also some operands of e are modified. From this we derive the following implications (see also [3]):
COMPi (e) => COMPi (D
TRANSPi (e) => TRANSPi (D
ANTLOCi (e) => ANTLOCi (f)
AVOUTi (e) => AVOUTi (f)
ANTINi (e) => ANTINi (f)
Further, a path carrying availability for e also carries availability forf.
From the equations of EARLIEST/, j we conclude
EARLIEST/, j (e) => EARLIEST/, j (t) + AVOUTi ~ , if i is entry block
=> EARLIEST/, j ((D + AVOUTi (f) + TRANSPi (f) • ANTOUTi (f), otherwise
Thus, a path carrying placeability for e also either carries placeability or availability forf.
INSERT/, j (e) = TRUE implies that all paths reaching the exit of i carry placeability for e and there is
some path reaching j carrying availability for e. Thus, all paths reaching the exit of i also carry either
placeability or availability f o r f and there is some path reaching j carrying availability forf. Hence, after
35
all insertions have been done, f has either also been inserted in (i, j) or has become available at exit of i.
D
5. PRACTICAL RESULTS
With the new code motion method presented in this paper the example from [2] have been
recomputed. Figure 1 shows the result of the new algorithm applied to the example given in [2]. As
explained in [2], methods based on the algorithm of Morel and Renvoise would move the computation
from block 7 to the end of block 4 (redundant hoisting through a loop).
76
Fig. 1: Preventing redundant hoisting through a loop
Also the method presented in this paper deliveres EARLIEST3, 4 = TRUE. But then we have
LATER3, 4 = LATER4, 5 = LATER5, 6 = LATER6, 5 = LATER6, 7 = TRUE and thus the computation in
block 7 is not hoisted.
Next we are discussing a real world example. Figure 2 shows the flow graph of the routine
x y _ r o u n d in Knuth's METAFONT as it appeared during code generation for the Siemens-Nixdorf
main frames (compatible to IBM 370 architecture) as target. We are concentrating our thoughts on three
expressions a, b, c which are some compiler generated address caclulations. The original algorithm of
Morel and Renvoise would insert computations of a and b at the end of node 5 and a computation of c at
the end of node 10 (marked as "MR: ..." in Figure 2). The computations in nodes 13, 17, and 23 would
then become redundant and would be deleted. The method presented here, however, inserts computations
of a in edges (11, 17), (15, 17), (16, 17), and (18, 19) (marked as "DS: ..." in Figure 2) and deletes the
redundant computations of a in nodes 17 and 23. The computations of b and c remain in their nodes 17
and 23 resp. This example shows that cases where the original Method of Morel and Renvoise
unnecessarily increases live ranges of registers really occur in real world programs. Aside the address
calculations a, b, and c there are many other computations in the basic blocks 5 to 23 which require lots
of registers. Thus, it is important to keep the live ranges of the address registers a, b, and c as short as
possible.
Edge Placements
Placing new computations in edges means introducing new basic blocks. This, however, requires
additional jump instructions. Therefore, edge placements should be prevented as far as possible.
36
If an edge (i, j) is the only in-edge of its destination j, i.e. I PREDj I = 1, then LATERINj = LATER/, j
and hence INSERT/, j = FALSE.
5~'~R: a, b
6[ ] t ~7
8 ',-rxx,. 9
10( )MR: c
12
11
14
DS: a
DS:
17
19 18
20
23
37
If lqIjGsucci INSERT/, j = TRUE then all the insertions can be done at the end of block i rather than in
edges (i, j). This is always true in the case of an edge which is the only out-edge of its source i, i.e.
I SUCCi I = 1. For example, in the flow graph of Figure 2 all the insertions can be done at the end of the
nodes 11, 15, 16, and 18 rather than in edges.
Thus, edge placements are at most required for critical edges, that are edges (i, j) with I SUCCi I > 1
and also I PREDj I > 1. If the code is to be generated for a pipelined processor architecture there are often
empty delay slots at the beginning of block j or at end of block i. Computations can be inserted in such
delay slots without needing extra cycles to be executed. If all computations to be inserted in an edge (i, j)
fit in empty delay slots at the beginning of block j, then the computations should be inserted there rather
than in edge (i, j). Since then their execution will need no extra time, it does not matter that they will be
partly redundant. Similarly, if for all computations to be inserted in an edge (i, j) we have ANTOUTi and
they all fit in delay slots at the end of block i (e.g. branch delay slots), then these computations should be
inserted there rather than in edge (i, j). Again it does not matter that these computations will then be
partly redundant.
6. CONCLUSION
Allowing edge placement and splitting the determination of insertions into two phases made it possible
to find an improved method for elimination of partial redundancies. It could be shown that the new
method is optimal, i.e. no other method is able to eliminate more partial redundancies by only safe code
motions. It was further shown that live ranges of pseudo-registers are kept as short as possible, leading to
minimal stress for register assignment. The presented method is also efficient; only three uni-directional
systems of Boolean equations must be iteratively solved to determine insertions and deletions of
computations.
With the new method elimination of partial redundancies as universal method for achieving several
code optimizations at once should now have become much more attractive for practical use in compilers.
ACKNOWLEDGEMENT
The authors wish to thank their supervisors W. Ffielinghaus and P. Jilek, who encouraged them to
write this paper. Further they wish to thank B. Steffen for their helpful comments on an earlier version of
this paper.
REFERENCES
1. Dhamdhere, D. M. A Fast Algorithm for Code Movement Optimisation. SIGPLAN Notices 23, 10
(1988), 172-180.
2. Dhamdhere, D. M. Practical adaption of the global optimization algorithm of Morel and Renvoise.
ACM Trans. Program. Lang. Syst. 13, 2 (1991), 291-294.
3. Drechsler, K.-H., Stadel, M. P. A solution to a problem with Morel and Renvoise's "global
optimization by suppression of partial redundancies". ACM Trans. Program. Lang. Syst. 10, 4 (1988),
635-640.
4. Knoop, J., RiJthing, O., Steffen, B. Lazy code motion. SIGPLANNotices 27, 7, (1992), 224-234.
5. Morel, E., and Renvoise, C. Global optimization by suppression of partial redundancies. Commun.
ACM 22, 2 (1979), 96-103, 111-126.
38