CD Unit-5
CD Unit-5
Machine-Independent Optimization
Cmrcet N.Nagaveni
UNIT–V
Machine Independent Optimization. The principle sources of Optimization ,Introduction to
Date flow Analysis, Foundations of Data-Flow Analysis, Constant Propagation, Partial
Redundancy Elimination, Loops in Flow Graph.
• There are number of optimization techniques that a compiler can use to improve the
performance of the program without changing its output.
• The optimization technique should not influence the semantics of the program.
• Common-sub expression elimination, copy propagation, dead-code elimination, and constant
folding are common examples of such semantic preserving transformations.
Cmrcet N.Nagaveni
The below example shows the optimized code after eliminating the common sub expressions.
Fig (a): Before Elimination Fig(b) After Common sub exp elimination
Cmrcet N.Nagaveni
Example 2: Quick Sort Example
In this example a fragment of a sorting program called quicksort is used to illustrate several
important code-improving transformations.
void quicksort(int m, int n)
/ *recursively sorts a[m] through a[n] */
{
int i , j ;
int v, x;
if (n <= m) return;
/ * fragment begins here */
i = m - 1 ; j = n ; v = a[n] ;
while (1) {
do i = i + 1 ; while ( a[i] < v
); do j = j - 1 ; while ( a[j] >
v ); i f (i>= j) break;
x = a[i]; a[i] = a[j]; a[j] = x; /* swap a [i],a [j] */
}
x = a [i]; a[i] = a[n]; a[n] = x; /* swap a[i],a[n] */
/* fragment ends here */
quicksort(m,j) ; quicksort(i+1,n );
}
Figure 5.1: C code for quicksort
Intermediate code for the marked fragment of the program is shown in Fig. 5.2.
(1) i = m-1 (16) t7 = 4*i
(2) j=n (17) t8 = 4*j
(3) t1 = 4*n (18) t9 = a[t8]
(4) v = a[t1] (19) a[17] = t9
(5) i = i+1 (20) t10 = 4*j
(6) t2 = 4*i (21) a[t10] = x
(7) t3 = a[t2] (22) goto (5)
(8) if t3<v goto (5) (23) t11 = 48i
(9) j = j-1 (24) x = a[t11]
(10) t4 = 4*i (25) t12 = 4*i
(11) t5 = a[t4] (26) t13 = 4*n
(12) if t5>v goto (9) (27) t14 = a[t13]
(13) if i>=j goto (23) (28) a[t12] = t14
(14) t6 = 4*i (29) t15 = 4*n
(15) x=a[t6] (30) a[t15] = x
Figure 5.2: Three-address code for fragment in Fig. 6.1
In this example we assume that integers occupy four bytes. The assignment x = a[i] is translated into the
two three-address statements
t6 = 4*i
x = a[t6]
as shown in steps (14) and (15).
Similarly, a[ j ] = x becomes
t10 = 4*j
a[t10] = x
in steps (20) and (21).
Cmrcet N.Nagaveni
Figure 5.3 is the flow graph for the program in Fig. 5.2. Block B1 is the entry node. All conditional and
unconditional jumps to statements in Fig. 5.2 have been replaced in Fig. 5.3 by jumps to the block of
which the statements are leaders. In Fig. 5.3, there are three loops. Blocks B2 and B3are loops by
themselves. Blocks B2, B3, B4, and B5 together form a loop, with B2 the only entry point.
t6 = 4*i t6 = 4*i
x = a[t6] x = a[t6]
t7 = 4*i t8 = 4*j
t8 = 4*j t9 = a[t8]
t9 = a[t8] a[t6] = t9
a[t7] = t9 a[t8] = x
t10 = 4*j gotoB2
a[t10] = x
goto B2
(a) Before. (b) After.
Figure 5.4: Local common-subexpression elimination
Cmrcet N.Nagaveni
Figure 5.5: B5 and B6 after common-subexpression elimination
Cmrcet N.Nagaveni
5.2.3 Copy Propagation:
It is the process of replacing the occurrences of targets of direct assignments with their values. A direct
assignment is an instruction of the form x = y, which simply assigns the value of y to x.
y=x
z=3+y
z=3+x
int foo(void)
{
int a =24;
int b =25;/* Assignment to dead variable */
int c;
c = a <<2;
return c;
b =24; /* Unreachable code */
return 0; }
Simple analysis of the uses of values would show that the value of b after the first assignment is not
used inside foo.
Furthermore, b is declared as a local variable inside foo, so its value cannot be used outside foo.
Thus, the variable b is dead and an optimizer can reclaim its storage space and eliminate its
initialization.
Furthermore, because the first return statement is executed unconditionally, no feasible execution
path reaches the second assignment to b. Thus, the assignment is unreachable and can be removed.
Cmrcet N.Nagaveni
Example 2:
• In the example below, the value assigned to i is never used, and the dead store can be
eliminated.
• The first assignment to global is dead, and the third assignment to global is unreachable; both
can be eliminated.
int global;
void f ()
{
int i;
i = 1; /* dead store */ global
= 1; /* dead store */ global =
2;
return;
global = 3; /* unreachable */
}
int global;
void f ()
{
global = 2;
return;
}
Cmrcet N.Nagaveni
5.2.5. Strength Reduction:
• Strength reduction is an optimization technique in which expensive operations are replaced
with equivalent but less expensive operations.
• The classic example of strength reduction converts "strong" multiplications inside a loop into
"weaker" additions – something that frequently occurs in array addressing. By doing this the
execution speed can be increased.
for(i=1;i<=5;i++)
{
x=4*i;
}
The instruction x=4*i in the loop can be replaced by equivalent additions instruction as x=x+4;
Code after strength reduction is shown below.
x=0;
for(i=1;i<=5;i++)
x=x+4;
Cmrcet N.Nagaveni
5.3. Loop Optimization:
Loops are a very important place for optimizations, especially the inner loops where programs tend
to spend the bulk of their time. The running time of a program may be improved if we decrease the
number of instructions in an inner loop, even if we increase the amount of code outside that loop.
There are three techniques:
1. Code Motion
2. Elimination of induction variables
3. Strength reduction
1. Code Motion:
Code motion reduces the number of instructions in a loop by moving instructions outside
a loop. It moves loop invariant computations i.e, those instructions or expressions that
result in the same value independent of the number of times a loop is executed and places
them at the beginning of the loop. The relocated expressions become an entry for the loop.
Example:
While(X!=n-2)
{
X=X+2
;
}
here the expression n-2 is a loop invariant computation i.e, the value evaluated by this
expression is independent of the number of times the while loop is executed. In other
words the value of n remains unchanged. The code relocation places the expressions n-2
before the while loop begins as shown below.
M=n-2;
While(X!=M)
{
X=X+2
;
}
Cmrcet N.Nagaveni
void fun(void)
{
inti,j,k;
for(i=0;i<10;i++)
a[i]=b[i];
return;
}
Thus induction variable elimination reduces the code and improves the run time performance.
3. Strength Reduction:
• Strength reduction is an optimization technique in which expensive operations
are replaced with equivalent but less expensive operations.
• The classic example of strength reduction converts "strong" multiplications inside
a loop into "weaker" additions – something that frequently occurs in array
addressing. By doing this the execution speed can be increased. For example
consider the code
for(i=1;i<=5;i++)
{
x=4*i;
}
The instruction x=4*i in the loop can be replaced by equivalent additions instruction as
x=x+4
Cmrcet N.Nagaveni
Fig (a): Before Instruction Scheduling.
First we apply local optimization on B1 and B2 independently. As shown in above figure. B1 contains
common sub-expression and copy propagations they are eliminated. After B1, optimization is
Cmrcet N.Nagaveni
performed independently on B2.
After performing local optimization, global optimization is performed on B1 and B2. In this case B2
contains T6=a+b and T7=c+T6 whose values are already calculated in B1. So instead of recomputing
these expressions we can use T1 and T2 directly in B2. After global Optimization the modified code is
shown above figure.
Cmrcet N.Nagaveni
Constant Propagation
• Constants assigned to a variable can be propagated through the flow graph and substituted at
the use of the variable.
• Another important optimization to be considered is constant propagation.
• Similar to copy propagation, if a variable is assigned a constant and if the variable is used in
subsequent expression, it can be replaced by the constant thus making the assignment of the
constant to the variable as dead code.
Consider the forward propagation of assignment of the form
Constant propagation involves replacement of “rx” with “L” wherever possible, provided the definition’s’
is available at point of replacement.
Example:
In the code fragment below, the value of x can be propagated to the use of x.
x = 3;
y = x + 4;
Below is the code fragment after constant propagation and constant folding.
x = 3;
y = 7;
Notes:
Some compilers perform constant propagation within basic blocks; some compilers perform constant
propagation in more complex control flow.
Some compilers perform constant propagation for integer constants, but not floating-point constants.
Few compilers perform constant propagation through bitfield assignments.
Few compilers perform constant propagation for address constants through pointer assignments.
Cmrcet N.Nagaveni
5.7. Data Flow Analysis:
Data flow analysis is a technique of gathering all the information about the program and distributing
this information to all the blocks of a flow graph.
It determines the information regarding the definition and use of data in program. This technique is
used for optimization.
• Definition Point: A point X in a program which contains definition or where definition is
specified.
• Reference Point: A point X in a program where data item is referred.
• Evaluation Point: A point X in a program where an expression is evaluated.
Cmrcet N.Nagaveni
Available Expression: An expression is said to be available at program point X if along the path it
reaches X. Also an expression is said to be available if none of its operands gets modified before its use.
In above figure expression b*C is available in Blocks B2 and B3. It is used to eliminate common sub
expressions.
Reaching Definition: A definition D reaches point X, if there is a path from the point immediately
following D to X such that D is not killed or not redefined along that path.
Cmrcet N.Nagaveni
Foundations of Data-Flow Analysis
1 Semilattices
2 Transfer Functions
3 The Iterative Algorithm for General Frameworks
4 Meaning of a Data-Flow Solution
Having shown several useful examples of the data-flow abstraction, we now study the family of data-flow
schemas as a whole, abstractly. We shall answer several basic questions about data-flow algorithms
formally:
Under what circumstances is the iterative algorithm used in data-flow analysis correct?
In Section 9.2, we addressed each of the questions above informally when describing the reaching-
definitions problem. Instead of answering the same questions for each subsequent problem from scratch,
we relied on analogies with the problems we had already discussed to explain the new problems. Here we
present a general approach that answers all these questions, once and for all, rigorously, and for a large
family of data-flow problems. We first iden-tify the properties desired of data-flow schemas and prove the
implications of these properties on the correctness, precision, and convergence of the data-flow algorithm,
as well as the meaning of the solution. Thus, to understand old algorithms or formulate new ones, we
simply show that the proposed data-flow problem definitions have certain properties, and the answers to all
the above difficult questions are available immediately.
Cmrcet N.Nagaveni
The concept of having a common theoretical framework for a class of sche-mas also has practical
implications. The framework helps us identify the reusable components of the algorithm in
our software design. Not only is cod-ing effort reduced, but programming errors are reduced by not having
to recode similar details several times.
1. Semilattices
A semilattice is a set V and a binary meet operator A such that for all x, y, and z in V:
1 x A x — x (meet is idempotent).
2. x Ay = y A x (meet is commutative).
3 x A (y A z) = (x A y) A z (meet is associative).
for all x in V, T A x — x.
for all x in V, _L A x = ±.
Partial Orders
As we shall see, the meet operator of a semilattice defines a partial order on the values of the domain. A
relation < is a partial order on a set V if for all x, y, and z in V:
1. x < x (the partial order is reflexive).
2. If x < y and y < x, then x = y (the partial order is antisymmetric).
3. If x < y and y < z, then x < z (the partial order is transitive).
The pair (V, <) is called a poset, or partially ordered set. It is also convenient to have a < relation for a
poset, defined as x < y if and only if (x < y) and (x / y).
The Partial Order for a Semilattice
It is useful to define a partial order < for a semilattice (V, A). For all x and y in V, we define
x < y if and only if x A y = x.
Because the meet operator A is idempotent, commutative, and associative, the < order as defined is
reflexive, antisymmetric, and transitive. To see why, observe that:
Reflexivity: for all x, x < x. The proof is that x A x = x since meet is idempotent.
• Antisymmetry: if x < y and y < x, then x — y. In proof, x < y means x Ay = x and y < x means y A
x = y. By commutativity of A, x = (x Ay) = (y Ax) = y.
• Transitivity: if x < y and y < z, then x < z. In proof, x < y and y < z means that x A y = x and y A z - y.
Then (x A z) = ((x A y) A z) = (x A (y A z)) = (x A y) = x, using associativity of meet. Since x A z = x
has been shown, we have x < z, proving transitivity.
Example 9 . 1 8 : The meet operators used in the examples in Section 9.2 are set union and set intersection.
They are both idempotent, commutative, and associative. For set union, the top element is 0 and the bottom
Cmrcet N.Nagaveni
element is U, the universal set, since for any subset xo£U,$Ux = x and U U x = U. For set
intersection, T is U and ± is 0. V, the domain of values of the semilattice, is the set of all subsets of U,
which is sometimes called the power set of U and denoted 2U.
For all x and yinV,xUy = x implies x D y; therefore, the partial order imposed by set union is D, set
inclusion. Correspondingly, the partial order imposed by set intersection is C, set containment. That is, for
set intersection, sets with fewer elements are considered to be smaller in the partial order. How-ever, for set
union, sets with more elements are considered to be smaller in the partial order. To say that sets larger in
size are smaller in the partial order is counterintuitive; however, this situation is an unavoidable
consequence of the definitions.6
As discussed in Section 9.2, there are usually many solutions to a set of data-flow equations, with the
greatest solution (in the sense of the partial order <) being the most precise. For example, in reaching
definitions, the most precise among all the solutions to the data-flow equations is the one with the smallest
number of definitions, which corresponds to the greatest element in the partial order defined by the meet
operation, union. In available expressions, the most precise solution is the one with the largest number of
expressions. Again, it is the greatest solution in the partial order defined by intersection as the meet
operation. •
G r e a t e s t Lower B o u n d s
There is another useful relationship between the meet operation and the partial ordering it imposes.
Suppose (V, A) is a semilattice. A greatest lower bound (or gib) of domain elements x and y is an
element g such that
g<x,
g < y, and
It turns out that the meet of x and y is their only greatest lower bound. To see why, let g = x A y. Observe
that:
And if we defined the partial order to be > instead of <, then the problem would surface when the meet was
intersection, although not for union.
In symmetry to the gib operation on elements of a poset, we may define the least upper bound (or lub) of
elements x and y to be that element b such that x <b, y <b, and if z is any element such that x < z and y <
z, then b < z. One can show that there is at most one such element b if it exists.
In a true lattice, there are two operations on domain elements, the meet A, which we have seen, and the
operator join, denoted V, which gives the lub of two elements (which therefore must always exist in the
lattice). We have been discussing only "semi" lattices, where only one of the meet and join operators exist.
That is, our semilattices are meet semilattices. One could also speak of join semilattices, where only the
join operator exists, and in fact some literature on program analysis does use the notation of join
semilattices. Since the traditional data-flow literature speaks of meet semilattices, we shall also do so in
this book.
g < x because (x A y) A x = x A y. The proof involves simple uses of associativity, commutativity, and
idempotence. That is,
g Ax = ((x A y) A x) — (x A (y A x)) =
(x A (x A y)) = ((x Ax) Ay) =
Cmrcet N.Nagaveni
(x Ay) = g
Suppose z is any element such that z < x and z < y. We claim z < g, and therefore, z cannot be a gib of x
and y unless it is also g. In proof: (zAg) = (zA (xAy)) — ((zAx) Ay). Since z < x, we know (zAx) = z, so
(z Ag) = (z Ay). Since z < y, we know z Ay = z, and therefore z Ag — z.
We have proven z < g and conclude g = x A y is the only gib of x and y.
Lattice Diagrams
It often helps to draw the domain V as a lattice diagram, which is a graph whose nodes are the elements
of V, and whose edges are directed downward, from x to y if y < x. For example, Fig. 9.22 shows the
set V for a reaching-definitions data-flow schema where there are three definitions: d±, d2, and d%. Since <
is an edge is directed downward from any subset of these three definitions to each of its supersets. Since <
is transitive, we conventionally omit the edge from x to y as long as there is another path from x to y left in
the diagram. Thus, although {di,d2,dz} < {di}, we do not draw this edge since it is represented by the path
through {di,d2}, for example.
It is also useful to note that we can read the meet off such diagrams. Since x A y is the gib, it is always the
highest z for which there are paths downward to z from both x and y. For example, if x is {di} and y is { ^ 2
} , then z in Fig. 9.22 is {G?I,G?2}5 which makes sense, because the meet operator is union. The top
element will appear at the top of the lattice diagram; that is, there is a path downward from T to each
element. Likewise, the bottom element will appear at the bottom, with a path downward from every
element to _L.
Product Lattices
While Fig. 9.22 involves only three definitions, the lattice diagram of a typical program can be quite large.
The set of data-flow values is the power set of the definitions, which therefore contains 2n elements if there
are n definitions in the program. However, whether a definition reaches a program is independent of the
reachability of the other definitions. We may thus express the lattice7 of definitions in terms of a "product
lattice," built from one simple lattice for each definition. That is, if there were only one definition d in the
program, then the lattice would have two elements: {}, the empty set, which is the top element,
and {d}, which is the bottom element.
2. The meet A for the product lattice is defined as follows. If (a, b) and (a1, b') are domain elements of
the product lattice, then
Cmrcet N.Nagaveni
So we might ask under what circumstances does (a AA a', bAB b') = (a, &)? That happens exactly when a
A A a' — a and b AB b' — b. But these two conditions are the same as a <A OJ and b <B b'.
The product of lattices is an associative operation, so one can show that the rules (9.19) and (9.20) extend
to any number of lattices. That is, if we are given lattices (Ai, Ai) for i = 1, 2 , . . . , k, then the product of
all k lattices, in this order, has domain A1 x A2 x • • • x Ak, a meet operator defined by
( a i , a 2 ) - • • ,ak) A (h,b2,... ,bk) = (ax Ai h,a2 A2 b2,... ,ak Ak h)
and a partial order defined by
(oi, a2,... , ak) < (bi, b2,... , bk) if and only if a« < bi for all i.
Height of a Semilattice
We may learn something about the rate of convergence of a data-flow analysis algorithm by studying the
"height" of the associated semilattice. An ascending chain in a poset (V, <) is a sequence where x1 < x2 <
... < xn. The height of a semilattice is the largest number of < relations in any ascending chain; that is, the
height is one less than the number of elements in the chain. For
example, the height of the reaching definitions semilattice for a program with
n definitions is n.
Showing convergence of an iterative data-flow algorithm is much easier if the semilattice has finite height.
Clearly, a lattice consisting of a finite set of values will have a finite height; it is also possible for a lattice
with an infinite number of values to have a finite height. The lattice used in the constant propagation
algorithm is one such example that we shall examine closely in Section 9.4.
2. Transfer Functions
The family of transfer functions F : V -» V in a data-flow framework has the following properties:
1. F has an identity function i", such that I(x) = x for all x in V.
2. F is closed under composition; that is, for any two functions / and g in F, the function h defined by h(x)
= g(f(x)) is in F.
To make an iterative algorithm for data-flow analysis work, we need for the data-flow framework to satisfy
one more condition. We say that a framework is monotone if when we apply any transfer function / in F to
two members of V, the first being no greater than the second, then the first result is no greater than the
second result.
3. The Iterative Algorithm for General Frameworks
We can generalize Algorithm 9.11 to make it work for a large variety of data-flow problems.
Cmrcet N.Nagaveni
Algorithm 9 . 2 5 : Iterative solution to general data-flow frameworks.
INPUT: A data-flow framework with the following components:
1. A data-flow graph, with specially labeled E N T R Y and E X I T nodes,
2. A direction of the data-flow D,
3. A set of values V,
4. A meet operator A,
5. A set of functions F, where fs in F is the transfer function for block B,
and
A constant value GENTRY or f EXIT in V, representing the boundary condition for forward and
backward frameworks, respectively.
OUTPUT: Values in V for IN[B] and OUT [B] for each block B in the data-flow graph.
METHOD: The algorithms for solving forward and backward data-flow prob-lems are shown in Fig.
9.23(a) and 9.23(b), respectively. As with the familiar iterative data-flow algorithms from Section 9.2, we
compute IN and O U T for each block by successive approximation. •
It is possible to write the forward and backward versions of Algorithm 9.25 so that a function
implementing the meet operation is a parameter, as is a function that implements the transfer function for
each block. The flow graph itself and the boundary value are also parameters. In this way, the compiler
implementor can avoid recoding the basic iterative algorithm for each data-flow framework used by the
optimization phase of the compiler.
We can use the abstract framework discussed so far to prove a number of useful properties of the iterative
algorithm:
If the framework is monotone, then the solution found is the maximum fixedpoint (MFP) of the
data-flow equations. A maximum fixedpoint is a solution with the property that in any other solution, the
values of IN[B] and O U T [B] are < the corresponding values of the MFP .
If the semilattice of the framework is monotone and of finite height, then the algorithm is guaranteed to
converge.
We shall argue these points assuming that the framework is forward. The case of backwards frameworks is
essentially the same. The first property is easy to show. If the equations are not satisfied by the time the
while-loop ends, then there will be at least one change to an OUT (in the forward case) or IN (in the
backward case), and we must go around the loop again.
We now know that the solution found using the iterative algorithm is the max-imum fixedpoint, but what
does the result represent from a program-semantics point of view? To understand the solution of a data-
flow framework (D, F, V, A), let us first describe what an ideal solution to the framework would be. We
show that the ideal cannot be obtained in general, but that Algorithm 9.25 approxi-mates the ideal
conservatively.
Cmrcet N.Nagaveni
5.8. Partial Redundancy Elimination
Partially redundancy elimination performs common sub expression elimination and loop-invariant code
motion.
Redundant expressions are computed more than once in parallel path, without any change in operands.
Whereas partial-redundant expressions are computed more than once in a path, without any change in
operands. For example,
Loop-invariant code is partially redundant and can be eliminated by using a code-motion technique.
Another example of a partially redundant code can be:
if (condition)
{
a = y OP z;
}
else
Cmrcet N.Nagaveni
{
...
}
c = y OP z;
We assume that the values of operands (y and z) are not changed from assignment of variable a to
variable c. Here, if the condition statement is true, then y OP z is computed twice, otherwise once. Code
motion can be used to eliminate this redundancy, as shown below:
if (condition)
{
...
tmp = y OP z;
a = tmp;
...
}
else
{
...
tmp = y OP z;
}
c = tmp;
Here, whether the condition is true or false; y OP z should be computed only once.
This code is partially redundant as the expression j + 1 is computed twice in a condition controlled loop.
To optimise this code, we can use partial redundant elimination through loop-invariant code motion
and common subexpression elimination to produce the following optimised code:
if(condition)
{
// code which does not alter j
n=j=1;
i=n;
}
else
{
//code which does not alter j
n=j+1;
}
k=n;
Here, we have removed the redundant assignment and calculation of k = j + 1, instead storing j + 1
inside the temporary variable n and only computed j + 1 once, increasing performance.
Cmrcet N.Nagaveni
5.9. Loops in Flow Graph
• Loops are important because programs spend most of their time executing them, and
optimizations that improve the performance of loops can have a significant impact.
• Thus, it is essential that we identify loops and treat them specially. Loops also affect the running
time of program analyses.
• Loop analysis is based on the dominators. Dominators are used to determine the loops in a
control flow graph.
• Dominators: We say node d of a flow graph dominates node n, written d dom n, if every path
from the entry node of the flow graph to n goes through d. Note that under this definition, every
node dominates itself. Consider the flow control graph
Consider the flow graph in above figure with entry node 1. The entry node dominates every node
(this statement is true for every flow graph).
Node 2 dominates only itself, since control can reach any other node along a path that begins with 1 -
> 3.
Node 3 dominates all but 1 and 2.
Node 4 dominates all but 1, 2 and 3, since all paths from 1 must begin with l 2 3 4 or 1 3 4.
Nodes 5 and 6 dominate only themselves, since flow of control can skip around either by going
through the other.
Finally, 7 dominates 7, 8, 9, and 10; 8 dominates 8, 9, and 10; 9 and 10 dominate only themselves.
Dominator information can be represented in a tree called the dominator tree. In this tree, the entry
node is the root, and each node d dominates only its descendants.
Fig. Dominator Tree
Cmrcet N.Nagaveni
The existence of dominator trees follows from a property of dominators: each node n has a unique
immediate dominator m that is the last dominator of n on any path from the entry node to n
Cmrcet N.Nagaveni
Cmrcet N.Nagaveni