0% found this document useful (0 votes)
13 views27 pages

CTCD Unit 5

unit 5 ctcd

Uploaded by

Ranjit47 H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

CTCD Unit 5

unit 5 ctcd

Uploaded by

Ranjit47 H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Unit V - CODE OPTIMIZATION

1. PRINCIPAL SOURCES OF OPTIMISATION

A transformation of a program is called local if it can be performed by looking only at the


statements in a basic block; otherwise, it is called global.

Many transformations can be performed at both the local and global levels. Local
transformations are usually performed first.

Function-Preserving Transformations

There are a number of ways in which a compiler can improve a program without changing
the function it computes.

The transformations

• Common sub expression elimination,


• Copy propagation,
• Dead-code elimination, and
• Constant folding

are common examples of such function-preserving transformations. The other


transformations come up primarily when global optimizations are performed.

• Frequently, a program will include several calculations of the same value, such as an
offset in an array. Some of the duplicate calculations cannot be avoided by the
programmer because they lie below the level of detail accessible within the source
language.
✓ Common Sub expressions elimination:
• An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
• For example
t1: = 4*i

t2: = a [t1]

t3: = 4*j

t4: = 4*i

t5: = n

t6: = b [t4] +t5

The above code can be optimized using the common sub-expression elimination as

t1: = 4*i

t2: = a [t1]

t3: = 4*j

t5: = n

t6: = b [t1] +t5

The common sub expression t4: =4*i is eliminated as its computation is already in t1. And
value of i is not been changed from definition to use.

✓ Copy Propagation:

• Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after
the copy statement f: = g. Copy propagation means use of one variable instead of
another.

This may not appear to be an improvement, but as we shall see it gives us an opportunity to
eliminate x.

• For example:

x=Pi;

……
A=x*r*r;

The optimization using copy propagation can be done as follows:

A=Pi*r*r;

Here the variable x is eliminated

✓ Dead-Code Eliminations:

• A variable is live at a point in a program if its value can be used subsequently;


otherwise,

it is dead at that point. A related idea is dead or useless code, statements that compute values
that never get used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations. An optimization can be
done by eliminating dead code.

Example:

i=0;

if(i=1)

a=b+5;

Here, ‘if’ statement is dead code because this condition will never get satisfied.

✓ Constant folding:

• We can eliminate both the test and printing from the object code. More generally,
deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.
• One advantage of copy propagation is that it often turns the copy statement into dead
code.
• For example,
a=3.14157/2 can be replaced by

a=1.570 thereby eliminating a division operation.

✓ Loop Optimizations:

• We now give a brief introduction to a very important place for optimizations, namely
loops, especially the inner loops where programs tend to spend the bulk of their time.
The running time of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of code outside that loop.
• Three techniques are important for loop optimization:
I. code motion, which moves code outside a loop;
II. Induction-variable elimination, which we apply to replace variables from inner loop.
III. Reduction in strength, which replaces and expensive operation by a cheaper one, such
as a multiplication by an addition.

✓ Code Motion:

• An important modification that decreases the amount of code in a loop is code motion.

This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the expression
before the loop. Note that the notion “before the loop” assumes the existence of an entry for
the loop. For example, evaluation of limit-2 is a loop-invariant computation in the following
while-statement:

while (i <= limit-2) /* statement does not change limit*/

Code motion will result in the equivalent of

t= limit-2;

while (i<=t) /* statement does not change limit or t */

✓ Induction Variables :

• Loops are usually processed inside out. For example consider the loop around B3.
• Note that the values of j and t4 remain in lock-step; every time the value of j decreases
by

1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called induction
variables.

• When there are two or more induction variables in a loop, it may be possible to get rid
of all but one, by the process of induction-variable elimination. For the inner loop
around B3 in Fig. we cannot get rid of either j or t4 completely; t4 is used in B3 and j
in B4.

However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2 - B5
is considered.

Example:

As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is not

changed elsewhere in the inner loop around B3, it follows that just after the statement

j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:=

4*j by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3

for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3,

we place an initializations of t4 at the end of the block where j itself is


initialized, shown by the dashed addition to block B1 in second Fig.

The replacement of a multiplication by a subtraction will speed up the object code if


multiplication takes more time than addition or subtraction, as is the case on many machines.

• Reduction In Strength:
• Reduction in strength replaces expensive operations by equivalent cheaper ones on
the target machine. Certain machine instructions are considerably cheaper than others
and can often be used as special cases of more expensive operators.
• For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be
implemented as multiplication by a constant, which may be cheaper.
2. OPTIMIZATION OF BASIC BLOCKS

There are two types of basic block optimizations. They are :


✓ Structure-Preserving Transformations
✓ Algebraic Transformations

Structure-Preserving Transformations:
The primary Structure-Preserving Transformation on basic blocks are:

✓ Common sub-expression elimination


✓ Dead code elimination
✓ Renaming of temporary variables
✓ Interchange of two independent adjacent statements.
✓ Algebraic Transformations

➢ Common sub-expression elimination:


Common sub expressions need not be computed over and over again. Instead they
can be
computed once and kept in store from where it’s referenced when encountered again
– of course
providing the variable values in the expression still remain constant.
Example:
a: =b+c
b: =a-d
c: =b+c
d: =a-d

Basic block can be transformed to


a: = b+c
b: = a-d
c: =b+c
d: = b

➢ Dead code elimination:


It’s possible that a large amount of dead (useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of
construction or error-correction of a program – once declared and defined, one forgets
to remove them in case they serve no purpose. Eliminating these will definitely
optimize the code.

➢ Renaming of temporary variables:


• A statement t:=b+c where t is a temporary name can be changed to u:=b+c
where u is
another temporary name, and change all uses of t to u.
• In this we can transform a basic block to its equivalent block called normal-
form block.

➢ Interchange of two independent adjacent statements:


• Two statements
t1:=b+c

t2:=x+y

can be interchanged or reordered in its computation in the basic block when value of
t1
does not affect the value of t2.

➢ Algebraic Transformations:
• Algebraic identities represent another important class of optimizations on
basic blocks. This includes simplifying expressions or replacing expensive
operation by cheaper ones i.e. reduction in strength.
• Another class of related optimizations is constant folding. Here we evaluate
constant
expressions at compile time and replace the constant expressions by their values. Thus
the expression 2*3.14 would be replaced by 6.28.
• The relational operators <=, >=, <, >, + and = sometimes generate unexpected
common
sub expressions.
• Associative laws may also be applied to expose common sub expressions. For
example, if the source code has the assignments
a :=b+c
e :=c+d+b

the following intermediate code may be generated:


a :=b+c
t :=c+d
e :=t+b
• Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y

3. PEEPHOLE OPTIMIZATION

• A statement-by-statement code-generations strategy often produce target


code that contains redundant instructions and suboptimal constructs. The
quality of such target code can be improved by applying “optimizing”
transformations to the target program.
• A simple but effective technique for improving the target code is peephole
optimization, a method for trying to improving the performance of the target
program by examining a short sequence of target instructions (called the
peephole) and replacing these instructions by a shorter or faster sequence,
whenever possible.
• The peephole is a small, moving window on the target program. The code in
the peephole need not contiguous, although some implementations do require
this.it is characteristic of peephole optimization that each improvement may
spawn opportunities for additional improvements.
• We shall give the following examples of program transformations that are
characteristic of peephole optimizations:

✓ Redundant-instructions elimination
✓ Flow-of-control optimizations
✓ Algebraic simplifications
✓ Use of machine idioms
✓ Unreachable Code

Redundant Loads And Stores:


If we see the instructions sequence
(1) MOV R0,a
(2) MOV a,R0
we can delete instructions (2) because whenever (2) is executed. (1) will ensure
that the value of a is already in register R0.If (2) had a label we could not be
sure that (1) was always executed immediately before (2) and so we could not
remove (2).

Unreachable Code:
• Another opportunity for peephole optimizations is the removal of
unreachable instructions. An unlabeled instruction immediately
following an unconditional jump may be removed. This operation can
be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain
segments that are executed only if a variable debug is 1. In C, the source
code might look like:
#define debug 0
….
If ( debug ) {
Print debugging information
}
• In the intermediate representations the if-statement may be translated
as:

If debug =1 goto L2
goto L2
L1: print debugging information
L2: …………………………(a)

• One obvious peephole optimization is to eliminate jumps over jumps. Thus no


matter what the value of debug; (a) can be replaced by:
If debug ≠1 goto L2
Print debugging information
L2: ……………………………(b)
• As the argument of the statement of (b) evaluates to a constant true it can be
replaced by
If debug ≠0 goto L2
Print debugging information
L2: ……………………………(c)
• As the argument of the first statement of (c) evaluates to a constant true, it can
be replaced by goto L2. Then all the statement that print debugging aids are
manifestly unreachable and can be eliminated one at a time.

Flows-Of-Control Optimizations:
• The unnecessary jumps can be eliminated in either the intermediate code or
the target code by the following types of peephole optimizations. We can
replace the jump sequence
goto L1
….
L1: goto L2

by the sequence
goto L2
….
L1: goto L2

• If there are now no jumps to L1, then it may be possible to eliminate the
statement L1:goto L2 provided it is preceded by an unconditional jump
.Similarly, the sequence

if a < b goto L1
….
L1: goto L2

can be replaced by

If a < b goto L2
….
L1: goto L2

• Finally, suppose there is only one jump to L1 and L1 is preceded by an


unconditional goto.
Then the sequence
goto L1
……..
L1: if a < b goto L2
L3: …………………………………..(1)

• May be replaced by
If a < b goto L2
goto L3
…….
L3: ………………………………….(2)

• While the number of instructions in (1) and (2) is the same, we sometimes skip
the unconditional jump in (2), but never in (1). Thus (2) is superior to (1) in
execution time.

Algebraic Simplification:

• There is no end to the amount of algebraic simplification that can be attempted


through peephole optimization. Only a few algebraic identities occur
frequently enough that it is worth considering implementing them .For
example, statements such as
x := x+0
Or
x := x * 1
• Are often produced by straightforward intermediate code-generation
algorithms, and they can be eliminated easily through peephole optimization.

Reduction in Strength:

• Reduction in strength replaces expensive operations by equivalent cheaper


ones on the target machine. Certain machine instructions are considerably
cheaper than others and can often be used as special cases of more expensive
operators.
• For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of
two is cheaper to implement as a shift. Floating-point division by a constant
can be implemented as multiplication by a constant, which may be cheaper.
X2 → X*X

Use of Machine Idioms:

• The target machine may have hardware instructions to implement


certain specific operations efficiently. For example, some machines
have auto-increment and auto-decrement addressing modes. These
add or subtract one from an operand before or after using its value.
• The use of these modes greatly improves the quality of code when pushing or
popping a stack, as in parameter passing. These modes can also be used in code
for statements like i :=i+1.
i:=i+1 → i++
i:=i-1 → i- -

4. THE DAG REPRESENTATION FOR BASIC BLOCKS


• A DAG for a basic block is a directed acyclic graph with the following
labels on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store
the computed values.
• DAGs are useful data structures for implementing transformations on
basic blocks.
• It gives a picture of how the value computed by a statement is used in
subsequent statements.
• It provides a good way of determining common sub - expressions.
5. Global Data-Flow Analysis

Knowledge about the behavior of a variable is essential for performing function preserving
transformations. In addition to the behavior of the variables, control flow information is also
required to do transformations across basic blocks. In addition, to apply global optimizations
on basic blocks, data- flow information is collected by solving systems of data- flow equations.
Data flow equations

A variable is defined at a particular statement. The value of this variable may reach subsequent
sequence of instructions. The variable when it is defined, we call it as being generated and is
indicated using gen[S], which means, statement S generates this variable. Thus, this variable
with the value it has gets forwarded to all other statements until some other statement kills
this definition of the variable. Thus, to determine the reaching definitions for a sequence of
statements “S” the following equation is defined in terms of 4 functions.

out[S] = gen[S] U (in[S] – kill[S])

In the above equation, out[S] indicates the variables which are available after statement S and
this is computed in terms of the variables the current statement generates and including the
statements that enter a block eliminating the definitions of variables the current statement
kills. The information at the end of a statement S is either generated within the statement or
enters at the beginning and is not killed as control flows through the statement.

Factors for setting up data -flow equations

The notion of killing and generating statements depends on the desired information and on
the data-flow analysis problem to be solved. For some problems the function out[S] needs to
be defined in terms of in[S] and for some in[S] needs to be defined in terms of out[S]. Data
flow is affected by the control flow of the program. The function out [S] is based on the
assumption that there is a unique end point. In addition, we need to consider that the variable
assignments through pointer variables, procedure calls, assignments to array variables
influence the data flow. Consider the following example given in figure, involving three basic
blocks. We will look at each statement as a definition of the LHS variable.

A “point” is defined as the position between two adjacent statements. In addition, a “point”
can also be the position above the first statement and following the last statement. For
example, in figure, basic block B1 has 3 points while B2 has 2 points. If we consider all the
blocks then each will have many points. If we try to merge the last point of a current block
with the first point of its successor block then we could end up with the sequence of
statements that can be looked together to perform some optimization. On the other hand, a
path is the sequence of statements between any two points.

Reaching Definitions
A definition of a variable ‘x’ is a statement that assigns or may assign a value to ‘x’. This
definition of the variable ‘x’ is unambiguous if a simple assignment holds good. On the other
hand, if ‘x’ is used as a parameter of a procedure or through pointer then ‘x’ is said to have
defined with ambiguity. Definition ‘d’ reaches a point ‘p’ if there is a path from the point
immediately following ‘d’ to ‘p’ and ‘d’ is not killed in that path. A “kill” is defined as the
position between two points, where the variable is defined and is redefined. Consider the
figure for which we have three definitions. Table gives the definitions of the various blocks.
Data flow analysis – structure d programs
Let us consider the data flow analysis of structured programs. In order to understand the data
flow, we need to know the various forms of statements. The assumption with the statements
is that there is a single entry and single exit point.

Statements can be simple assignment statements, if-else, or a do-while statement or a


sequence of these statements. A while statement could be interpreted in terms of the do-
while statement itself. The following productions define the various types of statements
where S is the start symbol and E is the expression. For simplicity, consider that this expression
could be addition of variables of just the variable itself. We also assume that there is a unique
header for all these types of statements which is the beginning of a control flow.
• S id := E | S ; S | if E then S else S | do S while E
• E id + id | id
Let us consider one statement after the other and to begin with consider the statement
defined by the following production.
Consider figure which is a control flow for a simple assignment statement having a single
definition of the variable ‘a’. Then, the data-flow equations for S are:
gen [S] = {d} – This statement defines the definition ‘d’ and hence it is included.
kill [S] = Da – {d} – This definition kills all other definitions of ‘a’ and is computed as

You might also like