compiler_optimization-notes
compiler_optimization-notes
Intermediate code (IC) optimization is an optional phase in most compilers which is enabled by an
user, if so desired. For example in gcc, optimization can be enabled using the Optimization switch “-
Ox”, where O denotes optimization, where x denotes a level, which is an integer, can be between 0 to
3. Level 0 denotes indicates no optimization, while levels 1, 2 and 3 specify optimizations. Level 2 is
the optimization level most commonly deployed. Level 3 is used for architectures with advanced
features, such as vector processors, multiple processors, etc.
Optimization
Front end Back end
Int code Phase Int code
Source
Program
Target
Program
As shown in the figure above, the main objective of the optimization phase is to improve the quality of
intermediate code generated by the front end of the compiler.
We have already observed during the SDTS of various linguistic features, that the intermediate code
generation concentrates on semantic correctness and equivalence and not on efficient code. Therefore
the generated IC during SDTS often has redundant code fragments that can be eliminated or rewritten
through analysis of the IC. For example,
• use of large number of temporaries to generate code to evaluate an expression; no concern for
reusing temporaries for already evaluated expression values,
• generation of large number of “goto ____” during partial evaluation of conditional
expressions in translation of control flow structures,
• multiple references to the same multiple dimensional array references, etc.
1. Processing of Intermediate Code to extract control flow information
The SDTS for all the features considered by us, except for declaration processing, results in generation
of intermediate code. As illustrated in the figure above, the optimization phase of a compiler, which is
optional, transforms the naive intermediate code to more efficient intermediate code by a sequence of
transformations.
Before the transformation process starts, the first task in the optimization phase is to extract control
flow information present in the intermediate code. The result is a directed graph, known as control flow
graph.
Consider the following source code and its gimple intermediate code generated by gcc.
Note that the intermediate code generated by us through our SDTS is similar to gimple in spirit. There
are a few major differences :
• Temporaries generated by us are labeled as, ti, while gimple generally uses, _i, instead ; also
gimple sometimes uses a temporary variable, such as D.2329 also.
• We label each intermediate code with numbers increasing sequentially, such as 10,11,…; while
gimple used alphanumeric labels, such as <D.dddd>. Further, not every gimple statement is
labeled, only a few selected are.
• Control flow instructions have a slightly different structure in our intermediate code format and
that of gimple. Note that semantically they are equivalent.
3. We have inserted a label number to each gimple code, similar to what we add in our SDTS.
This is done to refer to each gimple code by their label. Gimple code which are already labeled
are not added a numeric label.
4. The nodes (also called as basic blocks) of a cfg are drawn as rectangles. The notation “48:51”
denotes that all gimple code statements with labels 48 through 51 are part of the basic block..
The basic blocks have been numbered in the order they are encountered in the algorithm. The basic
block numbers, the corresponding gimple label and the first code number assigned by us are
summarized in the following table.
Basic block Gimple Target of Header Basic block contents :
Label Code No. Code No. Start No : Last No
<bb1 > --- --- 20 20 : 34
<bb2 > D.2323 34 47 47 : 47
<bb3 > D.2322 47 35 35 : 36
<bb4 > D.2326 36 37 37 : 42
<bb5 > D.2327 36 43 43 : 45
<bb6 > D.2328 42 46 46 : 46
<bb7 > D.2324 47 48 48 : 51
A final pass over the basic blocks and their contents as given in column 2 of the table above, generates
the cfg in terms of the basic blocks and their connected edges. The final cfg is shown below.
The control transfer statements, both conditional and unconditional, have been added as the last
statements of basic blocks wherever they are relevant, for readability only.
Properties of a basic block :
• All statements in a basic block are executed sequentially.
• Transfer from other basic blocks are only permitted to the header of the block, that is the
first statement in the block.
• Transfer from within a basic block is not permitted, except at the last statement.
Note that the cfg constructed by us using the algorithm and the cfg reported by gcc are identical in
content except for the numbering of the basic blocks (nodes).
3. Use of Basic Blocks / control flow graph in Optimization: The control flow graph exhibits the
control structure in the program explicitly. Loops in the control flow graph depict parts of the code that
are expected to be executed repeatedly. Therefore improvement in the contents of basic blocks that
constitute a loop at compile time should result in significant savings at execution time.
• The cfg is useful for another significant optimization, known as detection of unreachable code.
Unreachable code comprises of one or more basic blocks which are not reachable from the start
node of the cfg. A basic block x, other than start node, that is x ≠start, is unreachable from start
if predecessor(x) = Ф. Finding all unreachable nodes in a cfg is a simple graph theoretic
problem in directed graphs and can be determined. All such nodes can be safely removed from
the cfg without changing the semantics of the underlying program. This optimization is named
as elimination of unreachable code.
• Detecting all loops in a cfg, including nested loops, is also a well known graph theoretic
problem in directed graphs. One may use DFS to detect and extract the loops in a cfg in terms
of the set of basic blocks that constitute the loop. For the cfg of our example, there is exactly
one loop, described by the set of basic blocks, {<bb 2>, <bb 3>, <bb 4>, <bb 5>, <bb 6>, <bb
2> which can only be entered through the block <bb 2>.
43 : b = a * c; 43 : t1= a * c;
44 : d = a * c; 44 : b := t1;
45 : c = b + d; 44a : d = t1;
45 : c = t1+ t1;
Before local Optimization
After local Optimization
a=2 a = 10
b=3 b=2
…...
t1 = a* b
Analysis : Compiler has to perform an analysis of the IC to determine at the program point of interest,
say p, the definitions of all its operands that reach the point p. The analysis performed by the compiler
is known as “Reaching Definitions Analysis” and is beyond the scope of this course. However, once
the analysis has been performed by the compiler, it can easily check if the operands have a single
definition of both the operands, and in this case can safely perform constant folding.
Home work : Analyse and report whether all the 4 instances of constant folding in the example given
are safe.
This optimization is known as Constant Propagation. The main objective is to propagate constants
detected at compile time for further optimizations including constant folding.
Statement of Constant Propagation :
An intermediate code statement of the form : “a := b op c” is to be replaced by the value of its
operands, b or c, if one of them are found out to be constant at compile time. For example, if b has the
value v1, then IC is changed to IC “a := v1 op c”.
Benefits : An memory access is saved at run time at the least, this optimization may also expose more
instances of constant folding.
Constraints : This optimization has the same constraints as that mentioned for constant folding, that is
the operand value(s) reaching the statement is (are) unique. Similar counterexample can be created for
this optimization also.
Analysis : The “Reaching Definitions Analysis” mentioned earlier, can also be used to perform this
optimization safely.
Statement of Loop invariant Code Motion : Given a loop L of the cfg and a computation or IC of the
form, “ a = b op c” at some point p in some basic block of L, determine if “ a = b op c” is an loop
invariant of L. An IC, “ a = b op c”, is a loop invariant of a loop L, if all the definitions of the operands
b and c are placed outside L. Then this code can be safely moved out from L to a suitable predecessor /
successor of L.
Benefits : Movement of loop invariant code outside a loop at compile time reduces the computation
effort of m*x (where m is the number of iterations of loop at run time, and x is one time cost of
Optimization 4 : Dead Code Elimination is yet another optimization performed by a compiler. As the
name suggest, code that are not used in the program, after their definition at some point p, may be
safely removed without changing the semantics of the original program. This optimization involves two
tasks,
• detect a computation, defined at program point p, that is no longer used on any path starting
from p to the rest of the program till its exit point.
• Change the IC by removing the corresponding IC.
This optimization also, like all others that we discuss, must not change the semantics of the IC. For the
modified program, prog4.c, three definitions, that are highlighted are dead from their point of definition
and hence can be safely removed.
After constant folding & propagation : prog4.c Situations for optimization : prog5.c
int main() int main()
{ int a = 2, b = 3, c = 40, d, i, j; { int a = 2, b = 3, c = 40, d, i, j;
int x[10]={10, 20, 30, 40, 50, 60, 70, 80, 90, 100}; int x[10]={10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
d = 6; // d = 6; dead variable
c = 300; // c = 300; dead variable
d = 4; // d = 4; dead variable
for ( i = 1; i < 11; i+=2 ) for ( i = 1; i < 11; i+=2 )
{ if (i%2) { if (i%2)
{ x[i] = x[i-1] + 6; } { x[i] = x[i-1] + 6; }
else else
{ x[(i+4)%10] = x[i]+x[i-1];} }; { x[(i+4)%10] = x[i]+x[i-1];} };
if (false) if (false)
for (j = 1; j < 11; j++) x[(j+5)%10]=x[j]+5; for (j = 1; j < 11; j++) x[(j+5)%10]=x[j]+5;
else else
for (j = 0; j < 10; j++) for (j = 0; j < 10; j++)
printf(" x[%d] = %d \n", j, x[j]); printf(" x[%d] = %d \n", j, x[j]);
return 0; return 0;
} }
Statement of Dead Code Elimination : IC statements that are not used after their definition at point p
on any path in the cfg starting from p to the rest of the cfg till its exit block, can be safely removed.
Benefits : Saves memory and also reduces execution time.
Statement of Unreachable Code Elimination : Statements that are unreachable from the start node of
the cfg are known as “unreachable code”. This may result either from control flow analysis, or due to
the application of optimizations which lead to some conditional expression to have a constant value,
True or False, at compile time.
Benefits : Saves both memory and execution time. Also identifies unintentional potential bugs in the
design / code.
Constraints : Correctness depends on the the compile time evaluations done under various
optimizations by the compiler which has resulted in some code marked as unreachable.
Analysis : Control flow analysis is the key analysis performed by the compiler to detect code that are
found to be unreachable in the cfg. Global data flow analysis is required to determine code that
reachable from the start block but will never be executed at run time because some conditional
expression has been analysed and found to have acquired a constant value at compile time.
In summary, six different optimizations were applied above in some order to convert the given source
program to the final form shown in the following table. Readers are urged to compile and run all the six
different versions of the program, from prog1.c to prog6.c. Verify whether all the programs produce the
same output.
End of Document