Codeoptimization-Module 4B
Codeoptimization-Module 4B
1
Design Of a Compiler
2
What is Code
Optimization?
• Optimization is a program transformation technique, which tries to improve the code
that consume less resources (i.e. CPU, Memory) and deliver high speed.
• In optimization, high-level general programming constructs are replaced by very efficient low-
level programming codes.
A code optimizing process must follow the three rules given below:
1. The output code must not change the meaning of the program in any way.
Should not change the output produced for any input
Should not introduce an error
2. Optimization should increase the speed of the program and if possible, the program should
demand less resources.
3. Optimization should itself be fast and should not delay the overall compiling process.
3
Improvements can be made at various phases:
• Source Code:
-Algorithms transformation can produce spectacular improvements
• Intermediate Code:
-Compiler can improve loops, procedure calls and address
calculations
-Typically only optimizing compilers include this phase
• Target Code:
- Compilers can use registers efficiently
Optimized code’s features:
• Executes faster
• Code size get reduced
• Efficient memory usage
• Yielding better performance
• Reduces the time and space complexity 4
Organization of an optimizing compiler
Intermediate Code
Code optimizer Target Code Generator
Generator
Control
Data flow
flow Transformation
analysis
analysis
5
Flow analysis - Organization of an optimizing compiler
• Flow analysis is a fundamental prerequisite for many important types of code
improvement.
• Generally control flow analysis precedes data flow analysis.
• Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA
constructs such as
• control flow graph -graphical representation of control flow or computation during the execution.
• Call graph - represents calling relationships between subroutines.
• Data flow analysis (DFA) is the process of asserting and collecting information prior to
program execution about the possible modification, preservation, and use of certain
entities (such as values or attributes of variables) in a computer program.
6
Basic Blocks
Basic blocks
• Basic blocks are sequences of intermediate code in which flow of control enters at the beginning and leaves at
the
end without any halt or possibility of branching except at the end.
Example
• Sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
• Basic blocks are
represented as
directed acyclic
blocks(DAGs),
which are in turn
A basic block begins in one of the following ways:
3. Any statement that immediately follows a goto or conditional goto statement is a leader.
• Note that jump statements point to basic blocks and not quadruples so as to make code movement
easy
9
Basic Block Example
• Source code for dot product of two vectors a and • Three Address code of source code
b of length 20 prod = 0
{ i=1
prod =0; L1: t1 = 4* i
12
Global Data Flow Analysis
• Collect information about the whole program.
• Data flow equations: A set of equations solved by data flow analysis to gather data flow information.
• Data flow analysis should never tell us that a transformation is safe.
• When doing data flow analysis we must be
• Conservative
• Do not consider information that may not preserve the behavior of the program
• Aggressive
• Try to collect information that is as exact as possible, so we can get the greatest benefit
from the optimizations.
13
Global Iterative Data Flow
Analysis
• Global:
• Performed on the flow graph
• Goal is to collect information at the beginning and end of each basic block
• Iterative:
• Construct data flow equations that describe how information flows through each basic block and
solve
them by iteratively converging on a solution.
• Components of data flow equations
• Sets containing collected information
• in set: information coming into the BB from outside (following flow of data)
• gen set: information generated/collected within the BB
• kill set: information that, due to action within the BB, will affect what has been collected
outside the BB
• out set: information leaving the BB
• Functions (operations on these sets)
• Transfer functions describe how information changes as it flows through a basic block
• Meet functions describe how information from multiple paths is combined. 14
• A Code optimizer locates between the intermediate code generator and
code generator.
– Can do control flow analysis.
– Can do data flow analysis.
– Does transformations to improve the intermediate code.
15
Major Classifications of Code Optimization techniques
Machine Dependent Optimization
Machine dependent optimizations are based on register allocation and utilization of special machine-instruction sequences.
It involves CPU registers and may have absolute memory references rather than relative references.
Peephole optimization
Register Allocation and Instruction Selection (Special Hardware features)
16
Scope Of Optimization
• Peephole analysis
• Within one or a few instructions
• Local analysis
• Within a basic block
• Global analysis
• Entire procedure or within a certain scope
• Inter-procedural analysis
• Beyond a procedure, consider the entire program
17
Classification of
optimization
There are primarily 3 types of optimizations:
(1) Local optimization
• Apply to a basic block in isolation
(2) Global optimization
• Apply across basic blocks
(3) peep-hole optimization
• Apply across boundaries
18
Local optimization
• Optimization performed within a basic block.
• This is simplest form of optimizations
• No need to analyze the whole procedure
body.
• Just analyze the basic blocks of the
procedure.
The local optimization techniques include:
• Constant Folding
• Constant Propagation
• Algebraic Simplification
• Operator Strength Reduction
• Copy Propagation 19
Constant
Folding
• Evaluate constant expressions at compile time.
c:= 1 + 3 c:= 4
int f ()
{
return 3 + 5;
} Expressions with constant operands can be
Below is the code fragment after constant folding. evaluated at compile time, thus improving run-time
performance and reducing code size by avoiding
int f () evaluation at run-time.
{
return 8;
20
}
Constant Propagation
Constants assigned to a variable can be propagated through the flow graph and substituted at the use of
the
variable. b := 3 b := 3
c := 1 + b c := 1 + 3
• Variables that have constant value, e.g. b := 3 d := b + c d := 3 + c
• Later uses of b can be replaced by the constant, if no change of b in between.
Example:
• In the code fragment below, the value of x can be propagated to the use of
x. x = 3;
y = x + 4;
• Below is the code fragment after constant propagation and constant
folding.
x = 3;
y = 7;
21
Algebraic
Simplification -(-i) i
• Use algebraic properties to simplify expressions
• Some expressions can be simplified by replacing them with an equivalent expression that is more
efficient.
Example:
The code fragment below contains expressions that can be simplified.
void f (int i)
{
a[0] = i + 0;
a[1] = i * 0; Below is the code fragment after expression simplification.
a[2] = i - i;
a[3] = 1 + i + 1; void f (int i)
} {
a[0] = i;
a[1] = 0;
a[2] = 0;
a[3] = 2 + i;
}
22
Operator Strength y := x * 2 y := x + x
Reduction
• Replace expensive operations with simpler ones
• Typical cases of strength reduction occurs in address calculation of array references.
• Example: Multiplications replaced by additions ,
temp = 5;
for i=1 to 10 for i=1 to 10
{ {
… …
x=i*5 x = temp Replacement of an operator with a less costly one.
… …
temp = temp + 5
} }
23
Copy Propagation
• Given an assignment x = y, replace later uses of x with uses of y, provided there are
no
intervening assignments to x or y.
• Example
x[i] = a; x[i] = a;
sum = x[i] + a; sum = a + a;
• Example
x := y; s := y * f(y)
s := x * f(x)
• Examples:
• No control flows into a basic block
• A variable is dead at a point (i.e) its value is not used anywhere in the program
• An assignment is dead (i.e) assignment assigns a value to a dead variable
• Ineffective statements:
x := y + 1 (x is immediately redefined in 3rd line without use, therefore eliminate)
y := 5 y := 5
x := 2 * z x := 2 * z
if (false) if (false)
{a := 5} {}
26
Other types of Local Optimizations
• The following two optimizations can be applied only on DAG or tree representation of
basic block
27
Example of a Directed Acyclic Graph (DAG)
28
DAG representation of Basic Block (BB)
• Leaves are labeled with unique identifier (variable name or constant)
29
Example: DAG for BB
t1
t1 := 4 * *
i 4 i
t1 := 4 * i
t3 := 4 * i
t2 := t1 + if (i <= 20)goto L1
t3
+ t2
<= (L1)
* t1, t3 i 20
4 i
30
Construction of DAGs for BB
• Input: Basic block, B
• Output: A DAG for B containing the following information:
1) A label for each node
2) For leaves the labels are identifiers or constants
3) For interior nodes the labels are operators
Data structure and functions:
• Node:
1) Label: label of the node
2) Left: pointer to the left child node
3) Right: pointer to the right child node
4) List: list of additional labels (empty for leaves)
• Node (id): returns the most recent node created for id. else return undef
• Create(id,l,r): create a node with label id with l as left child and r as
right child. l and r are optional parameters. 31
Construction of DAGs for BB
Algorithm:
For each 3AC, A in BB
if A is any of the following forms:
1. x = y op z
2. x = op y
3. x = y
4. if ((ny = node(y)) == undef)
ny = Create
(y);
if (A = = type 1) and ((nz = node(z)) == undef)
nz = Create(z);
2. If (A = = type 1) //x = y op z
Find a node labelled ‘op’ with left and right as ny
and nz respectively [determination of common
sub-expression]
If (not found)
n = Create (op, ny, nz);
If (A = = type 2) //x = op y
If (A = = type 3) n = Node (y); //x = y
Find a node labelled ‘op’ with a single child as
2. Remove
ny x from Node(x).list
Add
If x in
(not n.list
found)
Node(x) = n; (op, n ); 32
n = Create y
Example: DAG construction
from BB
t1 := 4 *
i
* t1
4 i
33
Example: DAG construction
from BB
t1 := 4 * i
t2 := a [ t1 ]
[] t2
* t1
a 4 i
34
Example: DAG construction
from BB
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
[] t2
* t1, t3
a 4 i
35
Example: DAG construction
from BB
t1 := 4 * i
t2 := a [ t1
] t3
:= 4 * i
t4 := b [ t3
] t4 [] [] t2
* t1, t3
b a 4 i
36
Example: DAG construction
from BB
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ] + t5
t5 := t2 + t4
t4 [] [] t2
* t1, t3
b a 4 i
37
Example: DAG construction + t5,i
from BB t := 4 * i
1
t2 := a [ t1 ] t4 []
t3 := 4 * i [] t2
t4 := b [ t3 ]
* t1, t3
t5 := t2 +
t4 i := t5 b a 4 i
• Observations:
• A leaf node for the initial value of an id
• A node n for each statement s
• The children of node n are the last definition (prior to s) of the operands of
n 38
Optimization of Basic Blocks
• Common sub-expression elimination: by construction of DAG
• Note: for common sub-expression elimination, we are actually targeting for expressions that compute
the
same value.
ab =
= bb +– cd Common expressions
c=c+d (But do not generate the same result)
e=b+c
• DAG representation identifies expressions that yield the same
result
+ e
a=b+c
b=b–d
c=c+d
e=b+c + a
- b + c
b0 c0 d0
39
Optimization of Basic Blocks
• Dead code elimination: Code generation from DAG eliminates dead
code.
c +
a := b + c
a := b + c
b := a – d
×b, d d := a - d
d := a – d
c := d + c
c := d + c a +- d0
b is not live
b0 c0
40
Global Optimization
• Optimization across basic blocks within a procedure/function
• Could be restricted to a smaller scope, Example: a loop
• Data-flow analysis is done to perform optimization across basic blocks
• Each basic block is a node in the flow graph of the program.
• These optimizations can be extended to an entire control - flow graph
• Most of compiler implement global optimizations with well founded theory and practical gains
41
Interprocedural Optimizations
• Time consuming
42
Peep-hole optimization
• Optimization technique that operates on the one or few instructions at a time.
• Performs machine dependent improvements
• Peeps into a single or sequence of two to three instructions (peephole) and replaces it by most
efficient alternative (shorter or faster) instructions.
• Peephole is a small moving window on the target systems
• Characteristics of peep-hole optimizations:
Redundant-instruction (loads and stores)elimination
Flow-of-control optimizations - - Elimination of multiple jumps
Elimination of unreachable code
Algebraic simplifications
Reducing operator strength
Use of machine idioms
43
Eliminating Redundant Loads and Stores
MOV R, a
MOV a, R
if a < b goto L2
...
L1: goto L2
45
Eliminating Unreachable Code
• An unlabeled instruction that immediately follows an unconditional jump can possibly be
removed, and this operation can be repeated in order to eliminate a sequence of instructions.
int debug = 0
if (debug) { if debug != 1 goto L2
print print debugging information
debugging
information L2:
}this may be translated as
if debug = = 1 goto L1
goto L2
L1: print debugging
information
L2: The statements that print the debugging information are
46
unreachable and can be eliminated
Strength reduction
• Example:
47
Using Machine Idioms
• The target machine may have hardware instructions to implement certain specific operations
efficiently.
• Detecting situations that permit the use of these instructions can reduce execution time
significantly.
2. Loop Optimization
49
Common Sub expression elimination
• Common Sub expression elimination is a optimization that searches for instances
of identical expressions (i.e. they all evaluate the same value), and
• Analyses whether it is worthwhile replacing with a single variable holding the computed
value.
Example:
a := b * c temp := b * c
… a := temp
… …
x := b * c + 5 x := temp +
5
Identify common sub-expression present in different expression, compute once, and use the result in all the places.
50
Common Sub-expression elimination
• Common sub-expression elimination
• Example 1:
a := b + c a := b + c
c := b + c c := a
d := b + c d := a
1 x:=a+b
“a + b” is not a common
sub-expression in 1 and 4
2 a:= b 3
z : = a + b + 10 4
52
Dead code Optimization:
• Dead Code elimination removes code that does not affect a
program.
• Removing such code has two benefits.
• It shrinks program size.
• It avoids the executing irrelevant operations, which reduces its running time.
53
Unreachable Code - Dead code Optimization
• In Computer Programming, Unreachable Code or dead code is code that exists
in
the source code of a program but can never be executed.
Program Code
Optimized Code
if (a>b)
m=a if (a>b) m=a
main()
{
int a, b, c,
r; a=5;
b=6;
c=a + b;
r=2; r++; Adding time & space
complexity
printf(“ %d” ,c);
} 55
Loop optimization
• Loop optimization plays an important role in improving the performance of
the source code by reducing overheads associated with executing loops.
• The inner loops where programs tend to spend the bulk of their time.
• The running time of a program may be improved if we decrease the number of instructions
in an inner loop, even if we increase the amount of code outside that loop.
i=1 i=1
s= 0 s= 0
do{ a =5
s do{
=
s= s + i
s i=i+
+ 1
i }
a while (i
Bringing a=5 outside the do while loop, is called code motion.
< =n)
= 57
Code Motion - Loop Optimization
• Example
for (i=0; i<n; i++)
a[i] = a[i] + x/y;
• Three address code
for (i=0; i<n; i++)
{
c = x/y;
a[i] = a[i] + c;
}
c = x/y;
for (i=0; i<n; i++)
a[i] = a[i] + c; 58
Code hoisting - Loop Optimization
“x ** 2“ is computed once in both cases, but the code size in the second case reduces.
59
Induction variable elimination
• If there are multiple induction variables in a loop, can eliminate the ones which are used only
in
the test condition
The code fragment below shows the loop
The code fragment below has three induction after induction variable elimination.
variables (i1, i2, and i3) that can be replaced
with one induction variable int a[SIZE];
int b[SIZE];
int a[SIZE];
int b[SIZE]; void f
void f (void) (void)
{ {
int i1, i2, int i1;
i3;
for (i1 = 0, for (i1 = 0;
i2 = 0, i3 = i1 < SIZE;
0; i1 < i1++)
SIZE; i1++)
a[i1] = b[i1];
a[i2++] =
return; 60
Induction variable elimination
• Example
s := 0;
s := 0;
for (i=0; i<n; i++)
e := 4*n;
{
while (s < e)
s := 4 * i;
{
…
s := s + 4;
}
}
61
Loop Fusion - Loop Optimization
for (i=0; i<n; i++) {
Before Loop Fusion
A[i] = B[i] + 1
• Example
C[i] = A[i] / 2
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
A[i] = B[i] + 1
} }
for (i=0; i<n; i++) { Is this correct?
Actually, cannot fuse the third loop
C[i] = A[i] / 2
} for (i=0; i<n; i++) {
for (i=0; i<n; i++) { A[i] = B[i] + 1
D[i] = 1 / C[i+1] C[i] = A[i] / 2
}
}
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
}
Loop unrolling or Loop collapsing - Loop Optimization
63
Loop unrolling or Loop collapsing - Loop Optimization
Example:
In the code fragment below, the double-nested loop on i and j can be collapsed into a single-nested
loop.
int a[100][300];