0% found this document useful (0 votes)
21 views105 pages

Compiler Optimizations Presentation

Code optimization involves transforming a program to improve performance without changing the program's behavior. It can be done at different levels of a program's representation, such as source code, intermediate code, or machine code. Optimization techniques include both machine-dependent and machine-independent methods. Control flow and data flow analysis are important techniques used in code optimization. Control flow analysis determines the control paths in a program, while data flow analysis determines the flow of data values.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views105 pages

Compiler Optimizations Presentation

Code optimization involves transforming a program to improve performance without changing the program's behavior. It can be done at different levels of a program's representation, such as source code, intermediate code, or machine code. Optimization techniques include both machine-dependent and machine-independent methods. Control flow and data flow analysis are important techniques used in code optimization. Control flow analysis determines the control paths in a program, while data flow analysis determines the flow of data values.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 105

Code Optimization

5/21/2019 VNIT Nagpur 1


What is Code Optimization?

 A program transformation that preserves


correctness and improves performance of the
input program.
 Performance measures are execution time, and
space

5/21/2019 VNIT Nagpur 2


Levels at which code optimization is
possible
Code Optimization may be performed at multiple
levels of program representation like:

1. Source Code

2. Intermediate Code

3. Target Machine Code


5/21/2019 VNIT Nagpur 3
Machine Dependent & Machine
Independent Optimizations
 Machine dependent optimizations refers to those
transformations that requires knowledge of target
machine

 Machine independent optimizations refer to those


transformations that are possible to carry out
without having the knowledge of target machine

5/21/2019 VNIT Nagpur 4


Speed Vs Size Tradeoff

 Normally it requires to optimize time critical part of


an application for speed and the rest for size
 If we believe the 90-10 rule, that 90% of the time is
spent in the 10% of the code, then we optimize that
10% of the code for speed and rest for size.

5/21/2019 VNIT Nagpur 5


Control Flow & Data Flow Analysis

 Control Flow Analysis: determine the control


paths in a program.

 Data Flow Analysis: determine the flow of data


values.

5/21/2019 VNIT Nagpur 6


Building Control Flow Graph(CFG)

 To build a control flow graph, it is required to


partition TAC into basic blocks.
 A basic block is a sequence of consecutive intermediate
language statements which can be entered only at the
beginning and the control leaves only after execution of last
statement.

5/21/2019 VNIT Nagpur 7


Building Control Flow Graph(CFG)

 This requires identifying leader statements using


following rules:
1. The first statement in the program is a leader
statement
2. A target of a branch(both conditional as well as
unconditional) is a leader statement.
3. Any statement that immediately follows a
conditional branch is a leader
5/21/2019 VNIT Nagpur 8
Building Control Flow Graph(CFG)
 Example:
Program TAC
{ (1) prod = 0
prod = 0; (2) i=1
(3) t1 = 4 * i
i = 1; (4) t2 = a[t1]
do { (5) t3 = 4 * i
prod = prod + a[i] * b[i]; (6) t4 = b[t3]
(7) t5 = t2 * t4
i = i+ 1; (8) t6 = prod + t5
} (9) prod = t6
(10) t7 = i + 1
while i <= 20 (11) i = t7
} (12) if i <= 20 goto(3)

5/21/2019 VNIT Nagpur 9


Building Control Flow Graph(CFG)
Leaders identified

Rule(1) (1) prod = 0


(2) i=1
Rule(2) (3) t1 = 4 * i
(4) t2 = a[t1]
(5) t3 = 4 * i
(6) t4 = b[t3]
(7) t5 = t2 * t4
(8) t6 = prod + t5
(9) prod = t6
(10) t7 = i + 1
(11) i = t7
(12) if i <= 20 goto (3)
Rule(3) (13) …….

5/21/2019 VNIT Nagpur 10


Building Control Flow Graph(CFG)
A basic block corresponding to a leader consists of a leader, plus all
statements up to but not including the next leader or up to the end of
the program.
(1) prod = 0 B1
(2) i = 1

(3) t1 = 4 * i
(4) t2 = a[t1]
(5) t3 = 4 * i
(6) t4 = b[t3]
(7) t5 = t2 * t4
(8) t6 = prod + t5 B2
(9) prod = t6
(10) t7 = i + 1
(11) i = t7
(12) if i <= 20 goto (3)

(13) ……. B3
5/21/2019 VNIT Nagpur 11
Building Control Flow Graph(CFG)
 There is a directed edge from basic block B1 to
basic block B2 in the CFG if:
1. The leader of B2 immediately follows the last
statement of B1, and the last statement of B1 is
not an unconditional branch.
2. The leader of B2 is a target of the last statement of
B1 which is a branch.
 A basic block whose leader is the first intermediate
language statement is called the start node.

5/21/2019 VNIT Nagpur 12


Building Control Flow Graph(CFG)
Control Flow Graph

B1

B2

B3

5/21/2019 VNIT Nagpur 13


Identifying Loops
 To identify the loops, we use the concept of
dominance.
 A node a in a CFG dominates a node b if every path
from the start node to node b goes through a.
 The dominator set of node b, dom(b), is a set
formed by all nodes that dominate b.
 A dominator tree is a useful way to represent the
dominance relation.
 In a dominator tree the start node is the root, and
each node d dominates only its descendents in the
tree.
5/21/2019 14
VNIT Nagpur
Example

1
1
2
3 2 3

4 4
5 6
7 5 6 7

8
8
9 10

9 10
5/21/2019 VNIT Nagpur 15
Identifying Loops
 A strongly-connected component (SCC) of a graph
G = (N, E, s) is a subgraph G’ = (N’, E’, s’) in
which there is a path from each node in N’ to every
node in N’.

 A strongly-connected component G’ =(N’, E’, s’) of a


flowgraph G = (N, E, s) is a loop with entry s’ if s’
dominates all nodes in N’.

5/21/2019 VNIT Nagpur 16


Identifying Loops
 Example of a SCC, which is not a loop.
1

2 3

 In the above flowgraph nodes 2 and 3 form a strongly


connected component, but they are not a loop,
because no node in the subgraph dominates all the
other nodes, therefore this subgraph is not a loop

5/21/2019 VNIT Nagpur 17


Identifying Loops
Back edge detection.
 An edge (b,a) of a flowgraph G is a back edge if a the
head of the edge dominates tail of the edge. Here a is
the head, and b is the tail.
Natural Loop
 Given a back edge (b,a), a natural loop associated with
(b,a) with entry in node a is the subgraph formed by a
plus all nodes that can reach b without going through
a.

5/21/2019 VNIT Nagpur 18


Identifying Loops
 One way to find natural loops is;
1. Find a back edge (b,a).
2. Find the nodes that are dominated by a.
3. Look for nodes that can reach b among the nodes
dominated by a.

5/21/2019 VNIT Nagpur 19


Identifying Loops

1 For the back edge (9,1) in this graph


the natural loop associated is the entire
2
graph.
3

4 Back Edge- Natural Loop -


(9,1) Entire graph
5 6
7

8
9 10

5/21/2019 VNIT Nagpur 20


Identifying Loops

1 For the back edge (10,7) in this graph


the natural loop associated is subgraph
2
consisting of nodes 7,8,10.
3

4 Back Edge- Natural Loop –


(10,7) (7,8,10)
5 6
7

8
9 10

5/21/2019 VNIT Nagpur 21


Data Flow Analysis
Computing Reaching Definitions
 To determine the set of definitions reaching a point in

a program.
 For this we must take into consideration the data-flow
and the control flow in the program and set up
dataflow equations for each basic block.
 To set up dataflow equations we need to compute
gen, kill, in and out sets defined as follows:

5/21/2019 VNIT Nagpur 22


Data Flow Analysis
gen(B) – set of definitions generated within block B.
kill(B) – set of definitions block B can kill.
in(B) – set of definitions capable of reaching up to a
point immediately before the first statement of the
block B(i.e. at the start of B).
out(B) – set of definitions capable of reaching up to a
point immediately after the last statement of the block
B(i.e. at the end of B).

5/21/2019 VNIT Nagpur 23


Data Flow Analysis

B1 d1: i := m-1
d2: j := n
d3: a := u1 Gen(B1) = {d1, d2, d3}
Kill(B1) = {d4, d5, d6, d7}

B2 d4: i := i+1 Gen(B2) = {d4, d5}


d5: j :=j - 1 kill (B2) = {d1, d2, d7}

B3 Gen(B3) = {d6}
d6: a := u2 kill (B3) = {d3}

B4 d7: i := u3 Gen(B4) = {d7}


kill (B4) = {d1, d4}

5/21/2019 VNIT Nagpur 24


Data Flow Analysis
Dataflow equations for computing Reaching
Definitions

in(B) = out(p) for all predecessors p of B

out(b) = in(B) – kill(b) gen(B)

5/21/2019 VNIT Nagpur 25


Initialization:

in(B1) = 
out(B1) = {d1, d2, d3}

in(B2) = 
out(B2) = {d4, d5} To simplify the representation, the in(B)
and out(B) sets are represented by
in(B3) =  bit strings. Assuming the representation
out(B3) = {d6}
d1d2d3 d4d5d6d7 we obtain:
in(B4) = 
out(B4) = {d7}
Initial
Block
in(B) out(B)
B1 000 0000 111 0000
B2 000 0000 000 1100
B3 000 0000 000 0010
B4 000 0000 000 0001

5/21/2019 VNIT Nagpur 26


Initial
Block
in(B) out(B)
Gen(B1) = {d1, d2, d3}
B1 000 0000 111 0000
Kill(B1) = {d4, d5, d6, d7}
B2 000 0000 000 1100
B3 000 0000 000 0010
Gen(B2) = {d4, d5} B4 000 0000 000 0001
kill (B2) = {d1, d2, d7}

Gen(B3) = {d6}
B1 d1: i := m-1 kill (B3) = {d3}
d2: j := n
d3: a := u1 Gen(B4) = {d7}
kill (B4) = {d1, d4}

B2 d4: i := i+1
d5: j :=j - 1 First Iteration
Block
in(B) out(B)
B3 B1 000 0000 111 0000
d6: a := u2 B2 111 0010 001 1110
B3 011 1100 010 1110
B4 d7: i := u3 B4 011 1100 011 0101

5/21/2019 VNIT Nagpur 27


First Iteration
Gen(B1) = {d1, d2, d3} Block
Kill(B1) = {d4, d5, d6, d7} in(B) out(B)
B1 000 0000 111 0000
Gen(B2) = {d4, d5} B2 111 0010 001 1110
kill (B2) = {d1, d2, d7}
B3 011 1100 010 1110
Gen(B3) = {d6} B4 011 1100 011 0101
B1 d1: i := m-1 kill (B3) = {d3}
d2: j := n
d3: a := u1 Gen(B4) = {d7}
kill (B4) = {d1, d4}

B2 d4: i := i+1
d5: j :=j - 1

B3
d6: a := u2 Second Iteration
Block
on(B) out(B)
B4 d7: i := u3 B1 000 0000 111 0000
B2 111 1110 001 1110
B3 001 1110 000 1110
B4 001 1110 001 0111
5/21/2019 VNIT Nagpur 28
Final Values

in(B1) = 
out(B1) = {d1, d2, d3}

in(B2) = {d1, d2, d3 , d4, d5, d6}


out(B2) = {d3 , d4, d5, d6}
B1 d1: i := m-1 in(B3) = {d3 , d4, d5, d6}
d2: j := n out(B3) = {d4, d5, d6}
d3: a := u1
in(B4) = {d3 , d4, d5, d6}
out(B4) = {d3 , d5, d6, d7}
B2 d4: i := i+1
d5: j :=j - 1

B3
d6: a := u2 Second Iteration
Block
on(B) out(B)
B4 d7: i := u3 B1 000 0000 111 0000
B2 111 1110 001 1110
B3 001 1110 000 1110
B4 001 1110 001 0111
5/21/2019 VNIT Nagpur 29
Data Flow Analysis
Computing u-d chains
 If the use of name a in block B is preceded by its
definition in the block B, then u-d chain of a will
contain the last definition of a prior to this use of a in
the block B.
 If the use of name a in the block B is not preceded by
its definition in the block B, then the u-d chain for
this use of a consists of all definitions of a in in(B).

5/21/2019 VNIT Nagpur 30


Data Flow Analysis
Identifying loop invariant computation
 A three-address statement x=y op z specify a loop
invariant computation if:
1. All definitions of x in the u-d chain for this use of x
lies outside the loop.
2. All definitions of y in the u-d chain for this use of y
lies outside the loop.
 To move a loop invariant statement outside the
loop we need to create a pre-header

5/21/2019 VNIT Nagpur 31


Data Flow Analysis
d1: i := m-1
B1 d2: j := n
d3: a := u1

Pre-header

B2 d4: i := i+1
d5: j :=j - 1

B3
d6: a := u2

B4 d7: i := u3

5/21/2019 VNIT Nagpur 32


Legality of code motion

x= 1

x= 2

y =x

5/21/2019 VNIT Nagpur 33


Legality of code motion
x= 1

x= 2 Pre-header

y =x

5/21/2019 VNIT Nagpur 34


Legality of code motion
x= 1

x=3

x= 2

y =x

5/21/2019 VNIT Nagpur 35


Legality of code motion
x= 1

x= 3 Pre-header

x=2

y =x

5/21/2019 VNIT Nagpur 36


Data Flow Analysis
Computing available expressions

 An expression x op y is available at a point p, if its


value gets computed( and not subsequently
invalidated) on every path ending in p.

 To set up dataflow equations we need to compute


gen, kill, in and out sets defined as follows:

5/21/2019 VNIT Nagpur 37


Data Flow Analysis
gen(B) – An expression x op y will be in gen(B) if x op y is
evaluated in B and subsequently neither x nor y is redefined in
B.
kill(B) – An expression x op y is killed by B if there is an
assignment to either x and/or y, and subsequently
x op y is not recomputed in B.
in(B) – set of expressions available at a point immediately before
the first statement of the block B(i.e. at the start of B).
out(B) – set of expressions available at a point immediately after
the last statement of the block B.

5/21/2019 VNIT Nagpur 38


Data Flow Analysis
Dataflow equations for computing available
expressions
in(B) = ∩ out(p) for all predecessors p of B

out(b) = in(B) – kill(b) ∪ gen(B)

5/21/2019 VNIT Nagpur 39


Data Flow Analysis
a= b+c
B1 d = e+f
g=a+c

B2 h =a+c b=a+d
B3
e=f+1

j = a+c
k =f+1
B4
h = e*f

5/21/2019 VNIT Nagpur 40


Data Flow Analysis

Block gen Kill


B1 {b+c, e*f, a+c } { a+d }

B2 { a+c } Φ
B3 { a+d, f+1} { b+c, e*f}

B4 {a+c, e*f, f+1} Φ

5/21/2019 VNIT Nagpur 41


Data Flow Analysis
Initially for Block B1;
in(B1) = Φ

out(B1) = gen(B1) = { b+c, e*f, a+c}


For B2, B3, and B4
in(B2) = in(B3)=in(B4) =U = set of all expression;
U = {b+c, e*f, a+c, a+d, f+1}.

5/21/2019 VNIT Nagpur 42


Data Flow Analysis
Initially
Block In(initial values) out= in(B)-kill(B)
∪gen(B)
B1 Φ {b+c, e*f, a+d }

B2 { b+c, e*f, a+c, a+d, f+1 } { b+c, e*f, a+c, a+d,


f+1 }
B3 { b+c, e*f, a+c, a+d, f+1 } { a+c, a+d, f+1}

B4 { b+c, e*f, a+c, a+d, f+1 } { b+c, e*f, a+c, a+d,


f+1 }

5/21/2019 VNIT Nagpur 43


Data Flow Analysis
After Ist iteration

Block In = ∩ out(p) out= in(B)-kill(B)


∪gen(B)
B1 Φ {b+c, e*f, a+d }

B2 { b+c, e*f, a+c } { b+c, e*f, a+c }

B3 { a+c } { a+c, a+d, f+1}

B4 { a+c, a+d, f+1 } { e*f, a+c, a+d, f+1 }

5/21/2019 VNIT Nagpur 44


Data Flow Analysis
After IInd iteration
Block In = ∩ out(p) out= in(B)-kill(B)
∪gen(B)
B1 Φ {b+c, e*f, a+d }

B2 { b+c, e*f, a+c } { b+c, e*f, a+c }

B3 { a+c } { a+c, a+d, f+1}

B4 { a+c } { e*f, a+c, f+1 }

5/21/2019 VNIT Nagpur 45


Elimination of common sub-expression
a= b+c
B1 d = e+f
g= a+c

B2 h =x + y b= x + y
B3
….

j=x+y
….
B4 common
sub-expression

5/21/2019 VNIT Nagpur 46


Elimination of common sub-expression
a= b+c
B1 d = e+f
g= a+c

B2 temp = x +y temp = x + y
B3
h = temp b= temp
….

j = temp
….
B4

5/21/2019 VNIT Nagpur 47


Data Flow analysis
Computing Live variable
 Liveness definition: A variable x is live at a point p,
if x is used on some path starting from p.

Point p

a =x

5/21/2019 VNIT Nagpur 48


Data Flow analysis
Data flow equations for computing live variables

out(B) = ∪ in(S) for every successor S of B


in(B) = out(B) – def(B) ∪ use(B)

5/21/2019 VNIT Nagpur 49


Data Flow Analysis
Where
def(B) = set of variables defined in block B

use(B) = set of variables used in block B prior to any


definition of those variables.

5/21/2019 VNIT Nagpur 50


Data Flow Analysis
a= 0 B1

b = a+1
B2 c = c+ b
a = b*2
if a < 9 goto b2

B3 return c

5/21/2019 VNIT Nagpur 51


Data Flow Analysis
Initially
Block use def out In
B1 Φ {a} Φ Φ

B2 {a,c} {a,b,c} Φ {a,c}

B3 {c} Φ Φ {c}

5/21/2019 VNIT Nagpur 52


Data Flow Analysis
After Ist Iteration
Block out in
B1 {a,c} {c}

B2 {a,c} {a,c}

B3 Φ {c}

5/21/2019 VNIT Nagpur 53


5/21/2019 VNIT Nagpur 54
Loop Restructuring
Techniques

5/21/2019 VNIT Nagpur 55


Techniques for Performance
Improvements
 Since 90-10 rule is applicable; the inner loops are
the obvious targets of the techniques that we use
for performance improvement.
 Hence these techniques are collectively referred to
as loop restructuring techniques.
 Some of them are used for reduction in size of the
code, while others are used for speeding up the
execution

5/21/2019 VNIT Nagpur 56


Loop Restructuring

 It refers to the transformations applied to a loop


nest in the program to improve the execution
efficiency.

5/21/2019 VNIT Nagpur 57


Loop Restructuring Techniques for
speed.

5/21/2019 VNIT Nagpur 58


Unswitching
 It refers to removing the loop independent
conditionals from the loop.
Example
for(i=1; i<=n;i++)
for(j=1;j<=n;j++)
if (a[i] > 0)
b[i][j] = b[i][j-1] * a[i] + b[i];
else
b[i][j] = 0;
Before Unswitching

5/21/2019 VNIT Nagpur 59


Unswitching contd..

 Before unswitching the conditional is tested N2


times.

 The conditional to be tested is independent of


inner loop

 Hence it is possible to move this conditional


outside the inner loop.

5/21/2019 VNIT Nagpur 60


Unswitching Contd..
for(i=1; i<=N;i++)
if(a[i] > 0)
for(j=1;j<=n;j++)
b[i][j] = b[i][j-1] * a[i] + b[i];
else
for(j=1;j<=n;j++)
b[i][j] = 0;
After Unswitching
5/21/2019 VNIT Nagpur 61
Unswitching Contd..

Advantage
The above transformation will reduce the number of
times conditional will be tested to N. That means it
reduces the frequency of execution of the
conditional statement
Disadvantages
1. Loop structure may become more complex

2. Code size expansion is there

3. It might prevent data reuse


5/21/2019 VNIT Nagpur 62
Loop Fusion
 It refers to combining two or more loops into a
single loop.
Example
for (i=1;i<N;i++)
a[i] = a[i] + k1;
for (j=1;j<=N;j++)
d[j] = a[j] –b[j] + k2;
Before Fusion

5/21/2019 VNIT Nagpur 63


Loop Fusion Contd..

for (i=1;i<N;i++)
{
a[i] = a[i] + k1;
d[i] = a[i] –b[i] + k2;
}
After Fusion

5/21/2019 VNIT Nagpur 64


Loop Fusion Contd..
Advantage

 It saves the increment and branch instruction.


 It may improve register reuse

Disadvantage
 It may lead to formation of loops with more
complex flow of control.

5/21/2019 VNIT Nagpur 65


Loop Fusion Contd..
Requirements for Loop Fusion
1. Loops must have identical iteration count.

2. Loops must be adjacent.

3. Loops must be control flow equivalent. That means


if one executes the other also executes.

5/21/2019 VNIT Nagpur 66


Loop Peeling
 It refers to removing the first or the last iteration
of the loop into separate code.
Example

for(i=1;i<=N;i++)
a[i] = (x + y) * b[i];

Before Peeling

5/21/2019 VNIT Nagpur 67


Loop Peeling Contd..
If (N >= 1)----------------------------(1)
a[1] = (x + y) * b[1];
for(i=2; i<=N;i++)
a[i] = (x + y) *b[i];
After Peeling

 The test at (1) is required if there is no guarantee


that the iteration count is always positive.

5/21/2019 VNIT Nagpur 68


Loop Peeling Contd..
 This transformation is used to enable loop fusion.
Example
for (i=1;i<N;i++)
a[i] = a[i] + k1;
for (j=1;j<=N-1;j++)
d[j] = a[j] –b[j] + k2;
 Here the iteration count is not same. Hence loop
fusion is not possible

5/21/2019 VNIT Nagpur 69


Loop Peeling Contd..
 If we remove the first iteration of the first loop
into the separate code as shown below, loop
fusion becomes possible.
a[1] = a[1] + k1;
for (i=1;i<N-1;i++)
a[i+1] = a[i+1] + k1;
for (j=1;j<=N-1;j++)
d[j] = a[j] –b[j] + k2;

5/21/2019 VNIT Nagpur 70


Loop Peeling Contd..
After fusion we get the following code.
a[1] = a[1] + k1;
for (i=1;i<N-1;i++)
{
a[i+1] = a[i+1] + k1;
d[j] = a[j] –b[j] + k2;
}
 Loop peeling leads to code size expansion.

5/21/2019 VNIT Nagpur 71


Loop Interchanging
 It refers to reversing the nesting order of the
nested loops, if the outer loop iterates many times
and the inner loop iterates only a few times.
Example
for (i=1;i<=N;i++)
for(j=1;j<=M;j++)
a[i][j]=a[i][j-1]+b[i][j];
If N >> M, then interchange the ith loop with jth loop as
shown below:

5/21/2019 VNIT Nagpur 72


Loop Interchanging Contd..
for (j=1;j<=M;j++)
for(i=1;i<=N;i++)
a[i][j]=a[i][j-1]+b[i][j];

 Before interchanging the inner loop iterates N


times, therefore the startup cost of inner loop is
N (i.e. initializing the iteration variable j is done N
times).

5/21/2019 VNIT Nagpur 73


Loop Interchanging Contd..

 After interchange the startup cost of inner loop is


M (M << N).

 It can change the spatial locality of memory


references.

5/21/2019 VNIT Nagpur 74


Loop Reversal

 It refers to running a loop backward to reverse all


dependence directions. Used to allow loop fusion.
Example
for(i=1;i<=N;i++)
a[i] = b[i] + 1;
for(i=1;i<=N;i++)
c[i] = 1/a[i+1];
 Fusion is not possible because c[i] depends on
a[i+1].
5/21/2019 VNIT Nagpur 75
Loop Reversal

 The dependence is as shown below:

5/21/2019 VNIT Nagpur 76


Loop Reversal Contd..
for(i=N;i>=1;i--)
a[i] = b[i] + 1;
for(i=N;i>=1;i--)
c[i] =1/a[i+1];
Loop fusion is now possible as follows:
for(i=N;i>=1;i--)
{
a[i] = b[i] + 1;
c[i] =1/a[i+1];
}
5/21/2019 VNIT Nagpur 77
Index Set Splitting
 It is a loop transformation that divides the index set
(range) of the loop into sub-ranges. Each sub-range
is then handled as a separate loop.
Example
for(i=0;i<=100;i++)
{
if(i < 5) a[i] = 2 * a[i];
else a[i] = 5 * a[i];
b[i] = a[i] * a[i];
}
5/21/2019 VNIT Nagpur 78
Index Set Splitting Contd..
 The above loop can be transformed into the
following.
for(i=0;i<5;i++)
{ a[i] = 2 * a[i];
b[i] = a[i] * a[i]; }
for(i=5;i<=100;i++)
{ a[i] = 5 * a[i];
b[i] = a[i] * a[i]; }

5/21/2019 VNIT Nagpur 79


Index Set Splitting Contd..
 In the above example the split point was 5, and
known a priory.
 It may happen that split point is not known a
priory as is the case in the example given below.
for(i=0;i<=100;i++)
{ if(i < m) a[i] = 2 * a[i];
else a[i] = 5 * a[i];
b[i] = a[i] * a[i]; }

5/21/2019 VNIT Nagpur 80


Index Set Splitting Contd..

 This can be transformed into the following code.


for(i=0;i<m;i++)
{
a[i] = 2 * a[i]; b[i] = a[i] * a[i];
}
for(i=m;i<=100;i++)
{
a[i] = 5 * a[i]; b[i] = a[i] * a[i];
}
5/21/2019 VNIT Nagpur 81
Index Set Splitting Contd..
 The above transformation is not correct when
m<0, or m > 100.
 In the original loop irrespective of the value of m
only 100 iterations are possible.
 But in the transformed code, when m<0, the
second loop will get executed more than 100 times
, and when m > 100, the first loop will get
executed more than 100 times.

5/21/2019 VNIT Nagpur 82


Index Set Splitting Contd..
 To overcome this problem the range is required to be
split into (0,min(m,100)), and (max(0,m),100). The
transformed code therefore will be.
for(i=0;i<min(m,100);i++)
{ a[i] = 2 * a[i]; b[i] = a[i] * a[i]; }
for(i=max(0,m);i<=100;i++)
{ a[i] = 5 * a[i]; b[i] = a[i] * a[i]; }

5/21/2019 VNIT Nagpur 83


Index Set Splitting Contd..

 Index set splitting is used:

1. To enable loop fusion.


2. To remove conditionals on index variables from
inside the loop.

 The disadvantage of index set splitting is code size


expansion.

5/21/2019 VNIT Nagpur 84


Loop Unrolling
 It refers to replicating the body of the loop.
Example
while(i<=100)
{
a[i] = 0; i++;
}
Before Unrolling

5/21/2019 VNIT Nagpur 85


Loop Unrolling Contd..

while(i<=100)
{
a[i] = 0; i++;
a[i] = 0; i++;
}
After Unrolling

5/21/2019 VNIT Nagpur 86


Loop Unrolling

 The advantage of loop unrolling is; the condition


whether the iteration variable has reached to its final
value will be tested only 50 times.

 The disadvantage is; it leads to increase in code size.

5/21/2019 VNIT Nagpur 87


Loop Fission/Loop Distribution
 It refers to breaking a loop into two or more smaller
loops.
Example
for (i=1;i<=Ni++)
{ s1 : a[i] = a[i] + b[i-1];
s2 : b[i] = c[i-1] * x + y;
s3 : c[i] = 1/b[i];
s4 : d[i] = sqrt(c[i]); }

5/21/2019 VNIT Nagpur 88


Loop Fission Contd..

 The dependence graph for the statements in the


loop is shown below:

s1

s2

s3 s4

5/21/2019 VNIT Nagpur 89


Loop Fission cont..

 The statements s2 ans s3 forms a strongly


connected component in the above
dependence graph, and hence will be a part of
same loop.
 S1 and s2 will go in different loops.
 To find out legal order of these loop we do
acyclic condensation of the dependence graph
as shown below:

5/21/2019 VNIT Nagpur 90


Loop Fission Contd..

S2 – S3

S1 S4

5/21/2019 VNIT Nagpur 91


Loop Fission Contd..

 Loop containing statements S2 and S3 are required


to be executed before loop containing S1, and loop
containing S4.
 Loop containing S1, and loop containing S4 can be
executed in any order.
 The transformed code therefore is the one shown
below:

5/21/2019 VNIT Nagpur 92


Loop Fission Contd..
for (i=1;i<=Ni++)
{ s2 : b[i] = c[i-1] * x + y;
s3 : c[i] = 1/b[i];
}
for (i=1;i<=Ni++)
s1 : a[i] = a[i] + b[i-1];
for (i=1;i<=Ni++)
s4 : d[i] = sqrt(c[i]);

5/21/2019 VNIT Nagpur 93


Loop Restructuring Techniques for
reduction in space requirement

5/21/2019 VNIT Nagpur 94


Common Subexpression Elimination

 In the expression "(a+b)- a+b)/4", "common


subexpression" refers to the duplicated "(a+b)“

 Since "(a+b)" won't change, it should be


calculated only once.

5/21/2019 VNIT Nagpur 95


Dead Code Elimination
 Removes instructions that will not affect the
behaviour of the program
 For example definitions which have no uses,
constitutes dead code
 This reduces code size and eliminates unnecessary
computation.

5/21/2019 VNIT Nagpur 96


Controlling function inlining
 When some code invokes a procedure, it is possible
to directly insert the body of the procedure inside
the calling code rather than transferring control to it.
This is called as function inlining
 This saves the overhead related to procedure calls, as
well as providing great opportunity for many
different parameter-specific optimizations, but
comes at the cost of space; the procedure body is
duplicated each time the procedure is called inline.

5/21/2019 VNIT Nagpur 97


Controlling function inlining Contd…

 Generally, inlining is useful in performance-critical


code that makes a large number of calls to small
procedures. A "fewer jumps" optimization.

 Hence if the function inlining is controlled it will


lead to more compact code.

5/21/2019 VNIT Nagpur 98


Cross Jump Elimination
 It refers to combining of two or more instances of
identical code.

 For Example, multiple return points from functions


generate often identical code, and can be optimized
to a single return sequence.

5/21/2019 VNIT Nagpur 99


Tail Call Optimization
 A tail call is a call immediately before a return
 Normally when we have a tail call, the function will
be called, and when it returns to the caller, the caller
returns again.
 Tail call optimization avoids this by restoring the
saved context before jumping to the tail call. The
called function will now return directly to the caller's
caller, saving a return sequence.
 For example consider the following code:

5/21/2019 VNIT Nagpur 100


Tail Call Optimization Contd…

B(){ …….. C()}


 Assume that A calls B

 So while executing B, if context of A is restored and


then jump to C is executed, then C will directly
return to A.

5/21/2019 VNIT Nagpur 101


Tail Recursion Elimination

 The compiler also supports tail call recursion, which


is possible when the tail call is made to the same
function.
 In this case it is possible to skip the entry and exit
sequence altogether, converting the call into a loop.
 Tail call optimization is done by armcc only (as tcc
has a limited branch range), tail call recursion by
both armcc and tcc.

5/21/2019 VNIT Nagpur 102


Pure function detection
 A pure function is a function that always evaluates
the same result value given the same argument
value(s).
 The function’s result value cannot depend on any
hidden information or state that may change as
program execution proceeds or between different
executions of the program, nor can it depend on any
external input from I/O devices.
 Evaluation of the result does not cause any
semantically observable side effect or output, such as
mutation of mutable objects or output to I/O
devices.
5/21/2019 VNIT Nagpur 103
Pure function detection Contd…
 A pure function can be subject to common
subexpression elimination and loop optimization
just as an arithmetic operator would be.
 If fun() is a pure function then an expression;
fun(a) + fun(a) will be optimized by calling fun()
only once.

5/21/2019 VNIT Nagpur 104


5/21/2019 VNIT Nagpur 105

You might also like