0% found this document useful (0 votes)
6 views20 pages

Unit 5 Part 2

A Basic Block in Compiler Design is a sequence of instructions with a single entry and exit point, represented using three-address code. Basic blocks can be identified by leader statements, and optimizations can be applied to improve code efficiency, including common sub-expression elimination and loop optimization techniques. Directed Acyclic Graphs (DAGs) are used to represent the structure of basic blocks and visualize the flow of values, aiding in optimization.

Uploaded by

umairknp2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

Unit 5 Part 2

A Basic Block in Compiler Design is a sequence of instructions with a single entry and exit point, represented using three-address code. Basic blocks can be identified by leader statements, and optimizations can be applied to improve code efficiency, including common sub-expression elimination and loop optimization techniques. Directed Acyclic Graphs (DAGs) are used to represent the structure of basic blocks and visualize the flow of values, aiding in optimization.

Uploaded by

umairknp2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Basic Blocks

 A Basic Block in Compiler Design is a block of code that contains a sequence of


instructions that has only one entry point and one exit point.
 Single Entry Point: There is only one way to enter a basic block. The control flow can
only get to the beginning of the basic block from the immediately preceding block.
 Single Exit Point: There is only one way to exit from a basic block which is the last
statement of the block, we cannot jump from in between the blocks.

Representation of Basic Block:

A basic block can be represented using three-address code, which breaks down complex
statements into simpler instructions. Each basic block is usually identified by a leader statement,
and the instructions within the block are organized linearly. Control flow between basic blocks is
often represented using directed edges in a control flow graph.

Consider the code below for the expression a=b∗c+5/6

t1 = b * c;
t2 = 5 / 6;
t3 = t1 + t2;
The above code doesn't contain any branches or jump statements. It is only composed of
sequences of sentences. There is only one way to get to this block of code which is the first
statement function and the last statement is the only exit point for this block of code.
Consider another sequence of code:

if A < B goto (4)


T1 = 0
goto (5)
T1 = 1
The above code contains a goto statement and therefore, it cannot be considered as a basic block.
Rules for finding Basic Blocks

Any given code can be partitioned into basic blocks using the following rules-

1. Determining Leaders-

Following statements of the code are called as Leaders–

 First statement of the code.


 Statement that is a target of the conditional or unconditional goto statement.
 Statement that appears immediately after a goto statement.
2. Determining Basic Blocks-
 All the statements that follow the leader (including the leader) till the next leader appears
form one basic block.
 The first statement of the code is called as the first leader.
 The block containing the first leader is called as Initial block.

Example

Consider the source code for the dot product of two vectors a and b of length 20.

begin
prod :=0;
i:=1;
do begin
prod :=prod+ a[i] * b[i];
i :=i+1;
The three-address code for the above block of code can be generated as follows:
(1) prod := 0
(2) i := 1
(3) t1 := 4 * i
(4) t2 := a[t1] /*compute a[i] */
(5) t3 := 4 * i
(6) t4 := b[t3] /*compute b[i] */
(7) t5 := t2 * t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := I + 1
(11) i := t7
(12) if i<=20 goto (3)

There are four basic blocks formed:


Basic Block 1: STATEMENT (1) TO (3)
Basic Block 2: STATEMENT (4) TO (6)
Basic Block 3: STATEMENT (7) TO (9)
Basic Block 4: STATEMENT (10) TO (11)
 Explanation
 Now, let's identify the leader statements to form Basic Block in Compiler Design:
 Statement 1 is always a leader statement.
 Statement 4 is a leader statement as it is the target of a conditional jump.
 Statement 7 is a leader statement, the first statement after the conditional jump.
 Statement 10 is a leader statement as it is the target of an unconditional jump.
 Each Basic Block in Compiler Design represents a sequence of instructions with a single
entry and exit point.
Flow Graphs-

 A flow graph is a directed graph with flow control information added to the basic blocks.
 The basic blocks serve as nodes of the flow graph.
 There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in
the code.
Example1.
Compute the basic blocks for the given three address statements-

(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)
Solution-
We have- PROD = 0 is a leader since first statement of the code is a leader.
T1 = 4 x I is a leader since target of the conditional goto statement is a leader.
Now, the given code can be partitioned into two basic blocks as-

Block 1

Block 2
Example 2. Draw a flow graph for the three address statements given in problem-01.

Solution-

Firstly, we compute the basic blocks (already done above).


Secondly, we assign the flow control information.
The required flow graph is-

OPTIMIZATION OF BASIC BLOCKS

There are two types of basic block optimizations. They are :


Ø Structure-Preserving Transformations
Ø Algebraic Transformations

Structure-Preserving Transformations:
The primary Structure-Preserving Transformation on basic blocks are:

Ø Common sub-expression elimination


Ø Dead code elimination
Ø Renaming of temporary variables
Ø Interchange of two independent adjacent statements.
Common sub-expression elimination: Common sub expressions need not be computed
over and over again. Instead they can be computed once and kept in store from where it’s
referenced.

Example:
a: =b+c
b: =a-d
c: =b+c
d: =a-d
The 2nd and 4th statements compute the same expression: b+c and a-d
Basic block can be transformed to
a: = b+c
b: = a-d
c: = a
d: = b

 Dead code elimination:


It is possible that a large amount of dead (useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of construction or
error-correction of a program - once declared and defined, one forgets to remove them in case
they serve no purpose. Eliminating these will definitely optimize the code.

 Renaming of temporary variables:


A statement t:=b+c where t is a temporary name can be changed to u:=b+c where u is
another temporary name, and change all uses of t to u. In this a basic block is transformed to its
equivalent block called normal-form block.

 Interchange of two independent adjacent statements:


• Two statements
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value of t1 does not
affect the value of t2.

 Algebraic Transformations:
Algebraic identities represent another important class of optimizations on basic blocks.
This includes simplifying expressions or replacing expensive operation by cheaper ones i.e.
reduction in strength. Another class of related optimizations is constant folding. Here we
evaluate constant expressions at compile time and replace the constant expressions by their
values. Thus the expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common
sub expressions. Associative laws may also be applied to expose common sub expressions. For
example, if the source code has the assignments
a :=b+c
e :=c+d+b
the following intermediate code may be generated: a :=b+c

t :=c+d e :=t+b

Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y

The compiler writer should examine the language specification carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does not always obey
the algebraic identities of mathematics. Thus, a compiler may evaluate x*y-x*z as x*(y-z) but it
may not evaluate a+(b-c) as (a+b)-c.

Code optimization: Code optimization is a program modification strategy that improve the
intermediate code, so a program utilises the least potential memory, minimises its CPU time and
offers high speed.

Optimization is classified broadly into two types:

 Machine-Independent
 Machine-Dependent
 Machine Independent Optimization: This code optimization phase attempts to improve
the intermediate code to get a better target code as the output. The part of the intermediate
code which is transformed here does not involve any CPU registers or absolute memory
locations.

 Machine Dependent Optimization: Machine-dependent optimization is done after


the target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory
references rather than relative references. Machine-dependent optimizers put efforts to take
maximum advantage of the memory hierarchy.

Loop Optimization
Loop Optimization is the process of increasing execution speed and reducing the overheads
associated with loops. It plays an important role in improving cache performance and making
effective use of parallel processing capabilities. Most execution time of a scientific program is
spent on loops.
Loop Optimization is a machine independent optimization. Whereas Peephole
optimization is a machine dependent optimization technique.

Most programs run as a loop in the system. It becomes necessary to optimize the loops in order
to save CPU cycles and memory. Loops can be optimized by the following techniques:

1. Code Motion (Frequency Reduction)


In frequency reduction, the amount of code in the loop is decreased. A statement or expression,
which can be moved outside the loop body without affecting the semantics of the program, is
moved outside the loop.
Example:
Before optimization:
while(i<100)
{
a = Sin(x)/Cos(x) + i;
i++;
}

After optimization:

t = Sin(x)/Cos(x);
while(i<100)
{
a = t + i;
i++;
}

2. Induction Variable Elimination


If the value of any variable in any loop gets changed every time, then such a variable is known
as an induction variable. With each iteration, its value either gets incremented or decremented
by some constant value.
Example:
Before optimization:
B1
i:= i+1
x:= 3*i
y:= a[x]
if y< 15, goto B2
In the above example, i and x are locked, if i is incremented by 1 then x is incremented by
3. So, i and x are induction variables.
After optimization:
B1
i:= i+1
x:= x+4
y:= a[x]
if y< 15, goto B2

3. Strength Reduction
Strength reduction deals with replacing expensive operations with cheaper ones like
multiplication is costlier than addition, so multiplication can be replaced by addition in the
loop.
Example:
Before optimization:
while (x<10)
{
y := 3 * x+1;
a[y] := a[y]-2;
x := x+2;
}
After optimization:
t= 3 * x+1;
while (x<10)
{
y=t;
a[y]= a[y]-2;
x=x+2;
t=t+6;
}

4. Loop Invariant Method


In the loop invariant method, the expression with computation is avoided inside the loop. That
computation is performed outside the loop as computing the same expression each time was
overhead to the system, and this reduces computation overhead and hence optimizes the code.
Example:
Before optimization:
for (int i=0; i<10;i++)
t= i+(x/y);
...
end;

After optimization:
s = x/y;
for (int i=0; i<10;i++)
t= i+ s;
...
end;
5. Loop Unrolling
Loop unrolling is a loop transformation technique that helps to optimize the execution time of
a program. We basically remove or reduce iterations. Loop unrolling increases the program’s
speed by eliminating loop control instruction and loop test instructions.
Example:
Before optimization:

for (int i=0; i<5; i++)


printf("Pankaj\n");

After optimization:

printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");

6. Loop Jamming
Loop jamming is combining two or more loops in a single loop. It reduces the time taken to
compile the many loops.
Example:
Before optimization:

for(int i=0; i<5; i++)


a = i + 5;
for(int i=0; i<5; i++)
b = i + 10;

After optimization:

for(int i=0; i<5; i++)


{
a = i + 5;
b = i + 10;
}

7. Loop Fission
Loop fission improves the locality of reference, in loop fission a single loop is divided into
multiple loops over the same index range, and each divided loop contains a particular part of
the original loop.
Example:
Before optimization:
for(x=0;x<10;x++)
{
a[x]=…
b[x]=…
}
After optimization:

for(x=0;x<10;x++)
a[x]=…
for(x=0;x<10;x++)
b[x]=…

8. Loop Interchange
In loop interchange, inner loops are exchanged with outer loops. This optimization technique
also improves the locality of reference.
Example:
Before optimization:

for(x=0;x<10;x++)
for(y=0;y<10;y++)
a[y][x]=…

After optimization:
for(y=0;y<10;y++)
for(x=0;x<10;x++)
a[y][x]=…

9. Loop Reversal
Loop reversal reverses the order of values that are assigned to index variables. This help in
removing dependencies.
Example:
Before optimization:

for(x=0;x<10;x++)
a[9-x]=…

After optimization:

for(x=9;x>=0;x--)
a[x]=…
10. Loop Splitting
Loop Splitting simplifies a loop by dividing it into numerous loops, and all the loops have
some bodies but they will iterate over different index ranges. Loop splitting helps in reducing
dependencies and hence making code more optimized.
Example:
Before optimization:

for(x=0;x<10;x++)
if(x<5)
a[x]=…
else
b[x]=…

After optimization:

for(x=0;x<5;x++)
a[x]=…
for(;x<10;x++)
b[x]=…

11. Loop Peeling


Loop peeling is a special case of loop splitting, in which a loop with problematic iteration is
resolved separately before entering the loop.
Before optimization:

for(x=0;x<10;x++)
if(x==0)
a[x]=…
else
b[x]=…

After optimization:

a[0]=…
for(x=1;x<100;x++)
b[x]=…

12. Unswitching
Unswitching moves a condition out from inside the loop, this is done by duplicating loop and
placing each of its versions inside each conditional clause.
Before optimization:

for(x=0;x<10;x++)
if(s>t)
a[x]=…
else
b[x]=…

After optimization:

if(s>t)
for(x=0;x<10;x++)
a[x]=…
else
for(x=0;x<10;x++)
b[x]=…

DAG: A DAG is a graph containing directed edges but no cycles, ensuring no path leads back
to the starting node.

The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, visualize
the flow of values between basic blocks, and provide optimization techniques in basic blocks.
To apply an optimization technique to a basic block, a DAG is a three-address code generated
as the result of intermediate code generation.

Characteristics of DAG

 The graph’s leaves each have a unique identifier, which can be variable names or constants.
 The interior nodes of the graph are labelled with an operator symbol.
 In addition, nodes are given a string of identifiers to use as labels for storing the computed
value.
 Directed Acyclic Graphs have their own definitions for transitive closure and transitive
reduction.
 Directed Acyclic Graphs have topological orderings defined.
 Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices
such that for every directed edge u-v, vertex u comes before v in the ordering.

DAG representation of basic blocks

A DAG for a basic block is a directed acyclic graph with the following labels on nodes:

1. Leaves are labeled by unique identifiers, either variable names or constants.


2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store the
computed values.

• DAGs are useful data structures for implementing transformations on basic blocks.
• It gives a picture of how the value computed by a statement is used in subsequent statements.
• It provides a good way of determining common sub - expressions.

Algorithm for construction of DAG

Input: A basic block

Output: A DAG for the basic block containing the following information:

1. A label for each node. For leaves, the label is an identifier. For interior nodes, an operator
symbol.

2. For each node a list of attached identifiers to hold the computed values. Case (i) x : = y OP z
Case (ii) x : = OP y

Case (iii) x : = y

Method:

Step 1:

If y is undefined then create node(y).

If z is undefined, create node(z) for case(i).

Step 2:

For the case(i), create a node(OP) whose left child is node(y) and right child is node(z).
(Checking for common sub expression). Let n be this node.

For case(ii), determine whether there is node(OP) with one child node(y). If not create such a
node.

For case(iii), node n will be node(y).

Step 3:

Delete x from the list of identifiers for node(x). Append x to the list of attached identifiers for the
node n found in step 2 and set node(x) to n.

Example: Consider the block of three- address statements in Fig 4.6

Stages in DAG Construction


Fig. 4.5 Steps in DAG construction process

Fig. 4.6 Code Block

Application of DAGs:

1. We can automatically detect common sub expressions.

2. We can determine which identifiers have their values used in the block.

3. We can determine which statements compute values that could be used outside the block.
Value Numbering

1. Common sub-expression elimination is an important optimization technique that we will


use to explore several aspects of optimization methods.
2. The goal of this optimization is to identify expressions that are guaranteed to produce
identical values at runtime and arrange to only perform the associated computation once
(when the first instance of the expression is encountered).
3. This involves tracking the flow of information through variables.
o In the code:
o x = a + b;
o y = c + d;
o a = e;
o z = a + b;
o w = b + y;
o v = b + c + d;

The two occurences of "a+b" are not common subexpressions because the value
of a may change between the evaluation of the first and second copies of the
expression. On the other hand, the last two expressions, "b + y" and "b + c + d"
can be identified as common subexpressions even though they are not textually
identical because they are guaranteed to produce identical values.

4. We will begin by restricting our attention to CSE in straight line code. Then, once the
details are understood, we can see how to deal with programs (like most real ones) that
include control constructs.
5. Eliminating common sub-expressions involves two non-trivial sub-problems.
o It isn't enough to find expressions that look identical.
 Expressions that look identical may produce distinct results if the values
of variables referenced by the two expressions change between their
evaluations.
 Expressions that don't look identical may produce identical results:
 x = 1 + y;

 a[y + 1] = a[x] + 1

o Even if we could proceed by simply looking for textually identical expressions, it


isn't immediately clear how we could find them all efficiently.
6. One scheme that can be used to identify CSEs is called Value Numbering.
Example:

GLOBAL DATAFLOW ANALYSIS


In order to do code optimization and a good job of code generation , compiler needs
to collect information about the program as a whole and to distribute this information to
each block in the flow graph. A compiler could take advantage of “reaching definitions” ,
such as knowing
where a variable like debug was last defined before reaching a given block, in order to perform
transformations are just a few examples of data-flow information that an optimizing compiler
collects by a process known as data-flow analysis.

Data-flow information can be collected by setting up and solving systems of equations of


the form :

out [S] = gen [S] U ( in [S] - kill [S] )


Flow Graph

GEN and KILL set

Reaching Definition:
Computation of IN and OUT

You might also like