0% found this document useful (0 votes)
38 views20 pages

Unit 4

The document discusses intermediate code generation in compiler design. It explains that intermediate code acts as a representation between the source code and target code. It allows for source code optimizations by modifying the intermediate code. The document then discusses different representations of intermediate code including syntax trees, postfix notation, and three-address code. It provides examples and explanations of each representation.

Uploaded by

savatid730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views20 pages

Unit 4

The document discusses intermediate code generation in compiler design. It explains that intermediate code acts as a representation between the source code and target code. It allows for source code optimizations by modifying the intermediate code. The document then discusses different representations of intermediate code including syntax trees, postfix notation, and three-address code. It provides examples and explanations of each representation.

Uploaded by

savatid730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Unit – 4

Intermediate Code Generation

1) Intermediate Code Generation


In the analysis-synthesis model of a compiler, the front end of a compiler
translates a source program into an independent intermediate code, and then
the back end of the compiler uses this intermediate code to generate the
target code (which can be understood by the machine).

 To act as a glue between front-end and backend (or source and


machine codes).
 To lower abstraction from source level.
 It is easier to apply source code modification to improve the
performance of source code by optimizing the intermediate code.

2) Intermediate Code Representation:

a) Syntax Tree
b) Postfix Notation
c) Three-Address Code

a) Syntax Tree
 Syntax tree is nothing more than condensed form of a parse tree.
 The operator and keyword nodes of the parse tree are moved to their parents
and a chain of single productions is replaced by single link in syntax tree the
internal nodes are operators and child nodes are operands.
 To form syntax tree put parentheses in the expression, this way it's easy to
recognize which operand should come first.
 Maintains structure of the construct
 Suitable for high-level representations

Example

x = (a + b * c) / (a – b * c)

Find syntax tree

b) Postfix Notation

 Left, right than root ( LRr) in tree


 The ordinary (infix) way of writing the sum of a and b is with operator in the
middle : a+b
The postfix notation for the same expression places the operator at the right
end as ab +. In general, if e1 and e2 are any postfix expressions, and + is any
binary operator, the result of applying + to the values denoted by e1 and e2
is postfix notation by e1e2 +. No parentheses are needed in postfix notation
because the position and arity (number of arguments) of the operators permit
only one way to decode a postfix expression. In postfix notation the operator
follows the operand.
 Postfix notation is the useful form of intermediate code if the given language
is expressions.
 Postfix notation is also called as 'suffix notation' and 'reverse polish'.
 Postfix notation is a linear representation of a syntax tree.
 In the postfix notation, any expression can be written unambiguously
without parentheses.
 The ordinary (infix) way of writing the sum of x and y is with operator in the
middle: x * y. But in the postfix notation, we place the operator at the right
end as xy *.
 In postfix notation, the operator follows the operand.

Example – The postfix representation of the expression (a – b) * (c + d) + (a – b)

Ans : ab – cd + *ab -+.

Example: 1+2/3-4*5

1. Start with 1+2/3-4*5


2. Parenthesize (using standard precedence) to get (1+(2/3))-(4*5)
3. Apply the above rules to calculate P{(1+(2/3))-(4*5)}, where P{X} means
“convert the infix expression X to postfix”.
4. P{(1+(2/3))-(4*5)}
5. P{(1+(2/3))} P{(4*5)} -
6. P{1+(2/3)} P{4*5} -
7. P{1} P{2/3} + P{4} P{5} * -
8. 1 P{2} P{3} / + 4 5 * -
9. 1 2 3 / + 4 5 * -

Example: Now do (1+2)/3-4*5

1. Parenthesize to get ((1+2)/3)-(4*5)


2. Calculate P{((1+2)/3)-(4*5)}
3. P{((1+2)/3) P{(4*5)} -
4. P{(1+2)/3} P{4*5) -
5. P{(1+2)} P{3} / P{4} P{5} * -
6. P{1+2} 3 / 4 5 * -
7. P{1} P{2} + 3 / 4 5 * -
8. 12+3/45*-
C) Three-Address Code

 A statement involving no more than three references(two for operands and


one for result) is known as three address statement.
 A sequence of three address statements is known as three address code.
 Three address statement is of the form x = y op z , here x, y, z will have
address (memory location).
 Sometimes a statement might contain less than three references but it is still
called three address statements.
 Maximum three addresses in an instruction
 Suitable for both high and low-level representations

Example1 – The three address code for the expression a+b*c+d:

T1=b*c
T2=a+T1
T3=T2+d

T 1 , T 2 , T 3 are temporary variables.

Example2 Convert the expression a * – (b + c) into three address code.

Example3 Write Three Address Code for the following expression-

-(a x b) + (c + d) – (a + b + c + d)
Solution-

Three Address Code for the given expression is-

(1) T1 = a x b

(2) T2 = uminus T1

(3) T3 = c + d

(4) T4 = T2 + T3

(5) T5 = a + b

(6) T6 = T3 + T5

(7) T7 = T4 – T6

Example4 Write Three Address Code for the following expression-

If A < B then 1 else 0

Solution-

Three Address Code for the given expression is-

(1) If (A < B) goto (4)

(2) T1 = 0

(3) goto (5)

(4) T1 = 1

(5)

Example5 Write Three Address Code for the following expression-

If A < B and C < D then t = 1 else t = 0


Solution-

Three Address Code for the given expression is-

(1) If (A < B) goto (3)

(2) goto (4)

(3) If (C < D) goto (6)

(4) t = 0

(5) goto (7)

(6) t = 1

(7)

Example6 Write Three Address Code for the following expression-

a := (-c * b) + (-c * d)

Three-address code is as follows:

t1 := -c
t2 := b*t1
t3 := -c
t4 := d * t3
t5 := t2 + t4
a := t5

Example7 Write Three Address Code for the following expression-

while (A < C and B > D) do


if A = 1 then C = C + 1
else
while A <= D
do A = A + B

Solution

Three address code for the given code is-


1.if (A < C) goto (3)
2.goto (15)
3.if (B > D) goto (5)
4.goto (15)
5.if (A = 1) goto (7)
6.goto (10)
7.T1 = c + 1
8.c = T1
9.goto (1)
10.if (A <= D) goto (12)
11.goto (1)
12.T2 = A + B
13.A = T2
14.goto (10)
15

The commonly used representations for implementing Three Address Code are-
1. Quadruples
2. Triples
3. Indirect Triples

1. Quadruples-

In quadruples representation, each instruction is splitted into the following 4


different fields-
op, arg1, arg2, result
Here-
 The op field is used for storing the internal code of the operator.
 The arg1 and arg2 fields are used for storing the two operands used.
 The result field is used for storing the result of the expression.

2. Triples-

In triples representation,
 References to the instructions are made.
 Temporary variables are not used.

3. Indirect Triples-

 This representation is an enhancement over triples representation.


 It uses an additional instruction array to list the pointers to the triples in the
desired order.
 Thus, instead of position, pointers are used to store the results.
 It allows the optimizers to easily re-position the sub-expression for producing the
optimized code.

Translate the following expression to quadruple, triple and indirect triple-


a=bx–c+bx–c

Solution-

Three Address Code for the given expression is-

T1 = uminus c
T2 = b x T1
T3 = uminus c
T4 = b x T3
T5 = T2 + T4
a = T5

Now, we write the required representations-

Quadruple Representation-

Location Op Arg1 Arg2 Result

(1) uminus c T1

(2) x b T1 T2

(3) uminus c T3

(4) x b T3 T4
(5) + T2 T4 T5

(6) = T5 a

Triple Representation-

Location Op Arg1 Arg2

(1) uminus c

(2) x b (1)

(3) uminus c

(4) x b (3)

(5) + (2) (4)

(6) = a (5)

Indirect Triple Representation-

Statement

35 (1)

36 (2)
37 (3)

38 (4)

39 (5)

40 (6)

Location Op Arg1 Arg2

(1) uminus c

(2) x b (1)

(3) uminus c

(4) x b (3)

(5) + (2) (4)

(6) = a (5)

Backpatching

Backpatching is basically a process of fulfilling unspecified information. This


information is of labels. It basically uses the appropriate semantic actions during
the process of code generation. It may indicate the address of the Label in goto
statements while producing TACs for the given expressions. Here basically two
passes are used because assigning the positions of these label statements in one
pass is quite challenging. It can leave these addresses unidentified in the first
pass and then populate them in the second round. Back patching is the process of
filling up gaps in incomplete transformations and information.

Basic Blocks-

Basic block is a set of statements that always executes in a sequence one after the
other
Instructions from intermediate code which are leaders are determined.
Following are the rules used for finding a leader:

1. The first three-address instruction of the intermediate code is a


leader.

2. Instructions that are targets of unconditional or conditional


jump/goto statements are leaders.

3. Instructions that immediately follow unconditional or conditional


jump/goto statements are considered leaders.
Example Of Basic Block-

Three Address Code for the expression a = b + c + d is-

Here,
 All the statements execute in a sequence one after the other.
 Thus, they form a basic block.

Example Of Not A Basic Block-

Three Address Code for the expression If A<B then 1 else 0 is-

Here,
 The statements do not execute in a sequence one after the other.
 Thus, they do not form a basic block.

Compute the basic blocks for the given three address statements-
(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)

Solution-

We have-
 PROD = 0 is a leader since first statement of the code is a leader.
 T1 = 4 x I is a leader since target of the conditional goto statement is a leader.

Now, the given code can be partitioned into two basic blocks as-
Flow Graph

Solution-

 Firstly, we compute the basic blocks (already done above).


 Secondly, we assign the flow control information.

The required flow graph is-


Directed Acyclic Graph-

Directed Acyclic Graph (DAG) is a special kind of Abstract Syntax Tree


 Each node of it contains a unique value.
 It does not contain any cycles in it, hence called Acyclic.

Optimization Of Basic Blocks-


DAG is a very useful data structure for implementing transformations on Basic
Blocks

 A DAG is constructed for optimizing the basic block.


 A DAG is usually constructed using Three Address Code.
 Transformations such as dead code elimination and common sub expression
elimination are then applied.

Applications-

DAGs are used for the following purposes-


 To determine the expressions which have been computed more than once (called
common sub-expressions).
 To determine the names whose computation has been done outside the block
but used inside the block.
 To determine the statements of the block whose computed value can be made
available outside the block.

Construction of DAGs-

Following rules are used for the construction of DAGs-


Rule-01:

In a DAG,
 Interior nodes always represent the operators.
 Exterior nodes (leaf nodes) always represent the names, identifiers or constants.

Rule-02:
While constructing a DAG,
 A check is made to find if there exists any node with the same value.
 A new node is created only when there does not exist any node with the same
value.
 This action helps in detecting the common sub-expressions and avoiding the re-
computation of the same.

Rule-03:

The assignment instructions of the form x:=y are not performed unless they are
necessary.

 T1 = a + b
T2 = T1 + c
T3 = T1 x T2
Ex.2
T1:= 4*I0
T2:= a[T1]
T3:= 4*I0
T4:= b[T3]
T5:= T2 * T4
T6:= prod + T5
prod:= T6
T7:= I0 + 1
I0:= T7
if I0 <= 20 goto 1

Dead code elimination in dag


The DAG for the basic block
x=a[i]
a[j]= y
z=a[i]

is shown in Fig- 8.14. The node N for x is created first, but when the node labeled [
] = is created, N is killed. Thus, when the node for z is created, it cannot be
identified with N, and a new node with the same operands a0 and i0 must be
created instead.

You might also like