0% found this document useful (0 votes)
18 views55 pages

Lecture Notes Compiler Design Chapter-6

Chapter Six discusses Intermediate Code Generation, focusing on intermediate representations, code generation techniques, and the use of three-address code. It outlines the importance of intermediate languages and various types of intermediate representations, including syntax trees and directed acyclic graphs (DAGs). The chapter also details the implementation of three-address statements, including their syntax and various forms such as quadruples and triples.

Uploaded by

youngtitan700
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views55 pages

Lecture Notes Compiler Design Chapter-6

Chapter Six discusses Intermediate Code Generation, focusing on intermediate representations, code generation techniques, and the use of three-address code. It outlines the importance of intermediate languages and various types of intermediate representations, including syntax trees and directed acyclic graphs (DAGs). The chapter also details the implementation of three-address statements, including their syntax and various forms such as quadruples and triples.

Uploaded by

youngtitan700
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

CHAPTER SIX

Intermediate Code
Generation

1
Outline
• Intermediate representations
• Intermediate code generation
• Intermediate languages
• Syntax-Directed Translation of Abstract Syntax Trees
• Abstract Syntax Trees versus DAGs
• Three-Address Code
• Three-Address Statements
• Syntax-Directed Translation into Three-Address Code
• Implementation of Three-Address Statements:
– Quads, triples, indirect triples
• Three address code for an assignment statement and an
expression
2
Outline

• Three address code for Declarations


• Translation scheme to generate three-address code
• Addressing array elements

3
Intermediate
Representations
• In a compiler, the front end translates source program into
an intermediate representation,
• and the back end generates the target code from this
intermediate representation.
• The use of a machine independent intermediate code (IC) is:

• retargeting to another machine is facilitated


• the optimization can be done on the machine independent
code

• Type checking is done in another pass  Multi - pass


• IC generation and type checking can be done at the same time.
One - pass
4
Intermediate Representations
• Decisions in IR design affect the speed and
efficiency of the compiler
• Some important IR properties
• Ease of generation
• Ease of manipulation
• Procedure size
• Level of abstraction
• The importance of different properties varies
between compilers
— Selecting an appropriate IR for a compiler is
critical
5
Position of IC MULTI
PASS

Synta Parse
Token Parser x tree Type tree IC gen
strea checke
m r

IC
code
Parser Type IC gen
checke
r
Parse tree ONE
PASS
6
Intermediate Code
Generation
• Intermediate language can be many different
languages, and the designer of the compiler decides
this intermediate language.
– Syntax tree can be used as an intermediate language.
– Postfix notation can be used as an intermediate language.
– Three-address code (Quadraples) can be used as an
intermediate language
• We will use three address to discuss intermediate code generation
• Three address are close to machine instructions, but they are not
actual machine instructions.
– Some programming languages have well defined
intermediate languages.
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute instructions in
these intermediate languages.
7
Types of Intermediate
Representations
Three major categories
• Structural Examples:
— Graphically oriented Trees, DAG
— Heavily used in source-to-source translators
— Tend to be large
• Linear
— Pseudo-code for an abstract machine Examples:
— Level of abstraction varies 3 address code
— Simple, compact data structures Stack machine
— Easier to rearrange code
• Hybrid
Example:
— Combination of graphs and linear code Control-flow
— Example: control-flow graph graph 8
Intermediate languages
• Syntax tree
• While parsing the input, a syntax tree can be constructed.
• A syntax tree (abstract tree) is a condensed form of parse tree useful for
representing language constructs.
• For example, for the string a+b, the parse tree in (a) below can be represented
by the syntax tree shown in (b);
• the keywords (syntactic sugar) that existed in the parse tree will no longer exist
in the syntax tree.

Parse E +
tree
E E
+ a b

a b Abstract
tree 9
Abstract Syntax Trees
a*(b+c) E

E E
*
a ( E )
E + E
*
b c
a +

b c
10
Abstract Syntax Trees versus
DAGs
TREE a:= b * -c + b * -c DAG

:= :=

a +
a +

* *
*

b uminus
b uminus b uminus

c c c

11
Syntax Tree representation

TREE a:= b * -c + b * -c
=
:=
id a

a + +

* * * *

id b id b
b uminus b uminus

uminus uminus
c c
id c id c 12
Three-Address Code
a:= b * -c + b * -c

t1 := - c t1 := - c
t2 := b * t1 t2 := b * t1
t3 := - c t5 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5 Linearized
representation of a
Linearized syntax DAG
representation of a
syntax tree
13
Three-Address Code
• A three address code is:
x := y op z
where x, y and z are names, constants or compiler-
generated temporaries; op is any operator.

• But we may also use the following notation for three


address code (much better notation because it looks like a
machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.

• We use the term “three-address code” because each


statement usually contains three addresses (two for
operands, one for the result).
14
Three-Address Code…
• In three-address code:

• Only one operator at the right side of the


assignment is possible, i.e. x + y * z is not possible
• Similar to postfix notation, the three address code is
a linear representation of a syntax tree.
• It has been given the name three-address code
because such an instruction usually contains three
addresses (the two operands and the result)

t1 = y * z
t2 = x + t1
15
Three-Address Statements
Binary Operator:

op y,z,result or result := y op z

• Where op is a binary arithmetic or logical operator.


• This binary operator is applied to y and z, and the result
of the operation is stored in result.
Ex: add a,b,c
mul a,b,c
addr a,b,c
addi a,b,c

16
Three-Address Statements (cont.)
Unary Operator:

op y,, result or result := op y

• Where op is a unary arithmetic or logical


operator.
• This unary operator is applied to y, and the result
of the operation is stored in result.
Ex: uminus a,,c
not a,,c
inttoreal a,,c
17
Three-Address Statements (cont.)
Copy/ Move Operator: mov y,,result or result := y
where the content of y is copied into result.
Ex: mov a,,c
movi a,,c
movr a,,c

Unconditional Jumps: jmp ,,L or goto L


We will jump to the three-address code with the label L,
and the execution continues from that statement.
Ex: jmp ,,L1 // jump to L1
jmp ,,7 // jump to the statement 7

18
Three-Address Statements
(cont.)
Conditional Jumps: jmprelop y,z,L or if y relop z goto L
We will jump to the three-address code with the label L if the result of y relop z is
true, and the execution continues from that statement. If the result is false, the execution
continues from the statement following this conditional jump statement.
Ex: jmpgt y,z,L1 // jump to L1 if y>z
jmpgte y,z,L1 // jump to L1 if y>=z
jmpe y,z,L1 // jump to L1 if y==z
jmpne y,z,L1 // jump to L1 if y!=z

Our relational operator can also be a unary operator.


jmpnz y,,L1 // jump to L1 if y is not zero
jmpz y,,L1 // jump to L1 if y is zero
jmpt y,,L1 // jump to L1 if y is true
jmpf y,,L1 // jump to L1 if y is false

19
Three-Address Statements (cont.)
Procedure Parameters: param x,, or param x
Procedure Calls: call p,n, or call p,n
where x is an actual parameter, we invoke the procedure p with n
parameters.
Ex: param x1,,
param x2,,
p(x1,...,xn)
param xn,,
call p,n,

f(x+1,y) add x,1,t1


param t1,,
param y,,
call f,2,
20
Three-Address Statements
(cont.)
Indexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x

Address and Pointer Assignments:


moveaddr y,,x or x := &y
movecont y,,x or x := *y

21
Three Address Statements
(summary)

• Assignment statements: x := y op z, x := op y
• Indexed assignments: x := y[i], x[i] := y
• Pointer assignments: x := &y, x := *y, *x := y
• Copy statements: x := y
• Unconditional jumps: goto L
• Conditional jumps: if y relop z goto L
• Function calls: param x… call p, n
return y

22
Syntax-Directed Translation into Three-
Address Code
• Syntax directed translation can be used to generate the
three-address code.
• Generally, either:
• the three-address code is generated as an attribute of
the attributed parse tree or
• the semantic actions have side effects that write the
three-address code statements in a file.

• When the three-address code is generated, it is often


necessary to use temporary variables and temporary
names.

23
Syntax-Directed Translation into Three-
Address Code
• The following functions are used to generate 3-
address code:

• newtemp() - each time this function is called, it gives


distinct names that can be used for temporary
variables.
– returns t1, t2,…, tn in response to successive calls
• newlabel() - each time this function is called, it gives
distinct names that can be used for label names.
• gen() to generate a single three address statement
given the necessary information.
• variable names and operations.
24
Syntax-Directed Translation into Three-
Address Code
• gen will produce a three-address code after concatenating
all the parameters.
• For example:

• If id1.lexeme = x, id2.lexeme =y and id3.lexeme = z:

• gen (id1.lexeme, ‘:=’, id2.lexeme, ‘+’, id3.lexeme)

• will produce the three-address code : x := y + z


• Note: variables and attribute values are evaluated by gen
before being concatenated with the other parameters.

25
Syntax-Directed Translation into 3-
address code
• Deal with assignments.
• Use attributes:
– E.place: the name that will hold the value of E
• Identifier will be assumed to already have the place
attribute defined.
– E.code: hold the three address code statements
that evaluate E (this is the `translation’
attribute).
Syntax-Directed Translation into Three-
Address Code
Production Semantic Rules
S → id := E S.code three address code for S
| while E do S S.begin lable to start of S or nil
E→ E+E S.after lable to end of S or nil
|E*E E.code three-address code for E
|-E E.place a name holding the value of
E
|(E)
| id
| num
gen(E.place ‘:=‘ E1.place ‘+’ E2.place)
Code generation
t3 := t1 + t2
27
Implementation of Three-Address
Statements:
• The description of three-address instructions specifies
the components of each type of instruction.
• However, it does not specify the representation of these
instructions in a data structure.
• In a compiler, these statements can be implemented as
objects or as records with fields for the operator and the
operands.

• Three such representations are:


– Quadruples
– Triples and
– Indirect triples
28
Implementation of Three-Address
Statements…
– Quadruples A quadruple (or just "quad') has four
fields, which we call op, arg1, arg2, and result
– Triples: A triple has only three fields, which we call op,
arg1, and arg2.
– Indirect Triples: consists of a listing of pointers to
triples, rather than a listing of triples themselves.

• The benefit of Quadruples over Triples can be seen in an optimizing


compiler, where instructions are often moved around.
• With quadruples, if we move an instruction that computes a
temporary t, then the instructions that use t require no change.
• With triples, the result of an operation is referred to by its position,
so moving an instruction may require to change all references to that
result.
• This problem does not occur with indirect triples.
Implementation of Three-
Address Statements: Quads
a:= b * -c + b * -c
Quads (quadruples)
Three address code # op Arg1 Arg2 Res
t1 := - c (0) uminus c t1
t2 := b * t1 (1) * b t1 t2
t3 := - c (2) uminus c t3
t4 := b * t3 (3) * b t3 t4
t5 := t2 + t4 (4) + t2 t4 t5
a := t5 (5) := t5 a

The original FORTRAN Pro: easy to rearrange code for


compiler used “quads” global optimization, explicit names
Cons: lots of temporaries
30
Implementation of Three-
Address Statements: Triples
a:= b * -c + b * -c
Triples
Three address code
# op Arg1 Arg2
t1 := - c (0) uminus c
t2 := b * t1 (1) * b (0)
t3 := - c (2) uminus c
t4 := b * t3 (3) * b (2)
t5 := t2 + t4 (4) + (1) (3)
a := t5 (5) := a (4)

Implicit names occupy


no space Pro: temporaries are implicit
25% less space consumed than quads
Cons: difficult to rearrange codes
31
More triplet representations

# op Arg1 Arg2 # op Arg1 Arg2


(0) [ ]= x i (0) =[] y i
(1) assign (0) y (1) assign x (0)

x [i] = y x = y [i]

• Major tradeoff between quads and triples is


compactness
versus ease of manipulation
— In the past compile-time and space was critical
— Today, speed may be more important
32
Implementation of Three-
Address Statements: Indirect Triples
a:= b * -c + b * -c
Pointers to Triples Indirect Triples

# stmt # op Arg1 Arg2


(0) (14) (14) uminus c
(1) (15) (15) * b (14)
(2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) := a (18)

Pro: temporaries are implicit & easier to rearrange code


Cons: Uses more space than triples

33
Syntax-Directed Translation into Three-
Address Code
Three address code for an assignment statement and an
expression
Productions Semantic actions
S  id := E S.code := E.code || gen (id.lexeme ‘ :=‘ E.place); S.begin = S.after = nil
E  E1 + E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’,E2.place)
E  E1 * E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E  - E1 E.place := newtemp();
E.code := E1.code || gen (E.place, ‘:= uminus ’ E1.place)
E  ( E1) E.place := E1.place
E.code := E1.code
E  id E.place := id.lexeme
E.code := ‘’ /* empty code */
E  num E.place := newtemp();
E.code := gen (E.place ‘=‘ num. value)
34
Syntax-Directed Translation (cont.)
Three address code for an assignment statement and an expression

S  while E do S1 S.begin = newlabel();


S.after = newlabel();
S.code = gen(S.begin “:”) || E.code ||
gen(‘if’ E.place ‘=‘ ‘0’ ‘goto’ S.after) || S 1.code ||
gen(‘goto’ S.begin) ||
gen(S.after ‘:”)

S.begin: E.code
if E.place = 0 goto S.after
S1.code
goto S.begin
S.after: ….
35
Syntax-Directed Translation (cont.)

S  if E then S1 else S2 S.else = newlabel();


S.after = newlabel();
S.code = E.code ||
gen(‘if’ E.place ‘=’’0’ ‘goto’ S.else) ||
S1.code ||
gen(‘goto’ S.after) ||
gen(S.else ‘:”) || S2.code ||
gen(S.after ‘:”)
EE<E E.place=newtemp();
E.code = E1.code || E2.code ||
gen (E.place, ‘=‘, E1.place, ‘<‘, E2.place)
36
code for flow-of-control
statements
to E.true to E.true
E.code to E.false E.code to E.false
E.true: E.true:
S1.code S1.code
E.false:
... goto S.next
E.false:
S2.code
...
S.next:
(a) if-then
(b) if-then-else

37
Syntax-Directed Translation (cont.)
S  while E do S1 S.begin = newlabel();
S.after = newlabel();
S.code = gen(S.begin “:”) || E.code ||
gen(‘jmpf’ E.place ‘,,’ S.after) || S1.code ||
gen(‘jmp’ ‘,,’ S.begin) ||
gen(S.after ‘:”)
S  if E then S1 else S2 S.else = newlabel();
S.after = newlabel();
S.code = E.code ||
gen(‘jmpf’ E.place ‘,,’ S.else) || S1.code ||
gen(‘jmp’ ‘,,’ S.after) ||
gen(S.else ‘:”) || S2.code ||
gen(S.after ‘:”)
38
Exercises
• Draw the decorated parse tree and generate
three-address code by using the translation
schemes given:

a) A := B + C
b) A := C * ( B + D)
c) while a < b do a := (a + b) * c
d) while a < b do a := a + b
e) a:= b * -c + b * -c

39
Three address code for A := B + C

S.code = E.code || gen(id.lexeme=E.place)


t1=B+C, A=t1

id.lexeme= := E.place=t1
A E.code = t1=E1.place + E2.place
t1=B+c

E.code=“ “ := E.code=“ “
E.place=B E.place=C

id.lexeme= B id.lexeme= C
40
Three address code for A := C * (B + D)
S.code => t1 =B+D, t2 =C*t1, A=t2

E.place=t2
id.lexeme= A := E.code = t2=C*t1, t1=B+D

E.place=E1.place
E.code=“ “ E.code=E1.code
*
E.place=C
E.place=t1
( E.code=
t1= B+D
)
id.lexeme= C
E.code=“ “ E.code=“ “
+
E.place=B E.place=D

id.lexeme= B id.lexeme= D 41
Exercises
while a < b i := 2 * n + k
a = (a + b ) * c while i do
i:= i - k

L1: t1 := a < b t1 := 2
if t1 = 0 goto L2 t2 := t1 * n
t2 := a + b t3 := t2 + k
t3 := t2 * c i := t3
a := t3 L1: if i = 0 goto L2
goto L1 t4 := i - k
L2: i := t4
goto L1
L2:
How come ? Draw the decorated
parse tree 42
Three address code for
Declarations
• The declaration is used by the compiler as a source of
type-information that it will store in symbol table.

• While processing the declaration, the compiler reserves:


– memory location for the variables and
– stores the relative address of each variable in the symbol
table.

• In this section, we use:


– variables,
– attributes and
– procedure that help processing of declaration.

43
Three address code for
Declarations
• The compiler maintains a global offset variable that
indicates the first address not yet allocated.
• Initially, offset is assigned 0.
• Each time an address is allocated to a variable, the offset
is incremented by the width of the data object denoted
by the name.
• The procedure enter (name, type, address) creates a
symbol table entry for name, give it the type type and
the relative address address.
• The synthesized attributes name and width for non-
terminal T are also used to indicate the type and number
of memory units taken by objects of that type.

44
Translation scheme for declaration
PMD
M€ { offset=0 }
DD;D
D  id : T { enter(id.name,T.type,offset); offset=offset+T.width }
T  int { T.type=int; T.width=4 }
T  real { T.type=real; T.width=8 }
T  array[num] of T1 { T.type=array(num.val,T1.type);
T.width=num.val*T1.width }
T  ^ T1 { T.type=pointer(T1.type); T.width=4 }

where enter crates a symbol table entry with given values.


45
Example
integer = 4 byte, real = 8 byte, pointer = 4 byte
Var :
x : integer
y : real
t : array [10] of integer
v : ^ integer
id type offset
x integer 0
y real 4
t array[10] of int 12
v pointer (integer) 52
array = 10 * 4 + 12

46
Addressing array elements
• Elements of arrays can be accessed quickly if the elements are
stored in a block of consecutive locations.

A one-dimensional array A:

… …

baseA low i width

baseA is the address of the first location of the array A,


width is the width of each array element.
low is the index of the first array element

location of A[i] baseA + (i-low) * width


47
Addressing Array elements (cont.)
baseA+(i-low)*width
can be re-written as i*width + (baseA-low*width)

should be computed at run-time can be computed at compile-time

• So, the location of A[i] can be computed at the run-time by evaluating


the formula i*width+c where c is (baseA-low*width) which is
evaluated at compile-time.

• Intermediate code generator should produce the code to evaluate this


formula i*width+c (one multiplication and one addition operation).

48
Addressing Array elements(cont.)
• Example for an array declared as A : array [10..20] of integer;
• if it is stored at the address 100,
A[15] = 100 + (15 – 10) * 4

…:= A[i] = baseA + (i – low) * w


= i * w + c
where c = baseA – low * w
with low = 10; w = 4

t1 := c // c = baseA – 10 * 4
t2 := i * 4
t3 := t1[t2]
…:= t3

49
Addressing Array elements: Grammar

S  L := E Synthesized attributes:
EE+E
| E*E E.place name of temp holding
value of E
|-E L.place lvalue (=name of temp)
|(E) L.offset index into array (=name
|L of temp)
L  id [ E ] null indicates non-array simple
| id id

50
Three-address code for assignment statement and
expressions (including array references)
S  L := E if L.offset = nil then /* L is a simple id */
S.code := L.code || E.code || Gen (L.place, ‘:=’, E.place);
else
S.code := L.code || E.code || Gen (L.place, ‘[’, L.offset, ‘] :=’,
E.place);
E  E1 + E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’,
E2.place)
E  E1 * E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’,
E2.place)
E  - E1 E.place := newtemp();
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
51
Three-address code for assignment statement and
expressions…
E  (E1) E.place := E1.place;
E.code := E1.code
EL if L.offset = nil then /* L is simple */
begin
E.place := L.place
E.code := L.code;
else
begin
E.place := newtemp();
E.code := L.code || gen (E.place, ‘ :=’, L.place, ‘[’ , L.offset, ‘]’)
end

52
Three-address code for assignment statement and
expressions…
L  id [E] L.place := newtemp();
L.offset := newtemp();
L.code := E.code || gen (L.place, ‘:=’, base (id.lexeme) -
width (id.lexeme) * low(id.lexeme)) || gen (L.offset, ‘:=’,
E.place, ‘*’, width (id.lexeme));
L  id p := lookup (id.lexeme)
if p <> nil then L.place = p.lexeme else error
L.offset := nil; /* for simple identifier */
L.code := ‘’ /* empty code */

53
Example
• Three-address code generation for the input X := A [y]
• A is stored at the address 100 and its values are integers
(width = 4) and low = 1.
• The semantic actions will generate the following three-
address code.
t1 := 96
t2 := y * 4
t3 := t1 [t2]
x := t3

Exercise: produce the attributed parse tree (decorated


parse tree)
54
Example
• Three-address code generation for input:
tab1 [i + k] := x + tab2 [j]
• tab1 is stored at the address 100 and its value is integer
• tab2 is stored at the address 200 and its value is integer
• The semantic actions will generate the following three-
address code.
t1=i+k
t2=96
t3=t1*4
t4=196
t5=j*4
t6=t4[t5]
t7=x+t6
t2[t3]=t7
Exercise: produce the attributed parse tree (decorated parse
tree)
55

You might also like