Unit-4-2
Unit-4-2
UNIT-4
2
SRMIST, Vadapalani Campus
Intermediate Code Generation
● In the analysis-synthesis model of a compiler, the front end translates a source
program into an intermediate representation from which the back end generates
target code.
● A source program can be translated directly into the target language, some
benefits of using a machine-independent Intermediate form are:
1. Retargeting is facilitated; a complier for a different machine can be created
by attaching a back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate
representation.
3
● It can be used to translate into an intermediate code programming language
constructs such as declarations, assignments, and flow-of-control
statements.
● Assume that the source program has already been parsed and statically
checked
● Most of the SDD can be implemented during either bottom-up or top-down
parsing
4
Intermediate Languages
● Syntax trees and postfix notation, are two kinds of intermediate
representations.
● A third, called three-address code, will be used here.
● The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing
syntax trees or for generating postfix notation.
8
● Two representations of the
syntax tree
● Each node is represented as
a record with a field for its
operator and additional
fields for pointers to its
children.
● In Fig (b), nodes are
allocated from an array of
records and the index or
position of the node serves
as the pointer to the node.
● All the nodes in the syntax
tree can be visited by
following pointers, starting
from the root at position 10.
9
Three-Address Code
10
• Three-address code is a linearized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph.
• The syntax tree and dag are represented by the three-address code sequences as given below
• Variable names can appear directly in three-address statements, and has no statements
corresponding to the leaves
The reason for the term "three-address code" is that each statement usually contains three
addresses, two for the operands and one for the result.
11
Types of Three Address Statements
13
7. Indexed assignments of the form x:=y[i] and x[i] :=y.
The first statement x:=y[i] 🡪sets x to the value in the location i memory units beyond
location y.
The second statement x[i] :=y 🡪sets the contents of the location i units beyond x to the
value of y.
Where instructions, x, y, and i refer to data objects.
8. Address and pointer assignments of the form x := &y, x := *y and *x := y
Statement x := &y 🡪sets the value of x to be the location of y.
Here y is a name, a temporary, that denotes an expression and x is a pointer name or
temporary.
Statement x : = *y🡪 sets y is a pointer or a temporary whose r-value is a location.
Statement *x := y 🡪sets the r-value of the object pointed to by x to the r-value of y.
14
SDD into Three-Address Code
• When three-address code is generated, temporary names are made up for the interior nodes of
a syntax tree.
• The value of nonterminal E on the left side of E🡪El +E, will be computed into a new temporary t,
• The three address code for id : = E consists of code to evaluate E into some temporary t,
followed by the assignment id.place : = t.
15
The synthesized attribute
S.code represents the three
address code for the
assignment S.
The nonterminal E has two
attributes:
1. E.place the name that will
hold the value of E,
2. E.code the sequence of
three-address statements
evaluating E.
The function newtemp returns
a sequence of distinct names
t1,t2,…,tn in response to
successive calls.
For convenience, the notation gen(x ':=' y '+' z) is used
to represent the three-address statement x : = y + z.
16
Flow-of-control statements can be added to the language of assignments by productions and
semantic rules.
The code for S->while E do S1 is generated using new attributes S.begin and S,after to mark
the first statement in the code for E and the statement following the code S
17
● These attributes represent labels created by a function newlabel that returns a
new label every time it is called.
● Note that S.after becomes the label of the statement that comes after the code for
the while statement.
● Assume that a non-zero expression represents true; i.e. when the value of E
becomes zero, control leaves the while statement
● Expressions that govern the flow of control may in general be boolean expressions
containing relational and logical operators
18
Implementation of Three-Address Statements
● A three address statement is an abstract form of intermediate code.
● In a compiler, these statements can be implemented as records with fields for the
operator and the operands.
● There are 3 such representations: quadruples, triples, and indirect triples.
Quadruples
● A quadruple is a record structure with four fields, op, arg1, arg2, and result
● The op field contains an internal code for the operator
● The three-address statement x : = y op z is represented by placing y in arg1, z in
arg2, and x in result.
● Statements with unary operators like x : = -y or x : = y do not use arg2.
● Operators like param use neither arg2 nor result.
● Conditional and unconditional jumps put the target label in result
● The quadruples for the assignment a : = b * - c + b * - c is given by (next slide)
● The contents of fields arg1, arg2, and result are normally pointers to the
19
symbol-table entries for the names represented by these fields
Triples
● To avoid entering temporary names into the symbol table; We refer it as temporary
value by the position of the statement that computes it.
● Here three-address statements can be represented by records with only three
fields: op, arg1 and arg2
● The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol
table or pointers into the triple structure (for temporary values).
● Since three fields are used, this intermediate code format is known as triples
Parenthesized numbers
represent pointers into the triple
structure, while symbol-table
pointers are represented by the
names themselves.
Indirect triples
● Another implementation of three-address
code is listing pointers to triples, rather than
listing the triples themselves.
● This implementation is naturally called
indirect triples
● Eg: An array statement to list pointers to
triples in the desired order.
21
22
23