0% found this document useful (0 votes)
12 views23 pages

Unit-4-2

The document discusses intermediate code generation in compilers. It describes different intermediate representations like syntax trees, postfix notation, and three-address code. It also provides examples of different types of three-address statements and how a syntax-directed definition can be used to generate three-address code from a source program.

Uploaded by

Jefferson Aaron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views23 pages

Unit-4-2

The document discusses intermediate code generation in compilers. It describes different intermediate representations like syntax trees, postfix notation, and three-address code. It also provides examples of different types of three-address statements and how a syntax-directed definition can be used to generate three-address code from a source program.

Uploaded by

Jefferson Aaron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

18CSC304J- COMPILER DESIGN

UNIT-4

SRMIST, Vadapalani Campus


UNIT-IV SYLLABUS
1. Intermediate Code Generation 10. Code Generation
2. Intermediate Languages - prefix - postfix11. Issues in the design of code generator
3. Quadruple - triple - indirect triples 12. The target machine – Runtime Storage
Representation management
4. Syntax tree- Evaluation of 10. A simple Code generator
expression-three-address code 11. Code Generation Algorithm
5. Synthesized attributes – Inherited 12. Register and Address Descriptors
attributes 13. Generating Code of Assignment Statements
6. Intermediate languages – Declarations 14. Cross Compiler – T diagrams
7. Assignment Statements 15. Issues in Cross compilers
8. Boolean Expressions, Case Statements
9. Back patching – Procedure calls

2
SRMIST, Vadapalani Campus
Intermediate Code Generation
● In the analysis-synthesis model of a compiler, the front end translates a source
program into an intermediate representation from which the back end generates
target code.
● A source program can be translated directly into the target language, some
benefits of using a machine-independent Intermediate form are:
1. Retargeting is facilitated; a complier for a different machine can be created
by attaching a back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate
representation.

3
● It can be used to translate into an intermediate code programming language
constructs such as declarations, assignments, and flow-of-control
statements.
● Assume that the source program has already been parsed and statically
checked
● Most of the SDD can be implemented during either bottom-up or top-down
parsing

4
Intermediate Languages
● Syntax trees and postfix notation, are two kinds of intermediate
representations.
● A third, called three-address code, will be used here.
● The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing
syntax trees or for generating postfix notation.

CS416 Compiler Design


5
Graphical Representations

● A syntax tree depicts the


natural hierarchical
structure of a source
program.
● A dag gives the same
information but in a more
compact way because
common subexpressions
are identified.
● A syntax tree and dag for
the assignment statement
a := b* -c + b* -c
CS416 Compiler Design
6
Postfix notation

● It is a linearized representation of s syntax tree; it is a list of the nodes of the tree


in which a node appears immediately after its children.
● The postfix notation for the syntax tree in previous slide is
a b c uminus * b c uminus * + assign
● The edges in a syntax tree do not appear explicitly in postfix notation
● They can be recovered from the order in which the nodes appear and the number
of operands that the operator at a node expects.
● The recovery of edges is similar to the evaluation using a stack, of an expression in
postfix notation.
● Syntax trees for assignment statements are produced by the SDD
● It is an extension of SDD.
● Nonterminal S generates an assignment statement.
● The two binary operators + and * are examples of the full operator set in a typical
7
language.
● Operator associativities and precedence's are the usual ones; even though they
have not been put into the grammar.
● This definition constructs the SDD from the input a := b* - c + b* -c.
● This same SDD will produce the dag representation if the functions mkunode (op,
child) and mknodr(op, left, right) return a pointer to an existing node
● The token id has an attribute place that points to the symbol-table entry for the
identifier.

8
● Two representations of the
syntax tree
● Each node is represented as
a record with a field for its
operator and additional
fields for pointers to its
children.
● In Fig (b), nodes are
allocated from an array of
records and the index or
position of the node serves
as the pointer to the node.
● All the nodes in the syntax
tree can be visited by
following pointers, starting
from the root at position 10.
9
Three-Address Code

● Three address code is a sequence of statements of the general form


● where x, y, and z are names, constants, or compiler-generated temporaries;
● op stands for any operator, such as a fixed- or floating-point arithmetic operator,
or a logical operator on boolean-valued data.
● A source language expression like x + y * z might be translated into a sequence

● where tl and t2 are compiler-generated temporary names.


● The use of names for the intermediate values computed by a program allows
three-address code to be easily rearranged - unlike postfix notation.

10
• Three-address code is a linearized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph.
• The syntax tree and dag are represented by the three-address code sequences as given below
• Variable names can appear directly in three-address statements, and has no statements
corresponding to the leaves

The reason for the term "three-address code" is that each statement usually contains three
addresses, two for the operands and one for the result.
11
Types of Three Address Statements

● Three-address statements are similar to assembly code.


● Statements can have symbolic labels and there are statements for flow of control.
● A symbolic label represents the index of a three-address statement in the array
holding intermediate code.
● Actual indices can- be substituted for the labels either by making a separate pass,
or by using "backpatching“
Some of the common three-address statements used are:
1. Assignment statements of the form x := y op Z, where op is a binary arithmetic or
logical operation.
2. Assignment instructions of the form x : = op y, where op is a unary operation.
Essential unary operations include unary minus, logical negation, shift operators,
and conversion operators that,
Eg: convert a fixed-point number to a floating-point number
12
3. Copy statement of the form x : = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the
next to be executed
5. Conditional jumps such as if x relop y goto L, This instruction applies a relational
operator (<,=,>=,etc.,) to x and y and executes the statement with label L next if x
stands in relation relop to y
6. param x and call p, n for procedure calls and return y, where y representing a
returned value is optional. The sequence of three-address statements generated as
part of a call of the procedure p( xl , x2, . . . , xn )
The integer n indicating the number of actual-parameters in
''call p , n" is not redundant because calls can be nested.

13
7. Indexed assignments of the form x:=y[i] and x[i] :=y.
The first statement x:=y[i] 🡪sets x to the value in the location i memory units beyond
location y.
The second statement x[i] :=y 🡪sets the contents of the location i units beyond x to the
value of y.
Where instructions, x, y, and i refer to data objects.
8. Address and pointer assignments of the form x := &y, x := *y and *x := y
Statement x := &y 🡪sets the value of x to be the location of y.
Here y is a name, a temporary, that denotes an expression and x is a pointer name or
temporary.
Statement x : = *y🡪 sets y is a pointer or a temporary whose r-value is a location.
Statement *x := y 🡪sets the r-value of the object pointed to by x to the r-value of y.

14
SDD into Three-Address Code

• When three-address code is generated, temporary names are made up for the interior nodes of
a syntax tree.
• The value of nonterminal E on the left side of E🡪El +E, will be computed into a new temporary t,
• The three address code for id : = E consists of code to evaluate E into some temporary t,
followed by the assignment id.place : = t.

The S-attributed definition generates three-address code for assignment statements.


Given input a := b * - c + b * - c, it produces the code

15
The synthesized attribute
S.code represents the three
address code for the
assignment S.
The nonterminal E has two
attributes:
1. E.place the name that will
hold the value of E,
2. E.code the sequence of
three-address statements
evaluating E.
The function newtemp returns
a sequence of distinct names
t1,t2,…,tn in response to
successive calls.
For convenience, the notation gen(x ':=' y '+' z) is used
to represent the three-address statement x : = y + z.

16
Flow-of-control statements can be added to the language of assignments by productions and
semantic rules.
The code for S->while E do S1 is generated using new attributes S.begin and S,after to mark
the first statement in the code for E and the statement following the code S

17
● These attributes represent labels created by a function newlabel that returns a
new label every time it is called.
● Note that S.after becomes the label of the statement that comes after the code for
the while statement.
● Assume that a non-zero expression represents true; i.e. when the value of E
becomes zero, control leaves the while statement
● Expressions that govern the flow of control may in general be boolean expressions
containing relational and logical operators

● Postfix notation can be obtained by adapting the semantic rules .


● The postfix notation for an identifier is the identifier itself,
● The rules for the other productions concatenate only the operator after the code
for the operands.
● Eg: Associated with the production E🡪-E 1 is the semantic rule

18
Implementation of Three-Address Statements
● A three address statement is an abstract form of intermediate code.
● In a compiler, these statements can be implemented as records with fields for the
operator and the operands.
● There are 3 such representations: quadruples, triples, and indirect triples.
Quadruples
● A quadruple is a record structure with four fields, op, arg1, arg2, and result
● The op field contains an internal code for the operator
● The three-address statement x : = y op z is represented by placing y in arg1, z in
arg2, and x in result.
● Statements with unary operators like x : = -y or x : = y do not use arg2.
● Operators like param use neither arg2 nor result.
● Conditional and unconditional jumps put the target label in result
● The quadruples for the assignment a : = b * - c + b * - c is given by (next slide)
● The contents of fields arg1, arg2, and result are normally pointers to the
19
symbol-table entries for the names represented by these fields
Triples
● To avoid entering temporary names into the symbol table; We refer it as temporary
value by the position of the statement that computes it.
● Here three-address statements can be represented by records with only three
fields: op, arg1 and arg2
● The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol
table or pointers into the triple structure (for temporary values).
● Since three fields are used, this intermediate code format is known as triples
Parenthesized numbers
represent pointers into the triple
structure, while symbol-table
pointers are represented by the
names themselves.

The copy statement a : = t5 is


encoded in the triple
representation by placing a in
the arg1 field and using the
operator assign. 20
● A ternary operation like x[i] : = y requires two entries in the triple structure,
while x := y[i] is naturally represented us two operations

Indirect triples
● Another implementation of three-address
code is listing pointers to triples, rather than
listing the triples themselves.
● This implementation is naturally called
indirect triples
● Eg: An array statement to list pointers to
triples in the desired order.
21
22
23

You might also like