0% found this document useful (0 votes)
11 views

3unit cd IntermediateCode_Part1

The document discusses intermediate code generation in compilers, focusing on syntax trees, three-address code, type checking, and control flow. It explains the construction of Directed Acyclic Graphs (DAGs) for expressions, the value-number method for DAG construction, and various forms of three-address code including quadruples and triples. Additionally, it covers type expressions, type equivalence, and the storage layout for local names in programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

3unit cd IntermediateCode_Part1

The document discusses intermediate code generation in compilers, focusing on syntax trees, three-address code, type checking, and control flow. It explains the construction of Directed Acyclic Graphs (DAGs) for expressions, the value-number method for DAG construction, and various forms of three-address code including quadruples and triples. Additionally, it covers type expressions, type equivalence, and the storage layout for local names in programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Intermediate Code Generation

Dr. N. Kalyani
Professor, CSE
Contents
Intermediate-Code Generation:
Variants of Syntax Trees
Three-Address Code
Types and Declarations
Type checking
Control Flow
Back Patching
Switch Statements
Intermediate Code for Procedures
Parsing & Intermediate Code generation
• In the analysis-synthesis model of a compiler, the front end
analyzes a source program and creates an intermediate
representation, from which the back end generates target code.
• Source language is confined to the front end, and details of the
target machine to the back end.
• While parsing, static checking, and intermediate-code generation
are done sequentially; All these can be combined and folded into
parsing.
Intermediate Code Representation
• Intermediate representations, including syntax trees and three-
address code.
• Syntax trees are high level; they depict the natural hierarchical
structure of the source program and are well suited to tasks
like static type checking /evaluation ordering.
• A low-level representation is suitable for machine-dependent
tasks like register allocation and instruction selection.
• “Three-address code" comes from instructions of the general
form x = y op z with three addresses: two for the operands y
and z and one for the result x.
• Three-address code can range from high to low-level,
depending on the choice of operators.
Variants of Syntax Trees
1. Directed Acyclic Graphs for Expressions
2. The Value-Number Method for Constructing DAG's
1. Directed Acyclic Graphs for Expressions

Like the syntax tree for an expression,


• A DAG has leaves corresponding to atomic operands and
interior nodes corresponding to operators.
• The difference is that a node N in a DAG has more than one
parent if N represents a common subexpression.
• In a syntax tree, the tree for the common subexpression
would be replicated as many times as the subexpression
appears in the original expression.
• A DAG gives the compiler important clues regarding the
generation of efficient code to evaluate the expressions.
DAG for the expression

a + a * (b - c) + (b - c) * d
Construction of DAG

 The SDD can construct either syntax trees or DAG’s.

 It was used to construct syntax trees, where functions


Leaf and Node created a fresh node each time they
were called.
 It will construct a DAG if, before creating a new node,
these functions first check whether an identical node
already exists.
 If a previously created identical node exists, the
existing node is returned.
SDD to produce Syntax tree or DAG
2. The Value-Number Method for Constructing DAG's
2. The Value-Number Method for Constructing DAG's
Algorithm : The value-number method for constructing the nodes of a
DAG.
INPUT : Label op, node l, and node r.
OUTPUT : The value number of a node in the array with signature
(op, l, r).
METHOD :
• Search the array for a node M with label op, left child I, and right
child r.
• If there is such a node, return the value number of M.
• If not, create in the array a new node N with label op, left child I,
and right child r, and return its value number.
Essential data structure to construct DAG
• The hash table is one of several data structures that support
dictionaries efficiently.
• To construct a hash table for the nodes of a DAG, we need a
hash function h that computes the index of the bucket for a
signature (op, I, r).
• The bucket index h(op, I, r) is computed deterministically from
op, I, and r, so that we may repeat the calculation and always get
to the same bucket index for node (op, I, r).
Exercise

Construct the DAG for the expression

((a + y)- ((x + y)*(x- y))) + ((x + y) * (x - y))


Three-Address Code

1. Addresses and Instructions

2. Quadruples

3. Triples

4. Static Single-Assignment Form


Three-Address Code

• Three-address code - One operator and two operands on right side.


• For expression like x+y*z might be translated into the sequence of
three-address instructions

where t1 and t2 are compiler-generated temporary names.


• The use of names for the intermediate values computed by a
program allows three-address code to be rearranged easily.
DAG for the expression

a + a * (b - c) + (b - c) * d
Addresses and Instructions to build Three-address code
Three -address code can be implemented using records called
quadruples and triples
An address can be one of the following:
A name : For convenience, we allow source-program names
(pointer to its symbol-table entry) to appear as addresses in
three-address code.
A constant : In practice, a compiler must deal with many
different types of constants and variables.
A compiler-generated temporary : Useful, especially in
optimizing compilers, to create a distinct name each time a
temporary is needed. These temporaries can be combined, if
possible, when registers are allocated to variables.
Addresses and Instructions to build Three-address code

A symbolic label represents the index of a three-address instruction in


the sequence of instructions. Actual indexes can be substituted for the
labels, either by making a separate pass or by "backpatching,"
1. Assignment instructions of the form x = y op z, where op is a
binary arithmetic or logical operation, and x, y, and z are
addresses.
2. Assignments of the form x = op y, where op is a unary operation.
Essential unary operations include unary minus, logical negation,
shift operators, and conversion operators (int, float, etc).
3. Copy instructions of the form x = y, where x is assigned the
value of y.
4. An unconditional jump go to L. The three-address instruction
with label L is the next to be executed.
Addresses and Instructions to build Three-address code

5. Conditional jumps of the form if x goto L and if False x goto L.


These instructions execute the instruction with label L next if x is
true and false, respectively. Otherwise, the following three-
address instruction in sequence is executed next, as usual.
6. Conditional jumps such as if x relop y goto L, which apply a
relational operator (<, ==, >=, etc.) to x and y, and execute the
instruction with label L next if x stands in relation relop to y. If
not, the three-address instruction following if x relop y goto L is
executed next, in sequence.
7. Procedure calls and returns are implemented using the following
instructions: param x for parameters; call p , n and y = call p , n
for procedure and function calls, respectively; and return y, where
y, representing a returned value, is optional. Their typical use is as
the sequence of three-address instructions
Addresses and Instructions to build Three-address code
8. Indexed copy instructions of the form x = y[i] and x[i]
= y. The instruction x = y[i] sets x to the value in the
location i memory units beyond location y. The
instruction x[i] =y sets the contents of the location i
units beyond x to the value of y.
9. Address and pointer assignments of the form x = &y, x =
* y, and * x = y.
• x = &y sets the r-value of x to be the location (l-value) of y.

• x = *y, The r-value of x is made equal to the contents of the


r-value of location y.
• *x = y sets the r-value of the object pointed to by x to the r-
value of y.
Addresses and Instructions to build Three-address code

Consider the statement do i = i + 1 ; while ( a [ i ] < v ) ;


2. Quadruples
A quadruple (or just "quad!') has four fields op, arg1, arg2, result.
• op field contains an internal code for the operator, agr 1 and arg2
are the arguments and the fourth field is the result.
For instance, the three-address instruction x = y + z is represented by
placing + in op, y in arg1, z in arg2, and x in result.

The following are some exceptions to this rule:


1. Instructions with unary operators like x = minus y or x = y do not
use arg2. Note that for a copy statement like x = y, op is =, while for
most other operations, the assignment operator is implied.

2. Operators like param use neither arg2 nor result.

3. Conditional and unconditional jumps put the target label in result.


2. Quadruples
Three-address code for the assignment a = b * - c + b * - c
3. Triples
• A triple has only three fields, which we call op, arg1, and arg2.
• Using triples, we refer to the result of an operation x op y by its
position, rather than by an explicit temporary name.
• A triple representation would refer to position (0). Parenthesized
numbers represent pointers into the triple structure itself.
• Triples are equivalent to signatures in Value numbering method.
The DAG and triple representations of expressions are
equivalent.
• The equivalence ends with expressions, since syntax-tree
variants and three-address code represent control flow quite
differently.
3. Triples

a = (b * - c) + (b * - c)
Indirect triples
• Indirect triples consist of a listing of pointers to triples, rather than
a listing of triples themselves.
• With indirect triples, an optimizing compiler can move an
instruction by reordering the instruction list, without affecting the
triples themselves.
• When implemented in Java, an array of instruction objects is
analogous to an indirect triple representation, since Java treats the
array elements as references to objects.
Translate the arithmetic expression
a) a + - (a -b- c)
b) a = b[i] + c[j]

1. A syntax tree.
2. Quadruples.
3. Triples.
4. Indirect triples.
Static Single-Assignment Form
• Static single-assignment form (SSA) is an intermediate representation
that facilitates certain code optimizations.
• Two distinctive aspects distinguish SSA from three-address code.
• The first is that all assignments in SSA are to variables with distinct
names; hence the term static single-assignment.
• Note that subscripts distinguish each definition of variables p and q in the
SSA representation.
• The same variable may be defined in two different control-flow paths in
a program.
Types and Declarations

1 Type Expressions

2 Type Equivalence

3 Declarations

4 Storage Layout for Local Names

5 Sequences of Declarations
Types and Declaration
Type checking
• It uses logical rules to reason about the behavior of a program at
run time.
• Specifically, it ensures that the types of the operands match the
type expected by an operator.
• Example : && operator expects its two operands to be Booleans,
the result is also of type Boolean.

Translation Applications
• From the type of a name, a compiler can determine the storage
that will be needed for that name at run time.
• Type information is also needed to calculate the address denoted
by an array reference, to insert explicit type conversions.
1. Type Expressions
• Types have structure, which we shall represent using type
expressions.
• A type expression is either a basic type or is formed by
applying an operator called a type constructor to a type
expression.
• The sets of basic types and constructors depend on the
language to be checked.
1. Type Expressions
Definition of type expressions:

• A basic type is a type expression. Typical basic types for a


language include Boolean, char, integer, float, and void.
• A type name is a type expression.
• A type expression can be formed by applying the array type
constructor to a number and a type expression.
• A record is a data structure with named fields. A type expression
can be formed by applying the record type constructor to the
field names and their types.
• A type expression can be formed by using the type constructor
→• for function types. We write s —»• t for "function from type
s to type t."
Type Names and Recursive Types
Once a class is defined, its name can be used as a type name
Example: consider Node in the program fragment
public class Node { • • • }
public Node n;
Names can be used to define recursive types, which are needed for
data structures such as linked lists.
class Cell { int info; Cell next; ••• }
Similar recursive types can be defined using records and pointers.
If s and t are type expressions, then their Cartesian product s x t is
a type expression.
2. Type Equivalence
When type expressions are represented by graphs, two types are
structurally equivalent if and only if one of the following conditions
is true:
• They are the same basic type.
• They are formed by applying the same constructor to structurally
equivalent types.
• One is a type name that denotes the other.
• If type names are treated as standing for themselves, then the
first two conditions in the above definition lead to name
equivalence of type expressions.
• Name-equivalent expressions are assigned the same value
number, if used.
3. Declarations
A simplified grammar that declares just one name at a time and
declarations with lists of names.

The above grammar that deals with basic and array types was used to
illustrate inherited attributes.
• Nonterminal D generates a sequence of declarations.
• Nonterminal T generates basic, array, or record types.
• Nonterminal B generates one of the basic types int and float.
• Nonterminal C, for "component," generates strings of zero or more
integers, each integer surrounded by brackets.
4. Storage Layout for Local Names
4. Storage Layout for Local Names

parse tree for the type int [2][3]


5. Sequences of Declarations

You might also like