0% found this document useful (0 votes)
12 views42 pages

Cs 3007 Inter Code Gen

Uploaded by

RihanshuRaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views42 pages

Cs 3007 Inter Code Gen

Uploaded by

RihanshuRaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Intermediate Code Generation

Compiler Design (CS 3007)

S. Pyne1 & T. K. Mishra2

Assistant Professor1,2
Computer Science and Engineering Department
National Institute of Technology Rourkela
{pynes,mishrat}@nitrkl.ac.in

September 22, 2020

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 1 / 42
Front and back ends

Intermediate code
Machine independent representation of source code
Simplifies conversion from source code to target code
Analysis synthesis - syntax and semantic analysis on source code
Front end - intermediate code from source code
Back end - target code from intermediate code
Intermediate representation
Combines front end of i with back end of machine j
m × n compilers - by writing m front ends and n back ends

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 2 / 42
Implementation

Traversal of syntax trees


Three address code representation - x = y op z
Syntax tree - closer to high level language
Three address code - range from high- to low-level
Example - for loop statements
Syntax tree represents statement components
Three address code has labels and jump instruction
Intermediate code representation
An actual programming language like C, or
Internal data structures shared by different phases
Original C++ compiler
Front end generated intermediate code in C language
Back end compiled intermediate C code

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 3 / 42
Syntax trees

Syntax nodes represent constructs in source program


Node children represents components of a construct
Directed Acyclic graph (DAG) for common sub-expression
DAG for the expression a + a ∗ (b − c) + (b − c) ∗ d

Sub-expession b − c occur twice


b − c is represented by one node labeled −
Node labeled − has two parents for a ∗ (b − c) and (b − c) ∗ d
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 4 / 42
SDD for syntax tree and DAG

Functions Leaf and Node create new nodes


Creates DAG is if identical node already exists
Node(op, left, right)
Checks a node labeled op with children left and right
If found Node returns existing node, otherwise creates a new node

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 5 / 42
DAG construction

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 6 / 42
DAG construction - Value-Number Method
INPUT: Label op, node l, and node r.
OUTPUT: The value number of a node in the array with signature
(op, l, r ).
METHOD: Search the array for a node M with label op, left child l, and
right child r.
1 If there is such a node, return the value number of M.

2 If not, create in the array a new node N with label op,

left child l, and right child r, and return its value


number.

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 7 / 42
Data Structure - Value-Number Method

Hash table for the nodes of a DAG


Hash function h(op, l, r ) computes the index of the bucket for
(op, l, r )
h(op, l, r ) is computed deterministically from (op, l, r )

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 8 / 42
Three-address Code
A linearized representation of a syntax tree or a DAG in which explicit
names correspond to the interior nodes of the graph
There is at most one operator on the right side of an instruction; that
is, no built-up arithmetic expressions are permitted
Three-address code for the expression: a + a ∗ (b − c) + (b − c) ∗ d

t1, t2,· · · , t5 are compiler-generated temporary names


Target-code generated and code optimized from three-address code
Intermediate value names allow rearrangement of three-address code
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 9 / 42
Three-address Code - Addresses

An address can be one of the following


A name
source-program names to appear as addresses in three-address code
a source name is replaced by a pointer to its symbol-table entry, where
all information about the name is kept
A constant
A compiler-generated temporary
useful in optimizing compilers
to create a distinct name each time a temporary is needed
temporaries can be combined when registers are allocated to variables

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 10 / 42
Three-address Code - Instructions

1 Assignment instructions x = y op z for binary operations


2 Assignments x = op y, for binary operations
3 Copy instructions x = y, x is assigned the value of y
4 An unconditional jump goto L
5 Conditional jumps if x = true goto L and if x = false goto L
6 Conditional jumps if x relop y goto L, relop ∈ {<, <=, ==, >=, >}
7 Procedure call and return for p(x1 , x2 , · · · xn )
Procedure call - call p, n
Function call - y = call p, n
Three address instructions
param x1
param x2
···
param xn
call p, n

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 11 / 42
Three-address Code - Instructions
8 Indexed copy instructions
x = y [i] sets x to value in location i memory units beyond location y
x[i] = y sets contents of location i units beyond x to value of y
8 Address and pointer assignments
x = &y sets the r-value of x to be the location (l-value) of y
x = ∗y - r-value of x is made equal to contents of location pointed by y
∗x = y sets the r-value of the object pointed to by x to the r-value of y
l-value left side of assignments
r-value right side of assignments
l-value - location, r-value - content
Variable x: l-value is memory location &x, r-value is content at &x
Constant x: r-value is x, no l-value
Pointer ∗x: l-value is &x, r-value is location pointed by x
x[i]: l-value is &x[i] or x + i, r-value is x[i] or i[x] or ∗(x + i)
Common “l-value required” errors: 5 = x and x + y = z
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 12 / 42
An Example

Consider the statement


do i = i+1; while (a[i] < v);
Two possible translations as follows

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 13 / 42
Quadruples

Three-address instructions do not specify representation of instruction


in a data structure
A quadruple (or quad) has four fi
elds - op, arg1 , arg2 , and result
op fi
eld contains an internal code for the operator
Three-address instruction x = y + z is represented by placing
+ in op,
y in arg1 , z in arg2 , and x in result
Some exceptions
1 Unary op (x = minus y ) or assignment op (x = y ) do not use arg2
2 Operators like param use neither arg2 nor result
3 Conditional and unconditional jumps put the target label in result

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 14 / 42
An Example

Consider the assignment statement: a = b * - c + b * - c;


Three-address code and its quadruple is

In general quadruple fields are pointers to their symbol-table entries


Temporary names can be either
entered into symbol table like programmer-defined names, or
implemented as objects of a class Temp with its own methods

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 15 / 42
Triples

A triple has three fields - op, arg1 , and arg2


In quadruples result field is primarily used as temporary names
Triple refers to result of an operation x op y by its position
t1 in following quadruple is refered as position (0) in triple

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 16 / 42
Indirect Triples
In triples the result of an operation is referred to by its position
Moving an instruction may change all references to that result
Indirect triples solve this problem
Indirect triples consist of a listing of pointers to triples
Uses an array to list pointers to triples in the desired order

Indirect triples enable an optimizing compiler to move instructions by


reordering the pointer list without affecting the triples
Java implementation of indirect triples can be an array of references
to objects of triple objects
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 17 / 42
Static Single-Assignment (SSA) Form
SSA is an intermediate representation for certain code optimizations
SSA differs from three-address code in two ways
1 All assignments in SSA are to variables with distinct names - static
single-assignment

Subscripts distinguish each definition of variables p and q


2 SSA uses φ-function to combine two definations of a variable in two
different control-flow paths in a source program

(
x1 , if flag is true
φ(x1 , x2 ) =
x2 , if flag is false
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 18 / 42
Types and Declarations

Type checking
Uses rules to reason about the behavior of a program at run time
Ensures that types of operands match type expected by an operator
For example, the && operator in Java expects its two operands to be
booleans; the result is also of type boolean
Translation Applications
From type of a name, a compiler determines the storage needed by it
at run time
Type information is needed to calculate the address denoted by an
array reference
To insert explicit type conversions
To choose the right version of an arithmetic operator

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 19 / 42
Type Expressions

A type expression is either a basic type or is formed by applying an


operator called a type constructor to a type expression
The sets of basic types and constructors depend on the language to
be checked
For example, the array type int[2][3]
can be read as “array of 2 arrays of 3 integers each”
written as a type expression array(2, array(3, integer))
This type is represented by the tree

The operator array takes two parameters - a number and a type

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 20 / 42
Definition of type expressions
A basic type is a type expression.
Basic types in a programming language - boolean, char, integer, float,
and void
A type name is a type expression.
Type expression can be formed by applying array type constructor to
a number and a type expression
A type expression record is a data structure with named fields.
Formed by applying the record type constructor to the field names
and their types.
A type expression can be formed using type constructor → for
function types. s → t denotes “function from type s to type t”
If s and t are type expressions, then their Cartesian product s × t is a
type expression. Used to represent a list or tuple of types (e.g. for
function parameters).
Type expressions may contain variables whose values are type
expressions.
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 21 / 42
Type Equivalence

Two types are structurally equivalent iff one of the following is true
They are the same basic type
They are formed by applying the same constructor to structurally
equivalent types
One is a type name that denotes the other
For type names
first two conditions in above definition lead to name equivalence of
type expressions

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 22 / 42
Declarations

The following grammar deals with basic and array types


D → T id; D|
T → B C | record 0 {0 D 0 }0
B → int | float
C →  | [ num ] C
Nonterminal D generates a sequence of declarations
Nonterminal T generates basic, array, or record types
Nonterminal B generates one of basic types int and float
Nonterminal C , for “component”, generates strings of zero or more
integers, each integer surrounded by brackets
An array type consists of a basic type specified by B, followed by
array components specified by C
A record type (the second production for T ) is a sequence of
declarations for the fields of the record, surrounded by curly braces

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 23 / 42
Storage Layout for Local Names
Type of a name determines amount of storage it needs at run time
At compile time, amount of storage assigns a relative address to each
name. Type and relative address are saved in symbol-table entry for
the name
Data of varying length, such as strings or dynamic arrays whose size
cannot be determined until run time are handled reserving fixed
amount of storage for a pointer to the data
Run-time storage management deals with data of varying length
Consider storage in blocks of contiguous bytes where a byte is the
smallest unit of addressable memory
A byte is eight bits, and some number of bytes form a machine word
Multibyte objects are stored in consecutive bytes and given the
address of first byte
Width of a type is number of storage units needed for objects of that
type. A basic type, such as a char, int, or , float requires an integral
number of bytes. For easy access, storage for aggregates such as
arrays and classes is allocated in one contiguous block of bytes
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 24 / 42
Computing Storage for Local Names
SDT for computing type and width of basic and array types

SDT uses synthesized attributes type and width for each nonterminal
Variables t and w pass type and width information from a B node in
a parse tree to the node for C → .
Action between B and C sets t to B.type and w to B.width
If B → int then B.type is set to integer and B.width is set to 4
If B → float then B.type is set to float and B.width is set to 8
C →  then t becomes C .type and w becomes C .width
Otherwise, C specifies an array component
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 25 / 42
Syntax-directed translation of array types

The parse tree for the type int[2][3] shown by dotted lines
The solid lines show how the type and width are passed
The variables t and w are assigned the values of B.type and B.width
Then the subtree with the C nodes is examined
The values of t and w are used at the node for C →  to start
evaluation of synthesized attributes up the chain of C nodes

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 26 / 42
Sequences of Declarations
In C and Java declarations in a procedure is processed as a group
Distributed declarations in Java procedure, processed during analysis
Use a variable offset to keep track of next relative address
Following SDT deals with a sequence of declarations of the form T id

Semantic action within D → T id ; D1 creates a symbol-table entry


top denotes current symbol table
top.put creates a symbol-table entry for id.lexeme with type T.type
and relative address offset in its data area
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 27 / 42
Fields in Records and Classes
Field names within a record must be distinct
Offset or relative address for a field name is relative to data area for
that record

In the first action


The call Env .push(top) pushes the current symbol table denoted by
top onto a stack
top is then set to a new symbol table
offset is pushed onto a stack called Stack
offset is then set to 0
In the second action
T.type is set to record(top) and T.width is to offset
top and offset are then restored to their pushed values to complete the
translation of this record type
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 28 / 42
Translation of Expressions

S.code and E.code denote three-address code for S and E


E.addr denotes the address that will hold the value of E
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 29 / 42
Incremental Translation

A single sequence of instructions is created by successive calls to gen

For E → E1 + E2 calls gen to generate an add instruction


Only after instructions computing E1 and E2 have already been
generated

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 30 / 42
Example - Generating 3-address code using gen

Input Stack addr Generated Code


1. A:=-B*(C+D)
2. :=-B*(C+D) id A
3. :=-B*(C+D) id:= A
4. B*(C+D) id:=- A
5. *(C+D) id:=-id A B
6. *(C+D) id:=-E A B T1:=-B
7. *(C+D) id:=E A T1
8. (C+D) id:=E* A T1
9. C+D) id:=E*( A T1
10. +D) id:=E*(id A T1 C
11. +D) id:=E*(E A T1 C
12. D) id:=E*(E+ A T1 C
13. ) id:=E*(E+id A T1 CD
14. ) id:=E*(E+E A T1 CD T2:=C+D
15. ) id:=E*(E A T1 T2
16. id:=E*(E) A T1 T2
17. id:=E*E A T1 T2 T3:=T1*T2
18. id:=E A T3 A:=T3
19. A A

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 31 / 42
Addressing Array Elements
Array elements in consecutive locations for quick access
In C and Java, n array elements are indexed 0, 1, · · · , n − 1
Address of A[i] is base + i × w
base is relative address of A[0]
w is width of each array element
For 2 − d A[n1 ][n2 ], address of A[i1 ][i2 ] is base + (i1 × n2 + i2 ) × w
For k − d A[n1 ][n2 ] · · · [nk ], address of A[i1 ][i2 ] · · · [ik ] is
base + ((· · · ((i1 × n2 + i2 ) × n3 + i3 ) · · · ) × nk + ik ) × w
In general 1 − d array index numbers low , low + 1, · · · , high
Number of elements, n = high − low + 1
base = addr (A[low ])
addr (A[i]) = base + (i − low ) × w
Let addr (A[i]) = i × w + c
Compile time precalculation c = base − low × w
c = base when low = 0
c is saved in symbol table entry for A
Hence using c from symbol table relative address of A[i] is i × w + c
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 32 / 42
Layouts of 2-d Array

Compile-time precalculation for addressing elements of


multidimensional static arrays
Compile-time precalculation not applicable to dynamic arrays
c cannot be computed without knowning low and high in compile-time
2-d array stored in row-major (row-by-row) or column-major
(column-by-column) form
In general for multidimensional arrays
for row-major form rightmost subscript varies fastest while scanning a
block of storage
for column-major form lefttmost subscript varies fastest
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 33 / 42
Translation of Array References

L.addr - offset for array reference


L.array - a pointer to the symbol-table entry for the array name
L.array.base - determines l-value of an array reference
L.type - type of the subarray generated by L
t.width - width of type t
t.elem - element type
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 34 / 42
Translation of Array References

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 35 / 42
Type Checking

Type checking can take on two forms: synthesis and inference


Type synthesis builds up type of an expression from types of its
subexpressions
Type of E1 + E2 is defined in terms of types of E1 and E2
A typical rule for type synthesis has the form
if f has type s → t and x has type s,
then expression f (x) has type t
Type inference determines type of a language construct from way it is
used
null(x) tells that x must be a list as empty list is null
A typical rule for type inference has the form
if f (x) is an expression,
then for some α and β, f has type α → β and x has type α
Type inference is needed for ML language, which check types, but do
not declare names

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 36 / 42
Type Conversion
How to solve expression x + i ? - x is float and i is int
Conversion from int to float: t1 = (float) 2; t2 = t1 * 3.14;
Conversion is implicit (coercions) if done by compiler
Conversion is explicit (casts) if done by programmer
In Java, widening conversions preserve information and narrowing
conversions can loose information

The semantic action for checking E → E1 + E2 uses two functions


1 max(t1 , t2 ) returns maximum of types t1 and t2 in widening hierarchy
2 widen(a, t, w )) returns address a if types t and w are same, otherwise
it generates an instruction to convert
S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 37 / 42
widen for int and float

Generates an instruction to do the conversion


Places the result in a temporary

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 38 / 42
Type Conversions into Expression Evaluation

Temporary variable a1 is either


E1 .addr , if the type of E1 is not converted to type of E , or
a new temporary variable returned by widen, if converted
Similarly, a2 is E2 .addr if not converted, otherwise new temp. var.
In general, to add two different types convert both to a third type

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 39 / 42
Overloading of Functions and Operators

An overloaded symbol has different meanings depending on its context


Overloading is resolved when a unique meaning is determined for each
occurrence of a name
The + operator in Java denotes either string concatenation or
addition, depending on the types of its operands
User-de
fined functions can be overloaded , as in
void err() { · · · }
void err(String s) { · · · }
Type-synthesis rule for overloaded functions
if f can have type si → ti , for 1 ≤ i ≤ n, where si 6= sj for i 6= j
and x has type sk , for some 1 ≤ k ≤ n
then expression f (x) has type tk

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 40 / 42
Type Inference and Polymorphic Functions
Polymorphic - any code fragment executed with arguments of
different types
Parametric polymorphism by parameters or type variables
Example - ML program for finding length of a list
fun length(x) =
if null(x) then 0 else length(tl(x)) + 1;
length([00 sun00 ,00 mon00 ,00 tue 00 ]) + length([10, 9, 8, 7])
evaluates to 3 + 4 = 7
Abstract syntax tree for the function definition

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 41 / 42
Thank you

S. Pyne1 & T. K. Mishra2 (NITRKL) Aut’20, Compiler Design (CS 3007) September 22, 2020 42 / 42

You might also like