MIT6 035S10 Lec05
MIT6 035S10 Lec05
035
Intermediate Formats
Martin Rinard
Laboratory for Computer Science
Massachusetts Institute of Technology
Program
g Representation
p Goals
• Enable Prog
gram Analyysis and Transformation
– Semantic Checks, Correctness Checks, Optimizations
• Structure Translation to Machine Code
– Sequence of Steps
Semantic
Analysis High Level Low Level
Parse Machine
Intermediate Intermediate
Tree Code
Representation Representation
High
g Level IR
• Preserves Object Structure
• Preserves Structured Flow of Control
• Primary Goal: Analyze Program
Low Level IR
• Moves Data Model to Flat Address Space
• Eliminates Structured Control Flow
• Suitable for Low Level Compilation Tasks
– Register Allocation
– Instruction Selection
Examples
p of Object
j Representation
p
and Program Execution
(Thi happens
(This h when
h program runs))
Example
p Vector Class
class vector {
int v[];
void add(int x) {
int i;
i = 0;
while (i < v.length) { v[i] = v[i]+x; i = i+1; }
}
}
Representing
p g Arrays
y
• Items Stored Contiguously
g y In Memory
y
• Length Stored In First Word
3 7 4 8
• Color Code
– Red - generated by compiler automatically
– Blue, Yellow, Lavender - program data or code
– Magenta - executing code or data
Representing
p g Vector Objects
j
• First Word Points to Class Information
– Method Table, Garbage Collector Data
• Next Words Have Object Fields
– For vectors, Next Word is Reference to Array
Class Info
3 7 4 8
Invoking
g Vector Add Method
vect.add(1);
( )
• Create Activation Record
Class
l Info
f
3 7 4 8
Invoking
g Vector Add Method
vect.add(1);
( )
• Create Activation Record this
– this onto stack
Class
l Info
f
3 7 4 8
Invoking
g Vector Add Method
vect.add(1);
( )
• Create Activation Record this
1 x
– this onto stack
– parameters onto stack
Class
l Info
f
3 7 4 8
Invoking
g Vector Add Method
vect.add(1);
( )
• Create Activation Record this
1 x
– this onto stack
i
– parameters onto stack
– space for locals on stack
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;
i = i+1;
}
Class
l Info
f
3 7 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]+x;
v[i] [] ;
i = i+1;
}
Class
l Info
f
3 8 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
0 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 8 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
1 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 8 4 8
Executing
g Vector Add Method
void add(int x) {
int i; this
i = 0; 1 x
3 i
while (i < v.length)
[ ] = v[i]
v[i] [ ]+x;;
i = i+1;
}
Class
l Info
f
3 8 5 9
What does the compiler have to do
to make all of this work?
Compilation
p Tasks
• Determine Format of Objects
j and Arrays
y
• Determine Format of Call Stack
• Generate Code to Read Values
– this, parameters, locals, array elements, object fields
• G
Generate
t CCode
d to
t Evaluate
E l t Expressions
E i
• Generate Code to Write Values
• Generate Code for Control Constructs
Further Complication
p - Inheritance
Object Extension
Inheritance Example
p - Point Class
class p
point {
int c;
int getColor() { return(c); }
int distance() { return(0); }
}
Point Subclasses
class cartesianPoint extends point{
int x, y;
int distance() { return(x*x + y*y);
y y }
}
class ppolarPoint extends point
p {
int r, t;
int distance() { return(r
return(r*r);
r); }
int angle() { return(t); }
}
Implementing
p g Object
j Fields
• Each objject is a contigguous piece of memoryy
• Fields from inheritance hierarchy allocated
Class Info
c 2 polarPoint
l P i
r 1
t 2
Point Objects
j
Class Info
c 2 ppoint
Class Info
c 1 cartesianPoint
x 4
y 6
Class Info
c 4 polarPoint
l P i
r 1
t 3
Compilation
p Tasks
• Determine Object
j Format in Memoryy
– Fields from Parent Classes
– Fields from Current Class
• Generate Code for Methods
– Field,
Field Local Variable and Parameter Accesses
– Method Invocations
Symbol Tables - Key Concept in
C
Compilation
il ti
•• Compiler Uses Symbol Tables to Produce
– Object Layout in Memory
– Code to
• Access Object Fields
• Access
A L
Locall Variables
V i bl
• Access Parameters
• Invoke
k Methods
h d
Symbol Tables During Translation
F
From P
Parse T
Tree tto IR
IR
Descriiptors (i
(infformatiion ab
bout identifi
id
ifiers))
• Basic Operation: Lookup
– Given A String, find Descriptor
– Typical
yp Impplementation: Hash Table
• Examples
– Given a class name,
name find class descriptor
– Given variable name, find descriptor
• local descriptor, parameter descriptor, field descriptor
Hierarchy
y In Symbol
y Tables
• Hierarchy
y Comes From
– Nested Scopes - Local Scope Inside Field Scope
– Inheritance - Child Class Inside Parent Class
• Symbol Table Hierarchy Reflects These
Hierarchies
• Lookup Proceeds Up Hierarchy Until
Descriptor is Found
Hierarchy
y in vector add Method
Symbol Table for Fields
off vector Class
Cl
v descriptor for field v
Symbol Table for
Parameters of add
x ddescriptor
i t for
f parameter
t x
this descriptor for this
Symbol
y Table for
Locals of add
x ddescriptor
i t for
f parameter
t x
this descriptor for this
x ddescriptor
i t for
f parameter
t x
this descriptor for this
x ddescriptor
i t for
f parameter
t x
this descriptor for this
x ddescriptor
i t for
f parameter
t x
this descriptor for this
x ddescriptor
i t for
f parameter
t x
this descriptor for this
x ddescriptor
i t for
f parameter
t x
this descriptor for this
D i t
• Field, Parameter and Local Descrip
ptors Refer to
Type Descriptors
– Base typ
ype descripptor: int,,boolean
– Array type descriptor, which contains reference to
type descriptor for array elements
– Class descriptor
• Relatively Simple Type Descriptors
• Base Type Descriptors and Array Descriptors
Stored in Type Symbol Table
Example
p Type
yp Symbol
y Table
x parameter
t descriptor
d i t
this this descriptor
Method
descriptor llocall variable
i bl
for add symbol table
i local descriptor
x parameter descriptor
class descriptor add
for vector
thi
this this
hi descriptor
d i
i local descriptor
class descriptor
for vector
class_decl
v field descriptor
class descriptor
for vector
class_decl
v field descriptor
Method
descriptor
for add
class_decl
v field descriptor
x parameter descriptor
class descriptor add this this descriptor
for vector
Method
descriptor
for add
class_decl
v field descriptor
x parameter descriptor
class descriptor add this this descriptor
for vector
Method
i local descriptor
p
descriptor
for add
Representing Code in High
High-Level
Level
Intermediate Representation
Basic Idea
• Move towards assembly y language
g g
• Preserve high-level structure
– object format
– structured control flow
– distinction between parameters,
parameters locals and fields
• High-level abstractions of assembly language
– lloadd andd store nodes
d
– access abstract locals, parameters and fields, not
memory locations
l ti directly
di tl
Representing
p g Expressions
p
• Expression Trees Represent Expressions
– Internal Nodes - Operations like +, -, etc.
– Leaves - Load Nodes Represent Variable Accesses
• Load Nodes
– ldf node for field accesses - field descriptor
• (implicitly
(i li itl accesses this ld add
thi - could dd a reference
f t accessedd object)
to bj t)
– ldl node for local variable accesses - local descriptor
– ldp node for parameter accesses - parameter descriptor
– lda node for array accesses
• expression tree for array
• expression tree for index
Example
p
x*x + yy*yy +
* *
ld
lda ld
ldp
ldf ldl
parameter descriptor
for x in parameter symbol
local descriptor table of vector add
field descriptor
p for v
f i in
for i local
l l symbolb l
in field symbol table
table of vector add
for vector class
Special Case: Array Length
O
Operator
t
• len node represents
p length
g of arrayy
– expression tree for array
• Example: v.length
v length
len
ldf
ldf ldl +
lda ldp
< sta
+
ldl len ldf ldl lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
From Abstract Syntax
y Trees to
Intermediate Representation
while ((i < v.length)
g )
v[i] = v[i]+x;
field descrip
ptor for v local descrip
ptor for i parameter descripptor for x
ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
ldl
ldf
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
ldl len
ldf
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
ldl len
ldf
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
ldl len
ldf
ldf
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
ldl len
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
<
<
+
ldl len lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
+
ldl len ldf lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
<
+
ldl len ldf ldl lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
< sta
+
ldl len ldf ldl lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
while
< sta
+
ldl len ldf ldl lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
while ((i < v.length)
g )
v[i] = v[i]+x;
while
< sta
+
ldl len ldf ldl lda ldp
ldf
ldf ldl
field descriptor
p for v local descriptor
p for i p
parameter descriptor
p for x
Abbreviated Notation
while ((i < v.length)
g )
v[i] = v[i]+x;
while
< sta
+
ldl i len ldf v ldl i lda ldp x
ldf v
ldf v ldl i
From Abstract Syntax
y Trees to IR
• Recursively
y Traverse Abstract Syntax
y Tree
• Build Up Representation Bottom-Up Manner
– Look Up Variable Identifiers in Symbol Tables
– Build Load Nodes to Access Variables
– Build Expressions Out of Load Nodes and Operator
Nodes
– Build Store Nodes for Assignment Statements
– Combine Store Nodes with Flow of Control Nodes
Summary
High-Level Intermediate Representation
• Goal: represent program in an intuitive way that
supports future compilation tasks
• Representing
p g program
p g data
– Symbol tables
– Hierarchical organization
• Representing computation
– Expression trees
– Various types of load and store nodes
– Structured flow of control
• Traverse abstract syntax tree to build IR
Dynamic
y Dispatch
p
Which distance method is
if (x == 0) {
invoked?
p = new point();
} else if (x < 0) { • if p is a point
p = new cartesianPoint();
t i P i t() return(0)
} else if (x > 0) { • if p is a cartesianPoint
p = new polarPoint(); return(x*x
eu ( + y*y)
y y)
} • if p is a polarPoint
y = pp.distance();
(); return(r*r)
( )
• Invoked Method Depends
on Type of Receiver!
g Dynamic
Implementing
p y Dispatch
p
• Basic Mechanism: Method Table
method table for getColor method for point
point objects distance method for ppoint
method
th d descriptor
d i t
distance distance for distance
angle
g method descriptor
method descriptor for angle
for distance
Lookup
p In Method Symbol
y Tables
• Starts with method table of declared class of
receiver object
pg
p.getColor();
()
• finds getColor in point method symbol table
Static Versus Dynamic
y Lookupp
• Static lookup
p done at comppile time for type
yp
checking and code generation
• Dynamic lookup done when program runs to
dispatch method call
• Static and dynamic lookup results may differ!
differ!
getColor method
distance
for point
distance method distance angle
for polarPoint
method descriptor
angle method for method descriptor for angle
polarPoint
l i f di
for distance
class_decl
v field descriptor
x parameter descriptor
class descriptor add this this descriptor
for vector
Method
i local descriptor
p
descriptor
for add code for add method
Eliminating
g Parse Tree Construction
• Parser actions build symbol
y tables
– Reduce actions build tables in bottom-up fashion
– Actions correspond
p to activities that take pplace in
top-down fashion in parse tree traversal
• Eliminates intermediate construction of parse
tree - improves performance
• Also less code to write (but code may be harder
to write than if just traverse parse tree)
class vector { int v[]; void add(int x) { int i; ... }}
class symbol
table
class vector { int v[]; void add(int x) { int i; ... }}
field_decl
int v
class symbol
table
field descriptor
class vector { int v[]; void add(int x) { int i; ... }}
field_decl
i v
int param decl
param_decl
class symbol
table int x
field descriptor
parameter descriptor
class vector { int v[]; void add(int x) { int i; ... }}
field_decl
i v
int param decl var_decl
param_decl var decl
class symbol
table int x int i
field descriptor
parameter descriptor
local descriptor
p
class vector { int v[]; void add(int x) { int i; ... }}
statements
field_decl
i v
int param decl var_decl
param_decl var decl
class symbol
table int x int i
field descriptor
parameter descriptor
local descriptor
p
code for add method
class vector { int v[]; void add(int x) { int i; ... }}
statements
field_decl method_decl
i v
int add param_decl
param decl var_decl
var decl
class symbol
table int x int i
field descriptor
x parameter descriptor
this this descriptor
Method
i local descriptor
p
descriptor
for add code for add method
class vector { int v[]; void add(int x) { int i; ... }}
statements
field_decl method_decl
i v
int add param_decl
param decl var_decl
var decl
class symbol
table int x int i
v field descriptor
x parameter descriptor
class descriptor add this this descriptor
for vector
Method
i local descriptor
p
descriptor
for add code for add method
Nested Scopes
p
• So far, have seen several kinds of nesting
g
– Method symbol tables nested inside class symbol
tables
– Local symbol tables nesting inside method symbol
tables
• Nesting disambiguates potential name clashes
– Same name used for class field and local variable
– Name refers to local variable inside method
Nested Code Scopes
p
• Symbol
y tables can be nested arbitrarilyy deeply
py
with code nesting:
class bar {
Note: Name clashes
baz x;
with nesting can
int foo(int x) {
reflect programming
double x = 5.0;
error. Compilers
C il often
ft
{ float x = 10.0;
generate warning messages
{ int x = 1; ... x ...}
if it occurs.
... x ...
}
... x ...
}
What is a Parse Tree?
• Parse Tree Records Results of Parse
• External nodes are terminals/tokens
• Internal nodes are non-terminals
non terminals
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.