Lecture 25
Lecture 25
Construction
Evaluation Methods
Dynamic methods
Build the parse tree
Build the dependence graph
Topological sort the graph
Define attributes in topological
order
2
Evaluation Methods
Dynamic methods
Build the parse tree
Build the dependence graph
Topological sort the graph
Define attributes in topological
order
3
Evaluation Methods
Dynamic methods
Build the parse tree
Build the dependence graph
Topological sort the graph
Define attributes in topological
order
4
Evaluation Methods
Dynamic methods
Build the parse tree
Build the dependence graph
Topological sort the graph
Define attributes in topological
order
5
Evaluation Methods
Dynamic methods
Build the parse tree
Build the dependence graph
Topological sort the graph
Define attributes in topological
order
6
Evaluation Methods
Rule-based (treewalk)
Analyze attribute rules at
compiler-generation time
Determine a fixed (static)
ordering
Evaluate nodes in that order
7
Evaluation Methods
Oblivious (passes, dataflow)
Ignore rules and parse tree
Pick a convenient order (at
design time) and use it
8
Problems
Attribute grammars have
not achieved widespread
use due to a myraid of
problems
9
Problems
non-local computation
traversing parse tree
storage management for
short-lived attributes
lack of high-quality
inexpensive tools
10
Problems
non-local computation
traversing parse tree
storage management for
short-lived attributes
lack of high-quality
inexpensive tools
11
Problems
non-local computation
traversing parse tree
storage management for
short-lived attributes
lack of high-quality
inexpensive tools
12
Problems
non-local computation
traversing parse tree
storage management for
short-lived attributes
lack of high-quality
inexpensive tools
13
Ad-Hoc Analysis
In rule-based evaluators, a
sequence of actions are
associated with grammar
productions
14
Ad-Hoc Analysis
Organizing actions required
for context-sensitive analysis
around structure of the
grammar leads to powerful,
albeit ad-hoc, approach which
is used on most parsers
15
Ad-Hoc Analysis
A snippet of code (action) is
associated with each
production that executes at
parse time
In top-down parsers, the
snippet is added to the
appropriate parsing routine
16
Ad-Hoc Analysis
A snippet of code (action) is
associated with each
production that executes at
parse time
In top-down parsers, the
snippet is added to the
appropriate parsing routine
17
Ad-Hoc Analysis
In a bottom-up shift-reduce
parsers, the actions are
performed each time the
parser performs a reduction.
18
LR(1) Skeleton Parser
stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
else if( Action[s,token] == “shift i”){
stack.push(token); stack.push(i);
token = scanner.next();
}
19
LR(1) Skeleton Parser
stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
else if( Action[s,token] == “shift i”){
stack.push(token); stack.push(i);
token = scanner.next();
}
20
LR(1) Skeleton Parser
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
invoke the code snippet
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
21
Productions Code snippet
Sign → + Sign.val ← 1
Sign → – Sign.val ← –1
List → Bit List.val ← Bit.val
22
Implementing Ad-Hoc Scheme
23
Implementing Ad-Hoc Scheme
We will adopt notation used
by YACC for snippets and
passing values
24
Implementing Ad-Hoc Scheme
Recall that the skeleton LR(1)
parser stored two values on
the stack symbol,state
We can replace this with
triples value,symbol,state
25
Implementing Ad-Hoc Scheme
Recall that the skeleton LR(1)
parser stored two values on
the stack symbol,state
We can replace this with
triples value,symbol,state
26
Implementing Ad-Hoc Scheme
On a reduction by A → , the
parser pops 3|| items from
the stack rather than 2||
It pushes value along with the
symbol
27
Implementing Ad-Hoc Scheme
On a reduction by A → , the
parser pops 3|| items from
the stack rather than 2||
It pushes value along with the
symbol
28
YACC file for a calculator
%token NUMBER LPAREN RPAREN
%token PLUS MINUS TIMES DIVIDE
%%
expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| LPAREN expr RPAREN
| MINUS expr
| NUMBER
;
%%
29
%{
#include<iostream>
%}
%union {int val;}
%token NUMBER LPAREN RPAREN EQUAL
%token PLUS MINUS TIMES DIVIDE
/* associativity and precedence:
in order of increasing precedence */
%nonassoc EQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%left UMINUS /* dummy token to use as
precedence marker */
%type <val> NUMBER expr
%%
30
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
31
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
32
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
33
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c
34
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c
35
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c
36
struct v
struct rec {
int x;
double y; x (4 bytes)
char c;
} v;
y (8 bytes)
c (1 byte) 37
union v
union urec { c (1 byte)
int x;
double y; x (4 bytes)
char c;
} v; y 8
(8bytes
bytes)
38
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v;
short int whichOne;
v.x = 10; whichOne = 1;
v.y = 25.4; whichOne = 2;
v.c = ‘a’; whichOne = 3;
39
1 .3+24 shift
2 3.+24 reduce(r8) Eval=11
3 E.+24 shift
4 E+.24 shift
5 E+2.4 reduce(r8) Eval=3 Eval=8
6 E+E.4 shift
7 E+E.4 shift
8 E+E4. reduce(r8) Eval=2 Eval=4
9 E+EE. reduce(r4)
10 E+E. reduce(r2) 3 + 2 4
11 E. accept
40
Intermediate
Representations
(IR)
41
IR
Compilers are organized as a
series of passes
This creates the need for an
intermediate representation
for the code being compiled
42
IR
Compilers are organized as a
series of passes
This creates the need for an
intermediate representation
for the code being compiled
43
IR
Compilers use some internal
form– an IR –to represent the
code being analysed and
translated
Many compilers use more
than one IR during the course
of compilation
44
IR
Compilers use some internal
form– an IR –to represent the
code being analysed and
translated
Many compilers use more
than one IR during the course
of compilation
45
IR
The IR must be expressive
enough to record all of the
useful facts that might be
passed between passes of
the compiler
46
IR
During translation, the
compiler derives facts that
have no representation in the
source code
For example, the addresses
of variables and procedures
47
IR
During translation, the
compiler derives facts that
have no representation in the
source code
For example, the addresses
of variables and procedures
48
IR
Typically, the compiler
augments the IR with a set of
tables that record additional
information
These tables are considered
part of the IR
49
IR
Typically, the compiler
augments the IR with a set of
tables that record additional
information
These tables are considered
part of the IR
50
IR
Selecting an appropriate IR
for a compiler project requires
an understanding of both the
source language and the
target machines and the
properties of the programs to
be compiled
51
IR
Thus, a source-to-source
translator might keep its
internal information in a form
quite to close to the source
52
IR
In constrast, a compiler that
produces assembly code
might use a form close to the
target machine’s instruction
set
53
IR Taxonomy
IRs fall into three organizational
categories
1. Graphical IRs encode the
compiler’s knowledge in a
graph.
54
IR Taxonomy
2. Linear IRs resemble
pseudocode for some
abstract machine
3. Hybrid IRs combine elements
of both graphical (structural)
and linear IRs
55
IR Taxonomy
2. Linear IRs resemble
pseudocode for some
abstract machine
3. Hybrid IRs combine elements
of both graphical (structural)
and linear IRs
56
Graphical IRs
Parse trees are graphs that
represent source-code form of
the program
The structure of the tree
corresponds to the syntax of
the source code
57
Graphical IRs
Parse trees are graphs that
represent source-code form of
the program
The structure of the tree
corresponds to the syntax of
the source code
58
Graphical IRs
Parse trees are used primarily
in discussion of parsing and in
attribute grammar systems
where they are the primary IR
In most other applications,
compilers use one of the more
concise alternatives
59
Graphical IRs
Parse trees are used primarily
in discussion of parsing and in
attribute grammar systems
where they are the primary IR
In most other applications,
compilers use one of the more
concise alternatives
60
Graphical IRs
Abstract Syntax Trees (AST)
retains the essential structure
of the parse tree but
eliminates extraneous nodes
61
Graphical IRs
AST: a = b*-c + b*-c
=
a +
* *
b - b -
c c
62
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing
63
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing
64
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing
65
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing
66
Graphical IRs
AST is more concise than a
parse tree
It faithfully retains the structure of
the original source code
Consider the AST for
x*2+x*2*y
67
Graphical IRs
AST is more concise than a
parse tree
It faithfully retains the structure of
the original source code
Consider the AST for
x*2+x*2*y
68
Graphical IRs
AST is more concise than a
parse tree
It faithfully retains the structure of
the original source code
Consider the AST for
x*2+x*2*y
69
Graphical IRs
+
* *
x 2 * y
x 2
*
* y
x 2
*
* y
x 2
73
Graphical IRs
The task of building AST fits
neatly into an ad hoc-syntax-
directed translation scheme
Assume that the compiler has
routines mknode and mkleaf for
creating tree nodes
74
Production Semantic Rule
E → E1 + E2 E.nptr =
mknode(‘+’, E1.nptr, E2.nptr)
E → E1 E2 E.nptr =
mknode(‘’, E1.nptr, E2.nptr)
E → – E1 E.nptr =
mknode(‘–’, E1.nptr)
E → ( E1 ) E.nptr = E1.nptr
E → num E.nptr = mkleaf(‘num’, num.val)
75
Production Semantic Rule (yacc)
E → E1 + E2 $$.nptr =
mknode(‘+’, $1.nptr, $3.nptr)
E → E1 E2 $$.nptr =
mknode(‘’, $1.nptr, $3.nptr)
E → – E1 $$.nptr =
mknode(‘–’, $1.nptr)
E → ( E1 ) $$.nptr = $1.nptr
E → num $$.nptr = mkleaf(‘num’, $1.val)
76
Intermediate Languages
We will use another IR, called
three-address code, for actual
code generation
The semantic rules for
generating three-address code
for common programming
languages constructs are similar
to those for AST.
77
Intermediate Languages
We will use another IR, called
three-address code, for actual
code generation
The semantic rules for
generating three-address code
for common programming
languages constructs are similar
to those for AST.
78
Linear IRs
The alternative to graphical IR is
a linear IR
An assembly-language program
is a form of linear code
It consists of a sequence of
instructions that execute in order
of appearence
79
Linear IRs
The alternative to graphical IR is
a linear IR
An assembly-language program
is a form of linear code
It consists of a sequence of
instructions that execute in order
of appearence
80
Linear IRs
The alternative to graphical IR is
a linear IR
An assembly-language program
is a form of linear code
It consists of a sequence of
instructions that execute in order
of appearence
81
Linear IRs
Two linear IRs used in modern
compilers are
• stack-machine code
• three-address code
82
Linear IRs
Linear IR for x – 2 y
stack-machine three-address
push 2 t1 ← 2
push y t2 ← y
multiply t3 ← t 1 t2
push x t4 ← x
subtract t5 ← t4 – t1 83
Linear IRs
Linear IR for x – 2 y
stack-machine three-address
push 2 t1 ← 2
push y t2 ← y
multiply t3 ← t 1 t2
push x t4 ← x
subtract t5 ← t4 – t1 84
Stack-Machine Code
Stack-machine code is
sometimes called one-address
code
It assumes the presence of an
operand stack
85
Stack-Machine Code
Most operations take their
operands from the stack and
push results back onto the stack
Stack-machine code is compact;
eliminates many names from IR
This shrinks the program in IR
form
86
Stack-Machine Code
Most operations take their
operands from the stack and
push results back onto the stack
Stack-machine code is compact;
eliminates many names from IR
This shrinks the program in IR
form
87
Stack-Machine Code
All results and arguments are
transitory unless explicitly moved
to memory
Stack-machine code is simple to
generate and execute
88
Stack-Machine Code
All results and arguments are
transitory unless explicitly moved
to memory
Stack-machine code is simple to
generate and execute
89
Stack-Machine Code
Smalltalk-80 and Java use
bytecodes which are abstract
stack-machine code
The bytecode is either
interpreted or translated into
target machine code (JIT)
90
Stack-Machine Code
Smalltalk-80 and Java use
bytecodes which are abstract
stack-machine code
The bytecode is either
interpreted or translated into
target machine code (JIT)
91
Three-Address Code
Three-address code most
operations have the form
x ← y op z
with an operator (op), two
operands (y and z) and
one result (x)
92
Three-Address Code
Some operators, such as an
immediate load and a jump,
will need fewer arguments
93