0% found this document useful (0 votes)
45 views93 pages

Lecture 25

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views93 pages

Lecture 25

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 93

Compiler

Construction
Evaluation Methods
Dynamic methods
 Build the parse tree
 Build the dependence graph
 Topological sort the graph
 Define attributes in topological
order
2
Evaluation Methods
Dynamic methods
 Build the parse tree
 Build the dependence graph
 Topological sort the graph
 Define attributes in topological
order
3
Evaluation Methods
Dynamic methods
 Build the parse tree
 Build the dependence graph
 Topological sort the graph
 Define attributes in topological
order
4
Evaluation Methods
Dynamic methods
 Build the parse tree
 Build the dependence graph
 Topological sort the graph
 Define attributes in topological
order
5
Evaluation Methods
Dynamic methods
 Build the parse tree
 Build the dependence graph
 Topological sort the graph
 Define attributes in topological
order
6
Evaluation Methods
Rule-based (treewalk)
 Analyze attribute rules at
compiler-generation time
 Determine a fixed (static)
ordering
 Evaluate nodes in that order
7
Evaluation Methods
Oblivious (passes, dataflow)
 Ignore rules and parse tree
 Pick a convenient order (at
design time) and use it

8
Problems
Attribute grammars have
not achieved widespread
use due to a myraid of
problems

9
Problems
 non-local computation
 traversing parse tree
 storage management for
short-lived attributes
 lack of high-quality
inexpensive tools
10
Problems
 non-local computation
 traversing parse tree
 storage management for
short-lived attributes
 lack of high-quality
inexpensive tools
11
Problems
 non-local computation
 traversing parse tree
 storage management for
short-lived attributes
 lack of high-quality
inexpensive tools
12
Problems
 non-local computation
 traversing parse tree
 storage management for
short-lived attributes
 lack of high-quality
inexpensive tools
13
Ad-Hoc Analysis
 In rule-based evaluators, a
sequence of actions are
associated with grammar
productions

14
Ad-Hoc Analysis
 Organizing actions required
for context-sensitive analysis
around structure of the
grammar leads to powerful,
albeit ad-hoc, approach which
is used on most parsers

15
Ad-Hoc Analysis
 A snippet of code (action) is
associated with each
production that executes at
parse time
 In top-down parsers, the
snippet is added to the
appropriate parsing routine
16
Ad-Hoc Analysis
 A snippet of code (action) is
associated with each
production that executes at
parse time
 In top-down parsers, the
snippet is added to the
appropriate parsing routine
17
Ad-Hoc Analysis
 In a bottom-up shift-reduce
parsers, the actions are
performed each time the
parser performs a reduction.

18
LR(1) Skeleton Parser
stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
else if( Action[s,token] == “shift i”){
stack.push(token); stack.push(i);
token = scanner.next();
}
19
LR(1) Skeleton Parser
stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
else if( Action[s,token] == “shift i”){
stack.push(token); stack.push(i);
token = scanner.next();
}
20
LR(1) Skeleton Parser
if( Action[s,token] == “reduce A→”) {
stack.pop(2||);
invoke the code snippet
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}

21
Productions Code snippet

Number → Sign List Number.val ← – Sign.val  List.val

Sign → + Sign.val ← 1
Sign → – Sign.val ← –1
List → Bit List.val ← Bit.val

List0 → List1 Bit List0.val ← 2List1.val + Bit.val


Bit → 0 Bit.val ← 0
Bit → 1 Bit.val ← 1

22
Implementing Ad-Hoc Scheme

 Parser needs a mechanism to


pass values of attributes from
definitions in one snippet to
uses in another

23
Implementing Ad-Hoc Scheme
 We will adopt notation used
by YACC for snippets and
passing values

24
Implementing Ad-Hoc Scheme
 Recall that the skeleton LR(1)
parser stored two values on
the stack symbol,state
 We can replace this with
triples value,symbol,state

25
Implementing Ad-Hoc Scheme
 Recall that the skeleton LR(1)
parser stored two values on
the stack symbol,state
 We can replace this with
triples value,symbol,state

26
Implementing Ad-Hoc Scheme
 On a reduction by A → , the
parser pops 3|| items from
the stack rather than 2||
 It pushes value along with the
symbol

27
Implementing Ad-Hoc Scheme
 On a reduction by A → , the
parser pops 3|| items from
the stack rather than 2||
 It pushes value along with the
symbol

28
YACC file for a calculator
%token NUMBER LPAREN RPAREN
%token PLUS MINUS TIMES DIVIDE
%%
expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| LPAREN expr RPAREN
| MINUS expr
| NUMBER
;
%%
29
%{
#include<iostream>
%}
%union {int val;}
%token NUMBER LPAREN RPAREN EQUAL
%token PLUS MINUS TIMES DIVIDE
/* associativity and precedence:
in order of increasing precedence */
%nonassoc EQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%left UMINUS /* dummy token to use as
precedence marker */
%type <val> NUMBER expr
%%
30
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
31
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
32
struct and union
struct rec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 13 bytes,
sizeof w: 13 bytes
v.x = 10; v.y = 0.345; v.c = ‘#’;
w.x = 20; w.y = 24.05; w.c = ‘$’;
33
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c

34
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c

35
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v, w;
sizeof v: 8 bytes,
sizeof w: 8 bytes
assign to one of the 3 fields
v.x, v.y, or v.c

36
struct v
struct rec {
int x;
double y; x (4 bytes)
char c;
} v;

y (8 bytes)

c (1 byte) 37
union v
union urec { c (1 byte)
int x;
double y; x (4 bytes)
char c;
} v; y 8
(8bytes
bytes)

38
struct and union
union urec {
int x; // 4 bytes
double y; // 8 bytes
char c; // 1 byte
} v;
short int whichOne;
v.x = 10; whichOne = 1;
v.y = 25.4; whichOne = 2;
v.c = ‘a’; whichOne = 3;
39
1 .3+24 shift
2 3.+24 reduce(r8) Eval=11
3 E.+24 shift
4 E+.24 shift
5 E+2.4 reduce(r8) Eval=3 Eval=8
6 E+E.4 shift
7 E+E.4 shift
8 E+E4. reduce(r8) Eval=2 Eval=4
9 E+EE. reduce(r4)
10 E+E. reduce(r2) 3 + 2  4
11 E. accept

40
Intermediate
Representations
(IR)

41
IR
 Compilers are organized as a
series of passes
 This creates the need for an
intermediate representation
for the code being compiled

42
IR
 Compilers are organized as a
series of passes
 This creates the need for an
intermediate representation
for the code being compiled

43
IR
 Compilers use some internal
form– an IR –to represent the
code being analysed and
translated
 Many compilers use more
than one IR during the course
of compilation
44
IR
 Compilers use some internal
form– an IR –to represent the
code being analysed and
translated
 Many compilers use more
than one IR during the course
of compilation
45
IR
 The IR must be expressive
enough to record all of the
useful facts that might be
passed between passes of
the compiler

46
IR
 During translation, the
compiler derives facts that
have no representation in the
source code
 For example, the addresses
of variables and procedures
47
IR
 During translation, the
compiler derives facts that
have no representation in the
source code
 For example, the addresses
of variables and procedures
48
IR
 Typically, the compiler
augments the IR with a set of
tables that record additional
information
 These tables are considered
part of the IR
49
IR
 Typically, the compiler
augments the IR with a set of
tables that record additional
information
 These tables are considered
part of the IR
50
IR
 Selecting an appropriate IR
for a compiler project requires
an understanding of both the
source language and the
target machines and the
properties of the programs to
be compiled
51
IR
 Thus, a source-to-source
translator might keep its
internal information in a form
quite to close to the source

52
IR
 In constrast, a compiler that
produces assembly code
might use a form close to the
target machine’s instruction
set

53
IR Taxonomy
IRs fall into three organizational
categories
1. Graphical IRs encode the
compiler’s knowledge in a
graph.

54
IR Taxonomy
2. Linear IRs resemble
pseudocode for some
abstract machine
3. Hybrid IRs combine elements
of both graphical (structural)
and linear IRs
55
IR Taxonomy
2. Linear IRs resemble
pseudocode for some
abstract machine
3. Hybrid IRs combine elements
of both graphical (structural)
and linear IRs
56
Graphical IRs
 Parse trees are graphs that
represent source-code form of
the program
 The structure of the tree
corresponds to the syntax of
the source code

57
Graphical IRs
 Parse trees are graphs that
represent source-code form of
the program
 The structure of the tree
corresponds to the syntax of
the source code

58
Graphical IRs
 Parse trees are used primarily
in discussion of parsing and in
attribute grammar systems
where they are the primary IR
 In most other applications,
compilers use one of the more
concise alternatives
59
Graphical IRs
 Parse trees are used primarily
in discussion of parsing and in
attribute grammar systems
where they are the primary IR
 In most other applications,
compilers use one of the more
concise alternatives
60
Graphical IRs
 Abstract Syntax Trees (AST)
retains the essential structure
of the parse tree but
eliminates extraneous nodes

61
Graphical IRs
AST: a = b*-c + b*-c
=
a +

* *
b - b -

c c

62
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing

63
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing

64
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing

65
Graphical IRs
ASTs have been used in many
practical compiler systems
• Source-to-source systems
• automatic parallelization tools
• pretty-printing

66
Graphical IRs
 AST is more concise than a
parse tree
 It faithfully retains the structure of
the original source code
 Consider the AST for
x*2+x*2*y

67
Graphical IRs
 AST is more concise than a
parse tree
 It faithfully retains the structure of
the original source code
 Consider the AST for
x*2+x*2*y

68
Graphical IRs
 AST is more concise than a
parse tree
 It faithfully retains the structure of
the original source code
 Consider the AST for
x*2+x*2*y

69
Graphical IRs
+

* *
x 2 * y
x 2

AST contains two distinct copies of


x*2
70
Graphical IRs
+

*
* y
x 2

A directed acyclic graph (DAG) is a


contraction of the AST that avoids
duplication
71
Graphical IRs
+

*
* y
x 2

If the value of x does not change between


uses of x*2, the compiler can generate code
that evaluates the subtree once and uses the
result twice
72
Graphical IRs
 The task of building AST fits
neatly into an ad hoc-syntax-
directed translation scheme
 Assume that the compiler has
routines mknode and mkleaf for
creating tree nodes

73
Graphical IRs
 The task of building AST fits
neatly into an ad hoc-syntax-
directed translation scheme
 Assume that the compiler has
routines mknode and mkleaf for
creating tree nodes

74
Production Semantic Rule
E → E1 + E2 E.nptr =
mknode(‘+’, E1.nptr, E2.nptr)
E → E1  E2 E.nptr =
mknode(‘’, E1.nptr, E2.nptr)

E → – E1 E.nptr =
mknode(‘–’, E1.nptr)

E → ( E1 ) E.nptr = E1.nptr
E → num E.nptr = mkleaf(‘num’, num.val)
75
Production Semantic Rule (yacc)
E → E1 + E2 $$.nptr =
mknode(‘+’, $1.nptr, $3.nptr)
E → E1  E2 $$.nptr =
mknode(‘’, $1.nptr, $3.nptr)

E → – E1 $$.nptr =
mknode(‘–’, $1.nptr)

E → ( E1 ) $$.nptr = $1.nptr
E → num $$.nptr = mkleaf(‘num’, $1.val)
76
Intermediate Languages
 We will use another IR, called
three-address code, for actual
code generation
 The semantic rules for
generating three-address code
for common programming
languages constructs are similar
to those for AST.
77
Intermediate Languages
 We will use another IR, called
three-address code, for actual
code generation
 The semantic rules for
generating three-address code
for common programming
languages constructs are similar
to those for AST.
78
Linear IRs
 The alternative to graphical IR is
a linear IR
 An assembly-language program
is a form of linear code
 It consists of a sequence of
instructions that execute in order
of appearence
79
Linear IRs
 The alternative to graphical IR is
a linear IR
 An assembly-language program
is a form of linear code
 It consists of a sequence of
instructions that execute in order
of appearence
80
Linear IRs
 The alternative to graphical IR is
a linear IR
 An assembly-language program
is a form of linear code
 It consists of a sequence of
instructions that execute in order
of appearence
81
Linear IRs
Two linear IRs used in modern
compilers are
• stack-machine code
• three-address code

82
Linear IRs
Linear IR for x – 2  y
stack-machine three-address
push 2 t1 ← 2
push y t2 ← y
multiply t3 ← t 1  t2
push x t4 ← x
subtract t5 ← t4 – t1 83
Linear IRs
Linear IR for x – 2  y
stack-machine three-address
push 2 t1 ← 2
push y t2 ← y
multiply t3 ← t 1  t2
push x t4 ← x
subtract t5 ← t4 – t1 84
Stack-Machine Code
 Stack-machine code is
sometimes called one-address
code
 It assumes the presence of an
operand stack

85
Stack-Machine Code
 Most operations take their
operands from the stack and
push results back onto the stack
 Stack-machine code is compact;
eliminates many names from IR
 This shrinks the program in IR
form
86
Stack-Machine Code
 Most operations take their
operands from the stack and
push results back onto the stack
 Stack-machine code is compact;
eliminates many names from IR
 This shrinks the program in IR
form
87
Stack-Machine Code
 All results and arguments are
transitory unless explicitly moved
to memory
 Stack-machine code is simple to
generate and execute

88
Stack-Machine Code
 All results and arguments are
transitory unless explicitly moved
to memory
 Stack-machine code is simple to
generate and execute

89
Stack-Machine Code
 Smalltalk-80 and Java use
bytecodes which are abstract
stack-machine code
 The bytecode is either
interpreted or translated into
target machine code (JIT)

90
Stack-Machine Code
 Smalltalk-80 and Java use
bytecodes which are abstract
stack-machine code
 The bytecode is either
interpreted or translated into
target machine code (JIT)

91
Three-Address Code
Three-address code most
operations have the form
x ← y op z
with an operator (op), two
operands (y and z) and
one result (x)

92
Three-Address Code
 Some operators, such as an
immediate load and a jump,
will need fewer arguments

93

You might also like