The Structure of A Compiler (1) The Structure of A Compiler
The Structure of A Compiler (1) The Structure of A Compiler
Slides modified from Louden Book, Dr. Scherger, & Y Chung Analysis of the source program (Used by all Phases of The Compiler)
(NTHU), and Fischer, Leblanc
Synthesis of a machine-language program Code
Generator
2 3
Target machine code
Source Code
Syntax Tree
Semantic
Literal This makes the compilation syntax-directed Don't need literal parse tree
Analyzer
Table
Semantic routines finish the analysis Intermediate nodes for precedence and associativity
Annotated
Symbol
e-rules
Tree Table Verify static semantics are followed
Just enough info to drive semantic processing
Source Code
Optimizer Error Variables declared, compatible operands (type and #), etc.
Handler
Or even recreate input
Intermediate
Code Semantic routines also start the synthesis
Code
Generator Generate either IR or target machine code
Target The semantic action is attached to the productions (or Semantic processing performed by traversing the tree 1
Code
Target Code
sub trees of a syntax tree). or more times
Optimizer
Attributes attached to nodes aid semantic processing
Target
Code
4 Chapter 1: Introduction January, 2010 5 Chapter 6:Semantic Analysis April, 2011
1
11/8/2012
7.1.1
Using a Syntax Tree Representation of a Parse (1)
parse tree Abstract Syntax Tree Abstract Syntax Tree
Parsing: <assign>
<factor> id
* Id * Id(I)
Semantic processing:
Const
build and decorate the Abstract Syntax Tree (AST) Const Id Const(3) Id(X)
Non-terminals used for ease of parsing
may be omitted in the abstract syntax tree. abstract syntax tree
:= Abstract syntax tree for Y:=3*X+I Abstract syntax tree for Y:=3*X+I with initial
id + values
id
7 2012/11/8 *
const id
7.1.1
Using a Syntax Tree Representation of a Parse (2)
Abstract Syntax Tree Abstract Syntax Tree Semantic routines traverse (post-order) the AST,
computing attributes of the nodes of AST.
Initially, attributes only at leaves :=(itof)
Attributes propagate during the static semantic Initially, only leaves (i.e. terminals, e.g. const, id) have
checking Id(Y)(f) +(i) attributes
Processing declarations to build symbol table
Find symbols in ST to get attributes to attach * (i) Id(I,i) Ex. Y := 3*X + I
Determining expression/operand types
:=
Declarations propagate top-down Const(3,i) Id(X,i) +
Expressions propagate bottom-up id(Y)
A tree is decorated after sufficient info for code Abstract syntax tree for Y:=3*X+I with * id(I)
generation has propagated. propagated values const(3) id(X)
12 2012/11/8
2
11/8/2012
7.1.1 7.1.1
Using a Syntax Tree Representation of a Parse (3) Using a Syntax Tree Representation of a Parse (4) Static Semantic Checks
The attributes are then propagated to other nodes using After attribute propagation is done, Static semantics can be checked at compile time
some functions, e.g. the tree is decorated and ready for code generation, Check only propagated attributes
build symbol table use another pass over the decorated AST to generate code. Type compatibility across assignment
Int B;
attach attributes of nodes B := 5.2; illegal
check types, etc. B := 3; legal
Actually, these can be combined in a single pass
Build the AST Use attributes and structure
Correct number and types of parameters
bottom-up / top-down propagation Decorate the AST
procedure foo(int a, float b, int c, float b);
int C;
<program>
Generate the target code float D;
<stmt> ‘‘‘ ‘‘ ‘‘
declaration
‘‘ call foo(C,D,3,2.9) legal
What we have described is essentially
:=
id +
‘‘‘‘‘ call foo(C,D,3.3, 2.9) illegal
exp.
call foo(1,2,3,4,5) illegal
symbol
table
* id ‘
type
‘‘‘check types: integer * or floating *
the Attribute Grammars(AG) (Details in chap.14)
const id ” Need to consult symbol table for types of id’s.
‘‘‘
13 2012/11/8 14 2012/11/8 15 Chapter 6:Semantic Analysis April, 2011
3
11/8/2012
7.1.2
Compiler Organization Alternatives (2) Compiler Organization Compiler Organization
We prefer the code generator completely hides machine one-pass with peephole optimization one-pass analysis and IR synthesis plus a code generation
details and semantic routines are independent of machines. Optimizer makes a pass over generated machine code, looking pass
at a small number of instructions at a time Adds flexibility
Can be violated to produce better code. Allows for simple code generation Explicit IR created & sent to code generator
Suppose there are several classes of registers, IR typically simple
each for a different purpose. Peephole: looking at only a few instructions at a time Optimization can examine as much of IR as wanted
Better for register allocation to be performed by semantic Effectively a separate pass Less machine-dependent analysis
routines than code generator since semantic routines have a Simple but effective So easier to retarget
broader view of the AST. Simplifies code generator since there is a pass of post-
processing.
19 2012/11/8
7.1.2 7.1.3
Compiler Organization Compiler Organization Alternatives (7) Single Pass (1)
Multi-pass analysis
Scan, then parse, then check declarations, then static semantics
Multi-language and multi-target compilers In Micro of chap 2, scanning, parsing and semantic processing
Usually used to save space (memory usage or compiler) Components may be shared and parameterized. are interleaved in a single pass.
Multi-pass synthesis Ex : Ada uses Diana (language-dependent IR) (+) simple front-end
Separate out machine dependence Ex : GCC uses two IRs. (+) less storage if no explicit trees
Better optimization
one is high-level tree-oriented (-) immediately available information is limited since no complete
Generate IR
the other(RTL) is more machine-oriented tree is built.
Do machine independent optimization
Generate machine code .....
FORTRAN PASCAL ADA C
Machine dependent optimization
Relationships
semantic
Many complicated optimization and code generation algorithms require multiple passes
i.e. optimizations that need a more global view
..... rtn 1
call call
machine-independent optimization
scanner parser semantic semantic
for I = 1 to N rtn 2 records
tokens
foo = 35*bar(i)+16;
bar(i) { return 3;}; semantic
rtn k
SUN PC main-frame
23 2012/11/8 24 2012/11/8
language - and machine-independent IRs
4
11/8/2012
7.1.3 7.1.3
Single Pass (2) Single Pass (3)
Chapter 6 - Semantic Analysis
Each terminal and non-terminal has a semantic record. 1 pass = 1 post-order traversal of the parse tree Parser verifies that a program is
Semantic records may be considered parsing actions -- build parse trees syntactically correct and constructs a
as the attributes of the terminals and non-terminals. semantic actions -- post-order traversal syntax tree (or other intermediate
Terminals
representation).
the semantic records are created by the scanner. <assign>
1
<assign> ID:=<exp>
<term> const (1)
language requirements (is
Semanticrecords are transmitted + +
<exp> id (B) “meaningful”) and collects and
ex. A B C D #SR B B B
among semantic routines A A A A A computes information needed for code
via a semantic stack. A gencode(+,B,1,tmp1)
generation.
gencode(:=,A,tmp1)
Fall, 2002 CS 153 - Chapter 6 27
25 2012/11/8 26 2012/11/8
B C D #SR
5
11/8/2012
6
11/8/2012
7
11/8/2012
Attributes need not be kept in the New traversal code: Even better, use a parameter
syntax tree: typekind dtype; /* global */ instead of a global variable:
void evalType (SyntaxTree t) void evalDecl(SyntaxTree t)
{ switch (t->kind) { evalType(t->rchild, t->lchild->dtype);
GRAMMAR RULE SEMANTIC RULES
{ case decl: }
decl type var-list
dtype = t->lchild->dtype; void evalType(SyntaxTree t, typekind dtype)
type int dtype = integer
type float dtype = real evalType(t->rchild); { insert(t->name,dtype);
var-list1 id , var-list2 insert(id .name, dtype) break; if (t->sibling != NULL)
var-list id insert(id .name, dtype) case id: evalType(t->sibling,dtype);
insert(t->name,dtype); }
dtype is global if (t->sibling != NULL)
evalType(t->sibling);
Note: inherited attributes can often be turned into
Use a symbol table
to store the type of break; parameters to recursive traversal functions, while
each identifier } /* end switch */ synthesized attributes can be turned into returned
} /* end evalType */ values.
Fall, 2002 CS 153 - Chapter 6 43 Fall, 2002 CS 153 - Chapter 6 44 Fall, 2002 CS 153 - Chapter 6 45
8
11/8/2012
Solution: Introduce a new non-terminal. enum kind tag; The solution at next page…….
union {
<prog><head> begin <stmt> end op_rec_type OP_REC;
exp_rec_type EXP_REC;
<head>#start
stmt_rec_type STMT_REC;
......
}
YACC automatically performs such transformations. } sem_rec_type;
52 2012/11/8 53 2012/11/8 54 2012/11/8
9
11/8/2012
10
11/8/2012
11
11/8/2012
12
11/8/2012
13
11/8/2012
14
11/8/2012
Sample hash function code: Easy way to get O(1) behavior when
Some structure similar to the previous
exiting a scope: use a linked list (or tree
slide is actually required in C++, Ada, and
#define SIZE 211 // typically a prime number or…) of hash tables, one hash table for
other languages where scopes can be
#define SHIFT 4 each scope:
arbitrarily re-entered (C++ has the scope
int hash ( char * key ) resolution operator ::), since individual
{ int temp = 0;
int i = 0;
scopes must be attached to names,
while (key[i] != '\0') > i (char) > i (int) allowing them to be “called”:
{ temp = ((temp << SHIFT) + key[i]) % SIZE; > j (char *) > size (int) > j (int)
++i; class A { void f(); }
}
> temp (char) ...
return temp;
} > f (function) void A::f() // go back inside A
{ ... }
Fall, 2002 CS 153 - Chapter 6 - Part 2 85 Fall, 2002 CS 153 - Chapter 6 - Part 2 86 Fall, 2002 CS 153 - Chapter 6 - Part 2 87
Two additional scope issues (of One more scope issue: dynamic
Called “dynamic scope” (vs. the
many): scope more usual lexical or static scope).
Some languages use a run-time A questionable design choice for any
Recursion: insertion into table must occur
before processing is complete: version of scope that does not follow but the most dynamic, interpreted
// lookup of f in body must work: the layout of the program on the languages, since there can then be
void f() { … f() … } page, but the execution path: LISP, no static semantic analysis (no static
Relaxation of declaration before use rule perl. type checking, for example)
(C++ and Java class scopes): Symbol table then must be part of Running the symbol table during
all insertions must occur before all
runtime system, providing lookup of execution also slows down execution
lookups (two passes required):
class A names during execution (it better be speed substantially
{ int f() { return x; } int x; } really fast in this case).
Fall, 2002 CS 153 - Chapter 6 - Part 2 88 Fall, 2002 CS 153 - Chapter 6 - Part 2 89 Fall, 2002 CS 153 - Chapter 6 - Part 2 90
15
11/8/2012
Sample TINY code building the C-Minus Symbol Table Sample C-Minus symtab.h:
symbol table: Use basic structure of TINY
/* Start a new scope; return 0 if malloc fails,
else 1 */
caseAssignK: Store tree pointers int st_enterScope(void);
caseReadK:
if(st_lookup(t->attr.name) == -1) Add enterScope() and exitScope() /* Remove all declarations in the current scope */
/*not yet in table, so treat as new definition */ List of tables structure helpful (slide 15) void st_exitScope(void);
st_insert(t->attr.name,t->lineno,location++);
Add nesting level to tree nodes
else /* Insert def nodes from the syntax tree
/* already in table, so ignore location, Add pointer to declaration in all ID nodes return 0 if malloc fails, else 1 */
add line number of use only */ (found by lookup) int st_insert( TreePtr );
st_insert(t->attr.name,t->lineno,0);
Use best ADT methods (hide all details of
break; /* Return the defnode of a variable, parameter, or
actual symtab structure) function, or NULL if not found */
TreePtr st_lookup ( char * name );
Fall, 2002 CS 153 - Chapter 6 - Part 2 94 Fall, 2002 CS 153 - Chapter 6 - Part 2 95 Fall, 2002 CS 153 - Chapter 6 - Part 2 96
16
11/8/2012
17
11/8/2012
18
11/8/2012
19
11/8/2012
Fall, 2002 CS 153 - Chapter 6 - Part 2 118 Fall, 2002 CS 153 - Chapter 6 - Part 2 119
20
11/8/2012
{ /* Position 3 */ d +
Lookup of c after position 1 produces Symbol Table at Position 2: Lookups of a, b, c , and d after position
the following tree with link: 2 produces the following tree with links:
d1 d2 d1 d2
a b a b
d3 d7 d3 d7
c main main
nestLevel 3 nestLevel 2 nestLevel 1 nestLevel 0 c
d4 d5 d4
block d5 block
a c block a c
input block
call call
d6 d6
= d =
d
d + d +
11/11/02 K. Louden, CS 153, Fall 2002 124 11/11/02 K. Louden, CS 153, Fall 2002 125 11/11/02 K. Louden, CS 153, Fall 2002 126
21
11/8/2012
d3 d7
main
c
nestLevel 2 nestLevel 1 nestLevel 0
d4
d5 block
a c block
input call
main d7
a d1 if return output call
b d2
c block 0 c b a
c d3 output
d6
d =
d + output
subs subs
a c b c
11/11/02 K. Louden, CS 153, Fall 2002 127 11/11/02 K. Louden, CS 153, Fall 2002 128
22