0% found this document useful (0 votes)
66 views22 pages

The Structure of A Compiler (1) The Structure of A Compiler

The document discusses the structure and stages of a compiler. It summarizes: 1) A compiler must perform analysis of the source program and synthesis of a machine-language program. 2) Semantic processing builds an abstract syntax tree representing the input program and performs semantic analysis by traversing the tree. 3) Semantic routines attached to syntax tree productions interpret meaning based on structure and verify semantics, generating intermediate representation or target code.

Uploaded by

lizeth cruz dl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views22 pages

The Structure of A Compiler (1) The Structure of A Compiler

The document discusses the structure and stages of a compiler. It summarizes: 1) A compiler must perform analysis of the source program and synthesis of a machine-language program. 2) Semantic processing builds an abstract syntax tree representing the input program and performs semantic analysis by traversing the tree. 3) Semantic routines attached to syntax tree productions interpret meaning based on structure and verify semantics, generating intermediate representation or target code.

Uploaded by

lizeth cruz dl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

11/8/2012

The Structure of a Compiler (1) The Structure of a Compiler (2)

 Any compiler must perform two major tasks Source


Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Compiler
Intermediate
Representation

Compiler Design and Construction Analysis Synthesis Symbol and Optimizer


Attribute
Semantic Analysis Tables

Slides modified from Louden Book, Dr. Scherger, & Y Chung  Analysis of the source program (Used by all Phases of The Compiler)
(NTHU), and Fischer, Leblanc
 Synthesis of a machine-language program Code
Generator
2 3
Target machine code

Source Code

Scanner Compiler Stages Semantic Processing Abstract Syntax Tree


Tokens
 Semantic routines interpret meaning based on syntactic  1st step in semantic processing is to build a syntax tree
structure of input (modern compilers do this) representing input program
Parser

Syntax Tree

Semantic
Literal  This makes the compilation syntax-directed  Don't need literal parse tree
Analyzer
Table
 Semantic routines finish the analysis  Intermediate nodes for precedence and associativity
Annotated
Symbol
 e-rules
Tree Table  Verify static semantics are followed
Just enough info to drive semantic processing
Source Code

Optimizer Error  Variables declared, compatible operands (type and #), etc.
Handler
 Or even recreate input
Intermediate
Code  Semantic routines also start the synthesis
Code
Generator  Generate either IR or target machine code
Target  The semantic action is attached to the productions (or  Semantic processing performed by traversing the tree 1
Code
Target Code
sub trees of a syntax tree). or more times
Optimizer
 Attributes attached to nodes aid semantic processing

Target
Code
4 Chapter 1: Introduction January, 2010 5 Chapter 6:Semantic Analysis April, 2011

1
11/8/2012

7.1.1
Using a Syntax Tree Representation of a Parse (1)
parse tree Abstract Syntax Tree Abstract Syntax Tree
Parsing: <assign>

 build the parse tree


<target> := <exp> := :=
 Non-terminals for operator precedence
and associatively are included.
id
<exp> + <term>

<term> <factor> Id + Id(Y) +


<term> * <factoor> id

<factor> id
* Id * Id(I)
Semantic processing:
Const

 build and decorate the Abstract Syntax Tree (AST) Const Id Const(3) Id(X)
 Non-terminals used for ease of parsing
may be omitted in the abstract syntax tree. abstract syntax tree
:=  Abstract syntax tree for Y:=3*X+I  Abstract syntax tree for Y:=3*X+I with initial
id + values
id
7 2012/11/8 *
const id

7.1.1
Using a Syntax Tree Representation of a Parse (2)
Abstract Syntax Tree Abstract Syntax Tree Semantic routines traverse (post-order) the AST,
computing attributes of the nodes of AST.
 Initially, attributes only at leaves :=(itof)
 Attributes propagate during the static semantic  Initially, only leaves (i.e. terminals, e.g. const, id) have
checking Id(Y)(f) +(i) attributes
 Processing declarations to build symbol table
 Find symbols in ST to get attributes to attach * (i) Id(I,i) Ex. Y := 3*X + I
 Determining expression/operand types
:=
 Declarations propagate top-down Const(3,i) Id(X,i) +
 Expressions propagate bottom-up id(Y)

 A tree is decorated after sufficient info for code  Abstract syntax tree for Y:=3*X+I with * id(I)
generation has propagated. propagated values const(3) id(X)
12 2012/11/8

2
11/8/2012

7.1.1 7.1.1
Using a Syntax Tree Representation of a Parse (3) Using a Syntax Tree Representation of a Parse (4) Static Semantic Checks
The attributes are then propagated to other nodes using  After attribute propagation is done,  Static semantics can be checked at compile time

some functions, e.g. the tree is decorated and ready for code generation,  Check only propagated attributes
 build symbol table use another pass over the decorated AST to generate code.  Type compatibility across assignment
Int B;
 attach attributes of nodes B := 5.2; illegal
 check types, etc. B := 3; legal
 Actually, these can be combined in a single pass
 Build the AST  Use attributes and structure
Correct number and types of parameters
 bottom-up / top-down propagation  Decorate the AST

procedure foo(int a, float b, int c, float b);
int C;
<program>
 Generate the target code float D;
<stmt> ‘‘‘ ‘‘ ‘‘
declaration
‘‘ call foo(C,D,3,2.9) legal
What we have described is essentially
:=
id +
‘‘‘‘‘ call foo(C,D,3.3, 2.9) illegal
exp.
call foo(1,2,3,4,5) illegal
symbol
table
* id ‘
type
‘‘‘check types: integer * or floating *
the Attribute Grammars(AG) (Details in chap.14)
const id ” Need to consult symbol table for types of id’s.
‘‘‘
13 2012/11/8 14 2012/11/8 15 Chapter 6:Semantic Analysis April, 2011

Dynamic Semantic Checks Translation Compiler Organization


 Some checks can’t be done at compile time  Translation task uses attributes as data, but it is driven by  one-pass compiler
 Array bounds, arithmetic errors, valid addresses of pointers, the structure  Single pass used for both analysis and synthesis
variables initialized before use.  Translation output can be several forms  Scanning, parsing, checking, & translation all interleaved,
 Some languages allow explicit dynamic semantic checks  Machine code  No explicit IR generated
 i.e. assert denominator not = 0  Intermediate representation  Semantic routines must generate machine code
 Only simple optimizations can be performed
 Decorated tree itself
 Tends to be less portable
 These are handled by the semantic routines inserting  Sent to optimizer or code generator
code to check for these semantics

 Violating dynamic semantics result in exceptions

3
11/8/2012

7.1.2
Compiler Organization Alternatives (2) Compiler Organization Compiler Organization
We prefer the code generator completely hides machine  one-pass with peephole optimization  one-pass analysis and IR synthesis plus a code generation
details and semantic routines are independent of machines.  Optimizer makes a pass over generated machine code, looking pass
at a small number of instructions at a time  Adds flexibility
Can be violated to produce better code.  Allows for simple code generation  Explicit IR created & sent to code generator
 Suppose there are several classes of registers,  IR typically simple
each for a different purpose.  Peephole: looking at only a few instructions at a time  Optimization can examine as much of IR as wanted
 Better for register allocation to be performed by semantic  Effectively a separate pass  Less machine-dependent analysis
routines than code generator since semantic routines have a  Simple but effective  So easier to retarget
broader view of the AST.  Simplifies code generator since there is a pass of post-
processing.

19 2012/11/8

7.1.2 7.1.3
Compiler Organization Compiler Organization Alternatives (7) Single Pass (1)
 Multi-pass analysis
 Scan, then parse, then check declarations, then static semantics
 Multi-language and multi-target compilers In Micro of chap 2, scanning, parsing and semantic processing
 Usually used to save space (memory usage or compiler)  Components may be shared and parameterized. are interleaved in a single pass.
 Multi-pass synthesis  Ex : Ada uses Diana (language-dependent IR)  (+) simple front-end
 Separate out machine dependence  Ex : GCC uses two IRs.  (+) less storage if no explicit trees
 Better optimization
 one is high-level tree-oriented  (-) immediately available information is limited since no complete
 Generate IR
 the other(RTL) is more machine-oriented tree is built.
 Do machine independent optimization
 Generate machine code .....
FORTRAN PASCAL ADA C
Machine dependent optimization

Relationships
semantic
 Many complicated optimization and code generation algorithms require multiple passes
 i.e. optimizations that need a more global view
..... rtn 1
call call
machine-independent optimization
scanner parser semantic semantic
for I = 1 to N rtn 2 records
tokens
foo = 35*bar(i)+16;
bar(i) { return 3;}; semantic
rtn k
SUN PC main-frame
23 2012/11/8 24 2012/11/8
language - and machine-independent IRs

4
11/8/2012

7.1.3 7.1.3
Single Pass (2) Single Pass (3)
Chapter 6 - Semantic Analysis
Each terminal and non-terminal has a semantic record. 1 pass = 1 post-order traversal of the parse tree  Parser verifies that a program is
Semantic records may be considered  parsing actions -- build parse trees syntactically correct and constructs a
as the attributes of the terminals and non-terminals.  semantic actions -- post-order traversal syntax tree (or other intermediate
 Terminals
representation).
 the semantic records are created by the scanner. <assign>

 Non-terminals  Semantic analyzer checks that the


ID (A) := <exp>
 the semantic records are created by a semantic routine when a program satisfies all other static
production is recognized. <exp> <exp>+<term> <exp> + <term>

1
<assign> ID:=<exp>
<term> const (1)
language requirements (is
 Semanticrecords are transmitted + +
<exp> id (B) “meaningful”) and collects and
ex. A  B C D #SR B B B
among semantic routines A A A A A computes information needed for code
via a semantic stack. A gencode(+,B,1,tmp1)
generation.
gencode(:=,A,tmp1)
Fall, 2002 CS 153 - Chapter 6 27
25 2012/11/8 26 2012/11/8
B C D #SR

How to build the symbol table Theoretical framework for


Important Semantic Information semantic analysis
and check types:
 Focus on attributes: computable
 Symbol table: collects declaration
 Analyze the scope rules for the properties of language constructs that
and scope information to satisfy
language and determine an are needed to satisfy language
“declaration before use” rule, and to
appropriate table structure for requirements and/or generate code
establish data type and other
properties of names in a program. maintaining this information.  Describe the computation of attributes
 Analyze the type requirements and using equations or algorithms.
 Data types and type checking:
compute data types for all typed translate them into rules that can be  Associate these equations to grammar
language entities and check that applied recursively on a syntax tree. rules and/or kinds of nodes in a syntax
language rules on types are satisfied. tree.
Fall, 2002 CS 153 - Chapter 6 28 Fall, 2002 CS 153 - Chapter 6 29 Fall, 2002 CS 153 - Chapter 6 30

5
11/8/2012

 Analyze the structure of the  Such a set of equations as described


Example of an attribute grammar
Grammar:
equations to determine an order in is called an attribute grammar. exp  exp + term | exp - term | term
which the attributes can be  While much can be done without a term  term * factor | factor
computed. (Tree traversals of syntax factor  ( exp ) | number
formal framework, the formality of
tree - preorder, postorder, inorder, or equations can help the process Attribute Grammar:
some combination of them.) considerably. GRAMMAR RULE SEMANTIC RULES
 Nevertheless, there is currently no exp1  exp2 + term exp1 .val = exp2 .val  term.val
exp1  exp2 - term exp1 .val = exp2 .val  term.val
tool in standard use that allows this exp  term exp.val = term.val
process to be automated (languages term1  term2 * factor term1 .val = term2 .val * factor.val
term  factor term.val = factor.val
differ too much in their factor  ( exp ) factor.val = exp.val
requirements). factor  number factor.val = number.val
Fall, 2002 CS 153 - Chapter 6 31 Fall, 2002 CS 153 - Chapter 6 32 Fall, 2002 CS 153 - Chapter 6 33

Notes: A Second Example Notes


Grammar:  Data type typically propagates down
 Different instances of same decl  type var-list
nonterminal must be subscripted to type  int | float a syntax tree via declarations.
distinguish them. var-list  id , var-list | id  No longer something yacc can
 Some attributes must have been Attribute Grammar:
handle directly.
precomputed (by scanner or parser),  Such an attribute is called inherited,
GRAMMAR RULE SEMANTIC RULES
e.g. number.val. decl  type var-list var-list.dtype = type.dtype while bottom-up calculation is called
type  int type.dtype = integer synthesized.
 These particular attribute equations
type  float type.dtype = real
look a lot like a yacc specification, var-list1  id , var-list2 id .dtype = var-list1.dtype  Syntax tree is a standard synthesized
var-list2.dtype = var-list1.dtype
because they represent a bottom-up var-list  id id .dtype = var-list.dtype
attribute computable by yacc; other
attribute computation. attributes computed on the tree.
Fall, 2002 CS 153 - Chapter 6 34 Fall, 2002 CS 153 - Chapter 6 35 Fall, 2002 CS 153 - Chapter 6 36

6
11/8/2012

Dependency graph Data type dependencies (by L-attributed dependencies have


 Indicates order in which attributes must
grammar rule): three basic mechanisms:
be computed. decl  type var-list: decl a A A

 Synthesized attributes always flow from var-list.dtype = type.dtype


children to parents, and can always be aB a C aB a C

computed by a postorder traversal. type dtype dtype var-list


(a) Inheritance from parent to siblings (b) Inheritance from sibling to sibling
via the parent
 Inherited attributes can flow any other var-list  id , var-list:
dtype var-list A
way. id .dtype = var-list1.dtype
var-list2.dtype = var-list1.dtype
 L-attributed: a left-to-right traversal
aB a C
suffices to compute attributes. However, dtype id , dtype var-list
this may involve a combination of pre- (c) Sibling inheritance via sibling pointers
order, inorder, and postorder traversal.
Fall, 2002 CS 153 - Chapter 6 37 Fall, 2002 CS 153 - Chapter 6 38 Fall, 2002 CS 153 - Chapter 6 39

Sample tree structure: Traversal code:


Sample tree instance:
typedef enum {decl,type,id} nodekind; void evalType (SyntaxTree t)
typedef enum {integer,real} typekind; { switch (t->kind)
String: float x, y { case decl:
typedef struct treeNode
t->rchild->dtype = t->lchild->dtype;
{ nodekind kind; Tree: evalType(t->rchild);
struct treeNode break;
* lchild, * rchild, * sibling; decl case id:
typekind dtype; if (t->sibling != NULL)
{ t->sibling->dtype = t->dtype;
/* for type and id nodes */
type id id evalType(t->sibling);
char * name; ( dtype = real ) ( x) ( y) }
/* for id nodes only */ break;
} * SyntaxTree; } /* end switch */
} /* end evalType */
Fall, 2002 CS 153 - Chapter 6 40 Fall, 2002 CS 153 - Chapter 6 41 Fall, 2002 CS 153 - Chapter 6 42

7
11/8/2012

Attributes need not be kept in the New traversal code: Even better, use a parameter
syntax tree: typekind dtype; /* global */ instead of a global variable:
void evalType (SyntaxTree t) void evalDecl(SyntaxTree t)
{ switch (t->kind) { evalType(t->rchild, t->lchild->dtype);
GRAMMAR RULE SEMANTIC RULES
{ case decl: }
decl  type var-list
dtype = t->lchild->dtype; void evalType(SyntaxTree t, typekind dtype)
type  int dtype = integer
type  float dtype = real evalType(t->rchild); { insert(t->name,dtype);
var-list1  id , var-list2 insert(id .name, dtype) break; if (t->sibling != NULL)
var-list  id insert(id .name, dtype) case id: evalType(t->sibling,dtype);
insert(t->name,dtype); }
dtype is global if (t->sibling != NULL)
evalType(t->sibling);
Note: inherited attributes can often be turned into
Use a symbol table
to store the type of break; parameters to recursive traversal functions, while
each identifier } /* end switch */ synthesized attributes can be turned into returned
} /* end evalType */ values.
Fall, 2002 CS 153 - Chapter 6 43 Fall, 2002 CS 153 - Chapter 6 44 Fall, 2002 CS 153 - Chapter 6 45

Alternative to a difficult inherited Example: Our approach:


situation (not recommended): New grammar for types:  Compute inherited stuff first (symbol
decl  var-list id table) in a separate pass
Theorem (Knuth [1968]). Given an var-list  var-list id , | type
type  int | float  Then type inference and type
attribute grammar, all inherited checking turns into a purely
attributes can be changed into New Tree for float x, y synthesized attribute computation,
id
synthesized attributes by suitable might be: ( y) since all uses of names have their
( dtype = real )
modification of the grammar, without id types already computed.
changing the language of the grammar. ( x)
( dtype = real )  Next:
type
– Symbol table structure
( dtype = real )
– Synthesized type rules
Fall, 2002 CS 153 - Chapter 6 46 Fall, 2002 CS 153 - Chapter 6 47 Fall, 2002 CS 153 - Chapter 6 48

8
11/8/2012

7.2.2 7.2.2 7.2.2


LR(1) - (1) LR(1) - (2) LR(1) - (3)
Semantic routines  After shifting “if <cond> “  However, sometimes we do need to perform semantic
 are invoked only when a structure is recognized.  The parser cannot decide actions in the middle of a production.
which of #ifThen and #ifThenElse should be invoked.
Ex:
LR parsing <stmt> if <exp> then <stmt> end
 a structure is recognized when the RHS is reduced to LHS.  cf. In LL parsing, generate code for <exp> generate code for <stmt>
 The structure is recognized when a non-terminal is
Need a conditional jump here.
 Therefore, action symbols must be placed at the end. expanded.

Ex: Solution: Use two productions:


<stmt> # ifThen <stmt> <if head> then <stmt> end #finishIf
if <cond> then <stmt> end <if head> if <exp> #startIf
if <cond> then <stmt> else <stmt> end semantic hook (only for semantic processing)
# ifThenElse
49 2012/11/8 50 2012/11/8 51 2012/11/8

7.2.2 7.2.3 7.2.3


LR(1) - (4) Semantic Record Representation - (1) Semantic Record Representation - (2)
 Another problem Since we need to use a stack to store semantic records,  How to handle errors?
 What if the action is not at the end? all semantic records must have the same type.
 variant record in Pascal  Ex.
 Ex:
 union type in C  A semantic routine
 <prog>  #start begin <stmt> end needs to create a record for each identifier in an expression.
 We need to call #start. Ex:  What if the identifier is not declared?
enum kind {OP, EXP, STMT, ERROR};
typedef struct {

 Solution: Introduce a new non-terminal. enum kind tag;  The solution at next page…….
union {
 <prog><head> begin <stmt> end op_rec_type OP_REC;
exp_rec_type EXP_REC;
 <head>#start
stmt_rec_type STMT_REC;
......
}
 YACC automatically performs such transformations. } sem_rec_type;
52 2012/11/8 53 2012/11/8 54 2012/11/8

9
11/8/2012

7.2.3 7.2.4 7.2.4


Semantic Record Representation - (3) Action-controlled semantic stack - (1) Action-controlled semantic stack - (2)
 Solution 1: make a bogus record Action routines take parameters from  Two other disadvantages:
 Thismethod may create a chain of  the semantic stack directly and push results onto the stack.  (-)Action routines
meaningless error messages due to this bogus record. need to manage the stack.
Implementing stacks:
1. array  (-)Control of the stack
 Solution 2: create an ERROR semantic record 
is distributed among action routines.
 2. linked list
 No error message will be printed  Each action routine
when ERROR record is encountered. pops some records and pushes 0 or 1 record.
Usually, the stack is transparent - any records  If any action routine
in the stack may be accessed by the semantic routines. makes a mistake, the whole stack is corrupt.
WHO controls the semantic stack?
 (-) difficult to change
 action routines The solution at next page……..
 parser
55 2012/11/8 56 2012/11/8 57 2012/11/8

7.2.4 7.2.5 7.2.5


Action-controlled semantic stack - (3) parser-controlled stack - (1) parser-controlled stack - (2)
 Solution 1: Let parser control the stack LR LL parser-controlled semantic stack
 Solution 2: Introduce additional stack routines  Semantic stack and parse stack operate in parallel [shifts and  Every time a production AB C D is predicted,
reduces in the same way].
 Ex: B
C
 Parser  Stack routines  Parameter-driven action routines <stmt> .......... <stmt> Parse stack
Ex: then .......... then
A
:
D
:
 <stmt> if <exp> then <stmt> end <exp> .......... <exp>
If action routines if .......... if 12
11 D
top
:
do not control the stack, we can use opaque (or abstract) parser stack semantic stack 10
9
C
B
right
Semantic stack
stack: only push() and pop() are provided. may be combined
: 8 : current
 Ex: A 7 A left
 (+) clean interface : :
 YACC generates such parser-controlled semantic stack.
 (- ) less efficient
 <exp><exp> + <term>
Need four pointers for the semantic stack (left, right, current, top).
 { $$.value=$1.value+$3.value;}
58 2012/11/8 59 2012/11/8 60 2012/11/8

10
11/8/2012

7.2.5 7.2.5 7.2.5


parser-controlled stack - (3) parser-controlled stack - (4) parser-controlled stack - (5)
 However, when a new production BE F G is predicted, Note
the four pointers will be overwritten. Parse stack E  All push() and pop() are done by the parser
F
B
G  Not by the action routines.
EOP(7,9,9,12)
C
 Therefore, create a new EOP record for the four C
ABCD D B  EFG
D
pointers on the parse stack.
EOP(...)
A
:
:
EOP(......)
: Semantic records
 Are passed to the action routines by parameters.
 When EOP record appears on stack top, Semantic stack 15 top
G
restore the four pointers, which essentially pops off
14
13 F right  Example
records from the semantic stack.
12 top 12 E
current 11 D
right
11 D
current
 <primary>(<exp>) #copy ($2,$$)
10 C 10 C
9 9 B 9 B left
8 : 8 current :
: 8
 An example at next page……. 7 A 7 left A
A 7
: :
:

61 2012/11/8 62 2012/11/8 63 2012/11/8

7.2.5 7.2.5 7.3


parser-controlled stack - (6) parser-controlled stack - (7) Intermediate representation and code generation
 Two possibilities:
Initial information  (-) Semantic stack may grow very big.
semantic code Machine code
 is stored in the semantic record of LHS.  <fix> 1. .....
routines generation
 Certain non-terminals never use semantic records, (+) no extra pass for code generation
 e.g. <stmt list> and <id list>. (+) allows simple 1-pass compilation
After the RHS is processed the resulting information
 We may insert #reuse
 is stored back in the semantic record of LHS. semantic IR code
 before the last non-terminal in each of their productions. 2. ..... Machine code
routines generation
 Example
Target machine is abstracted to some virtual machine
D  <stmt list><stmt> #reuse <stmt tail> Allows language-oriented primitives
initially C finally  <stmt tail><stmt> #reuse <stmt tail> Code generation separated from semantic routines
B Semantic routines don't care about temp reg.
 <stmt tail> Reduces machine dependence (isolated to code generation
: : : Optimization can be done at intermediate level
A Optimization independent of target machine
A A  Evaluation Simpler and better optimization (IR more high-level)
: : :  Parser-controlled semantic stack is easy with LR, but not so with LL. (+) allows higher-level operations e.g. open block, call procedures.

64 information flow (attributes) 2012/11/8 65 2012/11/8 66 2012/11/8

11
11/8/2012

IR vs Machine Code Forms of IR – Postfix Notation Forms of IR – Three-Address Codes


Concise  Virtual machine having operations with 3
 Generating machine code advantages: 
operands, 2 source, 1 destination a := b*c + b*d
 No overhead of extra pass to translate IR  Simple translation  Explicitly reference intermediates
 Conceptually simple compilation model  Useful for interpreters and target machines with a stack (1)
(2)
(*
(*
b
b
c)
d)
(1) ( *
(2) ( *
b
b
c
d
t1 )
t2 )
architecture  Triples: op, arg1, arg2 (3) (+ (1) (2)) (3) ( + t1 t2 t3 )
More concise
 Not particularly good for optimization or code generation  (4) (:= (3) a) (4) ( := t3 a _)
 Bottom line  Position dependency makes
moving/removing triples hard intermediate results use
 IR valuable if optimization or portability is an important issue  such as during optimization are referenced by temporary
 Example: the instruction # names
 Machine code much simpler Code Postfix
a+b ab+  Quadruples: op, arg1, arg2, arg3
a+b*c abc*+  More convenient for code generation than
postfix
(a+b)*c ab+c*
 Expression oriented, not so good for other
a:=b*c+b*d abc*bd*+:= uses

Forms of IR – Three-Address Codes Forms of IR – Tuples Forms of IR – Trees


Float a,d; Int b,c;  Tuples allow variable number of operands  Syntax trees can also be used
a:=b*c+b*d
 A generalization of quadruples  Directed acyclic graph (DAG) is an option
Triples Quadruples  Can use an abstract syntax tree
(1)(MULTI,Addr(b),Addr(c)) (1)(MULTI,Addr(b),Addr(c),t1)  More complex and more powerful
(2)(FLOAT,Addr(b),-) (2)(FLOAT,Addr(b),t2,-) a:=b*c+b*d := :=
(3)(MULTF,(2),Addr(d)) (3)(MULTF,t2,Addr(d),t3)
a + a +
(4)(FLOAT,(1),-) (4)(FLOAT,t1,t4,-) (1)(MULTI,Addr(b),Addr(c),t1) Ex: a := b*c + b*d
(5)(ADDF,(4),(3)) (5)(ADDF,t4,t3,t5)
(6)(:=,(5),Addr(a)) (6)(:=,t5,Addr(a),-)
(2)(FLOAT,Addr(b),t2) * * * *
(3)(MULTF,t2,Addr(d),t3)
b c b d b c d
 Can also add more detail, such as type or address. (4)(FLOAT,t1,t4)
 Tree Transformations for optimizations
(5)(ADDF,t4,t3,t5)
 These forms translate input, other 3 forms transform it Ex. Ada uses Diana.
(6)(:=,t5,Addr(a)) .....

12
11/8/2012

Symbol Table  One way to finesse the issue of what


 Major data structure after syntax tree. information to put into the table is to
 Specific information stored in
just keep pointers in the table that
 An inherited attribute that may be kept symbol table depends heavily on
point to declaration nodes in the
globally. language, but generally includes:
syntax tree. Then symbol table code
 May be needed before semantic – Data type
doesn’t need to be changed when
analysis (or some form of it, as in C), – Scope (see below) changing the information, since it is
but makes sense to put off computing – Size (bytes, array length) stored in the node, not directly in the
it until necessary. – Potential or actual location information table. This is the approach taken in
 Stores declaration information using
(addresses, offsets - see later) the TINY compiler, and should be
name as primary key. carried over to C-Minus.
Fall, 2002 CS 153 - Chapter 6 - Part 2 73 Fall, 2002 CS 153 - Chapter 6 - Part 2 74 Fall, 2002 CS 153 - Chapter 6 - Part 2 75

Scope Information C has simple scope structure: Example:


 Requires that symbol table have some “external” (global)
 All names must be declared before use typedef int z; scope: nestLevel 0
kind of “delete” operation in addition to (although multiple declarations are
lookup and insert, since exiting a scope int y;
possible).
requires that declarations be removed  Scopes are nested in a stack-like fashion,
from view (that is, lookups no longer and cannot be re-entered after exit (simple /* this is legal C! */ nestlevel 1 begins
with params
find them, though they may still be delete is possible). void x(double x)
referenced elsewhere).  Scope information can be kept simply as a { char* x; nestlevel 2 begins
number: the nesting level (needed during { char x; with function body
 Delete operation should not in general
semantic analysis because redeclaration }
re-process individual declarations: in same scope is illegal in C).
} nestlevel 3
exitScope() should do them all in O(1).
Fall, 2002 CS 153 - Chapter 6 - Part 2 76 Fall, 2002 CS 153 - Chapter 6 - Part 2 77 Fall, 2002 CS 153 - Chapter 6 - Part 2 78

13
11/8/2012

Not all compilers get it right that


Java has 5 “namespaces”, Further complication in Java:
parameters have a separate scope depending on type of declaration: local redeclaration even in nested
from the function body in C. But gcc scopes is illegal:
package A; // legal Java!!!
does: class A
class A
C:\classes\cs153\f02>gcc -c scope.c { A A(A A) { A A(A A)
scope.c: In function `x': { A: { for(;;)
scope.c:6: warning: declaration of for(;;) { A A; // oops, now illegal!
`x' shadows a parameter { if (A.A(A) == A) break A; } if (A.A(A) == A) break;
return A; }
At least all names occupy a single
} return A;
“namespace” in C, so one symbol }
}
table is enough (compare to Java). }
Fall, 2002 CS 153 - Chapter 6 - Part 2 79 Fall, 2002 CS 153 - Chapter 6 - Part 2 80 Fall, 2002 CS 153 - Chapter 6 - Part 2 81

Symbol table data structure Best bet:


Example:
properties:  Use a hash table (or a list or tree or
 All operations should be very fast hash table of hash tables). Indices Buckets Lists of Items
(preferably O(1)).  Separate chains better than a closed
0 > i
 Must be able to disambiguate array (chains handled as little stacks,
1 > size > j
overloaded name use (depending on insertions and deletions always at
language): add type, scope, nesting the front). 2
info to lookup.  Hash function needs to use all 3 > temp
 Must not be affected by typical characters in a name (to avoid
4
programmer “clustered” names: x1, collisions), and involve character
x11, x12, etc. position too!
Fall, 2002 CS 153 - Chapter 6 - Part 2 82 Fall, 2002 CS 153 - Chapter 6 - Part 2 83 Fall, 2002 CS 153 - Chapter 6 - Part 2 84

14
11/8/2012

Sample hash function code: Easy way to get O(1) behavior when
Some structure similar to the previous
exiting a scope: use a linked list (or tree
slide is actually required in C++, Ada, and
#define SIZE 211 // typically a prime number or…) of hash tables, one hash table for
other languages where scopes can be
#define SHIFT 4 each scope:
arbitrarily re-entered (C++ has the scope
int hash ( char * key ) resolution operator ::), since individual
{ int temp = 0;
int i = 0;
scopes must be attached to names,
while (key[i] != '\0') > i (char) > i (int) allowing them to be “called”:
{ temp = ((temp << SHIFT) + key[i]) % SIZE; > j (char *) > size (int) > j (int)
++i; class A { void f(); }
}
> temp (char) ...
return temp;
} > f (function) void A::f() // go back inside A
{ ... }
Fall, 2002 CS 153 - Chapter 6 - Part 2 85 Fall, 2002 CS 153 - Chapter 6 - Part 2 86 Fall, 2002 CS 153 - Chapter 6 - Part 2 87

Two additional scope issues (of One more scope issue: dynamic
 Called “dynamic scope” (vs. the
many): scope more usual lexical or static scope).
 Some languages use a run-time  A questionable design choice for any
 Recursion: insertion into table must occur
before processing is complete: version of scope that does not follow but the most dynamic, interpreted
// lookup of f in body must work: the layout of the program on the languages, since there can then be
void f() { … f() … } page, but the execution path: LISP, no static semantic analysis (no static
 Relaxation of declaration before use rule perl. type checking, for example)
(C++ and Java class scopes):  Symbol table then must be part of  Running the symbol table during
all insertions must occur before all
runtime system, providing lookup of execution also slows down execution
lookups (two passes required):
class A names during execution (it better be speed substantially
{ int f() { return x; } int x; } really fast in this case).
Fall, 2002 CS 153 - Chapter 6 - Part 2 88 Fall, 2002 CS 153 - Chapter 6 - Part 2 89 Fall, 2002 CS 153 - Chapter 6 - Part 2 90

15
11/8/2012

Example of dynamic scope (C TINY symtab.h:


TINY symbol table:
syntax): /* Insert line numbers and memory locs
 All names are global: there are no into the symbol table */
int i = 1; void st_insert( char * name, int lineno, int loc
void f(void)
scopes.
);
{ printf("%d\n",i);}  Declaration is by use: if a lookup
fails, perform an insert. /* return the memory
main() location of a variable or -1 if not found */
{ int i = 2;  Virtually no information has to be int st_lookup ( char * name );
/* the following call prints 1 using normal lexical
scoping, but prints 2 (the value of the local i)
kept (all names are int vars), so I had
/* Procedure printSymTab prints a formatted
using dynamic scope */ to invent something to store in the listing of the symbol table contents
f();
symbol table (line numbers). to the listing file */
return 0;
void printSymTab(FILE * listing);
}  No deletes!
Fall, 2002 CS 153 - Chapter 6 - Part 2 91 Fall, 2002 CS 153 - Chapter 6 - Part 2 92 Fall, 2002 CS 153 - Chapter 6 - Part 2 93

Sample TINY code building the C-Minus Symbol Table Sample C-Minus symtab.h:
symbol table:  Use basic structure of TINY
/* Start a new scope; return 0 if malloc fails,
else 1 */
caseAssignK:  Store tree pointers int st_enterScope(void);
caseReadK:
if(st_lookup(t->attr.name) == -1)  Add enterScope() and exitScope() /* Remove all declarations in the current scope */
/*not yet in table, so treat as new definition */  List of tables structure helpful (slide 15) void st_exitScope(void);
st_insert(t->attr.name,t->lineno,location++);
 Add nesting level to tree nodes
else /* Insert def nodes from the syntax tree
/* already in table, so ignore location,  Add pointer to declaration in all ID nodes return 0 if malloc fails, else 1 */
add line number of use only */ (found by lookup) int st_insert( TreePtr );
st_insert(t->attr.name,t->lineno,0);
 Use best ADT methods (hide all details of
break; /* Return the defnode of a variable, parameter, or
actual symtab structure) function, or NULL if not found */
TreePtr st_lookup ( char * name );
Fall, 2002 CS 153 - Chapter 6 - Part 2 94 Fall, 2002 CS 153 - Chapter 6 - Part 2 95 Fall, 2002 CS 153 - Chapter 6 - Part 2 96

16
11/8/2012

Data types and type checking C Example


In terms of syntax tree:
 Suppose a function is declared as
 A data type is constructed recursively out
of simple or base types (int, char, double, char * f(double d)
call char*
etc.) and type constructors that create  Data type of f is then
“new” types out of a group of existing char*()(double) (1) is function (3) result has type
ones: struct, union, * (“pointer to”), enum, (function from double to char*)
[ ] (“array of”), etc.
 The call f(2) type checks because f id: f num: 2
 Types in code are checked by examining
the “compatibility” of the types of the is a function, 2 is an int, and int is
compatible in C with double (can be char*()(double) int
components, and by determining a (2) compatible with
“result” type, if any, from these. silently converted). The result then
must be of type char*
Fall, 2002 CS 153 - Chapter 6 - Part 2 97 Fall, 2002 CS 153 - Chapter 6 - Part 2 98 Fall, 2002 CS 153 - Chapter 6 - Part 2 99

Type compatibility of constructed


 On the other hand: Type Equivalence Algorithm
types struct A {} x;
 Generally depends on a notion of when struct A y;  Structural equivalence: as long as the
two type are “equal” (equivalent), or at y = x; // now it’s ok! types have the same structure, they are
least closely related.  struct A {} x; declares a type (with equivalent.
name “struct A”) and a variable x  Name equivalence: types are equivalent
 C example:
 Reusing name struct A gives same type only if they are identical as names
struct {} x,z;
 Writing struct {} defines a type with a  Declaration equivalence: types are
struct {} y; equivalent if they lead back (through
hidden internal name (so it can’t be
y = x; // illegal! (different renaming) to the same original use of a
referred to).
types) type constructor.
z = x; // ok! Same types
Fall, 2002 CS 153 - Chapter 6 - Part 2 100 Fall, 2002 CS 153 - Chapter 6 - Part 2 101 Fall, 2002 CS 153 - Chapter 6 - Part 2 102

17
11/8/2012

Equivalence Example (C syntax) C uses a combination of Digression: Enums in C and C++


structural and declaration
 An enum in C is not a real type constructor:
 struct A {}; equivalence: enum A {one,two,three} x;
 typedef struct A A; enum B {four,five,six} y;
 Declaration equivalence for struct x = y /* ok in C */
 typedef struct {} B; and union  In C++ this assignment is an error:
 struct A x; A y; B z;  structural equivalence for arrays, C:\classes\cs153\f02>gxx enum.cpp
enum.cpp: In function `int main()':
 x, y, z all structurally equivalent pointers, and functions enum.cpp:7: cannot convert `B' to `A' in
 x, y declaration equivalent, but z is  enum isn’t even a type constructor, assignment
not declaration equivalent to these but constructs a named subrange of  Note how error message implies that C++
int (unlike C++ - see next slide) automatically generates a typedef enum A A!
 none are name equivalent
Fall, 2002 CS 153 - Chapter 6 - Part 2 103 Fall, 2002 CS 153 - Chapter 6 - Part 2 104 Fall, 2002 CS 153 - Chapter 6 - Part 2 105

Representing types internally in a Digression on C function types


compiler  Functions generally are type
 There are two kinds of function types in C
that are almost identical (and that can
constructors too, but their types do almost be used interchangeably) -
 Since types are built up recursively,
not have to be built explicitly, since function constants and function pointers:
a tree structure must be used (syntax typedef char* F(double);
the return type and parameter types
tree gets another major node kind: typedef char* (*G)(double);
are available in the syntax tree for
datatype).  F is a “constant” function type (a
checking (unless, of course, function
 Some languages (FORTRAN, TINY, types can be explicitly written, as in prototype), while G is a “pointer to
C-Minus) have flat type spaces, so C: typedef char*(F)(double)- see function” type, or function variable:
that an enum can be used: int, F f; // a prototype for a func f
the next slide). G g = f; // g is var init’ed to f
intarray, function.
f = g; // illegal - f is const
Fall, 2002 CS 153 - Chapter 6 - Part 2 106 Fall, 2002 CS 153 - Chapter 6 - Part 2 107 Fall, 2002 CS 153 - Chapter 6 - Part 2 108

18
11/8/2012

 In many ways, this mirrors the close Recursive types


relationship in C between pointers and
Other issues (a sample)
arrays:  Present special problems:
int x[10]; struct A { int x; struct A next; };  Should array size be part of its type?
int* y = x; // ok is illegal, because it would represent an (C says no)
x = y; // illegal “infinite” type (just as void f(void) {  How far should compatibility of types
 In calls and params it really doesn’t matter f(); } represents an “infinite” call). go? (Should any two pointers be
which type you use or assume:  In C must interpose a pointer: compatible?)
f(2), (*f)(2) and (&f)(2) all work fine, struct A { int x; struct A* next; };
and void p(F ff) and void p(G gg) are  Dynamic typing: constructing types
identical in effect.  Some languages use a union instead. during execution.
 Others (like Java) have implicit
pointers.
Fall, 2002 CS 153 - Chapter 6 - Part 2 109 Fall, 2002 CS 153 - Chapter 6 - Part 2 110 Fall, 2002 CS 153 - Chapter 6 - Part 2 111

Sample TINY type checking


Type checking in TINY code Type Checking in C-Minus
switch (t->kind.exp)
 Go through Appendix A carefully,
 Only two types: int and bool
{ case OpK: writing out all type rules
 Only need to check if statement, if ((t->child[0]->type != Integer) ||
 As in TINY, there are only a few types
while statement, assignment, and a (t->child[1]->type != Integer))
typeError(t,"Op applied to non-integer"); (other than functions). And there are
few other cases if ((t->attr.op == EQ) || (t->attr.op == LT)) no explicit function types, or function
 type errors may create a “void” type. t->type = Boolean;
variables or parameters. Also no
else
Suppress error messages in the t->type = Integer; recursive types. And no typedefs.
presence of void. break;
 Answer questions such as: is x = y
legal if x and y are both arrays?
Fall, 2002 CS 153 - Chapter 6 - Part 2 112 Fall, 2002 CS 153 - Chapter 6 - Part 2 113 Fall, 2002 CS 153 - Chapter 6 - Part 2 114

19
11/8/2012

Making syntax tree traversals easy: use


Example from Appendix A // builds symtab in preorder:
“generic” traversal function: traverse(syntaxTree,insertNode,nullProc);
18. expression  var = expression | simple-expression
static void traverse( TreeNode * t,
19. var  ID | ID [ expression ]
void (* preProc) (TreeNode *), // checks types in postorder:
void (* postProc) (TreeNode *) ) traverse(syntaxTree,nullProc,checkNode);
An expression is a variable reference followed by an expression, { if (t != NULL)
or just a simple expression. The assignment has the usual storage { preProc(t);
semantics: the location of the variable represented by var is { int i;
void nullProc( treeNode* t)
found, then the subexpression to the right of the assignment is for (i=0; i < MAXCHILDREN; i++) {}
evaluated, and the value of the subexpression is stored at the traverse(t->child[i],preProc,postProc);
given location. This value is also returned as the value of the } etc . . .
entire expression. A var is either a simple (integer) variable or a postProc(t);
subscripted array variable. A negative subscript causes the traverse(t->sibling,preProc,postProc);
program to halt (unlike C). However, upper bounds of subscripts }
are not checked. }
Fall, 2002 CS 153 - Chapter 6 - Part 2 115 Fall, 2002 CS 153 - Chapter 6 - Part 2 116 Fall, 2002 CS 153 - Chapter 6 - Part 2 117

Analyze.h - a two-step process: What should C-Minus Print


under TraceAnalyze?
An Example of C-Minus Symbol
/* Function buildSymtab constructs the symbol
* table by preorder traversal of the syntax tree  Possibly a representation of the Table Construction
*/
symbol table, as in TINY and the use of the symbol table to
void buildSymtab(TreeNode *);
 But also another representation of
/* Procedure typeCheck performs type checking
the tree with types added
link uses of names to their defs.
* by a postorder syntax tree traversal
*/  PrintTree could be modified to do
void typeCheck(TreeNode *); this, or a new PrintTypes function CS 153 - Fall, 2002 - K. Louden -
added to util.h/util.c 11/10/02

Fall, 2002 CS 153 - Chapter 6 - Part 2 118 Fall, 2002 CS 153 - Chapter 6 - Part 2 119

20
11/8/2012

The Example: Syntax tree: Symbol Table at Position 1:


int a; /*d1*/ a
d1
b
d2

int b[10]; /*d2*/


d3 d7
int c /*d3*/ (int a[] /*d4*/, int c /*d5*/) c main

{ /* Position 1 */ if (c) d4 d5 block


a c nestLevel 2 nestLevel 1 nestLevel 0
block
{ int d; /*d6*/ /* Position 2 */
call
d = a[c] + b[c]; input
if return output call
return d; }
c b a a ad4 a d1
return 0; } c block 0
b d2

void main(void) /*d7*/ d


d6
= c d5 c d3 output

{ /* Position 3 */ d +

output(c(b,a)); subs subs


a c b c
}
11/11/02 K. Louden, CS 153, Fall 2002 121 11/11/02 K. Louden, CS 153, Fall 2002 122 11/11/02 K. Louden, CS 153, Fall 2002 123

Lookup of c after position 1 produces Symbol Table at Position 2: Lookups of a, b, c , and d after position
the following tree with link: 2 produces the following tree with links:
d1 d2 d1 d2
a b a b

d3 d7 d3 d7
c main main
nestLevel 3 nestLevel 2 nestLevel 1 nestLevel 0 c

d4 d5 d4
block d5 block
a c block a c
input block
call call

if return a ad4 a d1 if return


output call output call
b d2
c 0 c b a c block 0 c b a
block d d6 c d5 c d3 output

d6 d6
= d =
d

d + d +

subs subs subs subs


a c b c
a c b c

11/11/02 K. Louden, CS 153, Fall 2002 124 11/11/02 K. Louden, CS 153, Fall 2002 125 11/11/02 K. Louden, CS 153, Fall 2002 126

21
11/8/2012

Symbol Table at Position 3: Lookups of output, a, b, and c after pos.


3 produces
d1
thed2following tree with links:
a b

d3 d7
main
c
nestLevel 2 nestLevel 1 nestLevel 0
d4
d5 block
a c block
input call
main d7
a d1 if return output call
b d2
c block 0 c b a
c d3 output

d6
d =

d + output

subs subs
a c b c

11/11/02 K. Louden, CS 153, Fall 2002 127 11/11/02 K. Louden, CS 153, Fall 2002 128

22

You might also like