0% found this document useful (0 votes)
16 views

CSC 409 Note 2

Compiler construction

Uploaded by

idriswakil123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

CSC 409 Note 2

Compiler construction

Uploaded by

idriswakil123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

COMPILER CONSTRUCTION (CSC 409)

2. A simple One Pass Compiler:


2.0 INTRODUCTION: In computer programming, a one-pass compiler is a compiler that passes
through the parts of each compilation unit only once, immediately translating each part into its final
machine code. This is in contrast to a multi-pass compiler which converts the program into one or
more intermediate representations in steps between source code and machine code, and which
reprocesses the entire compilation unit in each sequential pass.
2.1 OVERVIEW
• Language Definition
o Appearance of programming language :
Vocabulary : Regular expression
Syntax: Backus-Naur Form(BNF) or Context Free Form (CFG)
o Semantics : Informal language or some examples

Fig 2.1. Structure of our compiler front end


2.2 SYNTAX DEFINITION
• To specify the syntax of a language : CFG and BNF
o Example : if-else statement in C has the form of statement → if ( expression )
statement else statement
• An alphabet of a language is a set of symbols.
o Examples: {0,1} for a binary number system (language) ={0,1,100,101,...}
{a,b,c} for language={a,b,c, ac,abcc..}
{if,(,),else ...} for a if statements={if(a==1) goto 10, if--}
• A string over an alphabet
o is a sequence of zero or more symbols from the alphabet.
o Examples : 0,1,10,00,11,111,0202 ... strings for a alphabet {0,1}
o Null string is a string which does not have any symbol of alphabet.
• Language
o Is a subset of all the strings over a given alphabet.
o Alphabets Ai Languages Li for Ai
A0={0,1} L0={0,1,100,101,...}

1
A1={a,b,c} L1={a,b,c, ac, abcc..}
A2={all of C tokens} L2= {all sentences of C program }
• Example 2.1. Grammar for expressions consisting of digits and plus and minus signs.
o Language of expressions L={9-5+2, 3-1, ...}
o The productions of grammar for this language L are:
list → list + digit
list → list - digit
list → digit
digit → 0|1|2|3|4|5|6|7|8|9
o list, digit : Grammar variables, Grammar symbols
o 0,1,2,3,4,5,6,7,8,9,-,+ : Tokens, Terminal symbols
• Convention specifying grammar
o Terminal symbols : bold face string if, num, id
o Nonterminal symbol, grammar symbol : italicized names, list, digit ,A,B
• Grammar G=(N,T,P,S)
o N : a set of nonterminal symbols
o T : a set of terminal symbols, tokens
o P : a set of production rules
o S : a start symbol, S∈N
• Grammar G for a language L={9-5+2, 3-1, ...}
o G=(N,T,P,S)
N={list,digit}
T={0,1,2,3,4,5,6,7,8,9,-,+}
P : list -> list + digit
list -> list - digit
list -> digit
digit -> 0|1|2|3|4|5|6|7|8|9
S=list
• Some definitions for a language L and its grammar G
• Derivation :
A sequence of replacements S⇒α1⇒α2⇒…⇒αn is a derivation of αn.
Example, A derivation 1+9 from the grammar G

2
• left most derivation
list ⇒ list + digit ⇒ digit + digit ⇒ 1 + digit ⇒ 1 + 9
• right most derivation
list ⇒ list + digit ⇒ list + 9 ⇒ digit + 9 ⇒ 1 + 9
• Language of grammar L(G)
L(G) is a set of sentences that can be generated from the grammar G.
L(G)={x| S ⇒* x} where x ∈ a sequence of terminal symbols
• Example: Consider a grammar G=(N,T,P,S):
N={S} T={a,b}
S=S P ={S → aSb | ε }
• is aabb a sentecne of L(g)? (derivation of string aabb)
S⇒aSb⇒aaSbb⇒aaεbb⇒aabb(or S⇒* aabb) so, aabbεL(G)
• there is no derivation for aa, so aa∉L(G)
• note L(G)={anbn| n≧0} where anbn meas n a's followed by n b's.
• Parse Tree
A derivation can be conveniently represented by a derivation tree( parse tree).
o The root is labeled by the start symbol.
o Each leaf is labeled by a token or ε.
o Each interior none is labeled by a nonterminal symbol.
o When a production A→x1… xn is derived, nodes labeled by x1… xn are made as children
nodes of node labeled by A.
• root : the start symbol
• internal nodes : nonterminal
• leaf nodes : terminal
o Example G:
list -> list + digit | list - digit | digit
digit -> 0|1|2|3|4|5|6|7|8|9
• left most derivation for 9-5+2,
list ⇒ list+digit ⇒ list-digit+digit ⇒ digit-digit+digit ⇒ 9-digit+digit
⇒ 9-5+digit ⇒ 9-5+2
• right most derivation for 9-5+2,
list ⇒ list+digit ⇒ list+2 ⇒ list-digit+2 ⇒ list-5+2

3
⇒ digit-5+2 ⇒ 9-5+2
parse tree for 9-5+2

Ambiguity
• A grammar is said to be ambiguous if the grammar has more than one parse tree for a given string of
tokens.
• Example 2.5. Suppose a grammar G that cannot distinguish between lists and digits as in
Example 2.1.
• G : string → string + string | string - string |0|1|2|3|4|5|6|7|8|9

Associativity of operator
A operator is said to be left associative if an operand with operators on both sides of it is
taken by the operator to its left.

4
eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)
• Left Associative Grammar :
list → list + digit | list - digit
digit →0|1|…|9
• Right Associative Grammar :
right → letter = right | letter
letter → a|b|…|z

Precedence of operators
We say that an operator(*) has higher precedence than other operator(+) if the operator(*) takes
operands before other operator(+) does.
• ex. 9+5*2≡9+(5*2), 9*5+2≡(9*5)+2
• left associative operators : + , - , * , /
• right associative operators : = , **

Syntax of full expressions


operator associative precedence
+, - left 1 low
*, / left 2 high

• expr → expr + term | expr - term | term


term → term * factor | term / factor | factor
factor → digit | ( expr )

5
digit → 0 | 1 | … | 9

Syntax of statements
o stmt → id = expr ;
| if (expr) stmt ;
| if (expr) stmt else stmt ;
| while (expr) stmt ;
expr → expr + term | expr - term | term
term → term * factor | term / factor | factor
factor → digit | (expr)
digit → 0 | 1 | … | 9

2.3 SYNTAX-DIRECTED TRANSLATION(SDT)


A formalism for specifying translations for programming language constructs.
(attributes of a construct: type, string, location, etc)
• Syntax directed definition(SDD) for the translation of constructs
• Syntax directed translation scheme(SDTS) for specifying translation
Postfix notation for an expression E
• If E is a variable or constant, then the postfix nation for E is E itself ( E.t≡E ).
• if E is an expression of the form E1 op E2 where op is a binary operator
o E1' is the postfix of E1,
o E2' is the postfix of E2
o then E1' E2' op is the postfix for E1 op E2
• if E is (E1), and E1' is a postfix
then E1' is the postfix for E

Syntax-Directed Definition (SDD) for translation


• SDD is a set of semantic rules predefined for each productions respectively for translation.
• A translation is an input-output mapping procedure for translation of an input X,

6
o construct a parse tree for X.
o synthesize attributes over the parse tree.
 Suppose a node n in parse tree is labeled by X and X.a denotes the value of attribute
a of X at that node.
 Compute X's attributes X.a using the semantic rules associated with X.
Example 2.6. SDD for infix to postfix translation

An example of synthesized attributes for input X=9-5+2

Fig 2.6. Attribute values at nodes in a parse tree.


Syntax-directed Translation Schemes (SDTS)
• A translation scheme is a context-free grammar in which program fragments called translation actions
are embedded within the right sides of the production.

7
• {print("+");} : translation(semantic) action.
• SDTS generates an output for each sentence x generated by underlying grammar by executing actions
in the order they appear during depth-first traversal of a parse tree for x.
1. Design translation schemes(SDTS) for translation
2. Translate :
a) parse the input string x and
b) emit the action result encountered during the depth-first traversal of parse tree.

Fig 2.8. An extra leaf is constructed for a semantic action.

Example 2.8.
• SDD vs. SDTS for infix to postfix translation.

8
• Action translating for input 9-5+2

1) Parse.
2) Translate.
Do we have to maintain the whole parse tree ?
No, Semantic actions are performed during parsing, and we don't need the nodes (whose
semantic actions done).

2.4 PARSING
if token string x ∈ L(G), then parse tree
else error message
Top-Down parsing
1. At node n labeled with nonterminal A, select one of the productions whose left part is
A and construct children of node n with the symbols on the right side of that production.
2. Find the next node at which a sub-tree is to be constructed.
ex. G: type → simple
|↑id

9
|array [ simple ] of type
simple → integer
|char
|num dotdot num

Fig 2.10. Top-down parsing while scanning the input from left to right.

10
Fig 2.11. Steps in the top-down construction of a parse tree.
• The selection of production for a nonterminal may involve trial-and-error. =>
backtracking

11
but process is too difficult. It needs 18 steps including 5 backtrackings.
• procedure of top-down parsing
let a pointed grammar symbol and pointed input symbol be g, a respectively.
o if( g ∈ N ) select and expand a production whose left part equals to g next to
current production.
else if( g = a ) then make g and a be a symbol next to current symbol.
else if( g ≠a ) back tracking
 let the pointed input symbol a be the symbol that moves back to steps same with the
number of current symbols of underlying production
 eliminate the right side symbols of current production and let the pointed symbol g
be the left side symbol of current production.

12

You might also like