Compilers Lecture 5
Compilers Lecture 5
Introduction to parsing
1
Parsing Process
Call the scanner to get tokens
Build a parse tree from the stream of tokens
A parse tree shows the syntactic structure of the source
program.
Add information about identifiers in the symbol table
Report error, when found, and recover from the error
2
Context-free grammars
Regular languages are the weakest formal languages
widely used. they aren’t used to specify syntax.
The rules used to specify the syntax of a programming
language can be formalized using the concept of
Context-free Grammar(CFG).
Programming languages have recursive structure .
Context-free grammars are a natural notation for this
recursive structure.
Formal Definition of a CFG
G = (V,T,P,S)
V: finite set of nonterminal symbols
T: finite set of terminal symbols, V and T are disjoint
P: finite set of productions of the form
A , A V and (T V)*
S V: start symbol
Context in CFGs
The notion of context in CFGs has nothing to do with
the ordinary meaning of the word context in language
All it really means is that the non-terminal on the left-
hand side of a rule is out there all by itself (free of
context)
A B C
I.e., I can rewrite an A as a B followed by a C regardless
of the context in which A is found
CFG Formalism
Terminals = symbols of the alphabet of the language
being defined.
Variables = nonterminals = a finite set of other
symbols, each of which represents a language.
Start symbol = the variable whose language is the one
being defined.
6
Productions
A production has the form variable (head) -> string of
variables and terminals (body).
Convention:
A, B, C,… and also S are variables.
a, b, c,… are terminals.
…, X, Y, Z are either terminals or variables.
…, w, x, y, z are strings of terminals only.
, , ,… are strings of terminals and/or variables.
7
CFG Example
Here is a formal CFG for { 0n1n | n > 1}.
Terminals = {0, 1}.
Variables = {S}.
Start symbol = S.
Productions =
S -> 01
S -> 0S1
8
CFG Example
Many possible CFGs for English, here is an example
(fragment):
S NP VP
VP V NP
NP DetP NP | AdjP NP
AdjP Adj | Adv AdjP
NP N
N boy | girl
V sees | likes
Adj big | small
Adv very
DetP a | the
Derivations in a CFG
S
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP
N boy | girl
V sees | likes
Adj big | small
Adv very
DetP a | the
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP NP VP
N boy | girl
V sees | likes
Adj big | small
Adv very
DetP a | the
Derivations in a CFG
DetP N VP
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP NP VP
N boy | girl
V sees | likes
Adj big | small
DetP N
Adv very
DetP a | the
Derivations in a CFG
the boy VP
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP NP VP
N boy | girl
V sees | likes
Adj big | small
DetP N
Adv very
DetP a | the
the boy
Derivations in a CFG
the boy likes NP
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP NP VP
N boy | girl
V sees | likes DetP
Adj big | small
N V NP
Adv very
DetP a | the the boy likes
Derivations in a CFG
the boy likes a girl
S NP VP S
VP V NP
NP DetP N | AdjP NP
AdjP Adj | Adv AdjP NP VP
N boy | girl
V sees | likes DetP
Adj big | small
N V NP
Adv very
DetP a | the the boy likes DetP N
a girl
Derivations of CFGs
String rewriting system: we derive a string (=derived
structure)
But derivation history represented by phrase-structure
tree (=derivation structure)
S
NP VP
the boy likes a girl DetP N V NP
the boy likes DetP N
a girl
Grammar: Example
List of parameters in:
Function definition
<Fdef> function
function id ( <argList> )
sub(a,b,c)
<argList> id , <arglist> | id
Function call
<Fcall> id ( <parList> )
sub(a,1,2)
<parList> <par> ,<parlist>| <par>
id | const
<argList>
<par>
id , <arglist>
Þ id, id , <arglist>
Þ …
<Fdef> (id function
,)* id id (<argList> )
<argList> <arglist> , id | id
<argList>
<arglist> , id
<Fcall> id ( <parList> )
Þ<arglist> , id, id
Þ … id
<parList> (, id<parlist>
)* ,<par>| <par>
<par> id | const
17
Leftmost Derivation Rightmost Derivation
Each step of the derivation Each step of the derivation
is a replacement of the is a replacement of the
leftmost nonterminals in a rightmost nonterminals in a
sentential form. sentential form.
E E
EOE EOE
(E) O E E O id
(E O E) O E E * id
(id O E) O E (E) * id
(id + E) O E (E O E) * id
(id + id) O E (E O id) * id
(id + id) * E (E + id) * id
(id + id) * id (id + id) * id
18
Parse Tree
A labeled tree in which
the interior nodes are labeled by nonterminals
leaf nodes are labeled by terminals
the children of an interior node represent a replacement
of the associated nonterminal in a derivation
corresponding to a derivation
id + F
id E
*
id
19
Abstract Syntax Tree for Expression
E
+
E + E
id *
E E
id * id id
id id
20
AbstractstSyntax Tree for If Statement
ifStatement
if
st elsePart
if ( exp )
true st return
else st
true
return
21
Ambiguous Grammar
A grammar is ambiguous if it can generate two
different parse trees for one string.
Ambiguous grammars can cause inconsistency in
parsing.
22
Example: Ambiguous Grammar
Given the following CFG:
EE+E
EE-E
EE*E
EE/E
E id
The string : id+id*id has two parse trees
E E
E + E *
E E
E E E E
id * + id
id id id id
23
Ambiguity in Expressions
Which operation is to be done first?
solved by precedence
An operator with higher precedence is done before one with
lower precedence.
An operator with higher precedence is placed in a rule
24
E
Precedence E
+ E *
E E E
E E + E
E E - E E E E E id
id * +
E E * E
id id id id
E E / E
E id E
E E + E + E
E
E E - E
F F
E F
F F * F id F * F
F F / F
id id
F id
25
Precedence (cont’d) E
F
EE+E|E-E|F
F * F
FF*F|F/F|X
X ( E ) | id X F * F
( E ) X X
id id
E + E
(id + id) * id * id F F
X X
id id
26
E
Associativity F
Left-associative operators F / X
EE+F|E-F|F F * X id
FF*X|F/X|X X id
X ( E ) | id
( E )
E +
F
(id + id) * id / id F X
id
27
Ambiguity in Dangling Else
St IfSt | ...
{ if (0)
IfSt if ( E ) St | if ( E ) St else St { if (1) St }
E 0 | 1 | … { if (0) else St }
IfSt { if (1) St else St } } IfSt
if ( E ) St else St
if ( E ) St
0
IfSt
0 IfSt
if ( E ) St else St if ( E ) St
1 1
28
Disambiguating Rules for Dangling Else
St
MatchedSt | UnmatchedSt
St
UnmatchedSt
if (E) St |
if (E) MatchedSt else UnmatchedSt
MatchedSt UnmatchedSt
if (E) MatchedSt else MatchedSt|
...
E if ( E ) St
0|1
MatchedSt
if (0) if (1) St else St
= if (0)
if (1) St else St
if ( E ) MatchedSt else MatchedSt
29
Questions?
30