Lec03 parserCFG
Lec03 parserCFG
• Syntax Analyzer creates the syntactic structure of the given source program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-free grammar (CFG). We will
use BNF (Backus-Naur Form) notation in the description of CFGs.
• The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
– If it satisfies, the parser creates the parse tree of that program.
– Otherwise the parser gives the error messages.
• A context-free grammar
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.
– a grammar can be directly converted into a parser by some tools.
1
Parser
2
Parsers (cont.)
1. Top-Down Parser
– the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right
(one symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
sub-classes of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing
3
Context-Free Grammars
• Inherently recursive structures of a programming language are defined
by a context-free grammar.
• In a context-free grammar, we have:
– A finite set of terminals (in our case, this will be the set of tokens)
– A finite set of non-terminals (syntactic-variables)
– A finite set of productions rules in the following form
• A where A is a non-terminal and
is a string of terminals and non-terminals (including the empty string)
– A start symbol (one of the non-terminal symbol)
• Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E id
4
Derivations
E E+E
• E+E derives from E
– we can replace E by E+E
– to able to do this, we have to have a production rule EE+E in our grammar.
5
CFG - Terminology
• L(G) is the language of G (the language generated by G) which is a set
of sentences.
• A sentence of L(G) is a string of terminal symbols of G.
• If S is the start symbol of G then
+
is a sentence of L(G) iff S where is a string of terminals of G.
*
• S - If contains non-terminals, it is called as a sentential form of G.
- If does not contain non-terminals, it is called as a sentence of G.
6
Derivation Example
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)
• At each derivation step, we can choose any of the non-terminal in the sentential form
of G for the replacement.
• If we always choose the left-most non-terminal in each derivation step, this derivation
is called as left-most derivation.
7
Left-Most and Right-Most Derivations
Left-Most Derivation
E
lm
-E
lm
-(E+E) lm
-(E) lm -(id+E)
lm
-(id+id)
Right-Most Derivation
E
rm
-E
rm
-(E+E) rm
-(E) rm -(E+id)
rm
-(id+id)
• We will see that the top-down parsers try to find the left-most
derivation of the given source program.
• We will see that the bottom-up parsers try to find the right-most
derivation of the given source program in the reverse order.
8
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
E -E E
-(E) E
-(E+E) E
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
9
Ambiguity
id id
E
E E*E E+E*E id+E*E
id+id*E id+id*id E * E
E + E id
id id
10
Ambiguity (cont.)
• For the most parsers, the grammar must be unambiguous.
• unambiguous grammar
unique selection of the parse tree for a sentence
11
Ambiguity (cont.)
stmt stmt
E2 S1 E2 S1 S2
1 2
12
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
13
Ambiguity – Operator Precedence
• Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules.
14
Left Recursion
• A grammar is left recursive if it has a non-terminal A such that there is
a derivation.
A
+
A for some string
15
Immediate Left-Recursion
AA| where does not start with A
eliminate immediate left recursion
A A’
A’ A’ | an equivalent grammar
In general,
A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
eliminate immediate left recursion
A 1 A’ | ... | n A’
A’ 1 A’ | ... | m A’ | an equivalent grammar
16
Immediate Left-Recursion -- Example
E E+T | T
T T*F | F
F id | (E)
17
Left-Recursion -- Problem
S Aa | b
A Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S Aa Sca or
A Sc Aac causes to a left-recursion
20
Eliminate Left-Recursion – Example2
S Aa | b
A Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A SdA’ | fA’
A’ cA’ |
for S:
- Replace S Aa with S SdA’a | fA’a
So, we will have S SdA’a | fA’a | b
- Eliminate the immediate left-recursion in S
S fA’aS’ | bS’
S’ dA’aS’ |
21
Left-Factoring
• A predictive parser (a top-down parser without backtracking) insists
that the grammar must be left-factored.
22
Left-Factoring (cont.)
• In general,
A 1 | 2 where is non-empty and the first symbols
of 1 and 2 (if they have one)are different.
• when processing we cannot know whether expand
A to 1 or
A to 2
23
Left-Factoring -- Algorithm
• For each non-terminal A with two or more alternatives (production
rules) with a common non-empty prefix, let say
A 1 | ... | n | 1 | ... | m
convert it into
A A’ | 1 | ... | m
A’ 1 | ... | n
24
Left-Factoring – Example1
A abB | aB | cdg | cdeB | cdfB
A aA’ | cdg | cdeB | cdfB
A’ bB | B
A aA’ | cdA’’
A’ bB | B
A’’ g | eB | fB
25
Left-Factoring – Example2
A ad | a | ab | abc | b
A aA’ | b
A’ d | | b | bc
A aA’ | b
A’ d | | bA’’
A’’ | c
26
Non-Context Free Language Constructs
• There are some language constructions in the programming languages
which are not context-free. This means that, we cannot write a context-
free grammar for these constructions.