CD - Ch.2
CD - Ch.2
CD - Ch.2
1. Top-Down Parser
– the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free
grammars.
– LL for top-down parsing
– LR for bottom-up parsing
Context-Free Grammars
• Inherently recursive structures of a programming language are defined by a context-free grammar.
• Example:
E→ E+E | E–E | E*E | E/E | -E
E→ (E)
E → id
Derivations
E ⇒ E+E
• E+E derives from E
– we can replace E by E+E
– to able to do this, we have to have a production rule E→E+E in our grammar.
E ⇒ E+E ⇒ id+E ⇒ id+id
• A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.
• At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement.
• If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most
derivation.
• If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most
derivation.
Left-Most and Right-Most Derivations
Left-Most Derivation
E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(id+E) ⇒ -(id+id)
Right-Most Derivation
E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(E+id) ⇒ -(id+id)
• We will see that the top-down parsers try to find the left-most derivation of the given source program.
• We will see that the bottom-up parsers try to find the right-most derivation of the given source program
in the reverse order.
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
• A parse tree can be seen as a graphical representation of a derivation.
E ⇒ -E E
⇒ -(E) E
⇒ -(E+E) E
- E - E - E
( E ) ( E )
E E E + E
- E - E
⇒ -(id+E) ⇒ -(id+id)
( E ) ( E )
E + E E + E
Ambiguity
• A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
E
id id
E * E
E ⇒ E*E ⇒ E+E*E ⇒ id+E*E
⇒ id+id*E ⇒ id+id*id E + E id
id id
Ambiguity (cont.)
• For the most parsers, the grammar must be unambiguous.
• unambiguous grammar
🡺 unique selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar during the design phase of the compiler.
• An unambiguous grammar should be written to eliminate the ambiguity.
• We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to
disambiguate that grammar to restrict to this choice.
Ambiguity (cont.)
stmt → if expr then stmt |
if expr then stmt else stmt | otherstmts
E2 S1 E2 S1 S2
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
In general,
A → A α1 | ... | A αm | β1 | ... | βn where β1 ... βn do not
start with A
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
• By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S → Aa | b
A → Sc | d This grammar is not immediately
left-recursive,
but it is still left-recursive.
S ⇒ Aa ⇒ Sca or
A ⇒ Sc ⇒ Aac causes to a left-recursion
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A → Sd with A → Aad | bd
So, we will have A → Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A → bdA’ | fA’
A’ → cA’ | adA’ | ε
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A → SdA’ | fA’
A’ → cA’ | ε
for S:
- Replace S → Aa with S → SdA’a | fA’a
So, we will have S → SdA’a | fA’a | b
- Eliminate the immediate left-recursion in S
S → fA’aS’ | bS’
S’ → dA’aS’ | ε
• when we see if, we cannot now which production rule to choose to re-write stmt in the derivation.
Left-Factoring (cont.)
• In general,
A → αβ1 | αβ2 where α is non-empty and the first symbols
of β1 and β2 (if they have one)are different.
• when processing α we cannot know whether expand
A to αβ1 or
A to αβ2
convert it into
A → αA’ | γ1 | ... | γm
A’ → β1 | ... | βn
Left-Factoring – Example1
A → abB | aB | cdg | cdeB | cdfB
⇓
A → aA’ | cdg | cdeB | cdfB
A’ → bB | B
⇓
A → aA’ | cdA’’
A’ → bB | B
A’’ → g | eB | fB
Left-Factoring – Example2
A → ad | a | ab | abc | b
⇓
A → aA’ | b
A’ → d | ε | b | bc
⇓
A → aA’ | b
A’ → d | ε | bA’’
A’’ → ε | c
Non-Context Free Language Constructs
• There are some language constructions in the programming languages which are not context-free. This
means that, we cannot write a context-free grammar for these constructions.
S → aBc
B → bc | b
S S
input: abc
a B c a B c
b c fails, backtrack b
Predictive Parser
a grammar 🡺 🡺 a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
• When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a
production rule by just looking the current symbol in the input string.
current token
Predictive Parser (example)
stmt → if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token is if we have to choose first
production rule.
• When we are trying to write the non-terminal stmt, we can uniquely choose the production rule by just
looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it may not be suitable for predictive
parsing (not LL(1) grammar).
Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
Recursive Predictive Parsing (cont.)
A → aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
Recursive Predictive Parsing (cont.)
• When to apply ε-productions.
A → aA | bB | ε
• If all other productions fail, we should apply an ε-production. For example, if the current token is not a or
b, we may apply the ε-production.
• Most correct choice: We should apply an ε-production for a non-terminal A when the current token is in
the follow set of A (which terminals can follow A in the sentential forms).
Recursive Predictive Parsing (Example)
• Content inside
A → aBe | cBd | C
B → bB | ε
C→f
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C follow set of B
}
}
first set of C
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
Parsing Table
www.paruluniversity.ac.i
n