Lec02-Syntax Analysis and LL
Lec02-Syntax Analysis and LL
1
SECTION 2.1: CONTEXT FREE GRAMMAR
2
SYNTAX ANALYZER
4
PARSERS (CONT.)
1. Top-Down Parser
Parse-trees built is build from root to leaves (top to bottom).
Input to parser is scanned from left to right one symbol at a time
2. Bottom-Up Parser
Start from leaves and work their way up to the root.
Input to parser scanned from left to right one symbol at a time
6
CONTEXT-FREE GRAMMARS (CFG)
G = (T,V, S, P)
{a, b} S→ aAa
S, A S→ b
A→a 8
TERMINALS SYMBOLS
Terminals include:
➢ Lower case letters early in the
alphabets
➢ Operator symbols, +, %
➢ Punctuation symbols such as ( ) , ;
➢ Digits 0,1,2, …
➢ Boldface strings id or if
11
DERIVATION OF A STRING
String ‘w’ of terminals is generated by the grammar if:
Starting with the start variable, one can apply productions
and end up with ‘w’.
A sequence of replacements of non-terminal symbols or a
sequence of strings so obtained is a derivation of ‘w’.
12
DERIVATION OF A STRING
*
: derives in zero or more steps
+
: derives in one or more steps
13
DERIVATION OF A STRING
Derived in multiple
S→aSa → aaSaa→aaaSaaa→aaabaaa
steps
14
SENTENCE AND SENTENTIAL FORM
➢ Left-Most Derivation
➢ Right- Most Derivation
E→E+E (E→E+E) id E E
E→id+E (E→id) *
E→id+E*E (E→E*E)
id
E→id+id*E (E→id) id
E→id+id*id (E→id)
18
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E→ E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
19
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E→ E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
E
E→E*E (E→E+E) E E
E→E*id (E→id) *
E→E+E*id (E→E+E) E id
+ E
E→E+id*id (E→id)
E→id+id*id (E→id)
id id
20
SECTION 2.2: AMBIGUOUS GRAMMAR
21
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e. it can be
derived by more then one ways from LMD or RMD.
stmt stmt
E2 S1 E2 S1 S2
1 2 24
AMBIGUITY (CONT.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
25
SECTION 2.3: LEFT RECURSION AND LEFT
FACTORING
26
LEFT RECURSION
A→A| A → A'
Eliminate
A' → A' |
where does immediate
left recursion
not start with A An equivalent grammar
In general,
A → A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
Eliminate immediate left recursion
A → 1 A' | ... | n A' an equivalent grammar
A' → 1 A ' | ... | m A' | 28
REMOVING IMMEDIATE LEFT-RECURSION
E→E+T|T (A→A | )
A is E; is +T and is T
Applying Rule we get E → T E'
E → T E' (A → A ')
E’ → +T E' | (A ' → A '| )
E’ → +T E' |
T → F T'
T’ → *F T' |
T→T*F|F (A→A | ) F → id | (E)
A is T; is *F and is F
Applying Rule we get Final Output 29
T → F T' (A → A')
T’ → *F T' | (A’ → A'|)
NO IMMEDIATE LEFT-RECURSION BUT GRAMMAR
IS LEFT RECURSIVE
Consider the Grammar
No Immediate left recursion in
S → Aa | b the grammar
A → Sc | d
Substitution
We need to check and eliminate both Immediate left recursion and Left recursion
30
NO IMMEDIATE LEFT-RECURSION BUT GRAMMAR
IS LEFT RECURSIVE
Consider the Grammar
S → Aa | b No Immediate left recursion in S
A → Ac | Sd | f
Order of non-terminals: S, A
for S:
Substitute A→Sd with Aad|bd
- there is no immediate left recursion in S.
Applying Rule
S → Aa | b
We get: A → bdA' | fA'
A → bdA' | fA' A' → cA' | adA' |
A' → cA' | adA' | 31
Final Output
A→A|
NO IMMEDIATE LEFT-RECURSION BUT A→ A'
A' → A ' |
GRAMMAR IS LEFT RECURSIVE
Consider the Grammar Order of non-terminals: A, S
S → Aa | b
A → Ac | Sd | f
for A:
Eliminate the immediate left-recursion A → Ac | Sd | f
in A is c; 1 is Sd and 2 is f
A → SdA' | fA'
A' → cA' |
for S:
- Replace S → Aa with S → SdA' a|fA'a
So, we will have S → SdA' a | fA'a | b
S → SdA'| fA'a | b
Eliminate the immediate left-recursion in S
is dA' a; 1 is fA'a and 2 is b
S → fA 'aS ' | bS'
S’ → dA ' aS ' |
S → fA'aS' | bS'
32
S' → dA' aS' |
A → SdA' | fA'
Final Output A' → cA' |
PRACTICE QUESTION: LEFT RECURSION
A→Bxy|x
B→CD
C→A| c
D→ d
33
ELIMINATE LEFT-RECURSION -- ALGORITHM
OR
If there is a grammar
A → 1|2
where is non-empty and the first symbols of 1 and 2
(if they have one)are different.
36
LEFT-FACTORING -- ALGORITHM
convert it into
A → A' | 1 | ... | m
A' → 1 | ... | n
37
LEFT-FACTORING – EXAMPLE1
A → aA' | cdA''
A' → bB | B
A'' → g | eB | fB
38
LEFT-FACTORING – EXAMPLE2
A → ad | a | ab | abc | b
is a; 1 is d; 2 is ; 3 is b, 4 is bc
A → aA' | b
A' → d | | b | bc
is b; 1 is ; 2 is c
A → aA' | b
A' → d | | bA''
A'' → | c
39
NON-CONTEXT FREE LANGUAGE
CONSTRUCTS
There are some language constructions in the
programming languages which are not context-free. This
means that, we cannot write a context-free grammar for
these constructions.
42
CHALLENGES OF ERROR HANDLER
The error handler in a parser has goals that are
simple to state but challenging to realize:
Report the presence of errors clearly and accurately.
Recover from each error quickly enough to detect
subsequent errors.
Add minimal overhead to the processing of correct
programs.
43
ERROR RECOVERY STRATEGIES
Panic Mode
Parser discards input symbols one at a time until one of a
designated set of synchronizing(e.g. ‘;’) token is found.
Phrase Level
Parser performs local correction on the remaining input.
Replacement can correct any input string, but has drawback
Error Productions
Augmenting the error productions to construct a parser
Error diagnostics can be generated to indicate the erroneous
construct.
Global correction
Minimal sequence of changes to obtain a globally least cost
correction 44
SECTION 2.4 : TOP DOWN PARSING
45
TOP-DOWN PARSING
46
CHALLENGES IN TOP-DOWN PARSING
In general, we can't.
47
TOP-DOWN PARSING
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not work, we
Not efficient
Predictive Parsing
No backtracking
Efficient
S → aBc
B → bc | b
S S
Input: abc
a B c a B c
b c b
49
fails, backtrack
RECURSIVE PREDICTIVE PARSING
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
50
RECURSIVE PREDICTIVE PARSING
(CONT.)
A → aBb | bAB
proc A {
case of the current token
{
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
} 51
RECURSIVE PREDICTIVE PARSING
(CONT.)
When to apply -productions.
A → aA | bB |
52
RECURSIVE PREDICTIVE PARSING (EXAMPLE)
A → aBe | cBd | C
B → bB |
C→f
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b:- match the current
token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, } follow set of B
and move to the next token;
f: - call C 53
}
} first set of C
TOP-DOWN, PREDICTIVE PARSING: LL(1)
54
TOP-DOWN, PREDICTIVE PARSING: LL(1)
current token
55
TOP-DOWN, PREDICTIVE PARSING: LL(1)
stmt → if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the
current token is if we have to choose first production rule
When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the
current token.
We eliminate the left recursion in the grammar, and left
factor it. But it may not be suitable for predictive parsing
56
(not LL(1) grammar).
NON-RECURSIVE PREDICTIVE PARSING -
- LL(1) PARSER
Non-Recursive predictive parsing is a table-driven parser.
It is a top-down parser.
It is also known as LL(1) Parser.
Input Buffer
Non-Recursive
Stack Predictive Output
Parser
Parsing Table
57
LL(1) PARSER
Input buffer
Contains the string to be parsed. We will assume that its end is marked with a
special symbol $.
Output
A production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
Stack
Contains the grammar symbols
At the bottom of the stack, there is a special end marker symbol $.
Initially the stack contains only the symbol $ and the starting symbol S.
$S initial stack
When the stack is emptied (ie. only $ left in the stack), the parsing is completed.
Parsing table
A two-dimensional array M[A,a]
Each row is a non-terminal symbol
Each column is a terminal symbol or the special symbol $ 58
Each entry holds a production rule.
LL(1) PARSER – PARSER ACTIONS
The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
There are four possible parser actions.
3. If X is a non-terminal
➔ parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production
rule X→Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the
stack. The parser also outputs the production rule X→Y1Y2...Yk to represent a
step of the derivation.
60
COMPUTE FIRST FOR ANY STRING X
61
COMPUTE FIRST FOR ANY STRING X
Initially, for all non-terminals A, set
FIRST(A) = { t | A → t for some }
Consider the grammar :
S→aC/bB
B→b
C→c
FIRST(S) ={a,b}; FIRST (B) ={b} and FIRST(C) ={c}
FIRST(C) ={c}
FIRST COMPUTATION WITH ΕPSILON
For all NT A where A → ε is a production, add ε to FIRST(A).
For eg. S→a|ε FIRST(S) →{a, ε}
For each production A → , where is a string of NT whose FIRST sets contain
ε, set
FIRST(A) = FIRST(A) ∪ { ε }.
For eg. S→AB|c ; A→a| ε ; B→ b| ε
FIRST(S) →{a, b,c, ε} ; FIRST(A) →{a, ε} ; FIRST(B) →{b, ε} ;
For each production A → t, where is a string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ { t }
For eg. S→ABcD ; A→a| ε ; B→ b| ε ; D→d
FIRST(S) →{a,b, c} ; FIRST(A) →{a, ε} ; FIRST(B) →{b, ε} ; FIRST(D) →{d}
For each production A → B, where is string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ (FIRST(B) - { ε }).
For eg. S→ABDc|f ; A→a| ε ; B→ b| ε ; D→d 63
64
COMPUTE FOLLOW FOR ANY STRING X
RULE 1: If S is the start symbol ➔ $ is in FOLLOW(S)
We apply these rules until nothing more can be added to any follow set.
65
FIRST AND FOLLOW SET EXAMPLE
66
FIRST AND FOLLOW SET EXAMPLE
Consider the grammar
C → P F class id X Y
P → public |
F → final |
X → extends id |
Y → implements I |
I → id J
J → , I |
E → TE'
E' → +TE'|
T → FT'
T' → *FT'| 68
F → (E)|id
FIRST EXAMPLE
FIRST(F) = {(,id}
FIRST(T') = {*, }
FIRST(T) = {(,id}
FIRST(E') = {+, }
FIRST(E) = {(,id}
69
FIRST(F) = {(,id}
FIRST(T’) = {*, } FOLLOW EXAMPLE
FIRST(T) = {(,id}
FIRST(E’) = {+, } 1. If S is the start symbol ➔ $ is in FOLLOW(S)
FIRST(E) = {(,id} 2(i) If A → B is a production rule
➔ everything in FIRST() is FOLLOW(B) except
3(i) If ( A → B is a production rule ) or
3(ii) ( A → B is a production rule and is in FIRST() )
Consider the following grammar:
➔ everything in FOLLOW(A) is in FOLLOW(B).
E → TE'
E’ → +TE' | E→TE’ {(Rule 1: $ in FOLLOW(E);
T → FT' (Rule 2: A→ B : is ; B is T and is E’ );
(Rule3(i): A→ B : is T; B is E ’);
T’ → *FT' | Rule 3 (ii): A→ B : is ; B is T and E ’ is ; FIRST of
F → (E) |id has )}
E→+TE ’ | {Rule 2: A→ B : is +; B is T and is E’ );
(Rule3(i): A→ B: is +T; B is E ’;
(Rule3(ii): A→ B: is +; B is T; is E ’;FIRST of has )}
T→FT ’ {Rule 2: A→ B : is ; B is F and is T’);
FOLLOW(E) = { $, ) } (Rule3(i): A→ B : is F; B is T ’);
FOLLOW(E') = { $, ) } (Rule3(ii): A→ B : is ; B is F and is T ’ FIRST of
FOLLOW(T) = { +, ), $ } has )}
T’→*FT ’| {Rule 2: A→ B : is *; B is F and is T ’);
FOLLOW(T') = { +, ), $} (Rule3(i): A→ B : is *; B is F; is T ’);
FOLLOW(F) = {+, *, ), $ } Rule3(ii): A→ B : is *; B is F; is T ’; FIRST of has70)}
F → (E)|id
{(Rule 2: A→ B : is ‘(‘; B is E and ‘)’ is )}
CONSTRUCTING LL(1) PARSING TABLE --
ALGORITHM
for each production rule A → of a grammar G
for each terminal a in FIRST()
➔ add A → to M[A,a]
If in FIRST()
➔ for each terminal a in FOLLOW(A) add A →
to M[A,a]
If in FIRST() and $ in FOLLOW(A)
➔ add A → to M[A,$]
id + * ( ) $
E E → TE' E → TE'
E' E' → +TE' E' → E' →
T T → FT' T → FT'
T' T' → T' → *FT’ T' → T' → 73
F F → id F → (E)
LL(1) PARSER – EXAMPLE 1
Stack Input Output id + * ( ) $
E E → TE' E → TE'
$E id+id$ E→TE'
$E'T id+id$ T→FT' E' E' → +TE' E' → E' →
$E'T'F id+id$ F→id T T → FT' T → FT'
$E'T'id id+id$
T' T' → T' → *FT’ T' → T' →
$E'T' +id$ T'→
$E' +id$ E’→+TE' F F → id F → (E)
$E'T+ +id$
$E'T id$ T→FT'
$E'T'F id$ F→id
$E’T'id id$
$ET' $ T'→
$E' $ E'→
$ $ Accept
74
LL(1) PARSER – EXAMPLE 2
a b $
S → aBa S S → aBa
B → bB | B B→ B → bB
$ $ Accept, Successful
Completion
LL(1) PARSER – EXAMPLE2 (CONT.)
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
76
A GRAMMAR WHICH IS NOT LL(1)
Problem ➔ ambiguity
A GRAMMAR WHICH IS NOT LL(1) (CONT.)
What do we have to do it if the resulting parsing table contains
multiply defined entries?
If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
If the grammar is not left factored, we have to left factor the grammar.
If its (new grammar’s) parsing table still contains multiply defined entries, that
grammar is ambiguous or it is inherently not a LL(1) grammar.
A left recursive grammar cannot be a LL(1) grammar.
A → A |
➔ any terminal that appears in FIRST() also appears FIRST(A)
because A .